Wednesday, 12 November 2014

Week 9: Problem Solving Episode: The Stockbroker's Algorithm

Context


            So you graduate from UofT and land your dream job.  You work a few years and realize there is this thing called inflation.  So you get a brokerage firm to invest your savings.

            To guide its investment decisions, the firm uses the Stockbroker’s algorithm, which inputs a list of real numbers and outputs the max sum over a slice of the list. 

            Unfortunately, the firm uses the following implementation of the algorithm, which is called John Smith’s Sum as nobody knows the name of the employee who wrote it:

            def john_smith_sum(L: list) -> int:
                        """
                        Return the maximum sum over a slice of L in cubic time.
                        """

                        max = front_index = 0
                        n = len(L)

                        while front_index < n:
                                    end_index = front_index
     
                                    while end_index <= n:
                                                middle_index = front_index
                                                sum = 0
               
                                                while middle_index < end_index:
                                                            sum += L[middle_index]
                                                            middle_index += 1

                                                if sum > max:
                                                max = sum

                                                end_index += 1

                                    front_index += 1

                        return max

The runtime of john_smith_sum is cubic.  This means that it is agonizingly slow for large inputs and that you will miss out on great investments.  So you decide to do the following:

Main Problem:  Come up with an implementation of the Stockbroker’s algorithm with a linear runtime.

You will then give it to the firm because you are very generous person.  First things first; how will you come up with an implementation?

Step 1:  Understand the Problem


            Polya’s How to Solve It recommends understanding the problem first.  We have to meet two goals: our implementation should be correct and linear in its runtime.  Let us answer the following to see if we know what these goals mean:

            (1) What does an iterative algorithm with a linear runtime look like?
            (2) What is the correct output of the Stockbroker’s algorithm for different inputs?

            For (1), a single loop and a monotone loop index should suffice.  For (2), work out examples:  Consider the list $[3, 1, 4]$.  The max sum is $8$ so that is what our implementation should return.  How about $[1, -2, 3, -4, 5, -6]$?  Each negative number makes all the preceding partial sums negative so the max sum is $5$ here.  What about $[3, -1, 4, -1, 5]$?.  We may be tempted to look only at positive numbers but we see that adding $-1$ after $3$ and $4$ still keeps the partial sum positive.  Hence, the max sum is $10$.

Step 2: Devise a Plan


 “No plan survives contact with the enemy” – The Desert Fox

            There are a couple of approaches to this problem.  One is to assume an arbitrary list of real numbers as the input.  We write a single implementation for that input.  However, this is a little hard as we have to juggle negative and non-negative numbers simultaneously.

            A second approach is to implement by cases through the lists in (1).  Then combine them into a single implementation.  However, we are lazy and we do not want to write multiple unrelated implementations.

            Check out this approach.  We write a single implementation that works for a simple input and then extend it to all inputs.  This may be the best approach as there appears to be only a few cases.  Hopefully, this assumption is correct.  Then this approach is our plan.

Step 3: Carry out the Plan


            This is the part where we execute the plan so try not to lose your head.  Of the three lists in (1), the easiest one to write an implementation for is $[3, 1, 4]$ as all the elements are non-negative.  By delegating a variable to store the max sum, then we have version one of our implementation:

My Sum Version 1:

            def my_sum(L: list) -> int:
                        """
                        Return the maximum sum over a slice of L in linear time.
                        """

                        n = len(L)
                        sum = max = index = 0

                        while index < n:
                                    element = L[index]
                                    sum += element    

                                    if max < sum:
                                                max = sum

                                    index += 1

                        return max

            This version features a single loop with a monotone increasing loop index so we satisfy the linear runtime component.  Unfortunately, this version only works if all the elements are positive.  It falters for lists of the following nature: $[1, -2, 3, -4, 5, -6]$.  So we break down each iteration into two cases: one for negative and one for nonnegative.  This leads to version two of our implementation:
                                                                                     
My Sum Version 2:

            def my_sum(L: list) -> int:
                        """
                        Return the maximum sum over a slice of L in linear time.
                        """

                        n = len(L)
                        sum = max = index = 0

                        while index < n:
                                    element = L[index]

                                    if element >= 0:
                                                sum += element

                                    else:
                                                sum = 0           

                                    if max < sum:
                                                max = sum

                                    index += 1

                        return max

            This version resets the sum once it iterates over a negative number.  The linear runtime is still preserved.  However, consider the following list: $[3, -1, 4, -1, 5]$.  The element $-1$ is not enough to reduce the sum to below zero.  Then we should not reset the sum unless it is below zero.  This also allows us to eliminate some code: the negative and non-negative checks and the memoization step where L[index] is assigned to the variable element:

My Sum Version 3:

            def my_sum(L: list) -> int:
                        """
                        Return the maximum sum over a slice of L in linear time.
                        """

                        n = len(L)
                        sum = max = index = 0

                        while index < n:
                                    sum += L[index]
                                    
                                    if sum < 0:
                                                sum = 0           

                                    if max < sum:
                                                max = sum

                                    index += 1

                        return max

Step 4: Look Back


            Version three is our final version but our obsessive tendencies force us to double check its correctness and runtime.  Consider the test code below.  It checks that my_sum and john_smith_sum have matching outputs for each input.

            Verify this by running the code from the link on Python.  It contains the code for john_smith sum, the final version of my_sum and the test cases:


Test Cases:

            if __name__ == "__main__":
                        L = [[3, 1, 4], [3, -1, 4, 1], [3, -1, 4, -1, 5], [3, 1, -4, 1, -5, -9],
                                                [-3, -1, -4], [1, -2, 3, -4, 5, -6], [3, -5, 7, 1, -2, 0, 3, -2, 1]]

                        for x in L:
                                    assert john_smith_sum(x) == my_sum(x)

            Likewise for the runtime, the loop executes $7$ steps for each iteration in the worst case so that is $7n$ steps in total.  Add the steps outside the loop plus the loop guard and we have $7n + 4$ as the runtime, which is linear.  Hence, we have a credible implementation of the Stockbroker's algorithm. 

2 comments:

  1. This a really concise and easy to follow blog post. I commend you for taking the time to gradually build up to the final version of the linear function. I wonder if it's possible to further improve this algorithm runtime? Maybe logn?

    ReplyDelete
  2. I doubt that you can do anything less than linear since you have to traverse through each element in the list. But thanks for the suggestion.

    ReplyDelete