Tuesday 2 December 2014

Week 12: A Fun Induction Problem

Introducing a Fellow Slogger:


            Please note that this is not a problem solving episode (see Week 9).  This final week, the instructors introduce simple induction.  A fellow slogger gives a nice induction post under Week 12 here.


You can also find my comments to the following slog under Week 3:


            What is really cool about the induction post is that the slogger talks about how induction is used in a proof of the following statement: the determinant of any $n \times n$ identity matrix is $1$.  In fact, I am so excited that a fellow slogger uses CSC165 concepts for greater and better things that I feel compelled to follow the slogger's footsteps.  I will also use CSC165 concepts as a guide for a problem that is outside the scope of this course:

The Fun Problem:


            Problem:  Is the following statement true: the determinant of any upper triangular matrix is
            the product of its diagonal entries?

The truth is I have no clue.  In my Problem Solving Episode under Week 9 (under the heading Step 1: Understand the Problem), I work out examples of an idea that I do not fully understand to gain intuition.  We should work out examples for this problem too.

            First we know that an upper triangular matrix is a $n \times n$ matrix $A$ that is defined by:

$A_{ij} = \begin{cases} a_{ij} & i \leq j \\ 0 & i > j \end{cases}$.

Then the following matrices satisfy the definition nicely:

$\begin{bmatrix} 1 & 2 \\ 0 & 3 \end{bmatrix}, \begin{bmatrix} 4 & 5 & 6 \\ 0 & 7 & 8 \\ 0 & 0 & 9 \end{bmatrix}$.

Secondly, let $A$ be a $n \times n$ matrix and $F$ be a field.   Then the determinant is a function $det: M_{n \times n}(F) \to F$ that is defined by:

$det(A) = \begin{cases} A_{ij} & n = 1 \\ \sum\limits_{j= 1}^n (-1)^{i + j} A_{ij} det(\bar{A_{ij}}) & n \geq 2 \end{cases}$.

There is some fancy notation here but it is not difficult to understand.  For simplicity sakes, think of $F$ as the set of real numbers.  Also, the strange bar on top of $A_{ij}$ just means $A$ with both row $i$ and column $j$ removed.  We see that the determinant is recursively defined.  This may be useful later.

            To find the determinant of the above matrices, use the definition.  For the $2 \times 2$ matrix, we have:

$det(\begin{bmatrix} 1 & 2 \\ 0 & 3 \end{bmatrix}) = (-1)^{1 + 1}(1)(3) + (-1)^{1 + 2}(0) = 3$.

For the $3 \times 3$ matrix, we have:

$det(\begin{bmatrix} 4 & 5 & 6 \\ 0 & 7 & 8 \\ 0 & 0 & 9\end{bmatrix}) = $

$(-1)^{1 + 1}(4)det(\begin{bmatrix} 7 & 8 \\ 0 & 9 \end{bmatrix}) + (-1)^{1 + 2}(5)det(\begin{bmatrix} 0 & 8 \\ 0 & 9 \end{bmatrix}) + (-1)^{1 + 3}(6)det(\begin{bmatrix} 0 & 7 \\ 0 & 0 \end{bmatrix}) = $

$(4)(-1)^{1 + 1}(7)(8) + (-1)(5)(-1)^{1 + 1}(0)(9) + (5)(-1)^{1 + 1}(0)(0) =$

$(4)(7)(8) + 0 + 0 = 224$.

It seems like the statement is true and that we should try proving it.  Something interesting in the calculation is that the determinant of a matrix with a column of $0$s is $0$.  Again, this may be useful later.

The Attempt at a Solution:


            How should we prove the statement?  A direct proof comes to mind but remember that the determinant is recursively defined.  This makes it difficult to work with unless we know the determinant of a previous recursive case.  Proof by simple induction sounds attractive here.  The induction hypothesis may give us the determinant of a previous recursive case.  Let us try it.

            Since a $0 \times 0$ matrix is undefined, then we show that $\forall n \in N, n \geq 1 \implies$ the determinant of an upper triangular $n \times n$ matrix is the product of its diagonal entries.  By simple induction on $n$, we show that this statement is true.  The CSC165 proof structure for induction is our guide.

            For the base case, assume that $n = 1$.  Let $A$ be a $1 \times 1$ upper triangular matrix.  By the definition of the determinant, then $det(A) = A_{ij}$.  Since $A_{ij}$ is the only diagonal entry, then statement is true for $n = 1$.

            For the induction step, assume that $n \in N, n \geq 1$ and that the statement is true for $n$.  That will be our induction hypothesis.  We need to show that the statement is true for $n + 1$.  Let $A$ be a $(n + 1) \times (n + 1)$ upper triangular matrix.   By the definition of the determinant, we have that

$det(A) =  \sum\limits_{j = 1}^{n + 1} (-1)^{1 + j}A_{1j} det(\bar{A_{1j}})$.

            Remember this observation: the determinant of a matrix with a column of $0$s is $0$?  Turns out, that is because a square matrix with a column of $0$s is not invertible and some fancy theorem tells us that the determinant of such a matrix is $0$.

            Luckily, we make the following observation:

$det(A) = \sum\limits_{j = 1}^{n + 1} (-1)^{1 + j} A_{1j} det(\bar{A_{1j}}) =  (-1) ^{1 + 1}A_{11} det(\bar{A_{11}}) +  \sum\limits_{j = 2}^{n + 1} (-1)^{1 + j}A_{1j} det(\bar{A_{1j}})$.

Notice that the matrix $\bar{A_{1j}}$ is $A$ without row $1$ and without a column $j \geq 2$.  Minus the first element, column $1$ of $A$ is a column of $0$s  So the matrix $\bar{A_{1j}}$ also has a column of $0$s.  Thus, $det(\bar{A_{1j}}) = 0$  and

$det(A) = (-1) ^{1 + 1}A_{11} det(\bar{A_{11}}) +  \sum\limits_{j = 2}^{n + 1} (-1)^{1 + j}A_{1j} det(\bar{A_{1j}}) = (-1) ^{1 + 1}A_{11} det(\bar{A_{11}})$.

            The matrix $\bar{A_{11}}$ has both row $1$ and column $1$ removed but it is still an upper triangular $n \times n$ matrix.  By the induction hypothesis, then $det(\bar{A_{11}})$ is the product of its diagonal entries.  Notice that the diagonal entries of $det(\bar{A_{11}})$  are the diagonal entries of $A$ without $A_{11}$.  Since

$det(A) = (-1) ^{1 + 1}A_{11} det(\bar{A_{11}}) =  A_{11} det(\bar{A_{11}})$,


then we have $det(A)$ as the product of its diagonal entries.  Since $A$ is a $(n + 1) \times (n + 1)$ upper triangular matrix, then the statement is true for $n + 1$.  By simple induction, then $\forall n \in N, n \geq 1 \implies$ the statement is true for $n$.  So that is my attempt is at this fun problem.

Wednesday 26 November 2014

Week 11: Harder Time Complexity Problems

           There is no class this week so for fun, we go over a few harder time complexity problems.  Consider the sequences $f(n) = n^4 - 165n^3 - 240n^2 - 265n - 373$ and  $g(n) = 157n^3 + 257n^2 + 337n + 457$.

Problem 1: Show that $f \notin O(g)$ without using any limit definitions.

Problem 2:  Show that $f \in \Omega (g)$ without using any limit definitions.

Problem 1 Tips:


            The definition that we have is $f \in O(g)$, which is defined as:

$\exists c \in R^+, \exists B \in N, ( \forall n \in N, n \geq B \implies f(n) \leq g(n))$.

Thus, its negation is:

$\forall c \in R^+, \forall B \in N, ( \exists n \in N, n \geq B \wedge f(n) > g(n))$.

We must choose $n$.  Using the recommended strategies of my instructors, we get:

$n^4 - 165n^3 - 240n^2 - 265n - 373 \geq$

$n^4 - 1043n^3$

$???$

$c(1208n^3) \geq$

$c(157n^3 + 257n^2 + 337n + 457)$.

The $???$ part indicates the area where the strategies of my instructors do not cover.  So I think hard here (like really hard) and I come up with this.  All we must do is find a $n$ that satisfies the inequality $n^4 - 1043n^3 > c(1208n^3)$.  We have:

$n^4 - 1043n^3 > c(1208n^3)$

Move $ -1043n^3$ on the left side to the right and factor out $n^3$ from the resulting right:

$n^4 > n^3(c1208 + 1043)$

It looks like we will choose $n > 0$.  That way, we can divide both sides by $n^3$:
$n > c1208 + 1043$.

Our choice of $n$ is then $max(\lceil c1208 + 1043 \rceil + 1, B)$.

Problem 2 Tips:


            Likewise, we show that $f \in \Omega (g)$, which is defined as:

$\exists c \in R^+, \exists B \in N, ( \forall n \in N, n \geq B \implies f(n) \geq g(n))$.

We must choose $c, B$.  We have a similar inequality to problem 1:

$n^4 - 1043n^3 \geq 1208n^3$

Again, we choose $B >0$ so that we can divide both sides by $n^3$.  Then add both sides by $1043$ to yield:

$n \geq 2051$.

Of course, to choose $B = 2051$, we must choose $c = 1$.  Otherwise, the first step is irreversible.

Tuesday 18 November 2014

Week 10: Computability and Confusion

            Computability is the first topic in CSC165 that is new to me.  It is both interesting and somewhat confusing.  Unfortunately, any confusion is undesirable at this point in the course.  After thinking about my confusion, I realize that the halting problem is the culprit.  Why is this?

Elementary Definitions and Theorems:


            First, the stuff that makes sense.  Consider a set of inputs $I$ and a set of outputs $O$.  Computability begins with three relatively simple definitions and a single theorem.

Definition 1:  Let $f: I \to O$ be a function.  Then $f$ is well-defined $\iff$ for each $x \in I$, there is a unique $y \in O$ such that $f(x) = y$.


For remaining definitions and theorems, let $g, h: I \to O$ be well-defined functions.

Definition 2:  Then $g$ is computable $\iff$ there is a program $P$ such that $\forall x \in I, g(x) = P(x)$.

Definition 3:  Then $g$ reduces to $h \iff g$ can be implemented with $h$.  Below is what $g, h$ look like in Python code if $g$ reduces to $h$.  Let $P$ be the implementation of $g$ and $Q$ be the implementation of $h$.

            def P():
            """An implementation of the function f."""
                        Q():

Theorem 1:  Let $g$ reduce to $h$.  Then $h$ is computable $\implies g$ is computable.

The Confusing Part:


            Now, the stuff that is not so clear.  The function halt(f, i) tells us if a function $f$ with an input $i$ halts.  The halting function is not computable.  The proof of this statement starts simple enough:  Assume to the contrary that there is an implementation of halt(f, i) that works for all functions and inputs.  Then consider the function what_is_love(f):

            def what_is_love(f):

                        if halt(f, f):
                                    while True:
                                                print(“Look to the skies.”)
                        else:
                                    print(“Nobody knows.”)

The “while True” statement is the incomprehensible part of the proof.  I ask in lecture why it is there.  The instructor lets a classmate several rows ahead of me answer my question.  The explanation takes nearly half a minute.  I did not get any of it.  Apparently, the instructor did not have any problems with the explanation nor did my other classmates.

The Comprehending Part:


            It takes several days for me to understand why the “while True” statement is there.  Eventually, I look at what we are proving.  We assume that there is an implementation of halt(f, i) that works for all functions and inputs.  To disprove, we only have to show that one function and input does not work for halt(f, i).  This is a classic existential proof.  We are free to construct the function what_is_love(f) however we want to so long as it disproves the computability of halt(f, i).

            The “while True” statement guarantees that what_is_love(what_is_love) goes into an infinite loop if halt(what_is_love, what_is_love) returns True.  Likewise, what_is_love(what_is_love) halts if halt(what_is_love, what_is_love) returns False.  This is the contradiction and it disproves the computability of halt(f, i).  I wonder why that classmate who answered my question did not explain it like this....

Wednesday 12 November 2014

Week 9: Problem Solving Episode: The Stockbroker's Algorithm

Context


            So you graduate from UofT and land your dream job.  You work a few years and realize there is this thing called inflation.  So you get a brokerage firm to invest your savings.

            To guide its investment decisions, the firm uses the Stockbroker’s algorithm, which inputs a list of real numbers and outputs the max sum over a slice of the list. 

            Unfortunately, the firm uses the following implementation of the algorithm, which is called John Smith’s Sum as nobody knows the name of the employee who wrote it:

            def john_smith_sum(L: list) -> int:
                        """
                        Return the maximum sum over a slice of L in cubic time.
                        """

                        max = front_index = 0
                        n = len(L)

                        while front_index < n:
                                    end_index = front_index
     
                                    while end_index <= n:
                                                middle_index = front_index
                                                sum = 0
               
                                                while middle_index < end_index:
                                                            sum += L[middle_index]
                                                            middle_index += 1

                                                if sum > max:
                                                max = sum

                                                end_index += 1

                                    front_index += 1

                        return max

The runtime of john_smith_sum is cubic.  This means that it is agonizingly slow for large inputs and that you will miss out on great investments.  So you decide to do the following:

Main Problem:  Come up with an implementation of the Stockbroker’s algorithm with a linear runtime.

You will then give it to the firm because you are very generous person.  First things first; how will you come up with an implementation?

Step 1:  Understand the Problem


            Polya’s How to Solve It recommends understanding the problem first.  We have to meet two goals: our implementation should be correct and linear in its runtime.  Let us answer the following to see if we know what these goals mean:

            (1) What does an iterative algorithm with a linear runtime look like?
            (2) What is the correct output of the Stockbroker’s algorithm for different inputs?

            For (1), a single loop and a monotone loop index should suffice.  For (2), work out examples:  Consider the list $[3, 1, 4]$.  The max sum is $8$ so that is what our implementation should return.  How about $[1, -2, 3, -4, 5, -6]$?  Each negative number makes all the preceding partial sums negative so the max sum is $5$ here.  What about $[3, -1, 4, -1, 5]$?.  We may be tempted to look only at positive numbers but we see that adding $-1$ after $3$ and $4$ still keeps the partial sum positive.  Hence, the max sum is $10$.

Step 2: Devise a Plan


 “No plan survives contact with the enemy” – The Desert Fox

            There are a couple of approaches to this problem.  One is to assume an arbitrary list of real numbers as the input.  We write a single implementation for that input.  However, this is a little hard as we have to juggle negative and non-negative numbers simultaneously.

            A second approach is to implement by cases through the lists in (1).  Then combine them into a single implementation.  However, we are lazy and we do not want to write multiple unrelated implementations.

            Check out this approach.  We write a single implementation that works for a simple input and then extend it to all inputs.  This may be the best approach as there appears to be only a few cases.  Hopefully, this assumption is correct.  Then this approach is our plan.

Step 3: Carry out the Plan


            This is the part where we execute the plan so try not to lose your head.  Of the three lists in (1), the easiest one to write an implementation for is $[3, 1, 4]$ as all the elements are non-negative.  By delegating a variable to store the max sum, then we have version one of our implementation:

My Sum Version 1:

            def my_sum(L: list) -> int:
                        """
                        Return the maximum sum over a slice of L in linear time.
                        """

                        n = len(L)
                        sum = max = index = 0

                        while index < n:
                                    element = L[index]
                                    sum += element    

                                    if max < sum:
                                                max = sum

                                    index += 1

                        return max

            This version features a single loop with a monotone increasing loop index so we satisfy the linear runtime component.  Unfortunately, this version only works if all the elements are positive.  It falters for lists of the following nature: $[1, -2, 3, -4, 5, -6]$.  So we break down each iteration into two cases: one for negative and one for nonnegative.  This leads to version two of our implementation:
                                                                                     
My Sum Version 2:

            def my_sum(L: list) -> int:
                        """
                        Return the maximum sum over a slice of L in linear time.
                        """

                        n = len(L)
                        sum = max = index = 0

                        while index < n:
                                    element = L[index]

                                    if element >= 0:
                                                sum += element

                                    else:
                                                sum = 0           

                                    if max < sum:
                                                max = sum

                                    index += 1

                        return max

            This version resets the sum once it iterates over a negative number.  The linear runtime is still preserved.  However, consider the following list: $[3, -1, 4, -1, 5]$.  The element $-1$ is not enough to reduce the sum to below zero.  Then we should not reset the sum unless it is below zero.  This also allows us to eliminate some code: the negative and non-negative checks and the memoization step where L[index] is assigned to the variable element:

My Sum Version 3:

            def my_sum(L: list) -> int:
                        """
                        Return the maximum sum over a slice of L in linear time.
                        """

                        n = len(L)
                        sum = max = index = 0

                        while index < n:
                                    sum += L[index]
                                    
                                    if sum < 0:
                                                sum = 0           

                                    if max < sum:
                                                max = sum

                                    index += 1

                        return max

Step 4: Look Back


            Version three is our final version but our obsessive tendencies force us to double check its correctness and runtime.  Consider the test code below.  It checks that my_sum and john_smith_sum have matching outputs for each input.

            Verify this by running the code from the link on Python.  It contains the code for john_smith sum, the final version of my_sum and the test cases:


Test Cases:

            if __name__ == "__main__":
                        L = [[3, 1, 4], [3, -1, 4, 1], [3, -1, 4, -1, 5], [3, 1, -4, 1, -5, -9],
                                                [-3, -1, -4], [1, -2, 3, -4, 5, -6], [3, -5, 7, 1, -2, 0, 3, -2, 1]]

                        for x in L:
                                    assert john_smith_sum(x) == my_sum(x)

            Likewise for the runtime, the loop executes $7$ steps for each iteration in the worst case so that is $7n$ steps in total.  Add the steps outside the loop plus the loop guard and we have $7n + 4$ as the runtime, which is linear.  Hence, we have a credible implementation of the Stockbroker's algorithm. 

Tuesday 4 November 2014

Week 8: Time Complexity

            The post in Week 7 talks about algorithms being linear, quadratic or cubic.  Such descriptions are neither precise nor fancy enough keep computer scientists satisfied.  So we have this thing called the time complexity of an algorithm, which describes the growth in runtime over the growth in input size.

Here Comes the Notation!


            Time complexity is described in Big O.  It looks something like this.  Let $f: N \to R^{\geq 0}$ be a sequence that describes the runtime of an algorithm $P$ and $g: N \to R^{\geq 0}$ be a sequence that describes an arbitrary runtime (i.e. linear, quadratic, cubic, etc).  If we say that $f \in O(g)$, then we mean that:

\[\exists c \in R^+, \exists B \in N, (\forall n \in N, n \geq B \implies f(n) \leq cg(n))\]

Intuitively, what these little symbols say is that as the input becomes arbitrarily large, then the runtime of $P$ performs better than or as well as $g$.  Likewise, if we say that $f \in \Omega (g)$, then we mean that:

\[\exists c \in R^+, \exists B \in N, (\forall n \in N, n \geq B \implies f(n) \geq cg(n))\]

Again, this is just a fancy way of saying that as the input becomes arbitrarily large, then the runtime of $P$ performs worse than or as well as $g$.

A Fun Problem


            Let $f: N \to R^{\geq 0}$ be a polynomial sequence of degree $m$ and $g: N \to R^{\geq 0}$ be a polynomial sequence of degree $n$ with $m, n \in R$.  Find all $m, n$ such that $f \in O(g)$ and $(f \times g) \in \Omega(f)$.

Monday 27 October 2014

Week 7: Nice Mathematical Toys in Algorithm Analysis

            Constant, linear, quadratic, logarithmic or linearithmic are terms that vaguely describe the worst case runtime of algorithms in CSC148 but they are not precise enough in CSC165.  We need the exact number of steps.  This calls for some nice mathematical toys, which we introduce with the following algorithm:
           
            def mystery_sort(L: list):
                        “””
                        Sort the elements of L in mysterious order.
                        “”” 

                        n = len(L)
                        outer_index  = 1

                        while outer_index < n:
                                    value = L[outer_index]
                                    inner_index = outer_index

                                    while inner_index > 0 and A[inner_index – 1] < value:
                                                A[inner_index] = A[inner_index – 1]
                                                inner_index = inner_index – 1

                                    A[inner_index] = value
                                    outer_index = outer_index + 1

            Since mystery_sort has a nested loop with monotone loop indexes, then its worst-case runtime is quadratic.  What is less obvious is the exact number of steps that it takes for mystery_sort to run in the worst case.   Our procedure to count mystery_sort is as follows:

            (1) Assume that each line represents one step to make the analysis easier.
            (2) Count from the inner loop to the outer loop.
            (3) Add up the steps that are outside the outer loop.

So assume (1).  Then do (2) and (3).

Inner Loop


            To count the steps of the inner loop, our first mathematical toy comes into play: abstraction.  It helps us determine the number of times that the inner loop executes.  Observe that the inner loop depends on the outer_index.  However, the outer_index is not concrete since it changes each time the outer loop executes.  So we must think abstractly; use an arbitrary integer $i$ to represent the outer_index.  Then $i$ becomes the number of times that the inner loop executes.  Now the easy part: the inner loop performs three steps for each iteration.  Since the inner loop executes $i$ times, then the number of steps of the inner loop is $3i$.

Outer Loop


           In addition to the number of steps of the inner loop, we have the following steps inside each outer loop:

            (1) while outer_index < n
            (2) value = L[outer_index]
            (3) inner_index = outer_index
            (4) loop guard of the inner loop
            (5) A[inner_index] = value
            (6) outer_index = outer_index + 1

This is $6$ steps in total.  Add it to the steps of the inner loop and you see that each iteration of the outer loop executes $3i + 5$ steps.  We still have to add up the steps for each iteration so here our second mathematical toy comes into play: summation.  Since the outer loop runs $n - 1$ times, then the total steps of the outer loop is the following summation:

\[\sum_{i = 1}^{n - 1} (3i + 6)\].

            To find the sum of this series, our mathematical final toy comes and it is Euler’s trick.  This is a technique for finding the sum of an arithmetic series.  First decompose the summation:

\[\sum_{i = 1}^{n - 1} (3i + 6) = 3 \sum_{i = 1}^{n - 1} i + \sum_{i = 1}^{n - 1} 6\]

Then perform the following calculation:

\[\sum_{i = 1}^{n - 1} i = 1 + 2 + 3 + … + (n - 3) + (n - 2) + (n - 1) =\]
\[\frac{1}{2}(1 + 2 + 3 + … + (n - 3) + (n - 2) + (n - 1) + \]
\[(n - 1) + (n - 2) + (n - 3) + … + 3 + 2 + 1) =\]
\[\frac{n(n - 1)}{2} = \frac{n^2 - n}{2}\]

Substitute back into the series and we get the number of steps that outer loop executes:

\[\frac{3n^2 - 3n + 12n - 12}{2} = \]
\[\frac{3n^2 - 9n - 12}{2}\]

Outside Both Loops


There are still the following steps to consider:

(1) n = len(L)
(2) outer_index = 1
(3) loop guard of the outer loop

This is $3$ steps so the total number of steps that the algorithm performs is:

\[\frac{3n^2 - 9n - 6}{2}\]

Monday 20 October 2014

Week 6: $\epsilon - \delta$ Definition of Limit

            As someone who has no prior experience with the $\epsilon - \delta$ definition of limit, I will teach myself about them with the help of my favourite math textbook and some creativity.  This is what I gleaned so far.

            In high school, we calculate limits.  In university, we like to think that we are sophisticated thinkers. Knowing how to calculate is not enough for us.  We must know why our calculation is correct.  Then we need a definition to work with.  This is where the the $\epsilon - \delta$ definition of limit comes in:

            Let $f:X \to R$ be a function, $L \in R$, $a \in X$ and $X \subseteq R$.  Then

\[\lim_{x \to a} f(x) = L \iff\]
\[\forall \epsilon \in R^+, \exists \delta \in R^+, (\forall x \in R, |x - a| < \delta \implies |f(x) - L| < \epsilon).\]

After digesting this definition, we can prove or disprove statements about the limits of functions.
For example, consider the following statement:

\[\lim_{x \to 2} x^2 = 4\]

By substituting two into $x$, we discover that this statement should be true.  To show that our hunch is correct, we plug in $a = 2$ and $L = 4$ into the $\epsilon - \delta$ definition and prove it:

\[\forall \epsilon \in R^+, \exists \delta \in R^+, (\forall x \in R, |x - 2| < \delta \implies |x^2 – 4| < \epsilon)\]

Our value for $\epsilon$ will be arbitrary but our value for $\delta$ must be concrete since $\epsilon$ has a $\forall$ quantifier but $\delta$ has a $\exists$ quantifier.  The hard part is choosing $\delta$.  How do we do it?

Where to Start


            Think of $\epsilon$ as a distance from $L$ on the vertical axis and $\delta$ as a distance from $2$ on the horizontal axis.  Once the distance of $x$ from $2$ is less than $\delta$ on the horizontal, then the $\epsilon – \delta $ definition of limit guarantees that the distance of $x^2$ from $4$ is less than $\epsilon$ on the vertical.  Hence, our choice of $\delta$ must satisfy the definition for all $\epsilon$.  To achieve this, we express $\delta$ in terms of $\epsilon$, which is done by working backwards from the definition.

How to Work Backwards


       Since the definition is an implication, then working backwards begins with the consequent:

$|x^2 - 4| < \epsilon$

Since we operate in $R$, then we can manipulate the left side of the inequality:

$|x + 2||x - 2| = |x^2 - 4| < \epsilon$

We see that the $|x - 2|$ in $|x + 2||x - 2| < \epsilon$ will be useful in getting the antecedent of the definition, which is $|x - 2| < \delta$.  However, there is still a bit of baggage to unload.

The Creative Step


            Since we choose our $\delta$, then we can give it an upper bound.  For simplicity, we choose $1$ as an upper bound for $\delta$.  So if $|x - 2| < d$, then $|x - 2| < 1$.

            From $|x - 2| < 1$, the absolute value definition gives $-1 < x - 2 < 1$.  By adding $4$ to all sides of the inequality, then we have $3 < x + 2 < 5$.  Use the absolute value definition again, which gives us $|x + 2| < 5$.

Now we can unload $|x + 2|$ from $|x + 2||x - 2| < \epsilon$.  Since $|x + 2| < 5$, then $|x + 2||x - 2| < 5|x - 2|$.  Then $5|x - 2| < e$ is equivalent to $|x - 2| < \frac{\epsilon}{5}$.

Endgame


We can now chose $\delta$.  If $|x - 2| < \frac{\epsilon}{5}$ is true, then its equivalent statement $5|x - 2| < e$ guarantees that $|x - 2||x + 2| = |x^2 - 4| < \epsilon$ since $|x - 2| < 5$.  Yet we chose $1$ as an upper bound for $\delta$.  So if $\frac{\epsilon}{5}>1$, then $\delta = 1$.  Hence our choice of $\delta$ is $min(1, \frac{\epsilon}{5})$.  The proof follows naturally from the scratch work so it will be omitted.

Monday 13 October 2014

Week 5: The Pros and Pitfalls of CSC165 Proofs


Many of my classmates have their first exposure to proofs in CSC165 so this class caters towards them.  I think this is fair but I am sure that at least some of my classmates did proofs in other classes.  I become used to the essay-like proofs of my previous proof-oriented math classes: MAT138 (Intro to Proofs) and MAT240 (Algebra I).  So when the first proofs in CSC165 presented themselves in mainly logical sentences instead of mainly English sentences, I was pleasantly surprised.  Their aesthetic appearance reminds me of programming code.  Why are the proofs in CSC165 presented this way?

The Pros


After thinking about it for a few days, I arrive at two justifications.  Proofs can seem overwhelming to those of us who are new it; this certainly was the case for me.  Presenting the proofs in the style of programming code can make it less intimidating to those us who are comfortable with programming.   In addition, the use of mainly logical sentences instead of mainly English sentences can make it easier to catch logic errors.




The Pitfalls



            However, there is a downside to the CSC165 way of writing the proofs.  Where is the intuition? There is emphasis on annotating the logical sentences correctly but not much focus on the how to prove a claim.  How do we decide between a direct proof or proof by contradiction if we are looking for a sound and efficient way to prove?   Do we memorize the cases for when one proof strategy works better than another?  Experience tells me that this is a bad idea because we will be at a loss if we see a new, unfamiliar claim to prove.




So what is the best way to gain intuition?  My opinion is to try things out.  We should think way about different ways to prove a claim.  We should think how the mathematical structures in a claim make a specific proof strategy efficient or inefficient.  Unfortunately, the focus on annotating the logical sentences correctly and making the aesthetics of a proof comparable to programming code is contrary to the goal of gaining intuition.  If CSC165 can focus more on the strategies for approaching proofs, then we might benefit more from the logical rigour through which the course presents its proofs.

Friday 3 October 2014

Week 4: Logical Cameleons


Suppose that you walk into the forest today and see a purple chameleon.  Then next week, you walk by into same forest and see an orange chameleon.  The orange chameleon looks similar to the purple chameleon except for its colour.  Surely they are not the same chameleon, right?


A Purple Panther Chameleon

An Orange Panther Chameleon


            In logic, a similar problem occurs.  Consider these two logical sentences.  Let $D $ be a set and $P, Q$ be sentences:

                        (1) $\exists x \in D, P(x) \implies Q(x)$
                        (2)  $( \forall x \in D, P(x)) \implies ( \exists x \in D, Q(x))$

These two sentences look very different.  Surely they are not the same sentence, right? 

            For chameleons, you are out of luck but for logic, we can use logical axioms, which link together logical statements that are equivalent, to determine if two logical sentences are saying the same thing.  Let $S$ be a set and $P, Q, R$ be sentences.  These are our axioms:

Table of Logical Axioms for Dummies
Axiom
Result
Commutativity

$P \vee Q \iff Q \vee P$

$P \wedge Q \iff Q \wedge P$

Associativity

$P \vee (Q \vee R) \iff (P \vee Q) \vee R$

$P \wedge (Q \wedge R) \iff (P \wedge Q) \wedge R$

Distributivity

$P \vee (Q \wedge R) \iff (P \vee Q) \wedge (P \vee Q)$

$P \wedge (Q \vee R) \iff (P \wedge Q) \vee (P \wedge Q)$

Implication

$(P \implies Q) \iff (\neg P \vee Q)$

Bi-implication

$(P \iff Q) \iff ((P \implies Q) \wedge (Q \implies P))$

Identity

$P \wedge (Q \vee \neg Q) \iff P$

$P \vee (Q \wedge \neg Q) \iff P$

Idempotency

$P \wedge P \iff P$

$P \vee P \iff P$

DeMorgan’s Law

$\neg (P \wedge Q) \iff \neg P \vee \neg Q$

$\neg (P \vee Q)  \iff \neg P \wedge \neg Q$

Quantifier Commutativity

$\forall x \in S, \forall y \in S, P(x, y) \iff$
$\forall y \in S, \forall x \in S, P(x, y)$

$\exists x \in S, \exists y \in S, P(x, y) \iff$
$\exists y \in S, \exists x \in S, P(x, y)$

Quantifier Distributivity

$\forall x \in S, P(x) \wedge Q(x) \iff$
$(\forall x \in S, P(x)) \wedge (\forall x \in S, Q(x))$

$\exists x \in S, P(x) \vee Q(x) \iff$
$(\exists x \in S, P(x)) \vee (\exists x \in S, Q(x))$


Our job is determine if (1) and (2) are equivalent with these axioms.  Start with (1):

$\exists x \in D, P(x) \implies Q(x)$

By the implication axiom, then (1) has the same meaning as this sentence:

$\exists x \in D, \neg P(x) \vee Q(x)$

By the quantifier distributivity axiom, then (1) camouflages into this sentence :

$(\exists x \in D, \neg P(x)) \vee (\exists x\in D, Q(x))$

Use the implication axiom again so that (1) looks like this sentence:

$(\forall x \in D, P(x)) \implies (\exists x\in D, Q(x))$


We see that sentences (1) and (2) are equivalent even though they look different.  Appearances are always deceiving!

Friday 26 September 2014

Week 3: Fancy Translations from English to Logic


Last week, we saw that a sentence in a natural language can be represented in the formal language of logic through predicates, implications and quantifiers.  Yet we can be a little fancier with our representations by introducing two new members to our zoo of logical symbols. 

Suppose that we want to represent these English statements in logic:

(1) Some student that is in CSC165 is both cool and likes math.
(2) All students that are CSC165 are either cool or like math.

Let $U$ be the set of all students.  Let $C(x)$ mean $x$ is in CSC165, $K(x)$ mean $x$ is cool and $M(x)$ mean $x$ likes math.  The conjunction symbol, which represents the English word “and”, and the disjunction symbol, which represents the word English “or”, allows the translation of statements (1) and (2) into logic:

                        (1) $\exists x \in U, C(x) \implies (K(x) \wedge M(x))$.
                        (2) $\forall x \in U, C(c) \implies (K(x) \vee M(x))$.

Now let us play a game of I-Spy.   See these statements:

                        (3) $\exists x \in U, C(x) \implies (K(x) \implies M(x))$.
                        (4) $\forall x \in U, C(x) \implies (K(x) \implies M(x))$.

I spy a nuance between statements (3) and (1) and statements (4) and (2).   What is this nuance?  It is the relation between predicates $K(x)$ and $M(x)$.  Replacing either the conjunction or disjunction symbol in a statement with an implication symbol changes the meaning of that statement.  Let us illustrate the difference between statements (1) and (3) and statements (2) and (4) with a CSC165 student named Bobby.

Bobby wants to take up the burden of satisfying statement (1).  Then he needs be cool and he has to like math.  Later Bobby decides that this is too much of a burden for him so he decides to satisfy statement (3) instead.  From last week`s post about implications, we know that he has a choice of either being uncool or keeping his fondness for math.  Hence, the implication is much more flexible than the conjunction.


Now Bobby wants to take up the burden of satisfying statement (2).  He campaigns to convince all his CSC165 classmates to either be cool or like math.  His classmates rebel and decide to neither be cool nor like math.  Yet Bobby is enterprising.  He decides to satisfy statement (4) instead.  Then he needs to get all his classmates to either be uncool or like math.  However, his CSC165 classmates are just as adept at defying Bobby and they decide to be both cool and dislike math.  Poor Bobby!  Hence, the implication and disjunction are different in how they are satisfied but they are the same in their flexibility.

Friday 19 September 2014

Week 2: Introducing... the Logical Zoo

The English Language


            Currently the most commonly used language in the developed world, the English language has a diverse vocabulary, a sophisticated grammar system and a rich history that dates back to the end of the Roman Empire.  Like all natural languages, English allows for ambiguity of expression.  Take a look at this sentence:
           
Panda mating fails: veterinarian takes over.

Either one of two things is being said:

(1) The veterinarian will investigate the pandas.
(2) The pandas will have to mate with the veterinarian.

Common sense is enough here to determine what the sentence-writer meant.  Yet there are occasions in which both interpretations are plausible.  What if you have a weird friend who said this to you:

"People are fashionably hungry unless they eat Satsumas"

Your friend means either one of two things:

(1) People that are fashionably hungry do not eat Satsumas.
(2) People that do not eat Satsumas are fashionably hungry.

Neither common sense nor the semantics of the English language tells us beyond doubt what your friend meant.  We need some other tool.


The Language of Logic


            Logic can solve the riddle of what your friend said.  The language of logic uses symbols to represent meaning just like English uses the alphabet the represent words.  Yet unlike English, logic communicates everything in terms of whether something is true or false.  The following table introduces common logical symbols:

Table of Logical Symbols for Dummies
Symbol
Meaning
Comments
$x, y, z, …$
Variable
A variable represents some real-world object.  It can be a person, word, number or any object you like.
$S, T, U, …$
Set
A set is an ordered collection of unique objects.  It can be a set of people, numbers, cars or any object you like.
$P(x), Q(x), R(x), …$
Predicate
A predicate is a function that maps some variable to the value “true” or the value “false”.  Think of it as telling the reader whether or not some real-world object has some attribute.
$\exists$
There Exists
A quantifier that conveys that at least one member of some set has some attribute.
$\forall$
For All
A quantifier that conveys that all members of some set have some characteristic.
$\implies$
If... then...
A composition of two sentences.  The first is an assumption, which is called the antecedent.  The second is a conclusion, called the consequent.  Its structure is as follows: if antecedent is true, then the consequent is true.
$\neg$
Negation
A negation is an operation that can be applied to a sentence.  A negated sentence is false if the original sentence is true and vice-versa.

These are all the symbols that we need to translate what your weird friend said from English to logic.  There is two mores hoops to jump however.  The English word “unless” is translated to logic as an implication that has a negated antecedent.  Additionally, since "unless" is in the middle of the sentence, then the overall sentence is in the form of "then... if...".  Now, we can translate the sentence:

Let $P$ be a set of all people and $H(x), S(x)$ be predicates with the following definitions:

$H(x): x$ is fashionably hungry
$S(x): x$ eats Satsumas

Then the sentence “People are fashionably hungry unless they eat Satsumas” is translated to logic as:

$\forall x \in P, \neg S(x) \implies H(x)$


The meaning of this logical statement in English is “People that do not eat Satsumas are fashionably hungry" so therefore, your weird friend meant (2).