Tuesday 2 December 2014

Week 12: A Fun Induction Problem

Introducing a Fellow Slogger:


            Please note that this is not a problem solving episode (see Week 9).  This final week, the instructors introduce simple induction.  A fellow slogger gives a nice induction post under Week 12 here.


You can also find my comments to the following slog under Week 3:


            What is really cool about the induction post is that the slogger talks about how induction is used in a proof of the following statement: the determinant of any $n \times n$ identity matrix is $1$.  In fact, I am so excited that a fellow slogger uses CSC165 concepts for greater and better things that I feel compelled to follow the slogger's footsteps.  I will also use CSC165 concepts as a guide for a problem that is outside the scope of this course:

The Fun Problem:


            Problem:  Is the following statement true: the determinant of any upper triangular matrix is
            the product of its diagonal entries?

The truth is I have no clue.  In my Problem Solving Episode under Week 9 (under the heading Step 1: Understand the Problem), I work out examples of an idea that I do not fully understand to gain intuition.  We should work out examples for this problem too.

            First we know that an upper triangular matrix is a $n \times n$ matrix $A$ that is defined by:

$A_{ij} = \begin{cases} a_{ij} & i \leq j \\ 0 & i > j \end{cases}$.

Then the following matrices satisfy the definition nicely:

$\begin{bmatrix} 1 & 2 \\ 0 & 3 \end{bmatrix}, \begin{bmatrix} 4 & 5 & 6 \\ 0 & 7 & 8 \\ 0 & 0 & 9 \end{bmatrix}$.

Secondly, let $A$ be a $n \times n$ matrix and $F$ be a field.   Then the determinant is a function $det: M_{n \times n}(F) \to F$ that is defined by:

$det(A) = \begin{cases} A_{ij} & n = 1 \\ \sum\limits_{j= 1}^n (-1)^{i + j} A_{ij} det(\bar{A_{ij}}) & n \geq 2 \end{cases}$.

There is some fancy notation here but it is not difficult to understand.  For simplicity sakes, think of $F$ as the set of real numbers.  Also, the strange bar on top of $A_{ij}$ just means $A$ with both row $i$ and column $j$ removed.  We see that the determinant is recursively defined.  This may be useful later.

            To find the determinant of the above matrices, use the definition.  For the $2 \times 2$ matrix, we have:

$det(\begin{bmatrix} 1 & 2 \\ 0 & 3 \end{bmatrix}) = (-1)^{1 + 1}(1)(3) + (-1)^{1 + 2}(0) = 3$.

For the $3 \times 3$ matrix, we have:

$det(\begin{bmatrix} 4 & 5 & 6 \\ 0 & 7 & 8 \\ 0 & 0 & 9\end{bmatrix}) = $

$(-1)^{1 + 1}(4)det(\begin{bmatrix} 7 & 8 \\ 0 & 9 \end{bmatrix}) + (-1)^{1 + 2}(5)det(\begin{bmatrix} 0 & 8 \\ 0 & 9 \end{bmatrix}) + (-1)^{1 + 3}(6)det(\begin{bmatrix} 0 & 7 \\ 0 & 0 \end{bmatrix}) = $

$(4)(-1)^{1 + 1}(7)(8) + (-1)(5)(-1)^{1 + 1}(0)(9) + (5)(-1)^{1 + 1}(0)(0) =$

$(4)(7)(8) + 0 + 0 = 224$.

It seems like the statement is true and that we should try proving it.  Something interesting in the calculation is that the determinant of a matrix with a column of $0$s is $0$.  Again, this may be useful later.

The Attempt at a Solution:


            How should we prove the statement?  A direct proof comes to mind but remember that the determinant is recursively defined.  This makes it difficult to work with unless we know the determinant of a previous recursive case.  Proof by simple induction sounds attractive here.  The induction hypothesis may give us the determinant of a previous recursive case.  Let us try it.

            Since a $0 \times 0$ matrix is undefined, then we show that $\forall n \in N, n \geq 1 \implies$ the determinant of an upper triangular $n \times n$ matrix is the product of its diagonal entries.  By simple induction on $n$, we show that this statement is true.  The CSC165 proof structure for induction is our guide.

            For the base case, assume that $n = 1$.  Let $A$ be a $1 \times 1$ upper triangular matrix.  By the definition of the determinant, then $det(A) = A_{ij}$.  Since $A_{ij}$ is the only diagonal entry, then statement is true for $n = 1$.

            For the induction step, assume that $n \in N, n \geq 1$ and that the statement is true for $n$.  That will be our induction hypothesis.  We need to show that the statement is true for $n + 1$.  Let $A$ be a $(n + 1) \times (n + 1)$ upper triangular matrix.   By the definition of the determinant, we have that

$det(A) =  \sum\limits_{j = 1}^{n + 1} (-1)^{1 + j}A_{1j} det(\bar{A_{1j}})$.

            Remember this observation: the determinant of a matrix with a column of $0$s is $0$?  Turns out, that is because a square matrix with a column of $0$s is not invertible and some fancy theorem tells us that the determinant of such a matrix is $0$.

            Luckily, we make the following observation:

$det(A) = \sum\limits_{j = 1}^{n + 1} (-1)^{1 + j} A_{1j} det(\bar{A_{1j}}) =  (-1) ^{1 + 1}A_{11} det(\bar{A_{11}}) +  \sum\limits_{j = 2}^{n + 1} (-1)^{1 + j}A_{1j} det(\bar{A_{1j}})$.

Notice that the matrix $\bar{A_{1j}}$ is $A$ without row $1$ and without a column $j \geq 2$.  Minus the first element, column $1$ of $A$ is a column of $0$s  So the matrix $\bar{A_{1j}}$ also has a column of $0$s.  Thus, $det(\bar{A_{1j}}) = 0$  and

$det(A) = (-1) ^{1 + 1}A_{11} det(\bar{A_{11}}) +  \sum\limits_{j = 2}^{n + 1} (-1)^{1 + j}A_{1j} det(\bar{A_{1j}}) = (-1) ^{1 + 1}A_{11} det(\bar{A_{11}})$.

            The matrix $\bar{A_{11}}$ has both row $1$ and column $1$ removed but it is still an upper triangular $n \times n$ matrix.  By the induction hypothesis, then $det(\bar{A_{11}})$ is the product of its diagonal entries.  Notice that the diagonal entries of $det(\bar{A_{11}})$  are the diagonal entries of $A$ without $A_{11}$.  Since

$det(A) = (-1) ^{1 + 1}A_{11} det(\bar{A_{11}}) =  A_{11} det(\bar{A_{11}})$,


then we have $det(A)$ as the product of its diagonal entries.  Since $A$ is a $(n + 1) \times (n + 1)$ upper triangular matrix, then the statement is true for $n + 1$.  By simple induction, then $\forall n \in N, n \geq 1 \implies$ the statement is true for $n$.  So that is my attempt is at this fun problem.

Wednesday 26 November 2014

Week 11: Harder Time Complexity Problems

           There is no class this week so for fun, we go over a few harder time complexity problems.  Consider the sequences $f(n) = n^4 - 165n^3 - 240n^2 - 265n - 373$ and  $g(n) = 157n^3 + 257n^2 + 337n + 457$.

Problem 1: Show that $f \notin O(g)$ without using any limit definitions.

Problem 2:  Show that $f \in \Omega (g)$ without using any limit definitions.

Problem 1 Tips:


            The definition that we have is $f \in O(g)$, which is defined as:

$\exists c \in R^+, \exists B \in N, ( \forall n \in N, n \geq B \implies f(n) \leq g(n))$.

Thus, its negation is:

$\forall c \in R^+, \forall B \in N, ( \exists n \in N, n \geq B \wedge f(n) > g(n))$.

We must choose $n$.  Using the recommended strategies of my instructors, we get:

$n^4 - 165n^3 - 240n^2 - 265n - 373 \geq$

$n^4 - 1043n^3$

$???$

$c(1208n^3) \geq$

$c(157n^3 + 257n^2 + 337n + 457)$.

The $???$ part indicates the area where the strategies of my instructors do not cover.  So I think hard here (like really hard) and I come up with this.  All we must do is find a $n$ that satisfies the inequality $n^4 - 1043n^3 > c(1208n^3)$.  We have:

$n^4 - 1043n^3 > c(1208n^3)$

Move $ -1043n^3$ on the left side to the right and factor out $n^3$ from the resulting right:

$n^4 > n^3(c1208 + 1043)$

It looks like we will choose $n > 0$.  That way, we can divide both sides by $n^3$:
$n > c1208 + 1043$.

Our choice of $n$ is then $max(\lceil c1208 + 1043 \rceil + 1, B)$.

Problem 2 Tips:


            Likewise, we show that $f \in \Omega (g)$, which is defined as:

$\exists c \in R^+, \exists B \in N, ( \forall n \in N, n \geq B \implies f(n) \geq g(n))$.

We must choose $c, B$.  We have a similar inequality to problem 1:

$n^4 - 1043n^3 \geq 1208n^3$

Again, we choose $B >0$ so that we can divide both sides by $n^3$.  Then add both sides by $1043$ to yield:

$n \geq 2051$.

Of course, to choose $B = 2051$, we must choose $c = 1$.  Otherwise, the first step is irreversible.

Tuesday 18 November 2014

Week 10: Computability and Confusion

            Computability is the first topic in CSC165 that is new to me.  It is both interesting and somewhat confusing.  Unfortunately, any confusion is undesirable at this point in the course.  After thinking about my confusion, I realize that the halting problem is the culprit.  Why is this?

Elementary Definitions and Theorems:


            First, the stuff that makes sense.  Consider a set of inputs $I$ and a set of outputs $O$.  Computability begins with three relatively simple definitions and a single theorem.

Definition 1:  Let $f: I \to O$ be a function.  Then $f$ is well-defined $\iff$ for each $x \in I$, there is a unique $y \in O$ such that $f(x) = y$.


For remaining definitions and theorems, let $g, h: I \to O$ be well-defined functions.

Definition 2:  Then $g$ is computable $\iff$ there is a program $P$ such that $\forall x \in I, g(x) = P(x)$.

Definition 3:  Then $g$ reduces to $h \iff g$ can be implemented with $h$.  Below is what $g, h$ look like in Python code if $g$ reduces to $h$.  Let $P$ be the implementation of $g$ and $Q$ be the implementation of $h$.

            def P():
            """An implementation of the function f."""
                        Q():

Theorem 1:  Let $g$ reduce to $h$.  Then $h$ is computable $\implies g$ is computable.

The Confusing Part:


            Now, the stuff that is not so clear.  The function halt(f, i) tells us if a function $f$ with an input $i$ halts.  The halting function is not computable.  The proof of this statement starts simple enough:  Assume to the contrary that there is an implementation of halt(f, i) that works for all functions and inputs.  Then consider the function what_is_love(f):

            def what_is_love(f):

                        if halt(f, f):
                                    while True:
                                                print(“Look to the skies.”)
                        else:
                                    print(“Nobody knows.”)

The “while True” statement is the incomprehensible part of the proof.  I ask in lecture why it is there.  The instructor lets a classmate several rows ahead of me answer my question.  The explanation takes nearly half a minute.  I did not get any of it.  Apparently, the instructor did not have any problems with the explanation nor did my other classmates.

The Comprehending Part:


            It takes several days for me to understand why the “while True” statement is there.  Eventually, I look at what we are proving.  We assume that there is an implementation of halt(f, i) that works for all functions and inputs.  To disprove, we only have to show that one function and input does not work for halt(f, i).  This is a classic existential proof.  We are free to construct the function what_is_love(f) however we want to so long as it disproves the computability of halt(f, i).

            The “while True” statement guarantees that what_is_love(what_is_love) goes into an infinite loop if halt(what_is_love, what_is_love) returns True.  Likewise, what_is_love(what_is_love) halts if halt(what_is_love, what_is_love) returns False.  This is the contradiction and it disproves the computability of halt(f, i).  I wonder why that classmate who answered my question did not explain it like this....

Wednesday 12 November 2014

Week 9: Problem Solving Episode: The Stockbroker's Algorithm

Context


            So you graduate from UofT and land your dream job.  You work a few years and realize there is this thing called inflation.  So you get a brokerage firm to invest your savings.

            To guide its investment decisions, the firm uses the Stockbroker’s algorithm, which inputs a list of real numbers and outputs the max sum over a slice of the list. 

            Unfortunately, the firm uses the following implementation of the algorithm, which is called John Smith’s Sum as nobody knows the name of the employee who wrote it:

            def john_smith_sum(L: list) -> int:
                        """
                        Return the maximum sum over a slice of L in cubic time.
                        """

                        max = front_index = 0
                        n = len(L)

                        while front_index < n:
                                    end_index = front_index
     
                                    while end_index <= n:
                                                middle_index = front_index
                                                sum = 0
               
                                                while middle_index < end_index:
                                                            sum += L[middle_index]
                                                            middle_index += 1

                                                if sum > max:
                                                max = sum

                                                end_index += 1

                                    front_index += 1

                        return max

The runtime of john_smith_sum is cubic.  This means that it is agonizingly slow for large inputs and that you will miss out on great investments.  So you decide to do the following:

Main Problem:  Come up with an implementation of the Stockbroker’s algorithm with a linear runtime.

You will then give it to the firm because you are very generous person.  First things first; how will you come up with an implementation?

Step 1:  Understand the Problem


            Polya’s How to Solve It recommends understanding the problem first.  We have to meet two goals: our implementation should be correct and linear in its runtime.  Let us answer the following to see if we know what these goals mean:

            (1) What does an iterative algorithm with a linear runtime look like?
            (2) What is the correct output of the Stockbroker’s algorithm for different inputs?

            For (1), a single loop and a monotone loop index should suffice.  For (2), work out examples:  Consider the list $[3, 1, 4]$.  The max sum is $8$ so that is what our implementation should return.  How about $[1, -2, 3, -4, 5, -6]$?  Each negative number makes all the preceding partial sums negative so the max sum is $5$ here.  What about $[3, -1, 4, -1, 5]$?.  We may be tempted to look only at positive numbers but we see that adding $-1$ after $3$ and $4$ still keeps the partial sum positive.  Hence, the max sum is $10$.

Step 2: Devise a Plan


 “No plan survives contact with the enemy” – The Desert Fox

            There are a couple of approaches to this problem.  One is to assume an arbitrary list of real numbers as the input.  We write a single implementation for that input.  However, this is a little hard as we have to juggle negative and non-negative numbers simultaneously.

            A second approach is to implement by cases through the lists in (1).  Then combine them into a single implementation.  However, we are lazy and we do not want to write multiple unrelated implementations.

            Check out this approach.  We write a single implementation that works for a simple input and then extend it to all inputs.  This may be the best approach as there appears to be only a few cases.  Hopefully, this assumption is correct.  Then this approach is our plan.

Step 3: Carry out the Plan


            This is the part where we execute the plan so try not to lose your head.  Of the three lists in (1), the easiest one to write an implementation for is $[3, 1, 4]$ as all the elements are non-negative.  By delegating a variable to store the max sum, then we have version one of our implementation:

My Sum Version 1:

            def my_sum(L: list) -> int:
                        """
                        Return the maximum sum over a slice of L in linear time.
                        """

                        n = len(L)
                        sum = max = index = 0

                        while index < n:
                                    element = L[index]
                                    sum += element    

                                    if max < sum:
                                                max = sum

                                    index += 1

                        return max

            This version features a single loop with a monotone increasing loop index so we satisfy the linear runtime component.  Unfortunately, this version only works if all the elements are positive.  It falters for lists of the following nature: $[1, -2, 3, -4, 5, -6]$.  So we break down each iteration into two cases: one for negative and one for nonnegative.  This leads to version two of our implementation:
                                                                                     
My Sum Version 2:

            def my_sum(L: list) -> int:
                        """
                        Return the maximum sum over a slice of L in linear time.
                        """

                        n = len(L)
                        sum = max = index = 0

                        while index < n:
                                    element = L[index]

                                    if element >= 0:
                                                sum += element

                                    else:
                                                sum = 0           

                                    if max < sum:
                                                max = sum

                                    index += 1

                        return max

            This version resets the sum once it iterates over a negative number.  The linear runtime is still preserved.  However, consider the following list: $[3, -1, 4, -1, 5]$.  The element $-1$ is not enough to reduce the sum to below zero.  Then we should not reset the sum unless it is below zero.  This also allows us to eliminate some code: the negative and non-negative checks and the memoization step where L[index] is assigned to the variable element:

My Sum Version 3:

            def my_sum(L: list) -> int:
                        """
                        Return the maximum sum over a slice of L in linear time.
                        """

                        n = len(L)
                        sum = max = index = 0

                        while index < n:
                                    sum += L[index]
                                    
                                    if sum < 0:
                                                sum = 0           

                                    if max < sum:
                                                max = sum

                                    index += 1

                        return max

Step 4: Look Back


            Version three is our final version but our obsessive tendencies force us to double check its correctness and runtime.  Consider the test code below.  It checks that my_sum and john_smith_sum have matching outputs for each input.

            Verify this by running the code from the link on Python.  It contains the code for john_smith sum, the final version of my_sum and the test cases:


Test Cases:

            if __name__ == "__main__":
                        L = [[3, 1, 4], [3, -1, 4, 1], [3, -1, 4, -1, 5], [3, 1, -4, 1, -5, -9],
                                                [-3, -1, -4], [1, -2, 3, -4, 5, -6], [3, -5, 7, 1, -2, 0, 3, -2, 1]]

                        for x in L:
                                    assert john_smith_sum(x) == my_sum(x)

            Likewise for the runtime, the loop executes $7$ steps for each iteration in the worst case so that is $7n$ steps in total.  Add the steps outside the loop plus the loop guard and we have $7n + 4$ as the runtime, which is linear.  Hence, we have a credible implementation of the Stockbroker's algorithm. 

Tuesday 4 November 2014

Week 8: Time Complexity

            The post in Week 7 talks about algorithms being linear, quadratic or cubic.  Such descriptions are neither precise nor fancy enough keep computer scientists satisfied.  So we have this thing called the time complexity of an algorithm, which describes the growth in runtime over the growth in input size.

Here Comes the Notation!


            Time complexity is described in Big O.  It looks something like this.  Let $f: N \to R^{\geq 0}$ be a sequence that describes the runtime of an algorithm $P$ and $g: N \to R^{\geq 0}$ be a sequence that describes an arbitrary runtime (i.e. linear, quadratic, cubic, etc).  If we say that $f \in O(g)$, then we mean that:

\[\exists c \in R^+, \exists B \in N, (\forall n \in N, n \geq B \implies f(n) \leq cg(n))\]

Intuitively, what these little symbols say is that as the input becomes arbitrarily large, then the runtime of $P$ performs better than or as well as $g$.  Likewise, if we say that $f \in \Omega (g)$, then we mean that:

\[\exists c \in R^+, \exists B \in N, (\forall n \in N, n \geq B \implies f(n) \geq cg(n))\]

Again, this is just a fancy way of saying that as the input becomes arbitrarily large, then the runtime of $P$ performs worse than or as well as $g$.

A Fun Problem


            Let $f: N \to R^{\geq 0}$ be a polynomial sequence of degree $m$ and $g: N \to R^{\geq 0}$ be a polynomial sequence of degree $n$ with $m, n \in R$.  Find all $m, n$ such that $f \in O(g)$ and $(f \times g) \in \Omega(f)$.

Monday 27 October 2014

Week 7: Nice Mathematical Toys in Algorithm Analysis

            Constant, linear, quadratic, logarithmic or linearithmic are terms that vaguely describe the worst case runtime of algorithms in CSC148 but they are not precise enough in CSC165.  We need the exact number of steps.  This calls for some nice mathematical toys, which we introduce with the following algorithm:
           
            def mystery_sort(L: list):
                        “””
                        Sort the elements of L in mysterious order.
                        “”” 

                        n = len(L)
                        outer_index  = 1

                        while outer_index < n:
                                    value = L[outer_index]
                                    inner_index = outer_index

                                    while inner_index > 0 and A[inner_index – 1] < value:
                                                A[inner_index] = A[inner_index – 1]
                                                inner_index = inner_index – 1

                                    A[inner_index] = value
                                    outer_index = outer_index + 1

            Since mystery_sort has a nested loop with monotone loop indexes, then its worst-case runtime is quadratic.  What is less obvious is the exact number of steps that it takes for mystery_sort to run in the worst case.   Our procedure to count mystery_sort is as follows:

            (1) Assume that each line represents one step to make the analysis easier.
            (2) Count from the inner loop to the outer loop.
            (3) Add up the steps that are outside the outer loop.

So assume (1).  Then do (2) and (3).

Inner Loop


            To count the steps of the inner loop, our first mathematical toy comes into play: abstraction.  It helps us determine the number of times that the inner loop executes.  Observe that the inner loop depends on the outer_index.  However, the outer_index is not concrete since it changes each time the outer loop executes.  So we must think abstractly; use an arbitrary integer $i$ to represent the outer_index.  Then $i$ becomes the number of times that the inner loop executes.  Now the easy part: the inner loop performs three steps for each iteration.  Since the inner loop executes $i$ times, then the number of steps of the inner loop is $3i$.

Outer Loop


           In addition to the number of steps of the inner loop, we have the following steps inside each outer loop:

            (1) while outer_index < n
            (2) value = L[outer_index]
            (3) inner_index = outer_index
            (4) loop guard of the inner loop
            (5) A[inner_index] = value
            (6) outer_index = outer_index + 1

This is $6$ steps in total.  Add it to the steps of the inner loop and you see that each iteration of the outer loop executes $3i + 5$ steps.  We still have to add up the steps for each iteration so here our second mathematical toy comes into play: summation.  Since the outer loop runs $n - 1$ times, then the total steps of the outer loop is the following summation:

\[\sum_{i = 1}^{n - 1} (3i + 6)\].

            To find the sum of this series, our mathematical final toy comes and it is Euler’s trick.  This is a technique for finding the sum of an arithmetic series.  First decompose the summation:

\[\sum_{i = 1}^{n - 1} (3i + 6) = 3 \sum_{i = 1}^{n - 1} i + \sum_{i = 1}^{n - 1} 6\]

Then perform the following calculation:

\[\sum_{i = 1}^{n - 1} i = 1 + 2 + 3 + … + (n - 3) + (n - 2) + (n - 1) =\]
\[\frac{1}{2}(1 + 2 + 3 + … + (n - 3) + (n - 2) + (n - 1) + \]
\[(n - 1) + (n - 2) + (n - 3) + … + 3 + 2 + 1) =\]
\[\frac{n(n - 1)}{2} = \frac{n^2 - n}{2}\]

Substitute back into the series and we get the number of steps that outer loop executes:

\[\frac{3n^2 - 3n + 12n - 12}{2} = \]
\[\frac{3n^2 - 9n - 12}{2}\]

Outside Both Loops


There are still the following steps to consider:

(1) n = len(L)
(2) outer_index = 1
(3) loop guard of the outer loop

This is $3$ steps so the total number of steps that the algorithm performs is:

\[\frac{3n^2 - 9n - 6}{2}\]

Monday 20 October 2014

Week 6: $\epsilon - \delta$ Definition of Limit

            As someone who has no prior experience with the $\epsilon - \delta$ definition of limit, I will teach myself about them with the help of my favourite math textbook and some creativity.  This is what I gleaned so far.

            In high school, we calculate limits.  In university, we like to think that we are sophisticated thinkers. Knowing how to calculate is not enough for us.  We must know why our calculation is correct.  Then we need a definition to work with.  This is where the the $\epsilon - \delta$ definition of limit comes in:

            Let $f:X \to R$ be a function, $L \in R$, $a \in X$ and $X \subseteq R$.  Then

\[\lim_{x \to a} f(x) = L \iff\]
\[\forall \epsilon \in R^+, \exists \delta \in R^+, (\forall x \in R, |x - a| < \delta \implies |f(x) - L| < \epsilon).\]

After digesting this definition, we can prove or disprove statements about the limits of functions.
For example, consider the following statement:

\[\lim_{x \to 2} x^2 = 4\]

By substituting two into $x$, we discover that this statement should be true.  To show that our hunch is correct, we plug in $a = 2$ and $L = 4$ into the $\epsilon - \delta$ definition and prove it:

\[\forall \epsilon \in R^+, \exists \delta \in R^+, (\forall x \in R, |x - 2| < \delta \implies |x^2 – 4| < \epsilon)\]

Our value for $\epsilon$ will be arbitrary but our value for $\delta$ must be concrete since $\epsilon$ has a $\forall$ quantifier but $\delta$ has a $\exists$ quantifier.  The hard part is choosing $\delta$.  How do we do it?

Where to Start


            Think of $\epsilon$ as a distance from $L$ on the vertical axis and $\delta$ as a distance from $2$ on the horizontal axis.  Once the distance of $x$ from $2$ is less than $\delta$ on the horizontal, then the $\epsilon – \delta $ definition of limit guarantees that the distance of $x^2$ from $4$ is less than $\epsilon$ on the vertical.  Hence, our choice of $\delta$ must satisfy the definition for all $\epsilon$.  To achieve this, we express $\delta$ in terms of $\epsilon$, which is done by working backwards from the definition.

How to Work Backwards


       Since the definition is an implication, then working backwards begins with the consequent:

$|x^2 - 4| < \epsilon$

Since we operate in $R$, then we can manipulate the left side of the inequality:

$|x + 2||x - 2| = |x^2 - 4| < \epsilon$

We see that the $|x - 2|$ in $|x + 2||x - 2| < \epsilon$ will be useful in getting the antecedent of the definition, which is $|x - 2| < \delta$.  However, there is still a bit of baggage to unload.

The Creative Step


            Since we choose our $\delta$, then we can give it an upper bound.  For simplicity, we choose $1$ as an upper bound for $\delta$.  So if $|x - 2| < d$, then $|x - 2| < 1$.

            From $|x - 2| < 1$, the absolute value definition gives $-1 < x - 2 < 1$.  By adding $4$ to all sides of the inequality, then we have $3 < x + 2 < 5$.  Use the absolute value definition again, which gives us $|x + 2| < 5$.

Now we can unload $|x + 2|$ from $|x + 2||x - 2| < \epsilon$.  Since $|x + 2| < 5$, then $|x + 2||x - 2| < 5|x - 2|$.  Then $5|x - 2| < e$ is equivalent to $|x - 2| < \frac{\epsilon}{5}$.

Endgame


We can now chose $\delta$.  If $|x - 2| < \frac{\epsilon}{5}$ is true, then its equivalent statement $5|x - 2| < e$ guarantees that $|x - 2||x + 2| = |x^2 - 4| < \epsilon$ since $|x - 2| < 5$.  Yet we chose $1$ as an upper bound for $\delta$.  So if $\frac{\epsilon}{5}>1$, then $\delta = 1$.  Hence our choice of $\delta$ is $min(1, \frac{\epsilon}{5})$.  The proof follows naturally from the scratch work so it will be omitted.