MATH2070: LAB 4: Newton's method




Introduction Exercise 1
Stopping Tests Exercise 2
Failure  
Introduction to Newton's Method Exercise 3
Writing Matlab code for functions Exercise 4
Choice of initial guess Exercise 5
Non-quadratic convergence Exercise 6
No root Exercise 7
Complex functions Exercise 8
Unpredictable convergence Exercise 9
Newton's method revisited Exercise 10


Introduction

The bisection method is very reliable, but it can be relatively slow, and it does not generalize easily to more than one dimension. In this lab we will look at Newton's method for finding roots of functions. Newton's method can be much faster than bisection, but it requires formulæ for both the function and its derivative, and it can easily fail. Nonetheless, it is a workhorse method in numerical analysis.

We will be taking three sessions to complete this lab. If you print this lab, you may find the pdf version more appropriate.


Stopping Tests

Root finding routines check after each step to see whether the current result is good enough. The tests that are made are called ``termination conditions'' or ``stopping tests''. Common tests include:

Residual size $ \vert f(x)\vert< \epsilon$
Increment size $ \vert x_{\mbox{new}} - x_{\mbox{old}} \vert < \epsilon$
Number of iterations: it_count > ITMAX

You should note that the true error, $ x_{\mbox{approx}}-x_{\mbox{exact}}$ is not included among the stopping tests. This is because you would need to know the exact solution to use it.


Failure

You know that the bisection method is very reliable and rarely fails but always takes a (sometimes large) fixed number of steps. Newton's method works more rapidly a good deal of the time, but does fail. And Newton's method works in more than one dimension. One objective of this lab will be to see how Newton can fail and get a flavor of what might be done about it.

Although we will seem to spend a lot of time looking at failures, you should still expect that Newton's method will converge most of the time. It is just that when it converges there is nothing special to say, but when it fails to converge there is always the interesting questions of, ``Why?'' and ``How can I remedy it?''


Introduction to Newton's Method

You will find the definition of Newton's method in Atkinson (Section 2.2) and in the lectures, along with convergence theorems and the like. The idea of Newton's method is that, starting from a guessed point $ x_0$, find the straight line that passes through the point $ (x_0,f(x_0))$ and has slope $ f'(x_0)$. The next iterate, $ x_1$ is simply the location that this straight line intersects the $ x$-axis.

A ``quick and dirty'' derivation of the formula can be taken from the definition of the derivative of a function. Assuming you are at the point $ x^{(k)}$, (I am using superscripts in parentheses to denote iteration to reduce confusion between iteration number and vector components.)

$\displaystyle \frac{f(x^{(k)}+\Delta x)-f(x^{(k)})}{\Delta x}\approx f'(x^{(k)}).$

If $ f$ were linear, this approximate equality would be true equality and the next iterate would be given by $ x^{(k+1)}=x^{(k)}+\Delta x$ and $ f(x^{(k+1)})=0$. Hence, the formula becomes

$\displaystyle \frac{-f(x^{(k)})}{\Delta x}\approx f'(x^{(k)}),$

or

$\displaystyle \Delta x=x^{(k+1)}-x^{(k)}=-\frac{f(x^{(k)})}{f'(x^{(k)})}.$ (1)

As you know, Newton's method also will work for vectors, so long as the derivative is properly handled. Assume that $ {\bf x}$ and $ {\bf f}$ are $ n$-vectors. Then the Jacobian matrix is the matrix $ J$ whose elements are

$\displaystyle J_{ij}=\frac{\partial f_i}{\partial x_j}.$

(Here, $ i=1$ for the first row of $ J$, $ i=2$ for the second row of $ J$, etc., so that the Matlab matrix subscripts correspond to the usual ones.) Thus, Newton's method for vector functions can be written

$\displaystyle {\bf x}^{(k+1)}-{\bf x}^{(k)}=-J({\bf x}^{(k)})^{-1}{\bf f}({\bf x}^{(k)}).$ (2)

Note: The Matlab backslash operator will be used instead of the inverse operator because it is much faster.


Writing Matlab code for functions

Newton's method requires both the function value and its derivative, unlike the bisection method that requireds only the function value. Matlab provides syntax for returning two or more values from a function. The f1.m file from the previous lab can be modified in the following way to return both the value (y) and its derivative (yprime).

function [y,yprime]=f1(x)
% [y,yprime]=f1(x) computes y=x^2-9 and its derivative, 
% yprime=2*x

% your name and the date

y=x.^2-9;
yprime=2*x;
Conveniently, this modified version could be used for the bisection method also because Matlab returns only the first value (y) unless the second is explicitly used. If this new function is used in code as if it had a single result y, then the value yprime is computed and discarded, so the modified function will continue to work in the old way. If both y and yprime are needed, as in Newton's method, then syntax like
[z z1]=f1(1)
will return z=-8 and z1=2. When it is necessary to call the function f1 from inside a function such as newton, you must use Matlab's feval function. The notation for this is similar:
[z z1]=feval('f1',1)
will again return z=-8 and z1=2.

In the following exercises you will write a version of Newton's method and apply it to several test functions, including the three from the previous lab. The first step is to modify each of those three functions to return both values of both the function and its derivative

Exercise 1:
  1. Modify each of the three function m-files f1.m, f2.m and f3.m to return both the function value and that of its derivative. the functions are:
     f1:  y=x^2-9           yprime=???
     f2:  y=x^6-x-1         yprime=???
     f3:  y=2*x-exp(-x)     yprime=???
    
  2. Use the help f1 command to confirm that your comments include at least the formula for f1, and similarly for f2 and f3.

Remark: There are several other programming strategies for computing derivatives. One might be to compute the derivative value in a new m-file with a new name such as df. Another way would be to compute the derivative using some sort of divided difference formula. This latter is useful for very complicated functions. For simple functions, Matlab provides ways of putting formulas in-line without making separate m-files. You could use Matlab's symbolic capabilities to symbolically differentiate the functions, too. Finally, there are automated ways to discover the formula for a derivative from the m-file or symbolic formula defining the function, and these can be used to generate the m-file for df automatically.

In the next exercise, you will get down to the task of writing Newton's method as a function m-file. This m-file is shorter than bisect.m and the instructions take you through it step by step.

Exercise 2:
  1. Open a new, empty m-file named newton.m. You can use either the menus or the command edit newton.m.
  2. Start off the function with its signature line and follow that with comments. The comments should repeat the signature as a comment, contain a line or two explaining what the function does, explain what each of the parameters mean, and include your name and the date.
    function [x,numIts,increment]=newton(f,x)
    % [x,numIts,increment]=newton(f,x)
    % f is the name of a function with signature [y,yprime]=f(x)
    % on input, x is the initial guess
    % on output, x is the final solution
    % numIts is ???
    % increment is ???
    % Newton's method is used to find x so that f(x)=0
    
    % your name and the date
    
    (You may have to complete this exercise before coming back to explain the parameters.)
  3. Mathematically speaking, the stopping criterion and its size are very important. From a programming standpoint, the iteration should also be limited to a fixed (large) number of steps. The reason for this is that a loop that never terminates appears to you as unresponsive. It is customary to give these ``fixed'' parameters names and use all upper-case to spell them out. Add the following statements to your file:
    EPSILON = 1.0e-6;
    MAXITS  = 500;
    
  4. Some people like to use while loops so that the stopping criterion appears at the beginning of the loop. I feel that if you are going to count the iterations you may as well use a for statement. Start off the loop with
    for numIts=1:MAXITS
    
  5. Evaluate the function and its derivative at the current point x using feval, much as was done in bisect.m. Remember that it is customary to indent the statements within a loop. You may choose the variable names as you please.
  6. Define a Matlab variable increment as the ratio of the function value divided by the value of its derivative. This is (except for the minus sign) the right side of Equation (1).
  7. Complete Equation (1) with the statement
      x = x - increment;
    
  8. Finish the loop and the m-file with the following statements
      disp(strcat(num2str(numIts), ' x=', num2str(x), ' increment=', ...
        num2str(increment)));
    
      if abs(increment)<EPSILON
        break;
      end
    end
    

    The break statement causes the loop to be completed before all MAXITS iterations are complete. Note that if newton ever returns with numIts having the value 500 (or whatever you have MAXITS set to), then Newton's method has failed.

    The disp statement prints out some useful numbers during the iteration. For the purpose of this lab, leave this printing in the file, but when the function is being used as part of a calculation, it should not print extraneous information.

  9. Test that your comments are complete with the help newton command.
  10. Test your newton function on the function f1 that you wrote in the previous exercise. Start from an initial guess of x=0.1. The correct solution, of course, is x=3.
  11. What is the true error in your approximate solution, $ \vert x-3\vert$? Is it comparable in size with EPSILON?
  12. Try again, starting at x=-0.1, you should get x=-3.
  13. Fill in the following table.
    Name Formula       guess  approxRoot No. Steps
     
     f1  x^2-9         0.1    _________  _________
     f2  x^6-x-1        10    _________  _________
     f3  2*x-exp(-x)    10    _________  _________
    

Remark: The theoretical convergence rate of Newton's method is quadratic. You can see this rate nicely if you look at the final several values of increment for the case of f2. This rate depends on an accurate computation of the derivative. Even a minor error in the derivative can yield a method that does not converge very quickly or that diverges. In the above table, none of the cases should have taken as many as 25 iterations. If they did, check your derivative formula. (As a general rule, if you apply Newton's method and it takes hundreds or thousands of iterations but it does converge, check your derivative calculation very carefully. In the overwhelming majority of cases you will find a bug there.)


Choice of initial guess

The theorems about Newton's method generally start off with the assumption that the initial guess is ``close enough'' to the solution. Since you don't know the solution when you start, how do you know when it is ``close enough?'' In one-dimensional problems, the answer is basically that if you stay away from places where the derivative is zero, then any number is OK. More to the point, if you know that the solution lies in some interval and $ f'(x)\ne0$ on that interval, then the Newton iteration will converge to the solution. When there are zeros of the derivative nearby, Newton's method can display highly erratic behavior and may or may not converge.

Exercise 3:
  1. In your newton.m file, change MAXITS from 500 to 2500.
  2. Write a function m-file for the cosmx function used in the previous lab ( $ f(x)=\cos x - x$). Be sure to calculate both the function and its derivative, as we did for f1, f2 and f3.
  3. Use newton to find the root of cosmx starting from the initial value x=0.1. What is the solution and how many iterations did it take?
  4. Again, use newton to find the root of cosmx, but start from the initial value x=12. Note that $ 3\pi<12<4\pi$, so there are several peaks and valleys between the initial guess and the root. Does it locate a solution in less than 2500 iterations? How many iterations does it take? Does it get the same root as before?
  5. Look at the sequence of values of x that Newton's method chose when starting from x=12. There is no real pattern to the numbers, and it is pure chance that finally put the iteration near the root. Once it is ``near enough,'' of course, it finds the root quickly.


Non-quadratic convergence

The proofs presented in class show that quadratic convergence depends on the ratio $ \frac{f''}{2f'}$ being finite at the solution. In most cases, this means that $ f'\ne 0$. When this condition fails at the desired solution, the convergence rate deteriorates to linear. You will see this illustrated in the following exercises.

Exercise 4:
  1. Write a function m-file for the function f4=x^2, returning both the function and its derivative.
  2. Eliminate the disp statement from newton so that so many printed lines are not a distraction.
  3. You recall using newton to find a root of f1=x^2-9. How many iterations did it take?
  4. Now try finding the root (x=0) of f4 using Newton, again starting at x=0.1. How many iterations does it take? Since the exact answer is zero, the error is simply the value of x. Is the true error larger than EPSILON or smaller?
  5. Now try finding the root (x=0) of f5=x^20 using Newton, again starting at x=0.1. How many iterations does it take? Is the true error larger than EPSILON or smaller?

In practice, if you don't know the root, how can you know whether or not the derivative is zero at the root? You can't, really, but you can tell if the convergence rate is deteriorating. Suppose you keep track of the two ratios

$\displaystyle r_1$ $\displaystyle =\bigl\vert\frac{\Delta x^{(k+1)}}{\Delta x^{(k)}}\bigr\vert,$    and (3)
$\displaystyle r_2$ $\displaystyle =\bigl\vert\frac{\Delta x^{(k+1)}}{(\Delta x^{(k)})^2}\bigr\vert.$ (4)

If the convergence is quadratic, $ r_2$ will remain bounded and $ r_1$ will approach zero. If the convergence deteriorates to linear, $ r_2$ will become unbounded and $ r_1$ will remain larger than zero.

Exercise 5:
  1. Make a copy of newton.m and call it newton1.m. Just before the beginning of the loop, insert the following statement
    increment=1;  % this is a dummy value
    
    Just before setting the value of increment inside the loop, save increment in oldIncrement:
      oldIncrement=increment;
    
  2. After the statement x=x-increment;, compute r1 and r2 according to (3) and (4) respectively.
  3. In addition, change the disp statement to display the values of r1 and r2.
  4. Use newton1 to fill in the following table, using the final values at convergence, starting from x=0.1. Enter the limiting values of r1 and r2 and use the term ``unbounded'' when the ratio has no bound.
    Function   numIts   r1       r2
    
    f1=x^2-9   ______  ______  ______
    f4=x^2     ______  ______  ______
    f5=x^20    ______  ______  ______
    

This exercise shows that quadratic convergence sometimes fails, usually resulting in a linear convergence rate, and you can estimate the rate. This is not always a bad situation-it is converging after all-but now the stopping criterion is not so good, as you saw in Exercise 4 above. In practice, it is usually as important to know that you are reliably close to the answer as it is to get the answer in the first place.

In the cases of $ x^2$ and $ x^{20}$, you saw that $ r_1$ turned out to be a constant, not merely bounded above and below. In that case,

$\displaystyle x^{(\infty)}=x^{(n)}+\sum_{k=n}^{\infty} \Delta x^{(k)}$

because it converges and the partial sums are collapsing sums. I have denoted the exact root by $ x^{(\infty)}$. Because $ r_1$ is constant, for $ k>n$, $ \Delta x^{(k)}=r_1^{k-n}\Delta x^{(n)}$. Hence,

$\displaystyle x^{(\infty)}$ $\displaystyle =x^{(n)}+\sum_{k=n}^{\infty} \Delta x^{(k)}$    
  $\displaystyle =x^{(n)}+\frac{\Delta x^{(n)}}{1-r_1}$    

Exercise 6:
  1. Conveniently, r1 either goes to zero or remains bounded. If the sequence converges, r1 should remain below 1, or at least its average should remain below 1. Change the convergence criterion in newton1 so that it reads
      if abs(increment)<EPSILON*(1-r1)
        break;
      end
    
    Note: I have multiplied through by $ (1-r_1)$ rather than divided by it. This avoids the problems that occur when $ r_1\ge1$.
  2. Use newton1 to fill in the following table, where you can compute the absolute value of the true error because you can easily guess the exact solution.
    Function   numIts     abs. error 
    
    f1=x^2-9   ______     ______
    f4=x^2     ______     ______
    f5=x^20    ______     ______
    
    You should see that the modified convergence does not harm quadratic convergence and greatly improves the estimated error in the linear case.

Warning: This modification of the stopping criterion is very nice when $ r_1$ settles down to a constant value quickly. In real problems, a great deal of care must be taken because $ r_1$ can cycle among several values, some larger than 1, or it can take a long time to settle down.


No root

Sometimes people become so confident of their computational tools that they attempt the impossible. What would happen if you attempted to find a root of a function that had no roots? Basically, the same kind of behavior occurs as when there are zeros of the derivative between the initial guess and the root.

Exercise 7:
  1. You will want to reduce MAXITS to 500 again, and make the disp statement active again in newton for this exercise.
  2. Write the usual function m-file for f6=x^2+9.
  3. Apply newton to the function f6=x^2+9, starting from x=0.1. Describe what happens.
  4. Comment out the disp statements in newton.m again.


Complex functions

But the function $ x^2+9$ does have roots! The roots are complex, but Matlab knows how to do complex arithmetic. (Actually the roots are imaginary, but it is all the same to Matlab.) All Matlab needs is to be reminded to use complex arithmetic.

Warning: If you have used the letter i as a variable in your Matlab session, its special value as the square root of -1 has been obscured. To eliminate the values you might have given it and return to its special value, use the command clear i. It is always safe to use this command, so I always use it just before I start to use complex numbers. For those people who prefer to use j for the imaginary unit, Matlab understands that one, too.

Exercise 8:
  1. Apply newton to f6=x^2+9, starting from the initial guess x=1+i. You should get the result 0+3*i and it should take fewer than 10 iterations.
  2. Fill in the following table by using newton to find a root of f6 from various initial guesses. The exact root, as you can see, is $ \pm 3i$.
    Initial guess     numIts      error
    1+i               _______   _________
    1-i               _______   _________
    10+5*i            _______   _________
    10+eps*i          _______   _________
    
    (Recall that eps is the smallest number you can add to 1.0 and still change it.)

What happened with that last case in Exercise 8? The derivative is $ f_6'(x)=2x$ and is not zero near $ x=10+eps*i\approx10$. In fact, the derivative is zero only at the origin. The origin is not near the initial guess nor either root. It turns out that the complex plane is just that: a plane. While Newton's method works in two or more dimensions, it is harder to see when it is going to have problems and when not. We will elaborate a bit in the next section and also return to this issue in the next lab.


Unpredictable convergence

The earlier, one-dimensional cases presented in this lab might lead you to think that there is some theory relating the initial guess and the final root found using Newton's method. For example, it is natural to expect that Newton's method will converge to the root nearest the initial guess. This is not true in general! In the exercise below, you will see that it is not possible, in general, to predict which of several roots will arise starting from a particular initial guess.

Consider the function $ f_7(z)=z^4+1$, where $ z=x+iy$ is a complex number with real part $ x$ and imaginary part $ y$. This function has the following four roots.

$\displaystyle \omega_1$ $\displaystyle =(1+i)/\sqrt2$    
$\displaystyle \omega_2$ $\displaystyle =(-1+i)/\sqrt2$    
$\displaystyle \omega_3$ $\displaystyle =(-1-i)/\sqrt2$    
$\displaystyle \omega_4$ $\displaystyle =(1-i)/\sqrt2$    

In the following exercise, you will choose a number of regularly-spaced points in the square given by $ \vert x\vert\le2$ and $ \vert y\vert\le2$ and then use your newton.m function to solve $ f(z)=0$ starting from each of those points. The surprise comes in seeing which of the four roots arises from which of the initial guesses.

Exercise 9:
  1. Be sure your newton.m file does not have any active disp statements in it.
  2. Create a function m-file named f7.m that computes the function $ f_7(z)$ above as well as its derivative.
  3. Copy the following code to a script m-file named exer4_9.m.
    NPTS=100;
    clear i
    x=linspace(-2,2,NPTS);
    y=linspace(-2,2,NPTS);
    omega(1)= ( 1+i)/sqrt(2);
    omega(2)= (-1+i)/sqrt(2);
    omega(3)= (-1-i)/sqrt(2);
    omega(4)= ( 1-i)/sqrt(2);
    
    hold on
    for k=1:NPTS
      for j=1:NPTS
        z=x(k)+i*y(j);
        plot(z,'.')
      end
    end
    hold off
    
    Be sure to add comments that identify the purpose of the file and your name.
  4. Execute this m-file to generate a plot of 10000 points that fill up a square in the complex plane. Please include a copy of this plot with your summary.
  5. What is the purpose of the statement clear i?
  6. What would happen if the two statements hold on and hold off were to be omitted?
  7. Next, discard the two hold and replace the plot statement with the following statements:
        root=newton('f7',z);
        [difference,whichRoot]=min(abs(root-omega));
        if difference>1.e-2
          whichRoot=0;
        end
        which(k,j)=whichRoot;
    
  8. If the variable root happened to take on the value root=0.707-i*0.707, what would the values of difference and whichRoot be? (Recall that (1/sqrt(2))=.70710687....)
  9. If the variable root happened to take on the value root=0.606-i*0.606, what would the values of difference and whichRoot be? What would this indicate about the Newton iteration that arrived at this value of root?
  10. Initialize the array which to be an empty array by placing the following statement just before the first for statement.
    which=[];
    
  11. The array which contains integers between 0 and 4. It can be plotted as if it were an image with the commands
    image(x,y,10*which)
    colorbar
    
  12. Execute your modified m-file. The position (z=x+i*y) on this image corresponds to the initial guess and the color values 10, 20, 30, and 40 correspond to roots 1, 2, 3, and 4. This image illustrates that, for example, some initial guesses in the third quadrant converge to the root in the first quadrant! It turns out that the set you have plotted is a Julia set (fractal). You can increase the detail of the plot by increasing NPTS and waiting longer. Please include this plot with your summary file.


Newton's method revisited

One disadvantage of Newton's method is that we have to supply not only the function, but also a derivative. In the following exercise, we will try to make life a little easier by approximating the derivative of the function. One way would be to consider a nearby point, evaluate the function there, and then approximate the derivative by

$\displaystyle y' \approx \frac{ f(x+\Delta x) - f(x) }{ \Delta x }$ (5)

Here are three possible ways to pick the stepsize $ \Delta x$:
  1. Simply set $ \Delta x=\vert x_{\mbox{previous}}-x\vert$, where $ x_{\mbox{previous}}$ is the value of the previous iteration. (On the first step, set $ \Delta x$ to $ .1$ or something.);
  2. Always use a small fixed value.
  3. Use a value that gets smaller as you converge, say $ \Delta x = f(x)$.
In the following exercises, you will be using a fixed value.

Exercise 10: Make a copy of newton.m and call it newton2.m.
  1. Change the signature (first line) to be
    function [x,numIts,increment]=newton2(f,x,dx)
    
    and change the comment lines to reflect the new function.
  2. Replace the evaluation of yprime through a function call with the expression described in Equation (5) using the constant stepsize dx.
  3. Check out your code using the function f1(x)=x^2-9, starting from the point x = 5.0, using an increment dx=0.00001. You should observe essentially the same converged value in the same number of iterations as using newton.

It turns out that the function f1(x)=x^2-9 is an ``easy'' function to solve using Newton iterations. More ``difficult'' functions require a more delicate choice for dx.

Exercise 11: For the two functions f2(x)=x^6-x-1 and f5(x)=x^20, and the starting point x = 5.0, compute the number of steps taken to converge using the true Newton method, and newton2 with different stepsizes dx. Remember that when the maximum number of iterations is reached, it indicates divergence, so write ``diverges'' in that case.

               Number of steps f2   Number of steps f5

true Newton    __________________   __________________
dx = 0.00001   __________________   __________________
dx = 0.0001    __________________   __________________
dx = 0.001     __________________   __________________
dx = 0.01      __________________   __________________
dx = 0.1       __________________   __________________
dx = 0.5       __________________   __________________

As you can see, the choice of dx is critical. It is also quite difficult in more complicated problems.

Remark: As mentioned earlier, Newton's method generally converges quite quickly. If you write a Newton solver and observe poor or no convergence, the first thing you should look for is an error in the derivative. One way of finding such an error is to apply a method like newton1 and print both the estimated derivative and the analytical derivative computed inside your function. They should be close. This trick is especially useful when you are using Newton's method in many dimensions, where the Jacobian can have some correctly-computed components and some components with errors in them.



Back to MATH2070 page.





Mike Sussman 2006-08-25