MATH2070: LAB 5: Multidimensional Newton's Method




Introduction Exercise 1
Modifications to newton.m for vector functions Exercise 2
A complex function revisited Exercise 3
Nonlinear least squares Exercise 4
Softening (damping) Exercise 5
  Exercise 6
  Exercise 7
  Exercise 8
  Exercise 9


Introduction

Last time we discussed Newton's method for nonlinear equations in one real or complex variable. In this lab, we will extend the discussion to two or more dimensions. One of the examples will include a common application of Newton's method, viz., nonlinear least squares fitting.

This lab will take two sessions. If you print it, you might find the pdf version more convenient.


Modifications to newton.m for vector functions

You should recall, if $ \bf x$ is a vector and $ {\bf f}({\bf x})$ a differentiable vector function with Jacobian matrix $ J$, then Newton's method can be written

$\displaystyle {\bf x}^{(k+1)}-{\bf x}^{(k)}=-J({\bf x}^{(k)})^{-1}{\bf f}({\bf x}^{(k)}).$ (1)

(The superscript indicates iteration number. A superscript is used to distinguish iteration number from vector index.)

The Jacobian matrix, $ J$, is the multidimensional equivalent of the derivative. It is given by

$\displaystyle J_{mn}=\frac{\partial f_m}{\partial x_n}.$

As an implementation note, the inverse of $ J$ should not be computed. Instead, the system

$\displaystyle J({\bf x}^{(k)})({\bf x}^{(k+1)}-{\bf x}^{(k)})=-{\bf f}({\bf x}^{(k)}).
$

should be solved. As you will see later in this course, this will save about a factor of three in computing time over computing the inverse. Matlab provides a special, division-like symbol for this solution operation: the backslash (\) operator. If you wish to solve the system

$\displaystyle J{\bf\Delta x}=-{\bf f}$

then the solution can be written as

$\displaystyle {\bf\Delta x}=-J\backslash{\bf f}.$

Note that the divisor is written so that it appears ``underneath'' the backslash. Mathematically speaking, this expression is equivalent to

$\displaystyle {\bf\Delta x}=-J^{-1}{\bf f}$

but the first form, using \ is about three times faster.

Exercise 1:
  1. Start from your version of newton.m from Lab 4, or download my version. Change its name to vnewton.m and change its signature to
    function [x,numIts]=vnewton(f,x)
    % comments
    
    and modify it so that it: (a) Uses the Matlab \ operator instead of scalar division for increment; (b) uses the norm rather than the absolute value of the increment to determine when to stop; (c) does not use the disp statement; and, (d) has appropriately modified comments. (Leaving the disp statements in will cause syntax errors when vector arguments are present, so be sure to eliminate them.)
  2. Test your modifications by comparing the value of the root and the number of iterations required for the complex scalar (not vector) case from last time ( $ f(z)=z^2+9$). This shows that vnewton will still work for scalars. Your results should agree with those from newton. Fill in the following table (same one as last lab, except the final case).
    Initial guess     numIts     error     numIts(newton)
    1+i               _______   _________  ______________
    1-i               _______   _________  ______________
    10+5*i            _______   _________  ______________
    10+1.e-25*i       _______   _________  ______________
    


A complex function revisited

It is possible to rewrite a complex function of a complex variable as a vector function of a vector variable. This is done by equating the real part of a complex number with the first component of a vector and the imaginary part of a complex number with the second component of a vector.

Consider the function $ f(z)=z^2+9$. Write $ z=x_1+x_2i$ and $ f(z)=f_1+f_2i$. Plugging these variables in yields

$\displaystyle f_1+f_2i=(x_1^2-x_2^2+9)+(2x_1x_2)i.$

This can be written in an equivalent matrix form as

$\displaystyle \left[\begin{array}{c}f_1(x_1,x_2) f_2(x_1,x_2)\end{array}\right]= \left[\begin{array}{c}x_1^2-x_2^2+9 2x_1x_2\end{array}\right]$ (2)

Exercise 2:
  1. Write a function m-file f2.m to compute the vector function described above in Equation (2) and its Jacobian. It should be in the form needed by vnewton and should have the signature
    function [f,J]=f2(x)
    % comments
    
    where f is the two-dimensional column vector from Equation (2) and J is its Jacobian matrix. Hint: Compute df1dx1 ( $ =\frac{\partial f_1}{\partial x_1}$), df1dx2, df2dx1, and df2dx2. Then set
    J=[df1dx1 df1dx2
       df2dx1 df2dx2];
    
  2. Fill in the table below. Note that it should be the same as the table in the previous exercise, except for its format, and the correct solution is

    $\displaystyle \left[\begin{array}{c}0 \pm 3\end{array}\right].$

    Initial guess     numIts   norm(error)
    [1;1]             _______   _________
    [1;-1]            _______   _________
    [10;5]            _______   _________
    [10;1.e-25]       _______   _________
    

At this point, you should be confident that vnewton is correctly programmed and yields the same results as newton from the previous lab. Now, you will turn your attention to seeing why the initial guess [10;1.e-25] requires so many Newton iterations. To do this, you will be using Matlab as an investigative tool.

Exercise 3: In this exercise, we will look more carefully at the iterates in the poorly-converging case from the previous exercise. Because we are interested in the sequence of iterates, we will generate a special version of vnewton.m that returns the full sequence of iterates. It will return the iterates as a matrix whose columns are the iterates, with as many columns as the number of iterates. (One would not normally return so much data from a function call, but we have a special need for this exercise.)
  1. Make a copy of vnewton.m and call it vnewton1.m. Change its signature line to the following
    function [x, numIts, iterates]=vnewton1(f,x)
    % comments
    
  2. Just before the loop, add the following statement
    iterates=x;
    
    to initialize the variable iterates.
  3. Add the following statement just after the new value of x has been computed.
    iterates=[iterates,x];
    
    Since iterates is a matrix and x is a column vector, this Matlab statement causes one column to be appended to the matrix iterates for each iteration.
  4. Test your modified function vnewton1 using the same f2.m from above, starting from the vector [1;1]. You should get the same results as before for solution and number of iterations. Check that the size of the matrix is $ 2\times$(number of iterations+1), i.e., it has 2 rows and (numIts+1) columns.

The study of the poorly-converging case continues with the following exercise. The objective is to try to understand what is happening. The point of the exercise is two-fold: on the one hand to learn something about Newton iterations, and on other hand to learn how one might use Matlab as an investigative tool.

Exercise 4:
  1. Apply vnewton1.m to the function f2.m starting from the column vector [10;1.e-25]. This is the poorly-converging case.
  2. Plot the iterates on the plane using the command
    plot(iterates(1,:),iterates(2,:),'*-')
    
    It is very hard to interpret this plot, but it is a place to start. You do not have to send me a copy of your plot. Note that most of the iterates lie along the $ x$-axis, with some quite large.
  3. Use the zoom feature of plots in Matlab (the magnifying glass icon with a + sign) to look at region with horizontal extent around [-20,20]. It is clear that most of the iterates appear to be on the $ x$-axis. Please send me a copy of this plot. Warning: The magnifying glass icon in Matlab on the computers in GSCC sometimes causes Matlab to crash! You can get the same effect with the following commands
    v=axis;
    axis([-20,20,v(3),v(4)]);
    
  4. Look at the formula you used for the Jacobian in f2.m. Explain why, if the initial guess starts with $ x_2=0$ exactly, then all subsequent iterates also have $ x_2=0$.
  5. You should be able to see what is happening. I chose an initial guess with $ x_2\approx 0$, so subsequent iterates stayed near the $ x$-axis. Since the root is at $ x_2=3$, it takes many iterates before the $ x_2$ component of the iterate can ``rise to the occasion.''
  6. Use a semilog plot to see how the iterates grow in the vertical direction:
    semilogy(iterates(2,:))
    
    The plot shows the $ x_2$ component seems to grow exponentially (linearly on the semilog plot) with seemingly random jumps superimposed. Please send me a copy of this plot.
You now have a rough idea of how this problem is behaving. It converges slowly because the initial guess is not close enough to the root for quadratic convergence to be exhibited. The $ x_2$ component grows exponentially, however, and eventually the iterates become close enough to the root to exhibit quadratic convergence. It is possible to prove these observations, although the proof is beyond the scope of this lab.

This type of behavior might be described as ``wandering around looking for a good initial guess.'' It is not uncommon and is one of the reasons that a good initial guess is important when using Newton's Method.


Nonlinear least squares

A second common application of Newton's method for vector functions is to nonlinear curve fitting by least squares. This application falls into the general topic of ``optimization'' wherein the extremum of some function $ F:\mathbb{R}^n\rightarrow\mathbb{R}$ is sought. Of course, if $ F$ is differentiable, then its extrema are given by the solution of the system of equations $ \partial F/\partial {x_k}=0$ for $ k=1,2,\dots,n$, and the solution can be found using Newton's method.

Suppose you are observing some phenomenon that is evolving in time. Suppose that you have a theoretical basis for assuming that the phenomenon obeys an equation

$\displaystyle v(t)=e^{-x_1t}(x_3\sin x_2t+x_4\cos x_2t).$

After the observations have progressed for some time, you have a large number of pairs of values $ (t_n,v_n)$ for $ n=1,\dots,N$. The question to be answered is, ``What values of $ x_k$ for $ k=1,2,3,4$ would best reproduce the observations?'' In other words, find the values of $ {\bf x}=[x_1,x_2,x_3,x_4]^T$ that minimize the norm of the differences between the formula and observations. Define

$\displaystyle F({\bf x})=\sum_{n=1}^N (v_n-e^{-x_1t_n}(x_3\sin x_2t_n+x_4\cos x_2t_n))^2$

and we seek the minimum of $ F$.

Note: The particular problem as stated can be reformulated as a linear problem, resulting in reduced numerical difficulty. However, it is quite common to solve the problem in this form and in more difficult cases it is not possible to reduce the problem to a linear one.

In order to solve this problem, it is best to note that when the minimum is achieved, the gradient $ {\bf f}=\nabla F$ must be zero. The components $ f_k$ of the gradient $ {\bf f}$ can be written as

$\displaystyle f_1 =\frac{\partial F}{\partial x_1}$ $\displaystyle = 2 \sum_{k=1}^K (v_k - e^{-x_1 t_k} (x_3 \sin x_2 t_k + x_4 \cos x_2 t_k )) t_k e^{-x_1 t_k} (x_3 \sin x_2 t_k + x_4 \cos x_2 t_k )$    
$\displaystyle f_2 =\frac{\partial F}{\partial x_2}$ $\displaystyle = -2 \sum_{k=1}^K (v_k - e^{-x_1 t_k} (x_3 \sin x_2 t_k + x_4 \cos x_2 t_k )) e^{-x_1 t_k} (x_3 \cos x_2 t_k t_k - x_4 \sin x_2 t_k t_k)$ (3)
$\displaystyle f_3 =\frac{\partial F}{\partial x_3}$ $\displaystyle = -2 \sum_{k=1}^K (v_k - e^{-x_1 t_k} (x_3 \sin x_2 t_k + x_4 \cos x_2 t_k )) e^{-x_1 t_k} \sin(x_2 t_k)$    
$\displaystyle f_4 =\frac{\partial F}{\partial x_4}$ $\displaystyle = -2 \sum_{k=1}^K (v_k - e^{-x_1 t_k} (x_3 \sin x_2 t_k + x_4 \cos x_2 t_k )) e^{-x_1 t_k} \cos(x_2 t_k)$    

(One rarely does this kind of calculation by hand any more. The Maple and Mathematica symbolic manipulation programs greatly reduce the manipulative chore. The symbolic toolbox in Matlab is a subset of Maple.)

To apply Newton's method to $ \bf f$ as defined in (3), the sixteen components of the Jacobian matrix are also needed. These are obtained from $ f_i$ above by differentiating with respect to $ x_j$ for $ j=1,2,3,4$.

In the following exercise, you will see that Newton's method applied to this system can require the convergence neighborhood to be quite small.

Exercise 5:
  1. Download my code for the least squares objective function $ F$, its gradient $ \bf f$, and the Jacobian matrix $ J$. The file is called objective.m because a function to be minimized is often called an ``objective function.'' (Our objective is to minimize the objective function.)
  2. Use the command help objective to see how to use it.
  3. Compute f and J at the solution x=[0.15;2;1;3] to be sure that the function is zero there.
  4. Compute the determinant of J (above) to see that it is nonsingular.

Exercise 6: The point of this exercise is to see how sensitive Newton's method can be when the initial guess is changed. Fill in the following table with either the number of iterations required or the word ``failed,'' using vnewton.m and the indicated initial guess. Note that the first line is the correct solution. For this exercise, restrict the number of iterations to be no more than 100.
 Initial guess             Number of iterations
[0.15; 2; 1.0; 3]          ______________

[0.15; 2; 0.9; 3]          ______________
[0.15; 2; 0.0; 3]          ______________
[0.15; 2;-0.5; 3]          ______________
[0.15; 2;-0.6; 3]          ______________

[0.15; 2; 1.0; 4]          ______________
[0.15; 2; 1.0; 5]          ______________
[0.15; 2; 1.0; 6]          ______________

[0.15; 1.99; 1.0; 3]       ______________
[0.15; 1.97; 1.0; 3]       ______________
[0.15; 1.95; 1.0; 3]       ______________
[0.15; 1.93; 1.0; 3]       ______________
[0.15; 1.91; 1.0; 3]       ______________

[0.17; 2; 1.0; 3]          ______________
[0.19; 2; 1.0; 3]          ______________
[0.21; 2; 1.0; 3]          ______________

You can see from the previous exercise that Newton can require a precise guess before it will converge. Let's look at what happens in one of the divergent cases.

Exercise 7: Recall that your vnewton1.m function has signature [x, numIts, iterates]=vnewton1(f,x), so that it returns the matrix of all iterates. Apply that function to the two final cases in the previous exercise, and save the iterates using the following commands.
[x,numIts,convg]=vnewton1('objective',[0.19; 2; 1.0; 3]);
[x,numIts,divg]=vnewton1('objective',[0.21; 2; 1.0; 3]);
In particular, look at x(1) (this is the one that is not already correct in the initial guess) for each of the first six iterates. Plot the first component of the iterates on the same plot using the commands
plot(convg(1,1:6),'b')
hold on
plot(divg(1,1:6),'r')
legend('convergent','divergent')
hold off
You should see that the divergent case starts out immediately with larger variations than the convergent case. Please send me this plot. (In the plot window, go to File $ \rightarrow$Export, then choose ``JPEG image'' from the drop-down box, and give the file a name.)


Softening (damping)

The exercise in the previous section indicates that Newton gets in trouble when its increment is too large. One way to mitigate this problem is to ``soften'' or ``dampen'' the iteration by putting a factor (less than one) on the iterate.

$\displaystyle {\bf x}^{(n+1)}={\bf x}^{(k)}-\alpha J({\bf x}^{(k)})^{-1}{\bf f}({\bf x}^{(k)})$ (4)

where $ \alpha$ is a number smaller than one. Try this idea out in the following exercise.

Exercise 8:
  1. Starting from your vnewton.m file, copy it to a new file named snewton.m (for ``softened Newton''), change its signature to
    function [x,numIts]=snewton(f,x,alpha)
    
    and change it to conform with Equation (4). Don't forget to change the comments and the convergence criterion. Matlab warning: The backslash operator does not observe the ``operator precedence'' you might expect, so you need parentheses. For example, 3*2\4=0.6667, but 3*(2\4)=6!
  2. Using $ \alpha=0.5$, to the nearest 0.01, how large can the first component of the initial guess get before the iteration diverges? (Leave the other three values at their correct values.)

Softening by a constant factor can improve the initial behavior of the iterates, but it destroys the quadratic convergence of the method. Further, it is hard to guess what the softening factor should be. There are tricks to soften the iteration in such a way that when it starts to converge, the softening goes away. One such trick is to compute

$\displaystyle \Delta x$ $\displaystyle =-J({\bf x}^{(k)})^{-1}{\bf f}({\bf x}^{(k)})$    
$\displaystyle \alpha$ $\displaystyle =\frac{1}{1+\Vert\Delta x\Vert}$ (5)
$\displaystyle {\bf x}^{(n+1)}$ $\displaystyle ={\bf x}^{(k)}+\alpha\Delta x$    

You should be able to see how you might prove that this softening strategy does not destroy the quadratic convergence rate.

Another trick is to choose $ \alpha$ so that the objective function necessarily decreases on each iteration. This latter idea can be quite effective, but offers no advantage in the present case.

Exercise 9:
  1. Starting from your snewton.m file, copy it to a new file named snewton1.m, change its signature to
    function [x,numIts]=snewton1(f,x)
    
    and change it to conform with Equation (5). Don't forget to change the comments.
  2. Consider the number of iterations used by snewton in the previous exercise for the convergent case with largest first component. Does snewton1 take fewer iterations? How many fewer?
  3. Using snewton1.m, to the nearest 0.01, how large can the first component of the initial guess get before the iteration diverges? (Leave the other three values at their correct values.)



Back to MATH2070 page.





Mike Sussman 2006-09-02