MATH2070: LAB 3: Roots of Equations




Introduction Exercise 1
A Sample Problem Exercise 2
The Bisection Idea Exercise 3
Programming Bisection Exercise 4
Variable Function Names Exercise 5
The secant method Exercise 6
  Exercise 7
  Exercise 8
Regula Falsi (Extra) Extra Credit


Introduction

The root finding problem is the task of finding one or more values for a variable $x$ so that an equation

\begin{displaymath}
f(x) = 0
\end{displaymath}

is satisfied. Denote the desired solution as $x_0$. As a computational problem, we are interested in effective and efficient ways of finding a value $x$ that is ``close enough'' to $x_0$. There are two common ways to measuring ``close enough.''
  1. The ``residual error,'' $\vert f(x)\vert$, is small, or
  2. The ``approximation error'' or ``true error,'' $\vert x-x_0\vert$ is small.
Of course, the true error is not known, so it must be approximated in some way. In this lab we will look at some nonlinear equations and some simple methods of finding an approximate solution. We will use an estimate of the true error when it is readily available and will use the residual error the rest of the time.

You will see in this lab that error estimates can be misleading and there are many ways to adjust and improve them. We will not examine more sophisticated error estimation in this lab.

This lab will take two sessions. If you print this lab, you may find the pdf version more appropriate.

Remarks on programming style

As you have undoubtedly noticed, Quarteroni, Sacco, and Saleri include a substantial amount of sample Matlab code in the text. Further, this code is available for download at http://www1.mate.polimi.it/%7ecalnum/programs.html. (Be careful, though; I have observed errors in some of the code on the web site.)
Warning: This link may not be working!

You will find that the code presented in these labs has a very different appearance than the code printed in the text. Perhaps the biggest difference is that the code in the book often includes several commands on a single line, separated by semicolons, while the code in these labs never has more than one command on a single line. It is important for these labs that you follow the formatting presented in the labs for code you hand in because the code in the labs is formatted to make reading and debugging as easy as possible, while code in the text is formatted to take up minimal space (so the book is less expensive).

A second difference is that the code presented in the labs tends to use much longer variable names than the code in the text. Longer variable names also improves clarity.

Clarity is important for debugging because if code is harder to read then it is harder to debug. Debugging is the most difficult and least rewarding activity you will engage in, so anything that simplifies debugging pays off. Clarity is also important when others read your code. Since I need to read your code in order to give you a grade, please format your code similarly to that in the labs.


A Sample Problem

Suppose we want to know if there is a solution to

\begin{displaymath}
\cos x = x ,
\end{displaymath}

whether the solution is unique, and what its value is. This is not an algebraic equation, and there is little hope of forming an explicit expression for the solution.

Since we can't solve this equation exactly, it's worth knowing whether there is anything we can say about it. In fact, if there is a solution $x_0$ to the equation, we can probably find a computer representable number x which approximately satisfies the equation (has low residual error) and which is close to $x_0$ (has a low approximation error). Each of the following two exercises uses a plot to illustrate the existance of a solution.

Note: Matlab has the capability of plotting some functions by name. We won't be using this capability in this course because it is too specialized. Instead, we will construct plots the way you probably learned when you first learned about plotting: by constructing a list of $x$-values (abscissæ) and corresponding $y$-values (ordinates) and then plotting the points with lines connecting them.

Exercise 1: The following steps show how to use Matlab to plot the functions $y=\cos x$ and $y=x$ together in the same graph from $-\pi$ to $\pi$.
  1. Define a Matlab (vector) variable xplot to take on 100 evenly-spaced values between -pi and pi. You can use linspace to do this.
  2. Define the (vector) variable y1plot by y1plot=cos(xplot). This is for the curve $y=\cos x$.
  3. Define the (vector) variable y2plot by y2plot=xplot. This is for the line $y=x$.
  4. Plot both y1plot and y2plot by plotting the first,
    plot(xplot,y1plot)
    
    then hold on, plot the second, and hold off. You saw this done before in Lab 2. Note: If you read the help files for the ``plot'' function, you will find that an alternative single command is plot(xplot,y1plot,xplot,y2plot).
  5. The two lines intersect, clearly showing that a solution to the equation $\cos x=x$ exists. Include the Matlab commands you used in your summary file. Send me the plot in the form of a jpg file. To create such a file, use the command
      print -djpeg exer1.jpg
    
    or use the ``File'' menu, ``export'' and choose JPEG image.

Exercise 2:
  1. Write an m-file named cosmx.m (``COS Minus X'') that defines the function

    \begin{displaymath}
f(x)=\cos x-x
\end{displaymath}

    Recall that a function m-file is one that starts with the word function. In this case it should start off as
    function y = cosmx ( x )
    % y = cosmx(x) computes the difference y=cos(x)-x
    
    % your name and the date
    
    y = ???
    
    (This is very simple-the object of this exercise is to get going.) Include a copy of this file with your summary.
  2. Now use your cosmx function to compute cosmx(0.5). Your result should be about 0.37758 and should not result in extraneous printed or plotted information.
  3. Now use your cosmx function in the same plot sequence
    plot(???
    hold on
    plot(???
    hold off
    
    as above to plot cosmx and the horizontal axis on the interval $-\pi\le x \le\pi$. Send me the plot file as exer2.jpg.

    Hint: The horizontal axis is the line $y=0$. To plot this as a function, you need a vector of at least two $x$-values between $-\pi$ and $\pi$ and a corresponding vector of $y$-values all of which are zero. Think of a way to generate an appropriate set of $x$-values and $y$-values when all the $y$-values are zero. You could use the Matlab function zeros.


The Bisection Idea

The idea of the bisection method is very simple. We assume that we are given two values $x=a$ and $x=b$ ($a<b$), and that the function $f(x)$ is positive at one of these values (say at $a$) and negative at the other. (When this occurs, we say that $[a,b]$ is a ``change-of-sign interval'' for the function.) Assuming $f$ is continuous, we know that there must be at least one root in the interval.

Intuitively speaking, if you divide a change-of-sign interval in half, one or the other half must end up being a change-of-sign interval, and it is only half as long. Keep dividing in half until the change-of-sign interval is so small that any point in it is within the specified tolerance.

More precisely, consider the point $x=(a+b)/2$. If $f(x)=0$ we are done. (This is pretty unlikely). Otherwise, depending on the sign of $f(x)$, we know that the root lies in $[a,x]$ or $[x,b]$. In any case, our change-of-sign interval is now half as large as before. Repeat this process with the new change of sign interval until the interval is sufficiently small and declare victory.

We are guaranteed to converge. We can even compute the maximum number of steps this will take, because if the original change-of-sign interval has length $\ell$ then after one bisection the current change-of-sign interval is of length $\ell/2$, etc. We know in advance how well we will approximate the root $x_0$. These are very powerful facts, which make bisection a robust algorithm--that is, it is very hard to defeat it.

Exercise 3:
  1. If we know the start points $a$ and $b$ and the interval size tolerance $\epsilon$, we can predict beforehand the maximum number of steps the bisection code can take. What is the formula for this value?
  2. Give an example of a continuous function that has only one root in the interval [-2, 1], but for which bisection could not be used.


Programming Bisection

Before we look at a sample bisection program, let's discuss some programming issues. If you haven't done much programming before, this is a good time to try to understand the logic behind how we choose variables to set up, what names to give them, and how to control the logic.

First of all, we should write the bisection algorithm as a Matlab function. There are two reasons for writing it as a function instead of a script m-file (without the function header) or by typing everything at the command line. The first reason is a practical one: we will be executing the algorithm several times, with differing end points, and functions allow us to use temporary variables without worrying about whether or not they have already been used before. The second reason is pedantic: you should learn how to do this now because it is an important tool in your toolbox.

A precise description of the bisection algorithm is presented by Quarteroni, Sacco, and Saleri in Section 6.2.1, on pages 250-251. Their algorithm can be expressed in the following way. (Note: the product $f(a)\cdot f(b)$ is negative if and only if $f(a)$ and $f(b)$ are of opposite sign and neither is zero.)

Given a function $f:\mathbb{R}\rightarrow\mathbb{R}$ and $a,b\in\mathbb{R}$ so that $f(a)\cdot f(b)<0$, then sequences $a^{(k)}$ and $b^{(k)}$ for $k=0,\ldots$ with $f(a^{(k)})\cdot f(b^{(k)})<0$ for each $k$ can be constructed by setting $a^{(1)}=a$, $b^{(1)}=b$ then, for each $k>0$,

  1. Set $x^{(k)}=(a^{(k)}+b^{(k)})/2$.
  2. If $x^k$ is close enough to the root, exit the algorithm.
  3. If $f(x^{(k)})\cdot f(a^{(k)})<0$, then set $a^{(k+1)}=a^{(k)}$ and $b^{(k+1)}=x^{(k)}$.
  4. If $f(x^{(k)})\cdot f(b^{(k)})<0$, then set $a^{(k+1)}=x^{(k)}$ and $b^{(k+1)}=b^{(k)}$.
  5. Return to step 1.

The bisection algorithm can be described in a manner more appropriate for computer implementation in the following way. Any algorithm intended for a computer program must address the issue of when to stop the iteration. In the following algorithm,

Algorithm Bisect($f$,$a$,$b$,$x$,$\epsilon$)

  1. Set $x$:=$(a+b)/2$.
  2. If $\vert b-a\vert \leq \epsilon$, or if $f(x)$ happens to be exactly zero, then accept $x$ as the approximate root, and exit.
  3. If sign($f(b)$) * sign($f(x)$) < 0, then $a$:=$x$; otherwise, $b$:=$x$.
  4. Return to step 1.

Remarks:

The language of this description, called ``pseudocode,'' is based on a computer language called Algol but is intended merely to be a clear means of specifying algorithms in books and papers. One objective of this lab is to show how to rewrite algorithms like it in the Matlab language. As a starting point, let's fix $f(x)$ to be the function cosmx that you just wrote. Later you will learn how to add it to the calling sequence. Let's also fix $\epsilon=10^{-6}$.

Here is a Matlab function that carries out the bisection algorithm for our cosmx function. In Matlab, returned values are separated from arguments, so the variables x and itCount are written on the left of an equal sign.

function [x,itCount] = bisect_cosmx( a, b)
% [x,itCount] = bisect_cosmx( a, b) uses bisection to find a
% root of cosmx between a and b to tolerance 0.000001=1.0e-6
% a=left end point of interval
% b=right end point of interval
% cosmx(a) and cosmx(b) should be of opposite signs
% x is the approximate root found
% itCount is the number of iterations required.

% your name and the date

EPSILON = 1.0e-6;
fa = cosmx(a); 
fb = cosmx(b);

for itCount = 1:(???)  % fill in this value from Exercise 3

  x = (b+a)/2;
  fx = cosmx(x);

  % The following statement prints the progress of the algorithm
  disp(strcat(  'a=' , num2str(a), ', fa=' , num2str(fa), ...
              ', x=' , num2str(x), ', fx=' , num2str(fx), ...
              ', b=' , num2str(b), ', fb=' , num2str(fb)))

  if ( fx == 0 )
    return;  % found the solution exactly!
  elseif ( abs ( b - a ) < EPSILON )
    return;  % satisfied the convergence criterion
  end

  if ( sign(fa) * sign(fx) <= 0 )
    b = x;
    fb = fx;
  else
    a = x; 
    fa = fx;
  end

end

error('bisect_cosmx failed with too many iterations!')

Note: I have violated my own rule about having functions print things, but the disp statement merely prints progress of the function and can be discarded once you are confident it is correct.

Remarks about the code: Compare the code and the algorithm. The code is similar to the algorithm, but there are some significant differences. For example, the algorithm generates a sequence of intervals $I_i$ and sequences of endpoints $(a_i,b_i)$ and midpoints $x_i$. The code does not explicitly generate the intervals and keeps track of only the most recent endpoint and midpoint values. This is a typical strategy in writing code--use variables to represent only the most recent values of a sequence when the full sequence does not need to be retained.

A second difference is in the looping. There is an error exit in the case that the convergence criterion is never satisfied. If the function ever exits with this error, it is because either your estimate of the maximum number of iterations is too small or because there is a bug in your code.

Note the symbol == to represent equality in a test. Use of a single = sign causes a syntax error in Matlab. In languages such as C no syntax error is produced but the statement is wrong nonetheless and can cause serious errors that are very hard to find.

Exercise 4: Create the file bisect_cosmx.m by typing it in yourself, or by using copy-and-paste to copy it from this window.
  1. Replace the ``???'' with your result from Exercise 3. (If you are not confident of your result, use 10000.)
    Hint: Recall that $\log_2(x)=\log(x)/\log(2)$.
  2. Why is the name EPSILON all capitalized?
  3. The variable EPSILON and the reserved Matlab variable eps have similar-sounding names. As EPSILON is used in this function, is it related to eps? If so, how?
  4. What does the sign(x) function do? What if x is 0? The sign function can avoid representation difficulties when fa and fm are near the largest or smallest numbers that can be represented on the computer.
  5. The disp command is used only to monitor progress. Note the use of ... as the continuation character.
  6. What is the result if the Matlab error function is called?
  7. What will the final value of itCount be if a and b are already closer together than the tolerance when the function starts?
  8. Try the command:
      [z,iterations] = bisect_cosmx ( 0, 3 )
    
    Is the value z close to a root of the equation cos(z)=z?
    Warning: You should not see the error message! If you do, your value for the maximum number of iterations is too small.
  9. How many iterations were required? Is this close to the value from your formula in Exercise 3?
  10. Type the command help bisect_cosmx at the Matlab command line. You should see your comments reproduced as a help message. This is one reason for putting comments at the top.
  11. For what input numbers will this program produce an incorrect answer (i.e., return a value that is not close to a root)? Add a check before the loop so that the function will call the Matlab error function instead of returning an incorrect answer.
  12. In your opinion, why did I use abs in the convergence test.


Variable Function Names

Now we have written an m-file called bisect_cosmx.m that can find roots of the function cosmx on any interval, but we really want to use it on an arbitrary function. To do this, we could create an m-file for the other function, but we would have to call that m-file cosmx.m even if there were no cosines in it, because that is what the bisect_cosmx function expects. Suppose that we have three files, called f1.m through f3.m, each of which evaluates a function that we want to use bisection on. We could rename f1.m to cosmx.m, and change its name inside the file as well, and repeat this for the other files. But who wants to do that?

It is more convenient to introduce a dummy variable for the function name, just as we used the dummy variables a and b for numerical values. In Matlab, function names can be passed as quoted strings or, in newer versions, as ``function handles,'' and their use is only a little more difficult than for numbers.

In the bisection function given below, the name of the function is specified in the function call using either a quoted string or the @ character. Either

[x, itCount] = bisect ( 'cosmx' , a, b )
or
[x, itCount] = bisect ( @cosmx , a, b )
will work. Inside the function bisect, the value of the function is found using the Matlab feval function, as shown below.

In newer versions of Matlab, either the quoted string or the @ character will still work, and feval can still be used to evaluate the function. However, in newer versions of Matlab, if you use the @ method (this is called a ``function handle''), then you can dispense with the feval syntax and simply call the function as f(x). If you are using your own recent version of Matlab, use the method you feel more comfortable with. If you are using the computers in GSCC, though, you must use feval.

Warning: If you forget the quotes around the function name or the @ in front of it, Matlab will produce errors that seem to make no sense at all (even less than usual). Keep in mind that function names in calling sequences must have single quotes around them or the @ character in front of them. Check for these first when you get errors in a function.

Remark: Quarteroni, Sacco, and Saleri have an m-file named bisect.m on p. 252. Their function includes a variable named fun that contains a string representing the function whose root is to be found. They find function values using eval. In contrast, bisect.m presented above uses a variable named f to contain a string or a function handle giving the name of an m-file and function values are found using feval. Using m-files for function calls is more flexible, allowing much more complicated functions, and is similar to the way that other programming languages (Fortran, C, C++, Java, etc.) work.

Exercise 5: Copy the file bisect_cosmx.m to a new file named bisect.m, or use the ``Save as'' menu choice to make a copy with the new name bisect.m.
  1. In the declaration line for bisect, include a dummy function name. We might as well call it f, so the line becomes
    function [x, itCount] = bisect ( f, a, b )
    
  2. Change the comments describing the purpose of the function, and explain the variable f.
  3. Where we evaluated the function by writing f(x), we now have to use a Matlab helper function called feval (Function EVALuation). For instance, the line
    fa = cosmx(a);
    
    must be rewritten as:
    fa = feval ( f, a );
    
    Make this change for fa, and a similar change for fb and fx.
  4. When we use bisect, we must include the function name, in quotes:
    [z, iterations] = bisect ( 'cosmx', 0, 3 )
    
    Execute this command and make sure that you get the same result as before. (This is called ``regression testing.'')

    Warning: If you forget the quotes around cosmx and use the command

    [z, iterations] = bisect ( cosmx, 0, 3 )   % error!
    
    you will get the following mysterious error message:
    >> bisect(cosmx,0,3)       
    ??? Input argument 'x' is undefined.
    
    Error in ==> cosmx.m
    On line 2  ==> y=cos(x)-x;
    
  5. It is always a good idea to check that your answers are correct. Consider the simple problem $f_0(x)=1-x$. Write a simple function m-file named f0.m (based on cosmx.m but defining this function and show that you get the correct solution (x=1) starting from the change-of-sign interval [0,3].

Exercise 6: The following table contains formulas and search intervals for four functions. Write four simple function m-files, f1.m, f2.m, f3.m and f4.m, each one similar to cosmx.m but with formulas from the table. Then use bisect to find a root in the given interval, and fill in the table. I have formatted the table as ASCII text so that you can easily include it in your summary file.

Name Formula           Interval     approxRoot No. Steps

 f1  x^2-9             [0,5]        _________  _________
 f2  x^5-x-1           [1,2]        _________  _________
 f3  x*exp(-x)         [-1,2]       _________  _________
 f4  2*cos(3*x)-exp(x) [0,6]        _________  _________
 f5  (x-1)^5           [0,3]        _________  _________


Convergence criteria

You found above that the bisection algorithm admits an excellent stopping criterion, and the number of iterations can be predicted before beginning the algorithm. This feature is remarkable and highly unusual. The much more common case is that it takes as much or more ingenuity to design a good stopping criterion as it does to design the algorithm itself. Furthermore, stopping criteria are likely to be problem-dependent.

One stopping criterion that is usually available is to stop when the residual of the function becomes small ( $\vert f(x)\vert\leq \epsilon$). This criterion is better than nothing, and often serves as a good starting point, but it can have drawbacks.

Exercise 7:
  1. Copy your bisect.m file to one named bisect0.m, or use ``Save as'' from the ``File'' menu. Be sure to change the name of the function inside the file. Change the line
      elseif ( abs ( b - a ) < EPSILON )
    
    to read
      elseif ( abs ( fx ) < EPSILON )
    
  2. Consider the function f0=x-1 on the interval [0,3]. Does your new function bisect0 find the correct answer?
  3. The amount the residual differs from zero is called the residual error and the difference between the solution you found and the true solution (found by hand or some other way) is called the true error. What are the residual and true errors when you used bisect0 to find the root of the function f0 above? How many iterations did it take?
  4. What are the residual and true errors when using bisect0 to find the root of the function f5=(x-1)^5? How many iterations did it take?
  5. To summarize your comparison, fill in the following table
                          residual      true          number of
    Name Formula Interval   error       error         steps
    
    f0    x-1     [0,3]  ___________   ___________   ____________
    f5   (x-1)^5  [0,3]  ___________   ___________   ____________
    
  6. The bisect0 function has the value EPSILON = 1.0e-6; in it. Does the table show that the residual error is smaller than EPSILON? What about the true error?


The secant method

The secant method is described by Quarteroni, Sacco, and Saleri in Section 6.2.2. Instead of dividing the interval in half, as is done in the bisection method, it regards the function as approximately linear, passing through the two points $x=a$ and $x=b$ and then finds the root of this linear function. The interval $[a,b]$ need no longer be a change-of-sign interval, and the next value of $x$ need no longer lie inside the interval $[a,b]$. In consequence, residual convergence must be used in the algorithm instead of true convergence.

It is easy to see that if a straight line passes through the two points $(a,f(a))$ and $(b,f(b))$, then it crosses the $x$-axis ($y=0$) when $x= b-\frac{b-a}{f(b)-f(a)}f(b)$.

The secant method can be described in the following manner.

Algorithm Secant($f$,$a$,$b$,$x$,$\epsilon$)

  1. Set

    \begin{displaymath}x= b-\frac{b-a}{f(b)-f(a)}f(b)\end{displaymath}

  2. If $\vert f(x)\vert\leq \epsilon$, then accept $x$ as the approximate root, and exit.
  3. Replace $a$ with $b$ and $b$ with $x$.
  4. Return to step 1.

The secant method is sometimes much faster than bisection, but since it does not maintain an interval inside which the solution must lie, the secant method can fail to converge at all. Unlike bisection, the secant method can be generalized to two or more dimensions, and the generalization is usually called Broyden's method.

Exercise 8:
  1. Write an m-file named secant.m, based partly on bisect.m, and beginning with the lines
    function [x,itCount] = secant(f, a, b)
    % [x,itCount] = secant(f, a, b)
    % more comments explaining what the function does and 
    % the use of each variable
    
    % your name and the date
    
  2. Because residual error is used for convergence, and because the number of iterations is unknown, choose the values
    EPSILON = 1.0e-12;
    ITMAX = 1000;
    
  3. Complete secant.m according to the secant algorithm given above.
  4. Repeat the experiments done with bisection above, using the secant method, and fill in the following table.
                                                   Secant     Bisection
    Name Formula           Interval     approxRoot No. Steps  No. Steps
    
     f1  x^2-9             [0,5]        _________  _________  _________
     f2  x^5-x-1           [1,2]        _________  _________  _________
     f3  x*exp(-x)         [-1,2]       _________  _________  _________
     f4  2*cos(3*x)-exp(x) [0,6]        _________  _________  _________
     f5   (x-1)^5          [0,3]        _________  _________  _________
     f3  x*exp(-x)         [-1.5,1]     _________  _________  _________
    
    You should observe faster and slower convergence, compared with bisection. You may also observe lack of convergence or convergence to a value that is not a root.


Extra credit (8 points): Regula Falsi

Regula falsi is described by Quarteroni, Sacco, and Saleri in Section 6.2.2. The regula falsi algorithm can be described in the following manner, reminiscent of the bisection and secant algorithms given above. Since the interval $[a,b]$ is supposed to begin as a change-of-sign interval, convergence is guaranteed.

Algorithm Regula($f$,$a$,$b$,$x$,$\epsilon$)

  1. Set

    \begin{displaymath}x= b-\frac{b-a}{f(b)-f(a)}f(b)\end{displaymath}

  2. If $\vert f(x)\vert\leq \epsilon$, then accept $x$ as the approximate root, and exit.
  3. If sign($f(a)$) * sign($f(x)$) $\geq 0$, then set $a=b$.
  4. Set $b=x$.
  5. Return to step 1.

Exercise 9:
  1. Look carefully at the algorithm above. The text presents regula falsi on p. 254 in terms of $x^{(k})$ and $x^{(k')}$ for various values of $k$. What values of $k$ and $k'$ correspond to the values $a$, $b$, and $x$ in the algorithm given above?
  2. Starting from your secant.m file, write an m-file named regula.m to carry out the regula falsi algorithm.
  3. Repeat the experiments from the previous exercise but using regula.m and fill in the following table.
                                               Regula     Secant    Bisection
    Name Formula           Interval approxRoot No. Steps  No. Steps No. Steps
    
     f1  x^2-9             [0,5]    _________  _________  _________ _________
     f2  x^5-x-1           [1,2]    _________  _________  _________ _________
     f3  x*exp(-x)         [-1,2]   _________  _________  _________ _________
     f4  2*cos(3*x)-exp(x) [0,6]    _________  _________  _________ _________
     f5   (x-1)^5          [0,3]    _________  _________  _________ _________
     f3  x*exp(-x)         [-1.5,1] _________  _________  _________ _________
    
    You should observe that both faster and slower convergence, compared with bisection and secant. You should not observe lack of convergence or convergence to an incorrect solution.



Back to MATH2070 page.





Mike Sussman 2012-08-15