MATH2071: LAB #2: Norms, Errors and Condition Numbers

Introduction Exercise 1
Vector Norms Exercise 2
Matrix Norms Exercise 3
Compatible Matrix Norms Exercise 4
More on the Spectral Radius Exercise 5
Types of Errors Exercise 6
Condition Numbers Exercise 7


Introduction

The objects we work with in linear systems are vectors and matrices. In order to make statements about the size of these objects, and the errors we make in solutions, we want to be able to describe the ``sizes'' of vectors and matrices, which we do by using norms.

We then need to consider whether we can bound the size of the product of a matrix and vector, given that we know the ``size'' of the two factors. In order for this to happen, we will need to use matrix and vector norms that are compatible. These kinds of bounds will become very important in error analysis.

We will then consider the notions of forward error and backward error in a linear algebra computation.

From the definitions of norms and errors, we can define the condition number of a matrix, which will give us an objective way of measuring how ``bad" a matrix is, and how many digits of accuracy we can expect when solving a particular linear system.

This lab will take two sessions. You may find it convenient to print the pdf version of this lab rather than the web page itself.


Vector Norms

A vector norm assigns a size to a vector, in such a way that scalar multiples do what we expect, and the triangle inequality is satisfied. There are three common vector norms in $ n$ dimensions:

To compute the norm of a vector $ x$ in Matlab:

(Recall that inf is the Matlab symbol corresponding to $ \infty$.)

Exercise 1: For each of the following vectors:
   x1 = [ 1; 2; 3 ]
   x2 = [ 1; 0; 0 ]
   x3 = [ 1; 1; 1 ]
compute the vector norms, using the appropriate Matlab commands. Be sure your answers are reasonable.
        L1            L2            L Infinity

  x1    __________    __________    __________
  x2    __________    __________    __________
  x3    __________    __________    __________


Matrix Norms

A matrix norm assigns a size to a matrix, again, in such a way that scalar multiples do what we expect, and the triangle inequality is satisfied. However, what's more important is that we want to be able to mix matrix and vector norms in various computations. So we are going to be very interested in whether a matrix norm is compatible with a particular vector norm, that is, when it is safe to say:

$\displaystyle \Vert Ax\Vert \le \Vert A\Vert \Vert x\Vert $

There are five common matrix norms:

To compute the norm of a matrix $ A$ in Matlab:


Compatible Matrix Norms

A matrix can be identified with a linear operator, and the norm of a linear operator is usually defined in the following way.

$\displaystyle \Vert A\Vert = \max_{x\ne0} \frac{\Vert Ax\Vert}{\Vert x\Vert}$

(It would be more precise to use $ \sup$ rather than $ \max$ here but the surface of a sphere in finite-dimensional space is a compact set, so the supremum is attained, and the maximum is correct.) A matrix norm defined in this way is said to be ``vector-bound'' to the given vector norm.

In order for a matrix norm to be consistent with the linear operator norm, you need to be able to say the following:

$\displaystyle \Vert Ax\Vert \leq \Vert A\Vert \Vert x\Vert$ (1)

but this expression is not true for an arbitrary chosen pair of matrix and vector norms. If it is true, then the two are ``compatible''.

If a matrix norm is vector-bound to a particular vector norm, then the two norms are guaranteed to be compatible. Thus, for any vector norm, there is always at least one matrix norm that we can use. But that vector-bound matrix norm is not always the only choice. In particular, the $ L^2$ matrix norm is difficult (expensive) to compute, but there is a simple alternative.

Note that:

Exercise 2: Consider each of the following column vectors:
   x1 = [ 1, 2, 3 ]'
   x2 = [ 1, 0, 0 ]'
   x3 = [ 1, 1, 1 ]'
For the same matrix A you used above:
A=[  4  1  1
     0 -2  2
     0  5 -4 ]
verify that the compatibility condition holds by comparing the values of $ \Vert A\Vert$ that you computed in the previous exercise with the ratios of $ \Vert Ax\Vert/\Vert x\Vert$. The final column refers to satisfaction of the compatibility relationship (1).

Matrix  Vector            norm(A*x1)   norm(A*x2)   norm(A*x3)
                norm(A)   ----------   ----------   ----------   OK?
 norm    norm              norm(x1)     norm(x2)     norm(x3)

   1      1    _________  __________   __________   __________   ___
   2      2    _________  __________   __________   __________   ___
 'fro'    2    _________  __________   __________   __________   ___
  inf    inf   _________  __________   __________   __________   ___


More on the Spectral Radius

In the above text there is a statement that the spectral radius is not vector-bound with any norm, but that it ``almost'' is. In this section we will see by example what this means.

In the first place, let's try to see why it isn't vector-bound to the $ L^2$ norm. Consider the following false ``proof''. Given a matrix $ A$, for any vector $ \bf x$, break it into a sum of eigenvectors of $ A$ as

$\displaystyle {\bf x}=\sum x_i{\bf e}_i$

where $ {\bf e}_i$ are the eigenvectors of $ A$, normalized to unit length. Then, for $ \lambda_i$ the eigenvalues of $ A$,
$\displaystyle \Vert A{\bf x}\Vert^2_2$ $\displaystyle =$ $\displaystyle \Vert\sum \lambda_i x_i{\bf e}_i\Vert^2_2$  
  $\displaystyle \leq$ $\displaystyle \max_i \vert\lambda_i\vert^2\sum \vert x_i\vert^2$  
  $\displaystyle \leq$ $\displaystyle \Vert A\Vert _{\mbox{spec}}^2\Vert{\bf x}\Vert^2$  

and taking square roots completes the ``false proof.''

Why is this ``proof'' false? The error is in the statement that all vectors can be expanded as sums of orthonormal eigenvectors. If you knew, for example, that $ A$ were positive definite and symmetric, then, the eigenvectors of $ A$ would form an orthonormal basis and the proof would be correct. Hence the term ``almost.'' On the other hand, this observation provides the counterexample.

Exercise 3: Consider the following matrix
A=[0.5  1   0   0   0   0   0 
    0  0.5  1   0   0   0   0 
    0   0  0.5  1   0   0   0 
    0   0   0  0.5  1   0   0
    0   0   0   0  0.5  1   0
    0   0   0   0   0  0.5  1
    0   0   0   0   0   0  0.5]
You should recognize this as a Jordan block. Any matrix can be decomposesed into several such blocks by a change of basis.
  1. Use the Matlab routine [V,D]=eig(A) (recall that the notation [V,D]= is that way that Matlab denotes that the function--eigin this case--returns two quantities) to get the eigenvalues (diagonal entries of D) and eigenvectors (columns of V) of A. How many linearly independent eigenvectors are there?
  2. You just computed the eigenvalues of A. What is the spectral norm of A?
  3. For the vector x=[1;1;1;1;1;1;1], what are $ \Vert{\bf x}\Vert _{L^2}$, $ \Vert A{\bf x}\Vert _{L^2}$, and $ (\Vert A\Vert _{\mbox{spec}})(\Vert{\bf x}\Vert _{L^2})$?
  4. Is Equation (1) satisfied?

It is a well-known fact that if the spectral radius of a matrix A is smaller than 1.0 then $ \lim_{n\rightarrow\infty} A^n=0$.

Exercise 4:
  1. For x=[1;1;1;1;1;1;1] compute and plot $ \Vert A^kx\Vert _{L^2}$ for $ k=0,1,\dots,40$. Please include this plot with your summary.
  2. What is the smallest value of $ k>2$ for which $ \Vert A^kx\Vert _{L^2}\le\Vert Ax\Vert _{L^2}$?
  3. What is $ \max_{0\le k\le40} \Vert A^kx\Vert _{L^2}$?


Types of Errors

A natural assumption to make is that the term ``error'' refers always to the difference between the computed and exact ``answers.'' We are going to have to discuss several kinds of error, so let's refer to this first error as ``solution error'' or ``forward error.'' Suppose we want to solve a linear system of the form $ A{\bf x}={\bf b}$, (with exact solution $ \bf x$) and we computed $ {\bf x}_{\mbox{approx}}$. We define the solution error as $ \Vert {\bf x}_{\mbox{approx}} - {\bf x} \Vert $.

Sometimes the solution error is not possible to compute, and we would like a substitute whose behavior is acceptably close to that of the solution error. One example would be during an iterative solution process where we would like to monitor the progress of the iteration but we do not yet have the solution and cannot compute the solution error. In this case, we are interested in the ``residual error'' or ``backward error,'' which is defined by $ \Vert A {\bf x}_{\mbox{approx}} - {\bf b}
\Vert = \Vert {\bf b}_{\mbox{approx}} - {\bf b} \Vert $ where, for convenience, we have defined the variable $ {\bf b}_{\mbox{approx}}$ to equal $ A{\bf
x}_{\mbox{approx}}$. Another way of looking at the residual error is to see that it's telling us the difference between the right hand side that would ``work'' for $ {\bf x}_{\mbox{approx}}$ versus the right hand side we have.

If we think of the right hand side as being a target, and our solution procedure as determining how we should aim an arrow so that we hit this target, then

There are problems for which the solution error is huge and the residual error tiny, and all the other possible combinations also occur.

Exercise 5: For each of the following cases, compute the Euclidean norm (two norm) of the solution error and residual error and characterize each as ``large'' or ``small.'' (For the purpose of this exercise, take ``large'' to mean greater than 1 and ``small'' to mean smaller than 0.01.) In each case, you must find the true solution first (Pay attention! In each case you can guess the true solution, xTrue.), then compare it with the approximate solution xApprox.
  1. A=[1,1;1,(1-1.e-12)], b=[0;0], xApprox=[1;-1]
  2. A=[1,1;1,(1-1.e-12)], b=[1;1], xApprox=[1.00001;0]
  3. A=[1,1;1,(1-1.e-12)], b=[1;1], xApprox=[100;100]
  4. A=[1.e+12,-1.e+12;1,1], b=[0;2], xApprox=[1.001;1]

Case Residual large/small  xTrue   Error   large/small
 1   ________   _______   ______   _______   ________
 2   ________   _______   ______   _______   ________
 3   ________   _______   ______   _______   ________
 4   ________   _______   ______   _______   ________

The norms of the matrix and its inverse exert some limits on the relationship between the forward and backward errors. Assuming we have compatible norms:

$\displaystyle \Vert x_{\mbox{approx}} - x \Vert
= \Vert A^{-1} A ( x_{\mbox{approx}} - x ) \Vert
\le (\Vert A^{-1}\Vert) (\Vert b_{\mbox{approx}} - b \Vert )
$

and

$\displaystyle \Vert A x_{\mbox{approx}} - b \Vert = \Vert A x_{\mbox{approx}} - A x \Vert
\le (\Vert A \Vert)( \Vert x_{\mbox{approx}} - x \Vert )
$

Put another way,
$\displaystyle ($solution error$\displaystyle )$ $\displaystyle \le$ $\displaystyle \Vert A^{-1}\Vert ($residual error$\displaystyle )$  
$\displaystyle ($residual error$\displaystyle )$ $\displaystyle \le$ $\displaystyle \Vert A \Vert($solution error$\displaystyle )$  

Relative error

Often, it's useful to consider the size of an error relative to the true quantity. This quantity is sometimes multiplied by 100 and expressed as a ``percentage.'' If the true solution is $ x$ and we computed $ x_{\mbox{approx}}=x+\Delta x$, the relative solution error is defined as

$\displaystyle ($relative solution error$\displaystyle )$ $\displaystyle =$ $\displaystyle \frac{ \Vert x_{\mbox{approx}} - x \Vert}{\Vert x \Vert}$  
  $\displaystyle =$ $\displaystyle \frac{\Vert \Delta x \Vert}{\Vert x \Vert}$  

Given the computed solution $ x_{\mbox{approx}}$, we know that it satifies the equation $ Ax_{\mbox{approx}}=b_{\mbox{approx}}$. If we write $ b_{\mbox{approx}}=b+\Delta b$, then we can define the relative residual error as:
$\displaystyle ($relative residual error$\displaystyle )$ $\displaystyle =$ $\displaystyle \frac{\Vert b_{\mbox{approx}} - b \Vert}{\Vert b \Vert}$  
  $\displaystyle =$ $\displaystyle \frac{\Vert \Delta b \Vert}{\Vert b \Vert}$  

These quantities depend on the vector norm used, they cannot be defined in cases where the divisor is zero, and they are problematic when the divisor is small.

Actually, relative quantities are important beyond being ``easier to understand.'' Consider the boundary value problem (BVP) for the ordinary differential equation

$\displaystyle u''$ $\displaystyle =$ $\displaystyle -\frac{\pi^2}{4}\sin(\frac{\pi x}{2})$  
$\displaystyle u(0)$ $\displaystyle =$ 0 (2)
$\displaystyle u(1)$ $\displaystyle =$ $\displaystyle 1$  

This problem has the exact solution $ u=\sin\pi x/2$. In the following exercise you will be computing the solution for various mesh sizes and using vector norms to compute the solution error. Since the topic of ordinary differential equations has not been covered yet, you will be given a Matlab function to compute the solution. You will see why relative errors are most convenient for error computation.
Exercise 6:
  1. Download a copy of lab02bvp.m. Use the command help lab02bvp to see how it is used. Feel free to examine the code to see exactly what it does. Note: In earlier sections, the vector $ \mathbf{x}$ represented the unknown. Here, $ \mathbf{u}$ represents the unknown (dependent variable) and $ \mathbf{x}$ represents the independent variable.
  2. As a test, solve the system for npts=10, plot the solution and compare it with sin(pi*x/2). You can use the following code to do so.
    plot(x,u,x,sin(pi*x/2));
    legend('computed solution','exact solution');
    
    The two solutions should be quite close. You do not need to send me the plot.
  3. Use code such as the following to compute solution errors for various mesh sizes, and fill in the following table. Recall that the exact solution is $ \sin\pi x/2$. (Note: you may want to put this code into a script m-file.)
    sizes=[10 20 40 80 160 320 640];
    for k=1:7
      [x,y]=lab02bvp(sizes(k));
      error(k,1)=norm(y-sin(pi*x'/2));
      relative_error(k,1)=error(k,1)/norm(sin(pi*x/2));
    end
    disp([error(1:6)./error(2:7) ,  ...
          relative_error(1:6)./relative_error(2:7)])
    
                      Euclidean norm
              error ratio       relative error ratio
     10/ 20   ___________       ___________
     20/ 40   ___________       ___________
     40/ 80   ___________       ___________
     80/160   ___________       ___________
    160/320   ___________       ___________
    320/640   ___________       ___________
    
  4. Repeat the above for the $ L^1$ vector norm.
                      1 vector norm
              error ratio       relative error ratio
     10/ 20   ___________       ___________
     20/ 40   ___________       ___________
     40/ 80   ___________       ___________
     80/160   ___________       ___________
    160/320   ___________       ___________
    320/640   ___________       ___________
    
  5. Repeat the above for the $ L^{\infty}$ vector norm.
                      Infinity vector norm
              error ratio       relative error ratio
     10/ 20   ___________       ___________
     20/ 40   ___________       ___________
     40/ 80   ___________       ___________
     80/160   ___________       ___________
    160/320   ___________       ___________
    320/640   ___________       ___________
    
  6. The method used to solve this boundary value problem converges as $ O(h^2)$. Since the number of mesh points is about $ 1/h$, then doubling the number of mesh points should quarter the error. Which of the above calculations yields this rate?
As you will see, convergence rates are an important component of this course, and you can see it is almost always best to use relative errors in computing convergence rates of vector quantities.

The reason that the $ L^1$ and $ L^2$ norms give different results is that the dimension of the space, $ n$ creeps into the calculation. For example, if a function is identically one, $ f(x)=1$, then its $ L^2$ norm, $ \int_0^1\vert f(x)\vert^2dx$ is one, but if a vector of dimension $ n$ has all components equal to one, then its $ L^2$ norm is $ \sqrt{n}$. Taking relative norms eliminates the dependence on $ n$.

When you are doing your own research, you will have occasion to compare theoretical and computed convergence rates. When doing so, you may use tables such as those above. If you find that your computed convergence rates differ from the theoretical ones, instead of looking for some error in your theory, you should first check that you are using the correct norm!


Condition Numbers

Given a square matrix $ A$, the $ L^{2}$ condition number $ k_{2}(A)$ is defined as:

$\displaystyle k_{2}(A) = \Vert A\Vert _{2} \Vert A^{-1}\Vert _{2}$ (3)

if the inverse of $ A$ exists. If the inverse does not exist, then we say that the condition number is infinite. Similar definitions apply for $ k_{1}(A)$ and $ k_{\infty}(A)$.

Matlab provides three functions for computing condition numbers: cond, condest, and rcond. cond computes the condition number according to Equation (3), and can use the one norm, the two norm, the infinity norm or the Frobenius norm. This calculation can be expensive, but it is accurate. condest computes an estimate of the condition number using the one norm. rcond uses a different method to estimate the ``reciprocal condition number,'' defined as

   rcond(A)$\displaystyle = \left\{\begin{array}{ll}
1/k_{1}(A) & \mbox{if $A$ is nonsingular}\\
0& \mbox{if $A$ is singular}\end{array}\right.
$

So as a matrix ``goes singular,'' rcond(A) goes to zero in a way similar to the determinant.

We won't worry about the fact that the condition number is somewhat expensive to compute, since it requires computing the inverse or (possibly) the singular value decomposition (a topic to be studied later). Instead, we'll concentrate on what it's good for. We already know that it's supposed to give us some idea of how singular the matrix is. But its real role is in error estimation for the linear system problem.

We suppose that we are really interested in solving the linear system

$\displaystyle A x = b $

but that the right hand side we give to the computer has a small error or ``perturbation'' in it. We might denote this perturbed right hand side as $ b+\Delta b$. We can then assume that our solution will be ``slightly'' perturbed, so that we are justified in writing the system as

$\displaystyle A (x + \Delta x) = b + \Delta b $

The question is, if $ \Delta b$ is really small, can we expect that $ \Delta x$ is small? Can we actually guarantee such a limit?

If we are content to look at the relative errors, and if the norm used to define $ k(A)$ is compatible with the vector norm used, it is fairly easy to show that:

$\displaystyle \frac{\Vert\Delta x\Vert}{\Vert x\Vert}\le k(A)\frac{\Vert\Delta b\Vert}{\Vert b\Vert}$

You can see that we would like the condition number to be as small as possible. (It turns out that the condition number is bounded below. What is the smallest possible value of the condition number?). In particular, since we have about 14 digits of accuracy in Matlab, if a matrix has a condition number of $ 10^{14}$, or rcond(A) of $ 10^{-14}$, then an error in the last significant digit of any element of the right hand side has the potential to make all of the digits of the solution wrong!

In the exercises below we will have occasion to use a special matrix called the Frank matrix. Row $ i$ of the $ n\times n$ Frank matrix has the formula:

$\displaystyle {\bf A}_{i,j}=\left\{ \begin{array}{cl}
0 & \mbox{ for }j<i-2\\
n+1-i & \mbox{ for }j=i-1\\
n+1-j & \mbox{ for }j\ge i
\end{array}\right.
$

The Frank matrix for $ n=5$ looks like:

$\displaystyle \left(\begin{array}{ccccc}
5&4&3&2&1\\
4&4&3&2&1\\
0&3&3&2&1\\
0&0&2&2&1\\
0&0&0&1&1
\end{array}\right)
$

The determinant of the Frank matrix is 1, but is difficult to compute numerically. This matrix has a special form called Hessenberg form wherein all elements below the first subdiagonal are zero. Matlab provides the Frank matrix in its ``gallery'' of matrices, gallery('frank',n), but we will use an m-file frank.m. You can find more information about the Frank matrix from the Matrix Market, and the references therein.

Exercise 7: To see how the condition number can warn you about loss of accuracy, let's try solving the problem $ Ax=b$, for x=ones(n,1), and with A being the Frank matrix. Compute
  b=A*x;
  xsolved=A\b;
  difference=xsolved-x;
Of course, xsolved would be the same as x if there were no arithmetic rounding errors, but there are rounding errors, so difference is not zero. This loss of accuracy is the point of the exercise. For convenience let's use Matlab's estimate cond(A) for the $ L^2$ condition number, and assume that the relative error in b is eps, the machine precision. (Recall that Matlab understands the name eps to indicate the smallest number you can add to 1 and get a result different from 1.) We expect that the relative solution error to be bounded by eps * cond(A). Since cond uses the Euclidean norm by default, use the Euclidean norm in constructing the table.
   Matrix 
   Size    cond(A)       eps*cond(A)   ||difference||/||x||
    6      __________    __________       __________
   12      __________    __________       __________
   18      __________    __________       __________
   24      __________    __________       __________
Your second and third columns should be roughly comparable in size (within a factor of 1000 or so).



Back to MATH2071 page.


Mike Sussman 2009-01-05