MATH2071: LAB #2: Norms, Errors and Condition Numbers

# Introduction

The objects we work with in linear systems are vectors and matrices. In order to make statements about the size of these objects, and the errors we make in solutions, we want to be able to describe the sizes'' of vectors and matrices, which we do by using norms.

We then need to consider whether we can bound the size of the product of a matrix and vector, given that we know the size'' of the two factors. In order for this to happen, we will need to use matrix and vector norms that are compatible. These kinds of bounds will become very important in error analysis.

We will then consider the notions of forward error and backward error in a linear algebra computation.

From the definitions of norms and errors, we can define the condition number of a matrix, which will give us an objective way of measuring how bad" a matrix is, and how many digits of accuracy we can expect when solving a particular linear system.

This lab will take two sessions. You may find it convenient to print the pdf version of this lab rather than the web page itself.

# Vector Norms

A vector norm assigns a size to a vector, in such a way that scalar multiples do what we expect, and the triangle inequality is satisfied. There are three common vector norms in dimensions:

• The vector norm

• The (or Euclidean'') vector norm

• The vector norm

To compute the norm of a vector in Matlab:

• norm(x,1);
• norm(x,2)= norm(x);
• norm(x,inf)
(Recall that inf is the Matlab symbol corresponding to .)

Exercise 1: For each of the following vectors:
   x1 = [ 1; 2; 3 ]
x2 = [ 1; 0; 0 ]
x3 = [ 1; 1; 1 ]

compute the vector norms, using the appropriate Matlab commands. Be sure your answers are reasonable.
        L1            L2            L Infinity

x1    __________    __________    __________
x2    __________    __________    __________
x3    __________    __________    __________


# Matrix Norms

A matrix norm assigns a size to a matrix, again, in such a way that scalar multiples do what we expect, and the triangle inequality is satisfied. However, what's more important is that we want to be able to mix matrix and vector norms in various computations. So we are going to be very interested in whether a matrix norm is compatible with a particular vector norm, that is, when it is safe to say:

There are five common matrix norms:

• The or max column sum'' matrix norm:

• The matrix norm:

where is a (necessarily real) eigenvalue of or

where is a singular value of ;
• The or max row sum'' matrix norm:

• The Frobenius'' matrix norm:

• The spectral'' matrix norm;

(only defined for a square matrix), where is a (possibly complex) eigenvalue of .

To compute the norm of a matrix in Matlab:

• norm(A,1);
• norm(A,2)=norm(A);
• norm(A,inf);
• norm(A,'fro')
• (see below)

# Compatible Matrix Norms

A matrix can be identified with a linear operator, and the norm of a linear operator is usually defined in the following way.

(It would be more precise to use rather than here but the surface of a sphere in finite-dimensional space is a compact set, so the supremum is attained, and the maximum is correct.) A matrix norm defined in this way is said to be vector-bound'' to the given vector norm.

In order for a matrix norm to be consistent with the linear operator norm, you need to be able to say the following:

 (1)

but this expression is not true for an arbitrary chosen pair of matrix and vector norms. If it is true, then the two are compatible''.

If a matrix norm is vector-bound to a particular vector norm, then the two norms are guaranteed to be compatible. Thus, for any vector norm, there is always at least one matrix norm that we can use. But that vector-bound matrix norm is not always the only choice. In particular, the matrix norm is difficult (expensive) to compute, but there is a simple alternative.

Note that:

• The , and matrix norms can be shown to be vector-bound to the corresponding vector norms and hence are guaranteed to be compatible with them;
• The Frobenius matrix norm is not vector-bound to the vector norm, but is compatible with it; the Frobenius norm is much easier to compute than the matrix norm.
• The spectral matrix norm is not vector-bound to any vector norm, but it almost" is. This norm is useful because we often want to think about the behavior of a matrix as being determined by its largest eigenvalue, and it often is. But there is no vector norm for which it is always true that

Exercise 2: Consider each of the following column vectors:
   x1 = [ 1, 2, 3 ]'
x2 = [ 1, 0, 0 ]'
x3 = [ 1, 1, 1 ]'

For the same matrix A you used above:
A=[  4  1  1
0 -2  2
0  5 -4 ]

verify that the compatibility condition holds by comparing the values of that you computed in the previous exercise with the ratios of . The final column refers to satisfaction of the compatibility relationship (1).

Matrix  Vector            norm(A*x1)   norm(A*x2)   norm(A*x3)
norm(A)   ----------   ----------   ----------   OK?
norm    norm              norm(x1)     norm(x2)     norm(x3)

1      1    _________  __________   __________   __________   ___
2      2    _________  __________   __________   __________   ___
'fro'    2    _________  __________   __________   __________   ___
inf    inf   _________  __________   __________   __________   ___


# More on the Spectral Radius

In the above text there is a statement that the spectral radius is not vector-bound with any norm, but that it almost'' is. In this section we will see by example what this means.

In the first place, let's try to see why it isn't vector-bound to the norm. Consider the following false proof''. Given a matrix , for any vector , break it into a sum of eigenvectors of as

where are the eigenvectors of , normalized to unit length. Then, for the eigenvalues of ,

and taking square roots completes the false proof.''

Why is this proof'' false? The error is in the statement that all vectors can be expanded as sums of orthonormal eigenvectors. If you knew, for example, that were positive definite and symmetric, then, the eigenvectors of would form an orthonormal basis and the proof would be correct. Hence the term almost.'' On the other hand, this observation provides the counterexample.

Exercise 3: Consider the following matrix
A=[0.5  1   0   0   0   0   0
0  0.5  1   0   0   0   0
0   0  0.5  1   0   0   0
0   0   0  0.5  1   0   0
0   0   0   0  0.5  1   0
0   0   0   0   0  0.5  1
0   0   0   0   0   0  0.5]

You should recognize this as a Jordan block. Any matrix can be decomposesed into several such blocks by a change of basis.
1. Use the Matlab routine [V,D]=eig(A) (recall that the notation [V,D]= is that way that Matlab denotes that the function--eigin this case--returns two quantities) to get the eigenvalues (diagonal entries of D) and eigenvectors (columns of V) of A. How many linearly independent eigenvectors are there?
2. You just computed the eigenvalues of A. What is the spectral norm of A?
3. For the vector x=[1;1;1;1;1;1;1], what are , , and ?
4. Is Equation (1) satisfied?

It is a well-known fact that if the spectral radius of a matrix A is smaller than 1.0 then .

Exercise 4:
1. For x=[1;1;1;1;1;1;1] compute and plot for . Please include this plot with your summary.
2. What is the smallest value of for which ?
3. What is ?

# Types of Errors

A natural assumption to make is that the term error'' refers always to the difference between the computed and exact answers.'' We are going to have to discuss several kinds of error, so let's refer to this first error as solution error'' or forward error.'' Suppose we want to solve a linear system of the form , (with exact solution ) and we computed . We define the solution error as .

Sometimes the solution error is not possible to compute, and we would like a substitute whose behavior is acceptably close to that of the solution error. One example would be during an iterative solution process where we would like to monitor the progress of the iteration but we do not yet have the solution and cannot compute the solution error. In this case, we are interested in the residual error'' or backward error,'' which is defined by where, for convenience, we have defined the variable to equal . Another way of looking at the residual error is to see that it's telling us the difference between the right hand side that would work'' for versus the right hand side we have.

If we think of the right hand side as being a target, and our solution procedure as determining how we should aim an arrow so that we hit this target, then

• The solution error is telling us how badly we aimed our arrow;
• The residual error is telling us how much we would have to move the target in order for our badly aimed arrow to be a bullseye.
There are problems for which the solution error is huge and the residual error tiny, and all the other possible combinations also occur.

Exercise 5: For each of the following cases, compute the Euclidean norm (two norm) of the solution error and residual error and characterize each as large'' or small.'' (For the purpose of this exercise, take large'' to mean greater than 1 and small'' to mean smaller than 0.01.) In each case, you must find the true solution first (Pay attention! In each case you can guess the true solution, xTrue.), then compare it with the approximate solution xApprox.
1. A=[1,1;1,(1-1.e-12)], b=[0;0], xApprox=[1;-1]
2. A=[1,1;1,(1-1.e-12)], b=[1;1], xApprox=[1.00001;0]
3. A=[1,1;1,(1-1.e-12)], b=[1;1], xApprox=[100;100]
4. A=[1.e+12,-1.e+12;1,1], b=[0;2], xApprox=[1.001;1]

Case Residual large/small  xTrue   Error   large/small
1   ________   _______   ______   _______   ________
2   ________   _______   ______   _______   ________
3   ________   _______   ______   _______   ________
4   ________   _______   ______   _______   ________


The norms of the matrix and its inverse exert some limits on the relationship between the forward and backward errors. Assuming we have compatible norms:

and

Put another way,
 solution error residual error residual error solution error

## Relative error

Often, it's useful to consider the size of an error relative to the true quantity. This quantity is sometimes multiplied by 100 and expressed as a percentage.'' If the true solution is and we computed , the relative solution error is defined as

 relative solution error

Given the computed solution , we know that it satifies the equation . If we write , then we can define the relative residual error as:
 relative residual error

These quantities depend on the vector norm used, they cannot be defined in cases where the divisor is zero, and they are problematic when the divisor is small.

Actually, relative quantities are important beyond being easier to understand.'' Consider the boundary value problem (BVP) for the ordinary differential equation

 0 (2)

This problem has the exact solution . In the following exercise you will be computing the solution for various mesh sizes and using vector norms to compute the solution error. Since the topic of ordinary differential equations has not been covered yet, you will be given a Matlab function to compute the solution. You will see why relative errors are most convenient for error computation.
Exercise 6:
1. Download a copy of lab02bvp.m. Use the command help lab02bvp to see how it is used. Feel free to examine the code to see exactly what it does. Note: In earlier sections, the vector represented the unknown. Here, represents the unknown (dependent variable) and represents the independent variable.
2. As a test, solve the system for npts=10, plot the solution and compare it with sin(pi*x/2). You can use the following code to do so.
plot(x,u,x,sin(pi*x/2));
legend('computed solution','exact solution');

The two solutions should be quite close. You do not need to send me the plot.
3. Use code such as the following to compute solution errors for various mesh sizes, and fill in the following table. Recall that the exact solution is . (Note: you may want to put this code into a script m-file.)
sizes=[10 20 40 80 160 320 640];
for k=1:7
[x,y]=lab02bvp(sizes(k));
error(k,1)=norm(y-sin(pi*x'/2));
relative_error(k,1)=error(k,1)/norm(sin(pi*x/2));
end
disp([error(1:6)./error(2:7) ,  ...
relative_error(1:6)./relative_error(2:7)])

                  Euclidean norm
error ratio       relative error ratio
10/ 20   ___________       ___________
20/ 40   ___________       ___________
40/ 80   ___________       ___________
80/160   ___________       ___________
160/320   ___________       ___________
320/640   ___________       ___________

4. Repeat the above for the vector norm.
                  1 vector norm
error ratio       relative error ratio
10/ 20   ___________       ___________
20/ 40   ___________       ___________
40/ 80   ___________       ___________
80/160   ___________       ___________
160/320   ___________       ___________
320/640   ___________       ___________

5. Repeat the above for the vector norm.
                  Infinity vector norm
error ratio       relative error ratio
10/ 20   ___________       ___________
20/ 40   ___________       ___________
40/ 80   ___________       ___________
80/160   ___________       ___________
160/320   ___________       ___________
320/640   ___________       ___________

6. The method used to solve this boundary value problem converges as . Since the number of mesh points is about , then doubling the number of mesh points should quarter the error. Which of the above calculations yields this rate?
As you will see, convergence rates are an important component of this course, and you can see it is almost always best to use relative errors in computing convergence rates of vector quantities.

The reason that the and norms give different results is that the dimension of the space, creeps into the calculation. For example, if a function is identically one, , then its norm, is one, but if a vector of dimension has all components equal to one, then its norm is . Taking relative norms eliminates the dependence on .

When you are doing your own research, you will have occasion to compare theoretical and computed convergence rates. When doing so, you may use tables such as those above. If you find that your computed convergence rates differ from the theoretical ones, instead of looking for some error in your theory, you should first check that you are using the correct norm!

# Condition Numbers

Given a square matrix , the condition number is defined as:

 (3)

if the inverse of exists. If the inverse does not exist, then we say that the condition number is infinite. Similar definitions apply for and .

Matlab provides three functions for computing condition numbers: cond, condest, and rcond. cond computes the condition number according to Equation (3), and can use the one norm, the two norm, the infinity norm or the Frobenius norm. This calculation can be expensive, but it is accurate. condest computes an estimate of the condition number using the one norm. rcond uses a different method to estimate the reciprocal condition number,'' defined as

rcond(A)

So as a matrix goes singular,'' rcond(A) goes to zero in a way similar to the determinant.

We won't worry about the fact that the condition number is somewhat expensive to compute, since it requires computing the inverse or (possibly) the singular value decomposition (a topic to be studied later). Instead, we'll concentrate on what it's good for. We already know that it's supposed to give us some idea of how singular the matrix is. But its real role is in error estimation for the linear system problem.

We suppose that we are really interested in solving the linear system

but that the right hand side we give to the computer has a small error or perturbation'' in it. We might denote this perturbed right hand side as . We can then assume that our solution will be slightly'' perturbed, so that we are justified in writing the system as

The question is, if is really small, can we expect that is small? Can we actually guarantee such a limit?

If we are content to look at the relative errors, and if the norm used to define is compatible with the vector norm used, it is fairly easy to show that:

You can see that we would like the condition number to be as small as possible. (It turns out that the condition number is bounded below. What is the smallest possible value of the condition number?). In particular, since we have about 14 digits of accuracy in Matlab, if a matrix has a condition number of , or rcond(A) of , then an error in the last significant digit of any element of the right hand side has the potential to make all of the digits of the solution wrong!

In the exercises below we will have occasion to use a special matrix called the Frank matrix. Row of the Frank matrix has the formula:

The Frank matrix for looks like:

The determinant of the Frank matrix is 1, but is difficult to compute numerically. This matrix has a special form called Hessenberg form wherein all elements below the first subdiagonal are zero. Matlab provides the Frank matrix in its gallery'' of matrices, gallery('frank',n), but we will use an m-file frank.m. You can find more information about the Frank matrix from the Matrix Market, and the references therein.

Exercise 7: To see how the condition number can warn you about loss of accuracy, let's try solving the problem , for x=ones(n,1), and with A being the Frank matrix. Compute
  b=A*x;
xsolved=A\b;
difference=xsolved-x;

Of course, xsolved would be the same as x if there were no arithmetic rounding errors, but there are rounding errors, so difference is not zero. This loss of accuracy is the point of the exercise. For convenience let's use Matlab's estimate cond(A) for the condition number, and assume that the relative error in b is eps, the machine precision. (Recall that Matlab understands the name eps to indicate the smallest number you can add to 1 and get a result different from 1.) We expect that the relative solution error to be bounded by eps * cond(A). Since cond uses the Euclidean norm by default, use the Euclidean norm in constructing the table.
   Matrix
Size    cond(A)       eps*cond(A)   ||difference||/||x||
6      __________    __________       __________
12      __________    __________       __________
18      __________    __________       __________
24      __________    __________       __________

Your second and third columns should be roughly comparable in size (within a factor of 1000 or so).

Back to MATH2071 page.

Mike Sussman 2009-01-05