The objects we work with in linear systems are vectors and matrices. In order to make statements about the size of these objects, and the errors we make in solutions, we want to be able to describe the ``sizes'' of vectors and matrices, which we do by using norms.
We then need to consider whether we can bound the size of the product of a matrix and vector, given that we know the ``size'' of the two factors. In order for this to happen, we will need to use matrix and vector norms that are compatible. These kinds of bounds will become very important in error analysis.
We will then consider the notions of forward error and backward error in a linear algebra computation.
From the definitions of norms and errors, we can define the condition number of a matrix, which will give us an objective way of measuring how ``bad" a matrix is, and how many digits of accuracy we can expect when solving a particular linear system.
This lab will take two sessions. You may find it convenient to print the pdf version of this lab rather than the web page itself.
A vector norm assigns a size to a vector, in such a way that scalar multiples do what we expect, and the triangle inequality is satisfied. There are three common vector norms in dimensions:
To compute the norm of a vector in Matlab:
x1 = [ 1; 2; 3 ] x2 = [ 1; 0; 0 ] x3 = [ 1; 1; 1 ]compute the vector norms, using the appropriate Matlab commands. Be sure your answers are reasonable.
L1 L2 L Infinity x1 __________ __________ __________ x2 __________ __________ __________ x3 __________ __________ __________
A matrix norm assigns a size to a matrix, again, in such a way that scalar multiples do what we expect, and the triangle inequality is satisfied. However, what's more important is that we want to be able to mix matrix and vector norms in various computations. So we are going to be very interested in whether a matrix norm is compatible with a particular vector norm, that is, when it is safe to say:
There are five common matrix norms:
To compute the norm of a matrix in Matlab:
A matrix can be identified with a linear operator, and the norm of a linear operator is usually defined in the following way.
In order for a matrix norm to be consistent with the linear operator norm, you need to be able to say the following:
If a matrix norm is vector-bound to a particular vector norm, then the two norms are guaranteed to be compatible. Thus, for any vector norm, there is always at least one matrix norm that we can use. But that vector-bound matrix norm is not always the only choice. In particular, the matrix norm is difficult (expensive) to compute, but there is a simple alternative.
Note that:
x1 = [ 1, 2, 3 ]' x2 = [ 1, 0, 0 ]' x3 = [ 1, 1, 1 ]'For the same matrix A you used above:
A=[ 4 1 1 0 -2 2 0 5 -4 ]verify that the compatibility condition holds by comparing the values of that you computed in the previous exercise with the ratios of . The final column refers to satisfaction of the compatibility relationship (1).
Matrix Vector norm(A*x1) norm(A*x2) norm(A*x3) norm(A) ---------- ---------- ---------- OK? norm norm norm(x1) norm(x2) norm(x3) 1 1 _________ __________ __________ __________ ___ 2 2 _________ __________ __________ __________ ___ 'fro' 2 _________ __________ __________ __________ ___ inf inf _________ __________ __________ __________ ___
In the above text there is a statement that the spectral radius is not vector-bound with any norm, but that it ``almost'' is. In this section we will see by example what this means.
In the first place, let's try to see why it isn't vector-bound to the norm. Consider the following false ``proof''. Given a matrix , for any vector , break it into a sum of eigenvectors of as
Why is this ``proof'' false? The error is in the statement that all vectors can be expanded as sums of orthonormal eigenvectors. If you knew, for example, that were positive definite and symmetric, then, the eigenvectors of would form an orthonormal basis and the proof would be correct. Hence the term ``almost.'' On the other hand, this observation provides the counterexample.
A=[0.5 1 0 0 0 0 0 0 0.5 1 0 0 0 0 0 0 0.5 1 0 0 0 0 0 0 0.5 1 0 0 0 0 0 0 0.5 1 0 0 0 0 0 0 0.5 1 0 0 0 0 0 0 0.5]You should recognize this as a Jordan block. Any matrix can be decomposesed into several such blocks by a change of basis.
It is a well-known fact that if the spectral radius of a matrix A is smaller than 1.0 then .
A natural assumption to make is that the term ``error'' refers always to the difference between the computed and exact ``answers.'' We are going to have to discuss several kinds of error, so let's refer to this first error as ``solution error'' or ``forward error.'' Suppose we want to solve a linear system of the form , (with exact solution ) and we computed . We define the solution error as .
Sometimes the solution error is not possible to compute, and we would like a substitute whose behavior is acceptably close to that of the solution error. One example would be during an iterative solution process where we would like to monitor the progress of the iteration but we do not yet have the solution and cannot compute the solution error. In this case, we are interested in the ``residual error'' or ``backward error,'' which is defined by where, for convenience, we have defined the variable to equal . Another way of looking at the residual error is to see that it's telling us the difference between the right hand side that would ``work'' for versus the right hand side we have.
If we think of the right hand side as being a target, and our solution procedure as determining how we should aim an arrow so that we hit this target, then
Case Residual large/small xTrue Error large/small 1 ________ _______ ______ _______ ________ 2 ________ _______ ______ _______ ________ 3 ________ _______ ______ _______ ________ 4 ________ _______ ______ _______ ________
The norms of the matrix and its inverse exert some limits on the relationship between the forward and backward errors. Assuming we have compatible norms:
solution error | residual error | ||
residual error | solution error |
Often, it's useful to consider the size of an error relative to the
true quantity. This quantity is sometimes multiplied by 100 and
expressed as a ``percentage.'' If the true solution is and we
computed
, the relative solution error is defined as
relative solution error | |||
relative residual error | |||
Actually, relative quantities are important beyond being ``easier
to understand.'' Consider the boundary value problem (BVP) for the ordinary
differential equation
plot(x,u,x,sin(pi*x/2)); legend('computed solution','exact solution');The two solutions should be quite close. You do not need to send me the plot.
sizes=[10 20 40 80 160 320 640]; for k=1:7 [x,y]=lab02bvp(sizes(k)); error(k,1)=norm(y-sin(pi*x'/2)); relative_error(k,1)=error(k,1)/norm(sin(pi*x/2)); end disp([error(1:6)./error(2:7) , ... relative_error(1:6)./relative_error(2:7)])
Euclidean norm error ratio relative error ratio 10/ 20 ___________ ___________ 20/ 40 ___________ ___________ 40/ 80 ___________ ___________ 80/160 ___________ ___________ 160/320 ___________ ___________ 320/640 ___________ ___________
1 vector norm error ratio relative error ratio 10/ 20 ___________ ___________ 20/ 40 ___________ ___________ 40/ 80 ___________ ___________ 80/160 ___________ ___________ 160/320 ___________ ___________ 320/640 ___________ ___________
Infinity vector norm error ratio relative error ratio 10/ 20 ___________ ___________ 20/ 40 ___________ ___________ 40/ 80 ___________ ___________ 80/160 ___________ ___________ 160/320 ___________ ___________ 320/640 ___________ ___________
The reason that the and norms give different results is that the dimension of the space, creeps into the calculation. For example, if a function is identically one, , then its norm, is one, but if a vector of dimension has all components equal to one, then its norm is . Taking relative norms eliminates the dependence on .
When you are doing your own research, you will have occasion to compare theoretical and computed convergence rates. When doing so, you may use tables such as those above. If you find that your computed convergence rates differ from the theoretical ones, instead of looking for some error in your theory, you should first check that you are using the correct norm!
Given a square matrix , the condition number is defined as:
Matlab provides three functions for computing condition numbers: cond, condest, and rcond. cond computes the condition number according to Equation (3), and can use the one norm, the two norm, the infinity norm or the Frobenius norm. This calculation can be expensive, but it is accurate. condest computes an estimate of the condition number using the one norm. rcond uses a different method to estimate the ``reciprocal condition number,'' defined as
We won't worry about the fact that the condition number is somewhat expensive to compute, since it requires computing the inverse or (possibly) the singular value decomposition (a topic to be studied later). Instead, we'll concentrate on what it's good for. We already know that it's supposed to give us some idea of how singular the matrix is. But its real role is in error estimation for the linear system problem.
We suppose that we are really interested in solving the linear system
If we are content to look at the relative errors, and if the norm used to define is compatible with the vector norm used, it is fairly easy to show that:
In the exercises below we will have occasion to use a special matrix called the Frank matrix. Row of the Frank matrix has the formula:
b=A*x; xsolved=A\b; difference=xsolved-x;Of course, xsolved would be the same as x if there were no arithmetic rounding errors, but there are rounding errors, so difference is not zero. This loss of accuracy is the point of the exercise. For convenience let's use Matlab's estimate cond(A) for the condition number, and assume that the relative error in b is eps, the machine precision. (Recall that Matlab understands the name eps to indicate the smallest number you can add to 1 and get a result different from 1.) We expect that the relative solution error to be bounded by eps * cond(A). Since cond uses the Euclidean norm by default, use the Euclidean norm in constructing the table.
Matrix Size cond(A) eps*cond(A) ||difference||/||x|| 6 __________ __________ __________ 12 __________ __________ __________ 18 __________ __________ __________ 24 __________ __________ __________Your second and third columns should be roughly comparable in size (within a factor of 1000 or so).
Back to MATH2071 page.