We have seen one factorization-PLU-that can be used to solve a linear system provided that the system is square, and that it is nonsingular, and that it is not too badly conditioned. However, if we want to handle problems with a bad condition number, or that are singular, or even rectangular, we are going to need to come up with a different approach. In this lab, we will look at two versions of the QR factorization:
We will see that the QR factorization can be used to solve the same problems that the PLU factorization handles, but can be extended to do several other tasks for which the PLU factorization is useless. In situations where we are concerned about controlling roundoff error, the QR factorization can be more accurate than the PLU factorization, and is even a little easier to use when solving linear systems, since the inverse of is simply .
We will also see that the SVD can be used to compress data, to identify the range and nullspace of a linear operator, and even to solve non-square systems.
You may find it convenient to print the pdf version of this lab rather than the web page itself. This lab will take three sessions.
Definition: An ``orthogonal matrix'' is a real matrix whose inverse is equal to its transpose.
By convention, an orthogonal matrix is usually denoted by the symbol . The definition of an orthogonal matrix immediately implies that
If I multiply a two-dimensional vector by and its norm doesn't change, then must lie on the circle whose radius is . In other words, the only thing that has happened is that has been rotated around the origin by some angle, or reflected through the origin or about a diameter. That means that a two-dimensional orthogonal matrix represents a rotation or reflection. In N dimensions, orthogonal matrices are also often called rotations.
When matrices are complex, the term ``unitary'' is an analog to ``orthogonal.'' A matrix is unitary if , where the superscript refers to the ``Hermitian'' or ``conjugate-transpose'' of the matrix. In Matlab, the prime operator implements the Hermitian and the dot-prime operator implements the transpose. A real matrix that is unitary is orthogonal.
Suppose we have vectors , each of dimension . Here are some common and related questions:
The ``Gram Schmidt method'' can be thought of as a process that analyzes a set of vectors , producing an ``equivalent'' (and possibly smaller) set of vectors that span the same space, have unit norm, and are pairwise orthogonal. Note that if we can produce such a set of vectors, then we can easily answer many of the questions in our set of problems. In particular, the size of the set tells us whether the set was linearly independent, and the dimension of the space spanned and the rank of a matrix constructed from the vectors. And of course, the vectors in are the orthonormal basis we were seeking. So you should believe that being able to compute the vectors is a valuable ability. In fact, the Gram-Schmidt method can help us with all the problems on our list.
We will be using the Gram-Schmidt process to factor a matrix, but the process itself pops up repeatedly whenever sets of vectors must be made orthogonal. You may see it, for example, in Krylov space methods for iteratively solving systems of equations and for eigenvalue problems.
In words, the Gram-Schmidt process goes like this:
= 0 % will become the number of vectors |
for k = 1 to |
for = 1 to % if the loop is skipped |
= |
= - |
end |
= |
if > 0 |
= + 1 |
= |
end |
end |
You should be able to match this algorithm to the previous verbal description. Can you see how the norm of a vector is being computed?
function Q = gram_schmidt ( X ) % Q = gram_schmidt ( X ) % more comments % your name and the date
Assume the vectors are stored as columns of the matrix X. Be sure you understand this data structure because it is a potential source of confusion. In the algorithm description above, the vectors are members of a set , but here these vectors are stored as the columns of a matrix X. Similarly, the vectors in the set are stored as the columns of a matrix Q. The first column of X corresponds to the vector , so that the Matlab expression X(k,1) refers to the component of the vector , and the Matlab expression X(:,1) refers to the Matlab column vector corresponding to . Similarly for the other columns. Include your code with your summary report.
X = [ 1 1 1 0 1 1 0 0 1 ]You should be able to see that the correct result is the identity matrix. If you do not get the identity matrix, find your bug before continuing.
X = [ 1 1 1 1 1 0 1 0 0 ]The columns of Q should have norm 1, and be pairwise orthogonal, and you should be able to confirm these facts ``by inspection.''
X = [ 2 -1 0 -1 2 -1 0 -1 2 0 0 -1 ]If your code is working correctly, you should compute approximately:
[ 0.8944 0.3586 0.1952 -0.4472 0.7171 0.3904 0.0000 -0.5976 0.5855 0.0000 0.0000 -0.6831 ]You should verify that the columns of Q have norm 1, and are pairwise orthogonal. If you like programming problems, think of a way to do this check in a single line of Matlab code.
We need to take a closer look at the Gram-Schmidt process. Recall how the process of Gauss elimination could actually be regarded as a process of factorization. This insight enabled us to solve many other problems. In the same way, the Gram-Schmidt process is actually carrying out a different factorization that will give us the key to other problems. Just to keep our heads on straight, let me point out that we're about to stop thinking of as a bunch of vectors, and instead regard it as a matrix. Since it is traditional for matrices to be called , that's what we'll call our set of vectors from now on.
Now, in the Gram-Schmidt algorithm, the numbers that we called and , that we computed, used, and discarded, actually record important information. They can be regarded as the nonzero elements of an upper triangular matrix . The Gram-Schmidt process actually produces a factorization of the matrix of the form:
function [ Q, R ] = gs_factor ( A ) % [ Q, R ] = gs_factor ( A ) % more comments % your name and the dateInclude a copy of your code with your summary.
Recall: [Q,R]=gs_factor is the syntax that Matlab uses to return two matrices from the function. When calling the function, you use the same syntax:
[Q,R]=gs_factor(...When you use this syntax, it is OK to leave out the comma between the Q and the R but leaving out that comma is a syntax error in the signature line of a function m-file.
A = [ 1 1 1 1 1 0 1 0 0 ]You computed Q in an earlier exercise, and you should get the same matrix Q again. Check that : if so, your code is correct.
A = [ 1 2 3 4 5 6 7 8 9 1 0 0 ]
A = [ 1 2 3+10*eps 4 5 6 7 8 9 ]Are the columns of your Q matrix orthonormal? (Recall that eps is the smallest number you can add to 1.0 and change its value.)
The ``solution'' to this problem is more art than science, and is beyond the scope of this lab. One feasible way to fix the problem would be to replace zero with, say, eps*max(max(R)).
It turns out that it is possible to take a given vector
and find a matrix so that the product
is proportinal to the vector
. That is, it is possible to
find a matrix that ``zeros out'' all but the first
entry of
. The matrix is given by
There is a way to use this idea to take any column of a matrix and make those entries below the diagonal entry be zero. Repeating this sequentially for each column will result in an upper triangular matrix. A way to do this is the following algorithm.
Start with a column vector of length
The Householder matrix will be constructed from a vector
The algorithm for constructing and hence can be summarized as follows.
function H = householder(b, k) % H = householder(b, k) % more comments % your name and the date n = length(b); d(:,1) = b(k:n); if d(1)>=0 alpha = -norm(d); else alpha = norm(d); end
H = eye(n);otherwise, complete the above algorithm.
This householder function can be used for the QR factorization of a matrix by proceeding through a series of partial factorizations , where is the identity matrix, and is the matrix . When we begin the step of factorization, our factor is only upper triangular in columns 1 to . Our goal on the -th step is to find a better factor that is upper triangular through column . If we can do this process times, we're done. Suppose, then, that we've partially factored the matrix , up to column . In other words, we have a factorization
For a rectangular by matrix , the Householder QR factorization has the form
If the matrix is not square, then this definition is different from the Gram-Schmidt factorization we discussed before. The obvious difference is the shape of the factors. Here, it's the matrix that is square. The other difference, which you'll have to take on faith, is that the Householder factorization is generally more accurate (smaller arithmetic errors), and easier to define compactly.
Householder QR Factorization Algorithm:
Q = I; |
R = A; |
for k = 1:min(m,n) |
Construct the Householder matrix H for column k of the matrix R; |
Q = Q * H'; |
R = H * R; |
end |
function [ Q, R ] = h_factor ( A ) % [ Q, R ] = h_factor ( A ) % more comments % your name and the dateUse the routine householder.m in order to compute the H matrix that you need at each step.
A = [ 0 1 1 1 0 1 0 0 1 ]You should see that simply interchanging the first and second columns of A turns it into an upper triangular matrix, so you can see that Q is a permutation matrix and you can guess what the solution is. Be sure that Q*R is A.
If we have computed the Householder QR factorization of a matrix
without encountering any singularities, then it is easy to solve linear
systems. We use the property of the factor that
:
function x = h_solve ( Q, R, b ) % x = h_solve ( Q, R, b ) % more comments % your name and the dateThe matrix R is upper triangular, so you can use u_solve to solve (2) above. Assume that the QR factors come from the h_factor routine. Set up your code to compute and then solve the upper triangular system using u_solve.
When you think your solver is working, test it out on a system as follows:
n = 7 A = magic ( n ) x = [ 1 : n ]' b = A * x [ Q, R ] = h_factor ( A ) x2 = h_solve ( Q, R, b ) norm(x - x2) % should be close to zero.
Now you know another way to solve a square linear system. It turns out that you can also use the QR algorithm to solve non-square systems, but that task is better left to the singular value decomposition.
Another very important matrix product decomposition is the singular value decomposition of a matrix into the product of three matrices. If is an matrix, then there are three matrices , and , with
The singular value decomposition provides much useful information about the matrix . This information includes:
Numerical methods for finding the singular value decomposition will be addressed in the next lab. One ``obvious'' algorithm involves finding the eigenvalues of , but this is not really practical because of roundoff difficulties. Matlab includes a function called svd with signature [U S V]=svd(A) to compute the singular value decomposition and we will be using it in the following exercises. This function uses the Lapack subroutine dgesvd, so if you were to need it in a Fortran or C program, it would be available by linking against the Lapack library.
Supposing that is the smallest integer so that , then the SVD implies that the rank of the matrix is . Similarly, if the matrix is , then the rank of the nullspace of is . The first columns of are an orthonormal basis of the range of , and the last columns of are an orthonormal basis for the nullspace of . Of course, in the presence of roundoff some judgement must be made as to what constitutes a zero singular value, but the SVD is the best (if expensive) way to discover the rank and nullity of a matrix. This is especially true when there is a large gap between the ``smallest of the large singular values'' and the ``largest of the small singular values.''
The SVD can also be used to solve a matrix system. Assuming that the matrix is non-singular, all singular values are strictly positive, and the SVD can be used to solve a system.
Imagine that you have been given many ``samples'' of related data involving several variables and you wish to find a linear relationship among the variables that approximates the given data in the best-fit sense. This can be written as a rectangular system
As we have done several times before, we will first generate a data set with known solution and then use svd to recover the known solution. Place Matlab code for the following steps into a script m-file called exer8.m
N=20; d1=rand(N,1); d2=rand(N,1); d3=rand(N,1); d4=4*d1-3*d2+2*d3-1;It should be obvious that these vectors satisfy the equation
EPSILON=1.e-5; d1=d1.*(1+EPSILON*rand(N,1)); d2=d2.*(1+EPSILON*rand(N,1)); d3=d3.*(1+EPSILON*rand(N,1)); d4=d4.*(1+EPSILON*rand(N,1));
[U S V]=svd(A);Please include the four non-zero values of S in your summary, but not the matrices U and V.
Sometimes the data you are given turns out to be deficient because the variables supposed to be independent are actually related. This dependency will cause the coefficient matrix to be singular or nearly singular. Dealing with dependencies of this sort is one of the greatest difficulties in using least-squares fitting methods. In the following exercise you will construct a deficient set of data and see that the singular value decomposition can find the solution nonetheless.
d3=rand(N,1);with the line
d3=d1+d2;This makes the data deficient and introduces nonuniqueness into the solution for the coefficients.
In the following exercise you will see how the singular value decomposition can be used to ``compress'' a graphical figure by representing the figure as a matrix and then using the singular value decomposition to find the closest matrix of lower rank to the original. This approach can form the basis of efficient compression methods.
mars=double(imread('mars.jpg'));The variable mars will be a matrix of integers between 0 and 255.
colormap(gray(256));to set a grayscale colormap. Now you can show the picture with the command
image(mars)Please do not send me this plot.
mars200=U(:,1:200)*S(1:200,1:200)*V(:,1:200)'; mars100=U(:,1:100)*S(1:100,1:100)*V(:,1:100)'; mars25=U(:,1:25)*S(1:25,1:25)*V(:,1:25)';These matrices are of lower rank than the mars matrix, and can be stored in a more efficient manner. (The mars matrix is and requires 250,000 numbers to store it. In contrast, mars100 requires subsets of and that are and the diagonal of , for a total of 20,100. This is better than two to one savings in space.) Do not include these matrices with your summary!
figure(1); colormap(gray(256)); image(mars); title('Original mars'); figure(2); colormap(gray(256)); image(mars25); title('mars with 25 singular values'); figure(3); colormap(gray(256)); image(mars100); title('mars with 100 singular values'); figure(4); colormap(gray(256)); image(mars200); title('mars with 200 singular values');You should see essentially no difference between the original and the one with 200 singular values, while you should see degradation of the image in the case of 25 singular values. Please only include the 25 singular value case with your summary.