MATH2071: LAB #4: Factorizations

# Introduction

We have seen one factorization-PLU-that can be used to solve a linear system provided that the system is square, and that it is nonsingular, and that it is not too badly conditioned. However, if we want to handle problems with a bad condition number, or that are singular, or even rectangular, we are going to need to come up with a different approach. In this lab, we will look at two versions of the QR factorization:

where is an orthogonal'' matrix, and is upper triangular, and also the singular value decomposition'' (SVD) that factors a matrix into the form

We will see that the QR factorization can be used to solve the same problems that the PLU factorization handles, but can be extended to do several other tasks for which the PLU factorization is useless. In situations where we are concerned about controlling roundoff error, the QR factorization can be more accurate than the PLU factorization, and is even a little easier to use when solving linear systems, since the inverse of is simply .

We will also see that the SVD can be used to compress data, to identify the range and nullspace of a linear operator, and even to solve non-square systems.

You may find it convenient to print the pdf version of this lab rather than the web page itself. This lab will take three sessions.

# Orthogonal Matrices

Definition: An orthogonal matrix'' is a real matrix whose inverse is equal to its transpose.

By convention, an orthogonal matrix is usually denoted by the symbol . The definition of an orthogonal matrix immediately implies that

From the definition of an orthogonal matrix, and from the fact that the vector norm of can be defined by:

and the fact that

you should be able to deduce that, for orthogonal matrices,

If I multiply a two-dimensional vector by and its norm doesn't change, then must lie on the circle whose radius is . In other words, the only thing that has happened is that has been rotated around the origin by some angle, or reflected through the origin or about a diameter. That means that a two-dimensional orthogonal matrix represents a rotation or reflection. In N dimensions, orthogonal matrices are also often called rotations.

When matrices are complex, the term unitary'' is an analog to orthogonal.'' A matrix is unitary if , where the superscript refers to the Hermitian'' or conjugate-transpose'' of the matrix. In Matlab, the prime operator implements the Hermitian and the dot-prime operator implements the transpose. A real matrix that is unitary is orthogonal.

# A Set of Questions

Suppose we have vectors , each of dimension . Here are some common and related questions:

• Can we determine if the vectors are linearly independent?
• Can we determine the dimension of the space the vectors span? (This is the rank.)
• Can we construct an orthonormal basis for the space spanned by the vectors?
• Can we construct an orthonormal basis for the space of vectors perpendicular to all the ?
• Given a new vector , can we determine whether this vector lies in the space spanned by the other vectors?
• If a new vector lies in the space, what specific linear combination of the old vectors is equal to the new one?
• If we create an by matrix using the vectors as columns, what is the rank of the matrix?
• How can we solve a rectangular system of linear equations?
Since we actually plan to do these calculations numerically we also need to be concerned about how we deal with numerical error. For instance, in exact arithmetic, it's enough to say a matrix is not singular. In numerical calculations, we would like to be able to say that a particular matrix, while not singular, is "nearly singular".

# The Gram Schmidt Method

The Gram Schmidt method'' can be thought of as a process that analyzes a set of vectors , producing an equivalent'' (and possibly smaller) set of vectors that span the same space, have unit norm, and are pairwise orthogonal. Note that if we can produce such a set of vectors, then we can easily answer many of the questions in our set of problems. In particular, the size of the set tells us whether the set was linearly independent, and the dimension of the space spanned and the rank of a matrix constructed from the vectors. And of course, the vectors in are the orthonormal basis we were seeking. So you should believe that being able to compute the vectors is a valuable ability. In fact, the Gram-Schmidt method can help us with all the problems on our list.

We will be using the Gram-Schmidt process to factor a matrix, but the process itself pops up repeatedly whenever sets of vectors must be made orthogonal. You may see it, for example, in Krylov space methods for iteratively solving systems of equations and for eigenvalue problems.

In words, the Gram-Schmidt process goes like this:

1. Start with no vectors in ;
2. Consider , the next (possibly the first) vector in ;
3. For every vector already in , compute the projection of on (i.e., ) and subtract it from to get a vector orthogonal to ;
4. Compute the norm of (what's left of) . If the norm is zero (or too small), discard from ; otherwise, divide by its norm and move it from to .
5. If there are more vectors in , return to step 2.
6. When you get here, is empty and you have an orthonormal set of vectors spanning the same space as the original set .
Here is a sketch of the Gram Schmidt process as an algorithm. Assume that is the number of vectors:
 = 0 % will become the number of vectors for k = 1 to for = 1 to         % if the loop is skipped = = - end = if > 0 = + 1 = end end

You should be able to match this algorithm to the previous verbal description. Can you see how the norm of a vector is being computed?

Exercise 1:
1. Implement the Gram-Schmidt process in an m-file called gram_schmidt.m. Your function should have the signature:
function Q = gram_schmidt ( X )
% Q = gram_schmidt ( X )

% your name and the date


Assume the vectors are stored as columns of the matrix X. Be sure you understand this data structure because it is a potential source of confusion. In the algorithm description above, the vectors are members of a set , but here these vectors are stored as the columns of a matrix X. Similarly, the vectors in the set are stored as the columns of a matrix Q. The first column of X corresponds to the vector , so that the Matlab expression X(k,1) refers to the component of the vector , and the Matlab expression X(:,1) refers to the Matlab column vector corresponding to . Similarly for the other columns. Include your code with your summary report.

2. It is always best to test your code on simple examples for which you know the answers. Test your code using the following input:
  X = [ 1  1  1
0  1  1
0  0  1 ]

You should be able to see that the correct result is the identity matrix. If you do not get the identity matrix, find your bug before continuing.
3. Another simple test you can do in your head is the following:
  X = [ 1  1  1
1  1  0
1  0  0 ]

The columns of Q should have norm 1, and be pairwise orthogonal, and you should be able to confirm these facts by inspection.''
4. Test your Gram-Schmidt code to compute a matrix Q using the following input:
  X = [  2  -1   0
-1   2  -1
0  -1   2
0   0  -1 ]

If your code is working correctly, you should compute approximately:
      [  0.8944   0.3586   0.1952
-0.4472   0.7171   0.3904
0.0000  -0.5976   0.5855
0.0000   0.0000  -0.6831 ]

You should verify that the columns of Q have norm 1, and are pairwise orthogonal. If you like programming problems, think of a way to do this check in a single line of Matlab code.
5. Show that the matrix you just computed is not an orthogonal matrix, even though its columns form an orthornormal set. Are you surprised?

# Gram-Schmidt Factorization

We need to take a closer look at the Gram-Schmidt process. Recall how the process of Gauss elimination could actually be regarded as a process of factorization. This insight enabled us to solve many other problems. In the same way, the Gram-Schmidt process is actually carrying out a different factorization that will give us the key to other problems. Just to keep our heads on straight, let me point out that we're about to stop thinking of as a bunch of vectors, and instead regard it as a matrix. Since it is traditional for matrices to be called , that's what we'll call our set of vectors from now on.

Now, in the Gram-Schmidt algorithm, the numbers that we called and , that we computed, used, and discarded, actually record important information. They can be regarded as the nonzero elements of an upper triangular matrix . The Gram-Schmidt process actually produces a factorization of the matrix of the form:

Here, the matrix has the same shape'' as , so it's only square if is. The matrix will be square (), and upper triangular. Now that we're trying to produce a factorization of a matrix, we need to modify the Gram-Schmidt algorithm of the previous section. Every time we consider a vector at the beginning of a loop, we will now always produce a vector at the end of the loop, instead of dropping some out. Hence, if , the norm of vector , is zero, we will simply set to the zero vector.

Exercise 2:
1. Make a copy of the m-file gram_schmidt.m and call it gs_factor.m. Modify it to compute the and factors of a (possibly) rectangular matrix . It should have the signature
function [ Q, R ] = gs_factor ( A )
% [ Q, R ] = gs_factor ( A )

% your name and the date

Include a copy of your code with your summary.

Recall: [Q,R]=gs_factor is the syntax that Matlab uses to return two matrices from the function. When calling the function, you use the same syntax:

[Q,R]=gs_factor(...

When you use this syntax, it is OK to leave out the comma between the Q and the R but leaving out that comma is a syntax error in the signature line of a function m-file.
2. Compute the QR factorization of the following matrix.
  A = [ 1  1  1
1  1  0
1  0  0 ]

You computed Q in an earlier exercise, and you should get the same matrix Q again. Check that : if so, your code is correct.
3. Compute the QR factorization of a matrix of random numbers (A=rand(100,100)). (Hint: use norms to check the equalities.)
• Is it true that ? (Hint: Does norm(Q'*Q-eye(100),'fro') equal 0?)
• Is it true that ?
• Is orthogonal?
• Is the matrix upper triangular? (Hint: the Matlab functions tril or triu can help, so you don't have to look at all those numbers.)
• Is it true that ?
4. Compute the QR factorization of the following matrix.
  A = [ 1 2 3
4 5 6
7 8 9
1 0 0 ]

• Is it true that ?
• Is it true that ?
• Is orthogonal?
• Is the matrix upper triangular?
• Is it true that ?

Exercise 3: One step in the Gram-Schmidt process avoids dividing by zero by checking that . This is, in fact, very poor practice! The reason is that might be of roundoff size. Find the matrices Q and R for the matrix
  A = [ 1 2 3+10*eps
4 5   6
7 8   9 ]

Are the columns of your Q matrix orthonormal? (Recall that eps is the smallest number you can add to 1.0 and change its value.)

The solution'' to this problem is more art than science, and is beyond the scope of this lab. One feasible way to fix the problem would be to replace zero with, say, eps*max(max(R)).

# Householder Matrices

It turns out that it is possible to take a given vector and find a matrix so that the product is proportinal to the vector . That is, it is possible to find a matrix that zeros out'' all but the first entry of . The matrix is given by

 (1)

Note that is a scalar but is a matrix. To see why Equation (1) is true, denote and do the following calculation.

You can find further discussion in several places on the web. For example, a Planet Math (planetmath.org) encyclopedia article.

There is a way to use this idea to take any column of a matrix and make those entries below the diagonal entry be zero. Repeating this sequentially for each column will result in an upper triangular matrix. A way to do this is the following algorithm.

Start with a column vector of length

and denote part of below the line as , a column vector of length .

We are thinking of the column vector as the th column of some matrix, so is the part of the column consisting of the diagonal and all values below the diagonal. The objective is to construct a matrix with the property that

where the component of the product, , is non-zero.

The Householder matrix will be constructed from a vector

where the vector is the same size as .

The algorithm for constructing and hence can be summarized as follows.

1. Set with signumsignum (The sign is arbitrary and chosen to minimize roundoff errors.)
2. Set
3. Set
4. Set for
5. Set .
6. Set .
In the following exercise you will write a Matlab function to implement this algorithm.

Exercise 4:
1. Start a function m-file named householder.m with signature and beginning lines
function H = householder(b, k)
% H = householder(b, k)

% your name and the date

n = length(b);
d(:,1) = b(k:n);
if d(1)>=0
alpha = -norm(d);
else
alpha =  norm(d);
end

2. Add comments at the beginning, being sure to explain the use of the variable k.
3. In the case that alpha is exactly zero, the algorithm will fail because of a division by zero. For the case that alpha is zero, return
H = eye(n);

otherwise, complete the above algorithm.
4. Test your function on the vector b=[10;9;8;7;6;5;4;3;2;1]; with k=1 and again with k=4. Check that H should be orthogonal and H*b should have zeros in positions below k. (Hint: Orthogonality of H is equivalent to the vector being a unit vector. If H is not orthogonal, check .)

This householder function can be used for the QR factorization of a matrix by proceeding through a series of partial factorizations , where is the identity matrix, and is the matrix . When we begin the step of factorization, our factor is only upper triangular in columns 1 to . Our goal on the -th step is to find a better factor that is upper triangular through column . If we can do this process times, we're done. Suppose, then, that we've partially factored the matrix , up to column . In other words, we have a factorization

for which the matrix is orthogonal, but for which the matrix is only upper triangular for the first columns. To proceed from our partial factorization, we're going to consider taking a Householder matrix , and inserting'' it into the factorization as follows:

Then, if we define

it will again be the case that:

and we're guaranteed that is still orthogonal. Our construction of guarantees that is actually upper triangular all the way through column .

Exercise 5: In this exercise you will see how the Householder matrix can be used in the steps of an algorithm that passes through a matrix, one column at a time, and turns the bottom of each column into zeros.
1. Define the matrix A to be the magic square of order 5, A=magic(5).
2. Compute H1, the Householder matrix that knocks out the subdiagonal entries in column 1 of A, and then compute A1=H1*A. Are there any non-zero values below the diagonal in the first column?
3. Now let's compute H2, the Householder matrix that knocks out the subdiagonal entries in column 2 of A1, and compute A2=H2*A1. This matrix should have subdiagonal zeros in column 2. You should be convinced that you can zero out any subdiagonal column you want. Since zeroing out one subdiagonal column doesn't affect the previous columns, you can proceed sequentially to zero out all subdiagonal columns.

# Householder Factorization

For a rectangular by matrix , the Householder QR factorization has the form

where the matrix is (hence square and truly orthogonal) and the matrix is , and upper triangular (or upper trapezoidal if you want to be more accurate.)

If the matrix is not square, then this definition is different from the Gram-Schmidt factorization we discussed before. The obvious difference is the shape of the factors. Here, it's the matrix that is square. The other difference, which you'll have to take on faith, is that the Householder factorization is generally more accurate (smaller arithmetic errors), and easier to define compactly.

Householder QR Factorization Algorithm:
 Q = I; R = A; for k = 1:min(m,n) Construct the Householder matrix H for column k of the matrix R; Q = Q * H'; R = H * R; end

Exercise 6:
1. Write an m-file h_factor.m. It should have the form
function [ Q, R ] = h_factor ( A )
% [ Q, R ] = h_factor ( A )

% your name and the date

Use the routine householder.m in order to compute the H matrix that you need at each step.
2. A simple test is the matrix
A = [ 0 1 1
1 0 1
0 0 1 ]

You should see that simply interchanging the first and second columns of A turns it into an upper triangular matrix, so you can see that Q is a permutation matrix and you can guess what the solution is. Be sure that Q*R is A.
3. Test your code by computing the QR factorization of the Hilbert matrix of order 4. Compare your results with those of the Matlab qr function. Warning: Q remains an orthogonal matrix if one of its columns is multiplied by (-1). This is equivalent to multiplying by a matrix that is like the identity matrix except with one (-1) on the diagonal instead of all (+1). This matrix is clearly orthogonal and can be incorporated into the QR factorization.

# The QR Method for Linear Systems

If we have computed the Householder QR factorization of a matrix without encountering any singularities, then it is easy to solve linear systems. We use the property of the factor that :

 (2)

so all we have to do is form the new right hand side and then solve the upper triangular system. And that's easy because it's the same as the u_solve step of our old PLU solution algorithm from the previous lab.

Exercise 7: Make a copy of your file u_solve.m from the previous lab (upper-triangular solver), or get a copy of my version. Write a file called h_solve.m. It should have the signature
function x = h_solve ( Q, R, b )
% x = h_solve ( Q, R, b )

% your name and the date

The matrix R is upper triangular, so you can use u_solve to solve (2) above. Assume that the QR factors come from the h_factor routine. Set up your code to compute and then solve the upper triangular system using u_solve.

When you think your solver is working, test it out on a system as follows:

    n = 7
A = magic ( n )
x = [ 1 : n ]'
b = A * x
[ Q, R ] = h_factor ( A )
x2 = h_solve ( Q, R, b )
norm(x - x2) % should be close to zero.


Now you know another way to solve a square linear system. It turns out that you can also use the QR algorithm to solve non-square systems, but that task is better left to the singular value decomposition.

# The singular value decomposition

Another very important matrix product decomposition is the singular value decomposition of a matrix into the product of three matrices. If is an matrix, then there are three matrices , and , with

 (3)

where is and unitary, and is and unitary. Recall that unitary matrices are like orthogonal matrices but they could have complex entries. Unitary matrices with all real entries are orthogonal matrices. The matrix is and is of the form

and . The number is the rank of the matrix and is necessarily no larger than . The singular values,'' , are real and positive and are the eigenvalues of the Hermitian matrix .

The singular value decomposition provides much useful information about the matrix . This information includes:

• Information about the range and nullspace of regarded as a linear operator, including the dimensions of these spaces. The matrices and provide orthonormal bases of these spaces.
• A method of solving'' non-square matrix systems.
• A method of compressing'' the matrix .
You will see these facts illustrated below.

Numerical methods for finding the singular value decomposition will be addressed in the next lab. One obvious'' algorithm involves finding the eigenvalues of , but this is not really practical because of roundoff difficulties. Matlab includes a function called svd with signature [U S V]=svd(A) to compute the singular value decomposition and we will be using it in the following exercises. This function uses the Lapack subroutine dgesvd, so if you were to need it in a Fortran or C program, it would be available by linking against the Lapack library.

Supposing that is the smallest integer so that , then the SVD implies that the rank of the matrix is . Similarly, if the matrix is , then the rank of the nullspace of is . The first columns of are an orthonormal basis of the range of , and the last columns of are an orthonormal basis for the nullspace of . Of course, in the presence of roundoff some judgement must be made as to what constitutes a zero singular value, but the SVD is the best (if expensive) way to discover the rank and nullity of a matrix. This is especially true when there is a large gap between the smallest of the large singular values'' and the largest of the small singular values.''

The SVD can also be used to solve a matrix system. Assuming that the matrix is non-singular, all singular values are strictly positive, and the SVD can be used to solve a system.

Where is the diagonal matrix whose entries are . It turns out that Equation (4) is not a good way to solve nonsingular systems, but is an excellent way to solve'' systems that are singular or almost singular! To this end, the matrix is the matrix with the same shape as whose elements are given as

Further, if is close to singular, a similar definition but with diagonal entries for for some can work very nicely. In these latter cases, the solution'' is the least-squares best fit solution and the matrix is called the Moore-Penrose pseudo-inverse of .

Exercise 8: In this exercise you will use the Matlab svd function to solve for the best-fit linear function of several variables through a set of points. This is an example of solving'' a rectangular system.

Imagine that you have been given many samples'' of related data involving several variables and you wish to find a linear relationship among the variables that approximates the given data in the best-fit sense. This can be written as a rectangular system

where represents the data and the desired coefficients. The best-fit solution can be found using the SVD.

As we have done several times before, we will first generate a data set with known solution and then use svd to recover the known solution. Place Matlab code for the following steps into a script m-file called exer8.m

1. Generate a data set consisting of twenty samples'' of each of four variables using the following Matlab code.
N=20;
d1=rand(N,1);
d2=rand(N,1);
d3=rand(N,1);
d4=4*d1-3*d2+2*d3-1;

It should be obvious that these vectors satisfy the equation

but, if you were solving a real problem, you would not be aware of this equation, and it would be your objective to discover it, i.e., to find the coefficients in the equation.
2. Introduce small errors'' into the data. The rand function produces numbers between 0 and 1.
EPSILON=1.e-5;
d1=d1.*(1+EPSILON*rand(N,1));
d2=d2.*(1+EPSILON*rand(N,1));
d3=d3.*(1+EPSILON*rand(N,1));
d4=d4.*(1+EPSILON*rand(N,1));

3. Regard the four vectors d1,d2,d3,d4 as data given to you, and construct the matrix consisting of the four column vectors, A=[d1,d2,d3,d4].
4. Use the Matlab svd function
[U S V]=svd(A);

Please include the four non-zero values of S in your summary, but not the matrices U and V.
5. Just to confirm that you have everything right, compute norm(A-U*S*V','fro'). This number should be roundoff-sized.
6. Construct the matrix (call it Splus) so that

7. We are seeking the coefficient vector that

Find this vector by setting b=ones(N,1) and then using Equation (4) above. Include your solution with your summary. It should be close to the known solution x=[4;-3;2;-1].

Sometimes the data you are given turns out to be deficient because the variables supposed to be independent are actually related. This dependency will cause the coefficient matrix to be singular or nearly singular. Dealing with dependencies of this sort is one of the greatest difficulties in using least-squares fitting methods. In the following exercise you will construct a deficient set of data and see that the singular value decomposition can find the solution nonetheless.

Exercise 9:
1. Copy your m-file exer8.m to exer9.m. Replace the line
d3=rand(N,1);

with the line
d3=d1+d2;

This makes the data deficient and introduces nonuniqueness into the solution for the coefficients.
2. After using svd to find U, S, and V, print S(1,1), S(2,2), S(3,3), S(4,4). You should see that S(4,4) is substantially smaller than the others. Treat it as if it were zero and find the coefficient vector x as before. The solution you found is probably not x=[4;-3;2;-1].
3. Look at V(:,4). This is the vector associated with the singular value that you set to zero. This vector should be associated with the extra relationship d1+d2-d3=0 that was introduced in the first step. What multiple of V(:,4) can be added to your solution to yield [4;-3;2;-1]?
Note that the singular value decomposition allows you to discover deficiencies in the data without knowing they are there (or even if there are any) in the first place. This idea can be the basis of a fail-safe method for computing least-squares coefficients.

In the following exercise you will see how the singular value decomposition can be used to compress'' a graphical figure by representing the figure as a matrix and then using the singular value decomposition to find the closest matrix of lower rank to the original. This approach can form the basis of efficient compression methods.

Exercise 10:
1. This photo from NASA is one of the pictures the Mars rovers sent back. Download it.
2. Matlab provides various image processing utilities. In this case, read the image in and convert it to a matrix of real (double'') numbers using the following commands.
mars=double(imread('mars.jpg'));

The variable mars will be a matrix of integers between 0 and 255.
3. Matlab has the notion of a colormap'' that determines how the values in the matrix will be colored. Use the command
colormap(gray(256));

to set a grayscale colormap. Now you can show the picture with the command
image(mars)

Please do not send me this plot.
4. Use the command [U S V]=svd(mars); to perform the singular value decomposition. Do not include these matrices with your summary!
5. Plot the singular values: plot(diag(S)). You should see that the final 300 singular values are essentially zero, and the singular values larger than 50 are still noticable. Please send me this plot with your summary.
6. Construct the three new matrices
mars200=U(:,1:200)*S(1:200,1:200)*V(:,1:200)';
mars100=U(:,1:100)*S(1:100,1:100)*V(:,1:100)';
mars25=U(:,1:25)*S(1:25,1:25)*V(:,1:25)';

These matrices are of lower rank than the mars matrix, and can be stored in a more efficient manner. (The mars matrix is and requires 250,000 numbers to store it. In contrast, mars100 requires subsets of and that are and the diagonal of , for a total of 20,100. This is better than two to one savings in space.) Do not include these matrices with your summary!
7. Plot all three files as images using the following commands;
figure(1);
colormap(gray(256));
image(mars);
title('Original mars');
figure(2);
colormap(gray(256));
image(mars25);
title('mars with 25 singular values');
figure(3);
colormap(gray(256));
image(mars100);
title('mars with 100 singular values');
figure(4);
colormap(gray(256));
image(mars200);
title('mars with 200 singular values');

You should see essentially no difference between the original and the one with 200 singular values, while you should see degradation of the image in the case of 25 singular values. Please only include the 25 singular value case with your summary.

Back to MATH2071 page.

Mike Sussman 2009-01-10