|Eigenvalues and eigenvectors||Exercise 2|
|The Power Method||Exercise 3|
|The Inverse Power Method||Exercise 4|
|Finding Several Eigenvectors at Once||Exercise 5|
|Using Shifts||Exercise 6|
|The QR Method||Exercise 7|
|Convergence of the QR Algorithm||Exercise 8|
This lab is concerned with several ways to compute eigenvalues and eigenvectors for a real matrix. All methods for computing eigenvalues and eigenvectors are iterative in nature, except for very small matrices. We begin with a short discussion of eigenvalues and eigenvectors, and then go on to the power method and inverse power methods. These are methods for computing a single eigenpair, but they can be modified to find several. We then look at shifting, which is an approach for computing one eigenpair whose eigenvalue is close to a specified value. We then look at the QR method, the most efficient method for computing all the eigenvalues at once, and also its shifted variant.
You may find it convenient to print the pdf version of this lab rather than the web page itself. This lab will take three sessions.
For any square matrix , consider the equation . This is a polynomial equation of degree in the variable so there are exactly complex roots that satisfy the equation. If the matrix is real, then the complex roots occur in conjugate pairs. The roots are known as ``eigenvalues'' of . Interesting questions include:
If is an eigenvalue of , then is a singular matrix, and therefore there is at least one nonzero vector with the property that . This equation is usually written
Some useful facts about the eigenvalues of a matrix :
If a vector is an exact eigenvector of a matrix , then it is easy to determine the value of the associated eigenvalue: simply take the ratio of a component of and , for any index that you like. But suppose is only an approximate eigenvector, or worse yet, just a wild guess. We could still compute the ratio of corresponding components for some index, but the value we get will vary from component to component. We might even try computing all the ratios, and averaging them, or taking the ratios of the norms. However, the preferred method for estimating the value of an eigenvalue uses the so-called ``Rayleigh quotient.''
The Rayleigh quotient of a matrix and vector is
If happens to be an eigenvector of the matrix , the the Rayleigh quotient must equal its eigenvalue. (Plug into the formula and you will see why.) When the real vector is an approximate eigenvector of , the Rayleigh quotient is a very accurate estimate of the corresponding eigenvalue. Complex eigenvalues and eigenvectors require a little care because the dot product involves multiplication by the conjugate transpose. The Rayleigh quotient remains a valuable tool in the complex case, and most of the facts remain true.
If the matrix is symmetric, then all eigenvalues are real, and the Rayleigh quotient satisfies the following inequality.
function r = rayleigh ( A, x ) % r = rayleigh ( A, x ) % comments % your name and the dateBe sure to include comments describing the input and output parameters and the purpose of the function.
x R(A,x) [ 1; 2; 3] -0.78571 [ 1; 0; 1] __________ (is an eigenvector) [ 0; 1; 1] __________ (is an eigenvector) [ 1;-2; 0] __________ (is an eigenvector) [ 1; 1; 1] __________ [ 0; 0; 1] __________
x R(A,x) [3; 2; 1] __________ A*[3; 2; 1] __________ A^2*[3; 2; 1] __________ A^3*[3; 2; 1] __________ A^4*[3; 2; 1] __________ A^5*[3; 2; 1] __________ A^10*[3; 2; 1] __________ A^15*[3; 2; 1] __________ A^20*[3; 2; 1] __________ A^25*[3; 2; 1] __________
The Rayleigh quotient provides a way to approximate eigenvalues from an approximate eigenvector. Now we have to figure out a way to come up with an approximate eigenvector.
In many physical and engineering applications, the largest or the smallest eigenvalue associated with a system represents the dominant and most interesting mode of behavior. For a bridge or support column, the largest eigenvalue might reveal the maximum load, and the eigenvector the shape of the object as it begins to fail under this load. For a concert hall, the smallest eigenvalue of the acoustic equation reveals the lowest resonating frequency. For nuclear reactors, the largest eigenvalue determines whether the reactor is subcritical, critical, or supercritical. Hence, a simple means of approximating just one extreme eigenvalue might be enough for some cases.
The ``power method'' tries to determine the largest magnitude eigenvalue, and the corresponding eigenvector, of a matrix, by computing (scaled) vectors in the sequence:
Note: The superscript in the above equation does not refer to a power of , or to a component. The use of superscript distinguishes it from the component, , and the parentheses distinguish it from an exponent. It is simply the vector in a sequence. When you write Matlab code, you will typically denote both and with the same Matlab name x, and overwrite it each iterate. In the signature lines specified below, xold is used to denote and x is used to denote and all subsequent iterates.
If we include the scaling, and an estimate of the eigenvalue, the power method can be described in the following way.
function [r, x, R, X] = power_method ( A, xold ,nsteps ) % [r, x, R, X] = power_method ( A, xold ,nsteps ) % comments % your name and the date . . . for k=1:nsteps . . . r= ??? %Rayleigh quotient R(k)=r; % save Rayleigh quotient on each step X(k)=x(1); % save one eigenvector component on each step endThe history variables R and X will be used to illustrate the progress that the power method is making and have no mathematical importance.
Note: It is usually the case that when writing code such as this that requires the Rayleigh quotient, I would ask that you use rayleigh.m, but in this case the Rayleigh quotient is so simple that it is probably better to just copy the code for it into power_method.m.
Using your power method code, try to determine the largest eigenvalue of the matrix eigen_test(1), starting from the vector [1;1;1]. Be sure to make this starting vector into a unit vector if you don't use rayleigh.m. (I have included the result for step 1 to help you check your code out.)
Step Rayleigh q. x(1) 0 3 1.0 1 0.72353 0.0 2 __________ _____________________________ 3 __________ _____________________________ 4 __________ _____________________________ 10 __________ _____________________________ 15 __________ _____________________________ 20 __________ _____________________________In addition, plot the progress of the method with the following commands
plot(R); title 'Rayleigh quotient vs. k' plot(X); title 'Eigenvector component vs. k'Please send me these two plots with your summary.
function [r, x, R, X] = power_method1 ( A, xold , tol ) % [r, x, R, X] = power_method1 ( A, xold , tol ) % comments % your name and the date nsteps=10000; % maximum allowable number of stepsTo determine when to stop iterating, add code to effect all three of the following criteria:
Matrix Rayleigh q. x(1) no. iterations 1 __________ __________ ____________ 2 __________ __________ ____________ 3 __________ __________ ____________ 4 __________ __________ ____________ 5 __________ __________ ____________ 6 __________ __________ ____________ 7 __________ __________ ____________These matrices were chosen to illustrate different iterative behavior. Some converge quickly and some slowly, some oscillate and some don't. One does not converge. Generally speaking, the eigenvalue converges more rapidly than the eigenvector. You can check the eigenvalues of these matrices using the eig function. Plot the histories of Rayleigh quotient and first eigenvector component for all the cases. Send me the plots for cases 3, 5, and 7.
Matlab hint: The eig command will show you the eigenvectors as well as the eigenvalues of a matrix A, if you use the command in the form:
[ V, D ] = eig ( A )The quantities V and D are matrices, storing the n eigenvectors as columns in V and the eigenvalues along the diagonal of D.
The inverse power method reverses the iteration step of the power method. We write:
This means that we can freely choose to study the eigenvalue problem of either or , and easily convert information from one problem to the other. The difference is that the inverse power iteration will find us the biggest eigenvalue of , and that's the eigenvalue of that's smallest in magnitude, while the plain power method finds the eigenvalue of that is largest in magnitude.
The inverse power method iteration is given in the following algorithm.
Warning: The convergence on is different, and better, than the one we used in power_method1. You will have to keep track of the previous iterate to implement it.
Your file should have the signature:
function [r, x, R, X] = inverse_power ( A, xold, tol) % [r, x, R, X] = inverse_power ( A, xold, tol) % comments % your name and the dateBe sure to include a check on the maximum number of iterations (10000 is a good choice) so that a non-convergent problem will not go on forever.
[r x R X]=inverse_power(A,[1;1;1],1.e-8);Please include r and x in your summary. How many steps did it take?
[rp xp Rp Xp]=power_method(inv(A),[1;1;1],nsteps);where nsteps is the number of steps you just took using inverse_power. You should find that r is close to 1/rp, that x and xp are close, and that X and Xp are close. If these relations are not true, you have an error somewhere. Fix it before continuing. Warning: the convergence criteria between the two methods is different, so they may take slightly different numbers of iterations. Compare X and Xp only for the common iterations.
% find location of entry with largest magnitude and % multiply all of x so its entry with largest % magnitude is a positive number. This code only % works correctly when x is a real vector. [absx,location]=max(abs(x)); factor=sign(x(location)); x=factor*x; % makes x(location)>=0
Matrix eigenvalue x(1) no. iterations 1 __________ __________ ____________ 2 __________ __________ ____________ 3 __________ __________ ____________ 4 __________ __________ ____________ 5 __________ __________ ____________ 6 __________ __________ ____________ 7 __________ __________ ____________Be careful! One of these still may not converge.
It is common to need more than one eigenpair, but not all eigenpairs. For example, finite element models of structures often have upwards of a million degrees of freedom (unknowns) and just as many eigenvalues. Only the lowest few are of practical interest. Fortunately, matrices arising from finite element structural models are typically symmetric and negative definite, so the eigenvalues are real and negative and the eigenvectors are orthogonal to one another. For the remainder of this section, and this section only, we will be assuming that the matrix whose eigenpairs are being sought is symmetric.
If you try to find two eigenvectors by using the power method (or the inverse power method) twice, on two different vectors, you will discover that you get two copies of the same dominant eigenvector. (The eigenvector with eigenvalue of largest magnitude is called ``dominant.'') Working smarter helps, though. Because the matrix is assumed symmetric, you know that the dominant eigenvector is orthogonal to the others, so you could use the power method (or the inverse power method) twice, forcing the two vectors to be orthogonal at each step of the way. This will guarantee they do not converge to the same vector. If this approach converges to anything, it probably converges to the dominant and ``next-dominant'' eigenvectors and their associated eigenvalues.
We considered orthogonalization in Lab 4 and we wrote a function called gram_schmidt.m. You can use your copy or mine for this lab. You will recall that the gram_schmidt function regards the vectors as columns of a matrix: the first vector is the first column of a matrix X, the second is the second column, etc. (Be warned that this function will sometimes return a matrix with fewer columns than the one it is given. This is because the original columns were not a linearly independent set.)
The power method applied to several vectors can be described in the following algorithm:
Note: There is no guarantee that the eigenvectors that you find have the largest eigenvalues in magnitude! It is possible that your starting vectors were themselves orthogonal to one or more eigenvectors, and you will never recover eigenvectors that are not spanned by the initial ones. In applications, there are consistency checks that serve to mitigate this problem.
In the following exercise, you will write a Matlab function to carry out this procedure to find several eigenpairs for a symmetric matrix. The initial set of eigenvectors will be represented as columns of a Matlab matrix V.
function [R, V, k] = power_several ( A, Vold , tol ) % [R, V, k] = power_several ( A, Vold , tol ) % comments % your name and the dateto implement the algorithm described above.
[absV,indices]=max(abs(V)); % indices of largest in columns d=diag(sign(V(indices,:))); % pick out their signs as +-1 D=diag(d); % diagonal matrix with +-1 V=V*D; % Transform V
V = [ 1 1 1 1 1 1 1 -1 1 -1 1 -1]You should immediately recognize that these vectors are not only linearly independent, but also orthogonal. Is the largest eigenvalue this time the same as before?
V=[1 1 1 1 1 1 1 1 1 1 1 -1 1 1 1 1 -1 -1 1 1 1 -1 -1 -1 1 1 -1 -1 -1 -1 1 -1 -1 -1 -1 -1];What are the six eigenvalues?
A major limitation of the power method was that it can only find the eigenvalue of largest magnitude. Similarly, the inverse power iteration seems only able to find the eigenvalue of smallest magnitude. Suppose, however, that you want one (or a few) eigenpairs with eigenvalues that are away from the extremes? Instead of working with the matrix , let's work with the matrix , where we have shifted the matrix by adding to every diagonal entry. This shift makes a world of difference.
We already said the inverse power method finds the eigenvalue of smallest magnitude. Denote the smallest eigenvalue of by . If is an eigenvalue of then is singular. In other words, is an eigenvalue of , and is the eigenvalue of that is closest to .
In general, the idea of shifting allows us to focus the attention of the inverse power method on seeking the eigenvalue closest to any particular value we care to name. This also means that if we have an estimated eigenvalue, we can speed up convergence by using this estimate in the shift. Although various strategies for adjusting the shift during the iteration are known, and can result in very rapid convergence, we will be using a constant shift.
We are not going to worry here about the details or the expense of shifting and refactoring the matrix. (A real algorithm for a large problem would be very concerned about this!) Our simple-minded method algorithm for the shifted inverse power method looks like this:
function [r, x, nsteps] = shifted_inverse ( A, xold, shift, tol) % [r, x, nsteps] = shifted_inverse ( A, xold, shift, tol) % comments % your name and the date
In the previous exercise, we used the backslash division operator. This is another example where the matrix could be factored first and then solved many times, with a possible time savings.
The value of the shift is at the user's disposal, and there is no good reason to stick with the initial (guessed) shift. If it is not too expensive to re-factor the matrix , then the shift can be reset after several iterations. One possible choice for would be to set it equal to the current eigenvalue estimate, and do so after every, say, third iteration. The closer the shift to the eigenvalue, the faster the convergence.
So far, the methods we have discussed seem suitable for finding one or a few eigenvalues and eigenvectors at a time. In order to find more eigenvalues, we'd have to do a great deal of special programming. Moreover, these methods are surely going to have trouble if the matrix has repeated eigenvalues, distinct eigenvalues of the same magnitude, or complex eigenvalues. The ``QR method'' is a method that handles these sorts of problems in a uniform way, and computes all the eigenvalues, but not the eigenvectors, at one time.
The QR method can be described in the following way.
The sequence of matrices is orthogonally similar to the original matrix , and hence has the same eigenvalues. This is because , and . Under fairly general conditions, the sequence of matrices converges to
The Matlab command
[ Q, R ] = qr ( A )computes the QR factorization (as does our own h_factor routine from Lab 4).
function [E,k] = qr_method ( A, tol) % [E,k] = qr_method ( A, tol) % comments % your name and the dateE is a diagonal matrix of eigenvalues of A. Since the matrix is supposed to converge, terminate the iteration when
[v,indices]=max(abs(Q)); % indices of largest in columns d=diag(sign(Q(indices,:))); % pick out their signs as +-1 D=diag(d); % diagonal matrix with +-1 Q=Q*D; % Transform Q R=D*R; % Transform R so still have A=Q*RAdd this code just after doing the QR decomposition in your code and try your revised code on A=eigen_test(1). It should converge. How many steps does it take?
Matrix 3 largest Eigenvalues Number of steps 1 _________ _________ _________ __________ 2 _________ _________ _________ __________ 3 _________ _________ _________ __________ 4 _________ _________ _________ __________ skip no. 5 6 _________ _________ _________ __________ 7 _________ _________ _________ __________
Serious implementations of the QR method save information about each step in order to construct the eigenvectors. The algorithm also uses shifts to try to speed convergence. Some eigenvalues can be determined early in the iteration, which speeds up the process even more. In the following section we consider the question of convergence of the algorithm.
Throughout this lab we have used a very simple convergence criterion. In general, convergence is a difficult subject and frought with special cases and exceptions to rules. To give you a flavor of how convergence might be addressed, we consider the case of the QR algorithm applied to a real, symmetric matrix, , with distinct eigenvalues, no two of which are the same in magnitude. For such matrices, the following theorem can be shown.
Theorem Let be a real symmetric matrix of order n and let its eigenvalues satisfy
The theorem points out a method for detecting convergence. Supposing you start out with a matrix satisfying the above conditions, and you call the matrix at the iteration . If denotes the matrix consisting only of the diagonal entries of , then the Frobenius norm of must go to zero as a power of , and if is the diagonal matrix with the true eigenvalues on its diagonal, then must converge to as a geometric series with ratio . In the following algorithm, the diagonal entries of (these are the eigenvalues) will be the only entries considered. The off-diagonal entries converge to zero, but there is no need to make them small so long as the eigenvalues are converged.
The following algorithm includes a convergence estimate as part of the QR algorithm.
You may wonder where the denominator came from. Consider the geometric series
function [E, nsteps]=qr_convergence(A,tol) % [E, nsteps]=qr_convergence(A,tol) % comments % your name and the dateModify the code to implement the above algorithm.
norm(sort(e)-sort(ee))/norm(e)where e=diag(E) is the vector of eigenvalues from qr_convergence?
A = [ .01 1 -1 .01];It should fail.