Introduction | Exercise 1 |
Two algorithms | Exercise 2 |
The ``standard'' algorithm | Exercise 3 |
Exercise 4 | |
Exercise 5 | |
Exercise 6 | |
Exercise 7 | |
Exercise 8 |
Suppose
is an
matrix. Then there exist two
(different) unitary matrices: an
matrix
and a
matrix
, and a
``diagonal'' matrix
with
non-negative numbers on its diagonal and zeros elsewhere, so that
where the
As you have already seen, applications of the SVD include
This lab will take two sessions. You may find it convenient to print the pdf version of this lab rather than the web page itself.
The easiest algorithm for SVD is to note the relationship between it
and the eigenvalue decomposition: singular values are the square
roots of the eigenvalues of
or
. Indeed, if
, then
![]() |
![]() |
|
![]() |
![]() |
Unfortunately, this is not a good algorithm because forming the product
roughly squares the condition number, so that the eigenvalue
solution is not likely to be accurate. Of course, it will work fine
for small matrices with small condition numbers and you can
find this algorithm presented in many web pages. Don't believe everything
you see on the internet.
A more practical algorithm is a Jacobi algorithm that is given in a 1989 report by James Demmel and Krešimir Veselic, that can be found at http://www.netlib.org/lapack/lawnspdf/lawn15.pdf. The algorithm is a one-sided Jacobi iterative algorithm that appears at Algorithm 4.1, p32, of that report. This algorithm amounts to the Jacobi algorithm for finding eigenvalues of a symmetric matrix. (See, for example, J. H. Wilkinson, The Algebraic Eigenvalue Problem, Clarendon Press, Oxford, 1965, or G. H. Golub, C. F. Van Loan, Matrix Computations Johns Hopkins University Press, 3rd Ed, 1996.)
The algorithm implicitly computes the product
and then
uses a sequence of ``Jacobi rotations'' to diagonalize it. A Jacobi
rotation is a
matrix rotation that annihilates the off-diagonal
term of a symmetric
matrix. You can find more in a nice
Wikipedia article located at
http://en.wikipedia.org/wiki/Jacobi_rotation.
Given a
matrix
with
It is possible to find a ``rotation by angle
with the property that
where
The following algorithm repeatedly passes through the
implicitly-constructed matrix
, choosing pairs of indices
,
constructing the
submatrix from the intersection of rows
and columns, and using Jacobi rotations to annhilate the off-diagonal
term. Unfortunately, the Jacobi rotation for the pair
,
messes up the rotation from
,
, so the process must be
repeated until convergence. At convergence, the matrix
of
the SVD has been implicitly generated, and the right and left singular
vectors are recovered by multiplying all the Jacobi rotations together.
The following algorithm carries out this process.
Given a convergence criterion
, a matrix
, that starts
out as
and ends up as a matrix whose columns are left singular
vectors, and another matrix
that starts out as the identity matrix
and ends up as a matrix whose columns are right singular vectors.
function [U,S,V]=jacobi_svd(A) % % [U S V]=jacobi_svd(A) % A is original matrix % Singular values come back in S (diag matrix) % orig matrix = U*S*V' % % One-sided Jacobi algorithm for SVD % lawn15: Demmel, Veselic, 1989, % Algorithm 4.1, p. 32 TOL=1.e-8; n=size(A,1); U=A; V=eye(n); converge=TOL+1; while converge>TOL converge=0; for j=2:n for i=1:j-1 % compute [alpha gamma;gamma beta]=(i,j) submatrix of U'*U alpha=??? %might be more than 1 line beta=??? %might be more than 1 line gamma=??? %might be more than 1 line converge=max(converge,abs(gamma)/sqrt(alpha*beta)); % compute Jacobi rotation that diagonalizes % [alpha gamma;gamma beta] zeta=(beta-alpha)/(2*gamma); t=sign(zeta)/(abs(zeta)+sqrt(1+zeta^2)); c=??? s=??? % update columns i and j of U t=U(:,i); U(:,i)=c*t-s*U(:,j); U(:,j)=s*t+c*U(:,j); % update matrix V of right singular vectors t=V(:,i); V(:,i)=??? V(:,j)=??? end end end % the singular values are the norms of the columns of U % the left singular vectors are the normalized columns of U for j=1:n singvals(j)=norm(U(:,j)); U(:,j)=U(:,j)/singvals(j); end S=diag(singvals);
U=[0.6 0.8 0.8 -0.6]; V=sqrt(2)/2*[1 1 1 -1]; S=diag([5 4]);It is easy to see that U and V are orthogonal matrices, so that the matrices U, S and V comprise the SVD of A. You may get a ``division by zero'' error, but this is harmless because it comes from gamma being zero, which will cause zeta to be infinite and t to be zero anyhow.
Your algorithm should essentially reproduce the matrices U, S and V. You might find that the diagonal entries of S are not in order, so that the U and V are similarly permuted, or you might observe that certain columns of U or V have been multiplied by (-1). Be sure, however, that an even number of factors of (-1) have been introduced.
If you do not get the right answers, you can debug your code in the following way. First, note that there is only a single term, i=1 and j=2 in the double for loop.
A1 = [ 1 3 2 5 6 4 7 8 9];Compare the singular values with those you get from the Matlab svd function. You should find they agree almost to roundoff, despite the coarse tolerance inside jacobi_svd. You also will note that the values do not come out sorted, as they do from svd. It is a simple matter to sort them, but then you would have to permute the columns of U and V to match.
You now have an implementation of one SVD algorithm. This algorithm is a good one, with some realistic applications, and is one of the two algorithms available for the SVD in the GNU scientific library. (See http://www.gnu.org/software/gsl/ for more detail.) In the following sections, you will see a different algorithm.
The most commonly used SVD algorithm is found in Matlab and in the Lapack linear algebra library. (See http://www.netlib.org/lapack/.) It is a revised version of one that appeared in Golub and Van Loan. The revised version is presented by J. Demmel, W. Kahan, ``Accurate Singular Values of Bidiagonal Matrices,'' SIAM J. Sci. Stat. Comput., 11(1990) pp. 873-912. This paper can also be found at http://www.netlib.org/lapack/lawnspdf/lawn03.pdf.
The standard algorithm is a two-step algorithm. It first reduces the matrix to bidiagonal form and then finds the SVD of the bidiagonal matrix. Reduction to bidiagonal form is accomplished using Householder transformations, a topic you have already seen. Finding the SVD of a bidiagonal matrix is an iterative process that must be carefully performed in order to minimize both numerical errors and the number of iterations required. To accomplish these tasks, the algorithm chooses whether Golub and Van Loan's original algorithm is better than Demmel and Kahan's, or vice-versa. Further, other choices are made to speed up each iteration. There is not time to discuss all these details, so we will only consider a simplified version of Demmel and Kahan's zero-shift algorithm.
In the Jacobi algorithm in the previous section, you saw how the two
matrices
and
can be constructed by multiplying the various
rotation matrices as the iterations progress. This procedure is the
same for the standard algorithm, so, in the interest of simplicity,
most of the rest of this lab will be concerned only with the singular
values themselves.
The first step in the algorithm is to reduce the matrix to bidiagonal form. You have already seen how to use Householder matrices to reduce a matrix to upper-triangular form. Once that is done, the matrix can be transposed and Householder matrices can again be used to eliminate all non-zeros below the subdiagonal. You cannot reduce the matrix to diagonal form this way because the Householder matrices would change the diagonal entries and ruin the original factorization.
function [U,B,V]=bidiag_reduction(A) % [U B V]=bidiag_reduction(A) % Algorithm 6.5-1 in Golub & Van Loan, Matrix Computations % Johns Hopkins University Press % Finds an upper bidiagonal matrix B so that A=U*B*V' % with U,V orthogonal. A is an m x n matrix [m,n]=size(A); B=A; U=eye(m); V=eye(n); for k=1:n % eliminate non-zeros below the diagonal H=householder(B(:,k),k); B=H*B; U=U*H; % eliminate non-zeros to the right of the % superdiagonal by working with the transpose if k<n-1 H=householder(B(k,:)',k+1); B=B*H'; V=??? end end
A1 = [ 1 3 2 5 6 4 7 8 9];Be sure that the condition that A1=U*B*V' is satisfied and that the matrix B is bidiagonal.
A=rand(50,50);Be sure that the condition that A=U*B*V' is satisfied and that the matrix B is bidiagonal.
An important piece of Demmel and Kahan's algorithm is a very efficient
way of generating a
``Givens'' rotation matrix that annihilates
the second component of a vector. The algorithm is presented
on page 13 of their paper and is called ``rot.''
This algorithm computes the cosine,
, and sine,
, of a rotation
angle that satisfies the following condition.
function [c,s,r]=rot(f,g) % [c s r]=rot(f,g) % more comments % your name and the date
[c s;-s c]*[f;g]and check that the resulting vector is [r;0].
The standard algorithm is based on repeated QR-type ``sweeps'' through
the bidiagonal matrix. Suppose
is a bidiagonal matrix, so
that it looks like the following.
In its simplest form, a sweep begins at the top left of the matrix and
runs down the rows to the bottom. For row
, the sweep first
annihilates the
entry by multiplying on the right by a rotation
matrix. This action introduces a non-zero value
immediately below the diagonal. This new non-zero is then
annihilated by multiplying on the left by a rotation matrix but this
action introduces a non-zero value
, outside the upper
diagonal, but the next row down. Conveniently, this new non-zero is
annihilated by the same rotation matrix that annihilates
.
(The proof is a consequence of the fact that
was zero as a
consequence of the first rotation, see the paper.) The algorithm
continues until the final step on the bottom row that introduces no
non-zeros. This process has been termed ``chasing the bulge.''
The following Matlab code implements the above algorithm.
function [B]=msweep(B) % [B]=msweep(B) % Demmel & Kahan zero-shift QR downward sweep % B starts as a bidiagonal matrix and is returned as % a bidiagonal matrix n=size(B,1); for i=1:n-1 [c s r]=rot(B(i,i),B(i,i+1)); % construct matrix Q and multiply on the right by Q' % this annihilates both B(i-1,i+1) and B(i,i+1) % but makes B(i+1,i) non-zero Q=eye(n); Q(i:i+1,i:i+1)=[c s;-s c]; B=B*Q'; [c s r]=rot(B(i,i),B(i+1,i)); % construct matrix Q and multiply on the left by Q % This annihilates B(i+1,i) but makes B(i,i+1) and % B(i,i+2) non-zero Q=eye(n); Q(i:i+1,i:i+1)=[c s;-s c]; B=Q*B; end
In this algorithm, there are two orthogonal (rotation) matrices,
,
employed. To see what their action is, consider the piece of
consisting of rows and columns
,
,
, and
.
Multiplying on the right by the transpose of the first rotation matrix
has the following consequence.
where asterisks indicate (possible) non-zeros and Greek letters indicate values that do not change. The fact that two entries are annihilated in the third column is a non-obvious consequence of the algorithm.
The second matrix
multiplies on the left and has the following consequence.
The important consequence of these two rotations is that the row with three non-zeros in it (the ``bulge'') has moved from row
% set almost-zero entries to true zero % display matrix and wait for a key B(find(abs(B)<1.e-13))=0; spy(B) disp('Plot completed. Strike a key to continue.') pauseInsert two copies of this code into msweep.m, one after each of the statements that start off ``B=.''
B=[1 11 0 0 0 0 0 0 0 0 0 2 12 0 0 0 0 0 0 0 0 0 3 13 0 0 0 0 0 0 0 0 0 4 14 0 0 0 0 0 0 0 0 0 5 15 0 0 0 0 0 0 0 0 0 6 16 0 0 0 0 0 0 0 0 0 7 17 0 0 0 0 0 0 0 0 0 8 18 0 0 0 0 0 0 0 0 0 9 19 0 0 0 0 0 0 0 0 0 10];The sequence of plots shows the ``bulge'' migrates from the top of the matrix down the rows until it disappears off the end. Please include any one of the plots with your work.
Demmel and Kahan present a streamlined form of this algorithm in their paper. This algorithm is not only faster but is more accurate because there are no subtractions to introduce cancellation and roundoff inaccuracy. Their algorithm is presented below.
The largest difference in speed comes from the representation of
the bidiagonal matrix
as a pair of vectors,
and
, the
diagonal and superdiagonal entries, respectively. In addition,
the intermediate product,
, is not explicitly formed.
This algorithm begins and ends with the two vectors
and
,
representing the diagonal and superdiagonal of a bidiagonal matrix.
The vector
has length
.
function [d,e]=vsweep(d,e) % [d e]=vsweep(d,e) % comments % your name and the date
B=[1 11 0 0 0 0 0 0 0 0 0 2 12 0 0 0 0 0 0 0 0 0 3 13 0 0 0 0 0 0 0 0 0 4 14 0 0 0 0 0 0 0 0 0 5 15 0 0 0 0 0 0 0 0 0 6 16 0 0 0 0 0 0 0 0 0 7 17 0 0 0 0 0 0 0 0 0 8 18 0 0 0 0 0 0 0 0 0 9 19 0 0 0 0 0 0 0 0 0 10]; d=diag(B); e=diag(B,1);
n=200; d=rand(n,1); e=rand(n-1,1); B=diag(d)+diag(e,1); tic;B=msweep(B);mtime=toc tic;[d e]=vsweep(d,e);vtime=toc norm(d-diag(B)) norm(e-diag(B,1))
It turns out that repeated sweeps tend to reduce the size of the
superdiagonal entries. You can find a discussion of convergence
in Demmel and Kahan, and the references therein, but the basic
idea is to watch the values of
and when they become small,
set them to zero. Demmel and kahan prove that this does not perturb
the singular values much. In the following exercise, you will see
how the superdiagonal size decreases.
d=[1;2;3;4;5;6;7;8;9;10]; e=[11;12;13;14;15;16;17;18;19];Plot the absolute values
You should have observed that one entry converges very rapidly to zero, and that overall convergence to zero is asymptotically linear. It is often one of the two ends that converges most rapidly. Further, a judiciously chosen shift can make convergence even more rapid. We will not investigate these issues further here, but you can be sure that the software appearing, for example, in Lapack for the SVD takes advantage of these and other acceleration methods.
In the following exercise, the superdiagonal is examined during the iteration. As the values become small, they are set to zero, and the iteration continues with a smaller matrix. When all the superdiagonals are zero, the iteration is complete.
function [d,iterations]=mysvd(d,e) % [d,iterations]=mysvd(d,e) % more comments % your name and the date TOL=100*eps; n=length(d); maxit=500*n^2; % The following convergence criterion is discussed by % Demmel and Kahan. First, estimate the smallest % singular value. lambda(n)=abs(d(n)); for j=n-1:-1:1 lambda(j)=abs(d(j))*lambda(j+1)/(lambda(j+1)+abs(e(j))); end mu(1)=abs(d(1)); for j=1:n-1 mu(j+1)=abs(d(j+1))*mu(j)/(mu(j)+abs(e(j))); end sigmaLower=min(min(lambda),min(mu)); thresh=max(TOL*sigmaLower,maxit*realmin); iUpper=n-1; iLower=1; for iterations=1:maxit % reduce problem size when some zeros are % on the superdiagonal % how many zeros are near the bottom right? for i=iUpper:-1:1 iUpper=i; if abs(e(i))>thresh break; end end % how many zeros are near the top left? j=iUpper; for i=iLower:iUpper if abs(e(i))>thresh j=i; break; end end iLower=j; if (iUpper==iLower & abs(e(iUpper))<=thresh) | ... (iUpper<iLower) % all done, sort singular values d=sort(abs(d)); % sort results in ascending order, so reverse it d(end:-1:1)=d; return end % do a sweep [d(iLower:iUpper+1),e(iLower:iUpper)]= ... vsweep(d(iLower:iUpper+1),e(iLower:iUpper)); end error('mysvd: too many iterations!')
d=[1;2;3;4;5;6;7;8;9;10]; e=[11;12;13;14;15;16;17;18;19];What are the singular values? How many iterations did mysvd need? What is the largest difference between the singular values mysvd found and those that the Matlab function svd found for the same matrix? (You will have to expand the two diagonals into a matrix to use svd.)
Remark: In the above algorithm, you can see that
the singular values are forced to be positive by taking absolute
values. This is legal because if a negative singular value arises
then multiplying both it and the corresponding column of
by
negative one does not change the unitary nature of
and leaves
the singular value positive.
You saw in the previous exercise that the number of iterations can get large. It turns out that the number of iterations is dramatically reduced by the proper choice of shift, and tricks such as choosing to run sweeps up from the bottom right to the top left instead of down as we did, depending on the matrix elements. Nonetheless, the presentation here should give you the flavor of the algorithm used.
In the following exercise, you will put your two functions, bidiag_reduction and mysvd to find the singular values of a full matrix.
Back to MATH2071 page.