MathAlgebraLinear algebraMatrix decomposition

SVD in action

9 minutes read

What happens after calculating the singular value decomposition (SVD) of a matrix? In this topic, you'll explore the main applications of this decomposition. You'll finally get a geometric interpretation of the transpose and easily compute the orthonormal basis of spaces closely related to the matrix.

You'll also develop an alternative form of the SVD that allows you to progressively rebuild any matrix and accurately approximate it.

In the following topic, you'll be working with an m×nm \times n matrix AA of rank rr with SVD given by:A=UΣVTA=U \Sigma V^TAnd, as before, the columns of its pieces are:U=[u1u2um]  and  V=[v1v2vn]U = \left[\begin{array}{l|l|l|l} \mathbf{u}_1 & \mathbf{u}_2 & \cdots & \mathbf{u}_m \end{array}\right] \ \text{ and } \ V=\left[\begin{array}{l|l|l|l} \mathbf{v}_1 & \mathbf{v}_2 & \cdots & \mathbf{v}_n \end{array}\right]Also, the singular values are ordered non-increasingly σ1σ2σr>0\sigma_1 \geq \sigma_2 \geq \dots \geq \sigma_r > 0.

The geometry of the inverse and the transpose

Here comes the first application of singular values. When all of them are different from zero and AA is square, then it's important to note that Σ1\Sigma^{-1} is the diagonal matrix. This matrix's entries are the multiplicative inverse of the singular values. As a result, AA is invertible and its inverse is simply:

A1=VΣ1UTA^{-1}=V \Sigma^{-1} U^TYou already know all of the pieces of this decomposition. Now the problem of finding A1A^{-1} is reduced to the simpler tasks of getting Σ1\Sigma^{-1} and then performing only two matrix products!

Have you ever noticed that so far, you don't have a geometric interpretation of the transpose? It's time to fill this embarrassing gap. By leveraging the SVD, it immediately becomes clear that:
AT=VΣTUTA^{T}=V \Sigma^{T} U^TThink about the decompositions of both A1A^{-1} and ATA^{T}. They look quite similar to each other. Geometrically they undo the transformations that A=UΣVTA=U \Sigma V^T but in the opposite order:

  • First, they apply UTU^T in order to neutralize the effect of UU.
  • Then, they stretch the resulting space.
  • Finally, as in the first step, they counteract VTV^T by applying VV.

The geometry of the transpose

But the second step is the actual difference between A1A^{-1} and ATA^{T}. While the former undoes the stretch of AA, the second simply stretches by the same amount. Thus, you can roughly think of ATA^{T} as rotating in the opposite direction as AA but strechting by the same way.

The four fundamental spaces

The relationship between AA, its transpose and the SVD is even deeper than you've just seen. The four fundamental spaces of AA are Ker(LA)\operatorname{Ker}(L_A), Im(LA)\operatorname{Im}(L_A), Ker(LAT)\operatorname{Ker}(L_{A^T}) and Im(LAT)\operatorname{Im}(L_{A^T}). They're connected to each other by several remarkable relations. The most important one says that once you've computed the SVD, you have enough information to reconstruct every space!

  • {u1,u2,,ur}\{ \mathbf{u}_1, \mathbf{u}_2, \dots, \mathbf{u}_r \} is an orthonormal basis of Im(LA)\operatorname{Im}(L_A).
  • {ur+1,ur+2,,um}\{ \mathbf{u}_{r+1}, \mathbf{u}_{r+2}, \dots, \mathbf{u}_m \} is an orthonormal basis of Ker(LAT)\operatorname{Ker}(L_{A^T}).

  • {v1,v2,,vr}\{ \mathbf{v}_1, \mathbf{v}_2, \dots, \mathbf{v}_r \} is an orthonormal basis of Im(LAT)\operatorname{Im}(L_{A^T}).

  • {vr+1,vr+2,,vn}\{ \mathbf{v}_{r+1}, \mathbf{v}_{r+2}, \dots, \mathbf{v}_n \} is an orthonormal basis of Ker(LA)\operatorname{Ker}(L_A).

An example would be a useful illustration. Consider the following matrix:

A=(111122223333)A=\begin{pmatrix} 1 & 1 & 1 & 1 \\ 2 & 2 & -2 & -2 \\ 3 & -3 & 3 & -3 \\ \end{pmatrix}An SVD for AA is:U=(001010100)Σ=(600004000020)V=12(1111111111111111)U=\begin{pmatrix} 0 & 0 & 1 \\ 0 & 1 & 0 \\ 1 & 0 & 0 \\ \end{pmatrix} \qquad \Sigma=\begin{pmatrix} 6 & 0 & 0 & 0 \\ 0 & 4 & 0 & 0 \\ 0 & 0 & 2 & 0 \\ \end{pmatrix} \qquad V= \frac{1}{2} \begin{pmatrix} 1 & 1 & 1 & 1 \\ -1 & 1 & 1 & -1 \\ 1 & -1 & 1 & -1 \\ -1 & -1 & 1 & 1 \\ \end{pmatrix}Since it has three positive singular values, its rank is r=3r=3. This means that:

  • {(001),(010),(100)}\left\{ \begin{pmatrix} 0 \\ 0 \\1 \end{pmatrix} , \begin{pmatrix} 0 \\ 1 \\0 \end{pmatrix} , \begin{pmatrix} 1\\0\\0 \end{pmatrix} \right\} is an orthonormal basis for Im(LA)\operatorname{Im}(L_A)
  • Ker(LAT)={0}\operatorname{Ker}(L_{A^T}) = \{ \mathbf{0} \}.
  • {12(1111),12(1111),12(1111)}\left\{ \frac{1}{2} \begin{pmatrix} 1 \\ -1 \\ 1 \\ -1 \end{pmatrix} , \frac{1}{2} \begin{pmatrix} 1 \\ 1 \\ -1 \\ -1 \end{pmatrix} , \frac{1}{2} \begin{pmatrix} 1 \\ 1 \\ 1 \\ 1 \end{pmatrix} \right\} is an orthonormal basis for Im(LAT)\operatorname{Im}(L_{A^T})
  • {12(1111)}\left\{ \frac{1}{2} \begin{pmatrix} 1 \\ -1 \\ -1 \\ 1 \end{pmatrix} \right\} is an orthonormal basis for Ker(LA)\operatorname{Ker}(L_A)

An alternative form of the SVD

Let's express the SVD in a simpler way through a sum of simpler matrices that involves the singular vectors:

A=σ1 u1v1T+σ2 u2v2T++σr urvrT=j=1rσj ujvjTA = \sigma_1 \ \mathbf{u}_1 \, \mathbf{v}_1^T + \sigma_2 \ \mathbf{u}_2 \, \mathbf{v}_2^T + \dots + \sigma_r \ \mathbf{u}_r \, \mathbf{v}_r^T = \sum_{j=1}^r \sigma_j \ \mathbf{u}_j \, \mathbf{v}_j^TEach term in the summation—specifically σj ujvjT\sigma_j \ \mathbf{u}_j \, \mathbf{v}_j^T, is considered a latent component of the original matrix. This is because each component represents incremental computations of the hidden elements within the matrix. Notice that the columns of each latent component are linearly dependent, so they're rank-1 matrices.

Proof

You can assume that m<nm < n (otherwise, first compute the product between Σ\Sigma and VV). Then:
A=UΣVT=[u1u2um][σ100000σ200000σm00][v1v2vn]=[σ1u1σ2u2σrum00][v1v2vn]=σ1u1v1T+σ2u2v2T++σmumvmT\begin{align*} A &=U \Sigma V^T \\ &= \left[\begin{array}{l|l|l|l} \mathbf{u}_1 & \mathbf{u}_2 & \cdots & \mathbf{u}_m \end{array}\right] \begin{bmatrix} \sigma_1 & 0 & \dots & 0 & 0 & \dots & 0 \\ 0 & \sigma_2 & \dots & 0 & 0 & \dots & 0 \\ \vdots & \vdots& \ddots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & \sigma_m & 0 & \dots & 0 \\ \end{bmatrix} \begin{bmatrix} \mathbf{v}_1 \\ \mathbf{v}_2 \\ \vdots \\ \mathbf{v}_n \end{bmatrix} \\ &= \left[\begin{array}{l|l|l|l|l|l|l} \sigma_1 \mathbf{u}_1 & \sigma_2 \mathbf{u}_2 & \cdots & \sigma_r \mathbf{u}_m & \mathbf{0} & \cdots & \mathbf{0} \end{array}\right] \begin{bmatrix} \mathbf{v}_1 \\ \mathbf{v}_2 \\ \vdots \\ \mathbf{v}_n \end{bmatrix} \\ &= \sigma_1 \mathbf{u}_1 \mathbf{v}_1^T + \sigma_2 \mathbf{u}_2 \mathbf{v}_2^T + \dots + \sigma_m \mathbf{u}_m \mathbf{v}_m^T \end{align*}Since σj=0\sigma_j = 0 for every j>rj>r, this implies that:

A=σ1u1v1T+σ2u2v2T++σmumvmT=σ1u1v1T+σ2u2v2T++σrurvrT+0=σ1u1v1T+σ2u2v2T++σrurvrT\begin{align*} A &= \sigma_1 \mathbf{u}_1 \mathbf{v}_1^T + \sigma_2 \mathbf{u}_2 \mathbf{v}_2^T + \dots + \sigma_m \mathbf{u}_m \mathbf{v}_m^T \\ &= \sigma_1 \mathbf{u}_1 \mathbf{v}_1^T + \sigma_2 \mathbf{u}_2 \mathbf{v}_2^T + \dots + \sigma_r \mathbf{u}_r \mathbf{v}_r^T + 0 \\ &= \sigma_1 \mathbf{u}_1 \mathbf{v}_1^T + \sigma_2 \mathbf{u}_2 \mathbf{v}_2^T + \dots + \sigma_r \mathbf{u}_r \mathbf{v}_r^T \end{align*}

\blacksquare

In order to get this alternative form, you'll study this matrix:A=(102011211)A=\left( \begin{array}{ccc} 1 & 0 & 2 \\ 0 & 1 & 1 \\ -2 & -1 & 1 \\ \end{array} \right)Its SVD is:

U=(2321515161302516560)Σ=(600060001)V=(0251501525100)U=\left( \begin{array}{ccc} \sqrt{\frac{2}{3}} & \sqrt{\frac{2}{15}} & -\frac{1}{\sqrt{5}} \\ \frac{1}{\sqrt{6}} & \frac{1}{\sqrt{30}} & \frac{2}{\sqrt{5}} \\ \frac{1}{\sqrt{6}} & -\sqrt{\frac{5}{6}} & 0 \\ \end{array} \right) \qquad \Sigma=\left( \begin{array}{ccc} \sqrt{6} & 0 & 0 \\ 0 & \sqrt{6} & 0 \\ 0 & 0 & 1 \\ \end{array} \right)\qquad V=\left( \begin{array}{ccc} 0 & \frac{2}{\sqrt{5}} & -\frac{1}{\sqrt{5}} \\ 0 & \frac{1}{\sqrt{5}} & \frac{2}{\sqrt{5}} \\ 1 & 0 & 0 \\ \end{array} \right)Thus:

σ1u1v1T=6(231616)(001)=(002001001)\sigma_1 \, \mathbf{u}_1 \, \mathbf{v}_1^T = \sqrt{6} \begin{pmatrix} \sqrt{\frac{2}{3}} \\ \frac{1}{\sqrt{6}} \\ \frac{1}{\sqrt{6}} \\ \end{pmatrix} \begin{pmatrix} 0& 0&1 \end{pmatrix} = \begin{pmatrix} 0 & 0 & 2 \\ 0 & 0 & 1 \\ 0 & 0 & 1 \\ \end{pmatrix}

σ2u2v2T=2(21513056)(25150)=(4525025150210)\sigma_2 \, \mathbf{u}_2 \, \mathbf{v}_2^T =\sqrt{2} \begin{pmatrix} \sqrt{\frac{2}{15}} \\ \frac{1}{\sqrt{30}} \\ -\sqrt{\frac{5}{6}} \\ \end{pmatrix} \begin{pmatrix} \frac{2}{\sqrt{5}} & \frac{1}{\sqrt{5}} & 0 \\ \end{pmatrix} = \begin{pmatrix} \frac{4}{5} & \frac{2}{5} & 0 \\ \frac{2}{5} & \frac{1}{5} & 0 \\ -2 & -1 & 0 \\ \end{pmatrix}σ3u3v3T=(15250)(15250)=(1525025450000)\sigma_3 \, \mathbf{u}_3 \, \mathbf{v}_3^T = \begin{pmatrix} -\frac{1}{\sqrt{5}} \\ \frac{2}{\sqrt{5}} \\ 0 \end{pmatrix} \begin{pmatrix} -\frac{1}{\sqrt{5}} & \frac{2}{\sqrt{5}} & 0 \\ \end{pmatrix} = \begin{pmatrix} \frac{1}{5} & -\frac{2}{5} & 0 \\ -\frac{2}{5} & \frac{4}{5} & 0 \\ 0 & 0 & 0 \end{pmatrix}Now, putting it all together:

j=13σj ujvjT=σ1 u1v1T+σ2 u2v2T+σ3 u3v3T=(002001001)+(4525025150210)+(1525025450000)=A\begin{align*} \sum_{j=1}^3 \sigma_j \ \mathbf{u}_j \, \mathbf{v}_j^T &= \sigma_1 \ \mathbf{u}_1 \, \mathbf{v}_1^T +\sigma_2 \ \mathbf{u}_2 \, \mathbf{v}_2^T + \sigma_3 \ \mathbf{u}_3 \, \mathbf{v}_3^T \\ & =\begin{pmatrix} 0 & 0 & 2 \\ 0 & 0 & 1 \\ 0 & 0 & 1 \\ \end{pmatrix} + \begin{pmatrix} \frac{4}{5} & \frac{2}{5} & 0 \\ \frac{2}{5} & \frac{1}{5} & 0 \\ -2 & -1 & 0 \\ \end{pmatrix}+ \begin{pmatrix} \frac{1}{5} & -\frac{2}{5} & 0 \\ -\frac{2}{5} & \frac{4}{5} & 0 \\ 0 & 0 & 0 \end{pmatrix} = A \end{align*}

Extra: SVD for the linear transformation associated with A

LA(x)=Ax=UΣVTx=UΣ(v1xv2xvnx)=U(σ1v1xσ2v2xσrvrx00)=σ1(v1x) u1+σ2(v2x) u2++σr(vrx) ur\begin{align*} L_A(\mathbf{x}) &= A \mathbf{x} = U \Sigma V^T \mathbf{x} \\ &= U \Sigma \begin{pmatrix} \mathbf{v_1} \cdot \mathbf{x} \\ \mathbf{v_2} \cdot \mathbf{x} \\ \vdots \\ \mathbf{v_n} \cdot \mathbf{x} \end{pmatrix} = U \begin{pmatrix}\sigma_1 \, \mathbf{v_1} \cdot \mathbf{x} \\ \sigma_2 \,\mathbf{v_2} \cdot \mathbf{x} \\ \vdots \\ \sigma_r \, \mathbf{v_r} \cdot \mathbf{x} \\ 0 \\ \vdots \\ 0 \end{pmatrix} \\ &= \sigma_1\left( \mathbf{v_1} \cdot \mathbf{x} \right) \ \mathbf{u_1} + \sigma_2 \left( \mathbf{v_2} \cdot \mathbf{x} \right) \ \mathbf{u_2} + \dots +\sigma_r \left( \mathbf{v_r} \cdot \mathbf{x} \right) \ \mathbf{u_r} \end{align*}Note that after computing the SVD, you don't have to calculate the value of LAL_A in any vector xx explicitly anymore.

\blacksquare

Truncated SVD

The alternative form of the SVD is the most important source of the applications of this decomposition. The more latent components you add, the closer you get to the matrix. Each of these partial sums is known as a truncated singular value decomposition. For this reason, for every k{1,,r}k \in \{1, \dots, r\} we define:

Ak=j=1kσj ujvjTA_k = \sum_{j=1}^k \sigma_j \ \mathbf{u}_j \, \mathbf{v}_j^TThe important thing about all of this is that, for all 1kr1 \leq k \leq r, among all the matrices of rank kk, AkA_k is the one that most resembles A. This is the main reason why SVD is used in real applications. You can interpret it as the SVD arranging AA into its “most important” and “least important” pieces. For this reason, the largest singular values describe the broad strokes of AA, whilst the smallest singular values take care of the finer details.

The best way to approximate a high-rank matrix by a low-rank one is by discarding the pieces of its singular value decomposition which have the smallest singular values.

Let's compute the truncated SVD for the matrix from the previous section:

A=(111122223333)A=\begin{pmatrix} 1 & 1 & 1 & 1 \\ 2 & 2 & -2 & -2 \\ 3 & -3 & 3 & -3 \\ \end{pmatrix}Its latent components are:σ1u1v1T=(002001001)σ2u2v2T=(4525025150210)σ3u3v3T=(1525025450000)\sigma_1 \, \mathbf{u}_1 \, \mathbf{v}_1^T = \begin{pmatrix} 0 & 0 & 2 \\ 0 & 0 & 1 \\ 0 & 0 & 1 \\ \end{pmatrix} \qquad \sigma_2 \, \mathbf{u}_2 \, \mathbf{v}_2^T = \begin{pmatrix} \frac{4}{5} & \frac{2}{5} & 0 \\ \frac{2}{5} & \frac{1}{5} & 0 \\ -2 & -1 & 0 \\ \end{pmatrix} \qquad \sigma_3 \, \mathbf{u}_3 \, \mathbf{v}_3^T = \begin{pmatrix} \frac{1}{5} & -\frac{2}{5} & 0 \\ -\frac{2}{5} & \frac{4}{5} & 0 \\ 0 & 0 & 0 \end{pmatrix}Then, the best approximations (of rank 11, 22 and 33 respectively) for AA are:

A1=(002001001)A2=(4525225151211)A3=(102011211)A_1 = \begin{pmatrix} 0 & 0 & 2 \\ 0 & 0 & 1 \\ 0 & 0 & 1 \\ \end{pmatrix} \qquad A_2 = \left( \begin{array}{ccc} \frac{4}{5} & \frac{2}{5} & 2 \\ \frac{2}{5} & \frac{1}{5} & 1 \\ -2 & -1 & 1 \\ \end{array} \right) \qquad A_3 = \left( \begin{array}{ccc} 1 & 0 & 2 \\ 0 & 1 & 1 \\ -2 & -1 & 1 \\ \end{array} \right)

Image compression

Truncated singular value decomposition often retains a stunningly large level of accuracy even when the values of kk are much smaller than rr. This is because, in real-world matrices, only a minuscule proportion of singular values are large. As a result, AkA_k serves as an accurate approximation of A.

This is particularly useful for image compression. A black and white image can be represented as a matrix with values from 0 to 255, where 0 is full black and 255 equals white. As the numbers increase, lighter and lighter shades are obtained. Let's see truncated SVD in action with this cute panda:

A black and white image

This image corresponds to a 350×634350 \times 634 matrix AA. Since every column is nearly unique, the rank of AA is 350350—the biggest possible. This implies that there are 350350 latent components. The first singular value is larger, and the first latent component is the best rank-11 approximation to the image:

The first latent component

Perhaps it's not a good approximation, but note that as it's rank 11, every row is multiple of any other one—and the same occurs with the columns. Now look at the approximation with k=5k=5:

Truncated SVD with k=5

It's getting better with only 55 singular values. But when k=10,20,50k=10, 20, 50 the results are amazing:

Truncated SVD with k=10

Truncated SVD with k=20

Truncated SVD with k=50

5050 singular values are excellent, and note that this is much less than 350350. As the next singular values are negligible, when k=100,200k=100,200 the approximation is so good that the difference is not even distinguished anymore:

Truncated SVD with k=100

Truncated SVD with k=200

Conclusion

  • When every singular value of AA is positive, the matrix is invertible and A1=VΣ1UTA^{-1}=V \Sigma^{-1} U^T.
  • The geometry of A1A^{-1} and ATA^T is closely related to that of AA.
  • The four fundamental spaces of AA are Ker(LA)\operatorname{Ker}(L_A), Im(LA)\operatorname{Im}(L_A), Ker(LAT)\operatorname{Ker}(L_{A^T}) and Im(LAT)\operatorname{Im}(L_{A^T}). The SVD of AA gives you an orthonormal basis for every such space.

  • The alternative form of the SVD of AA is the sum of its latent components A=j=1rσj ujvjTA = \sum_{j=1}^r \sigma_j \ \mathbf{u}_j \, \mathbf{v}_j^T.

  • The best way to approximate an rr-rank matrix AA by a kk-rank one (krk \leq r) is its truncated SVD Ak=j=1rσj ujvjTA_k = \sum_{j=1}^r \sigma_j \ \mathbf{u}_j \, \mathbf{v}_j^T.

  • The singular values are ordered non-increasingly σ1σ2σr>0\sigma_1 \geq \sigma_2 \geq \dots \geq \sigma_r > 0.

5 learners liked this piece of theory. 0 didn't like it. What about you?
Report a typo