MathAlgebraLinear algebraMatrix decomposition

SVD in action

Provided by: Edvancium

9 minutes read

What happens after calculating the singular value decomposition (SVD) of a matrix? In this topic, you'll explore the main applications of this decomposition. You'll finally get a geometric interpretation of the transpose and easily compute the orthonormal basis of spaces closely related to the matrix.

You'll also develop an alternative form of the SVD that allows you to progressively rebuild any matrix and accurately approximate it.

In the following topic, you'll be working with an $m \times n$ matrix $A$ of rank $r$ with SVD given by: $A=U \Sigma V^T$ And, as before, the columns of its pieces are: $U = \left[\begin{array}{l|l|l|l} \mathbf{u}_1 & \mathbf{u}_2 & \cdots & \mathbf{u}_m \end{array}\right] \ \text{ and } \ V=\left[\begin{array}{l|l|l|l} \mathbf{v}_1 & \mathbf{v}_2 & \cdots & \mathbf{v}_n \end{array}\right]$ Also, the singular values are ordered non-increasingly $\sigma_1 \geq \sigma_2 \geq \dots \geq \sigma_r > 0$ .

The geometry of the inverse and the transpose

Here comes the first application of singular values. When all of them are different from zero and $A$ is square, then it's important to note that $\Sigma^{-1}$ is the diagonal matrix. This matrix's entries are the multiplicative inverse of the singular values. As a result, $A$ is invertible and its inverse is simply:

$A^{-1}=V \Sigma^{-1} U^T$ You already know all of the pieces of this decomposition. Now the problem of finding $A^{-1}$ is reduced to the simpler tasks of getting $\Sigma^{-1}$ and then performing only two matrix products!

Have you ever noticed that so far, you don't have a geometric interpretation of the transpose? It's time to fill this embarrassing gap. By leveraging the SVD, it immediately becomes clear that:
$A^{T}=V \Sigma^{T} U^T$ Think about the decompositions of both $A^{-1}$ and $A^{T}$ . They look quite similar to each other. Geometrically they undo the transformations that $A=U \Sigma V^T$ but in the opposite order:

First, they apply $U^T$ in order to neutralize the effect of $U$ .
Then, they stretch the resulting space.
Finally, as in the first step, they counteract $V^T$ by applying $V$ .

The geometry of the transpose

But the second step is the actual difference between $A^{-1}$ and $A^{T}$ . While the former undoes the stretch of $A$ , the second simply stretches by the same amount. Thus, you can roughly think of $A^{T}$ as rotating in the opposite direction as $A$ but strechting by the same way.

The four fundamental spaces

The relationship between $A$ , its transpose and the SVD is even deeper than you've just seen. The four fundamental spaces of $A$ are $\operatorname{Ker}(L_A)$ , $\operatorname{Im}(L_A)$ , $\operatorname{Ker}(L_{A^T})$ and $\operatorname{Im}(L_{A^T})$ . They're connected to each other by several remarkable relations. The most important one says that once you've computed the SVD, you have enough information to reconstruct every space!

$\{ \mathbf{u}_1, \mathbf{u}_2, \dots, \mathbf{u}_r \}$ is an orthonormal basis of $\operatorname{Im}(L_A)$ .
$\{ \mathbf{u}_{r+1}, \mathbf{u}_{r+2}, \dots, \mathbf{u}_m \}$ is an orthonormal basis of $\operatorname{Ker}(L_{A^T})$ .
$\{ \mathbf{v}_1, \mathbf{v}_2, \dots, \mathbf{v}_r \}$ is an orthonormal basis of $\operatorname{Im}(L_{A^T})$ .
$\{ \mathbf{v}_{r+1}, \mathbf{v}_{r+2}, \dots, \mathbf{v}_n \}$ is an orthonormal basis of $\operatorname{Ker}(L_A)$ .

An example would be a useful illustration. Consider the following matrix:

$A=\begin{pmatrix} 1 & 1 & 1 & 1 \\ 2 & 2 & -2 & -2 \\ 3 & -3 & 3 & -3 \\ \end{pmatrix}$ An SVD for $A$ is: $U=\begin{pmatrix} 0 & 0 & 1 \\ 0 & 1 & 0 \\ 1 & 0 & 0 \\ \end{pmatrix} \qquad \Sigma=\begin{pmatrix} 6 & 0 & 0 & 0 \\ 0 & 4 & 0 & 0 \\ 0 & 0 & 2 & 0 \\ \end{pmatrix} \qquad V= \frac{1}{2} \begin{pmatrix} 1 & 1 & 1 & 1 \\ -1 & 1 & 1 & -1 \\ 1 & -1 & 1 & -1 \\ -1 & -1 & 1 & 1 \\ \end{pmatrix}$ Since it has three positive singular values, its rank is $r=3$ . This means that:

$\left\{ \begin{pmatrix} 0 \\ 0 \\1 \end{pmatrix} , \begin{pmatrix} 0 \\ 1 \\0 \end{pmatrix} , \begin{pmatrix} 1\\0\\0 \end{pmatrix} \right\}$ is an orthonormal basis for $\operatorname{Im}(L_A)$
$\operatorname{Ker}(L_{A^T}) = \{ \mathbf{0} \}$ .
$\left\{ \frac{1}{2} \begin{pmatrix} 1 \\ -1 \\ 1 \\ -1 \end{pmatrix} , \frac{1}{2} \begin{pmatrix} 1 \\ 1 \\ -1 \\ -1 \end{pmatrix} , \frac{1}{2} \begin{pmatrix} 1 \\ 1 \\ 1 \\ 1 \end{pmatrix} \right\}$ is an orthonormal basis for $\operatorname{Im}(L_{A^T})$
$\left\{ \frac{1}{2} \begin{pmatrix} 1 \\ -1 \\ -1 \\ 1 \end{pmatrix} \right\}$ is an orthonormal basis for $\operatorname{Ker}(L_A)$

An alternative form of the SVD

Let's express the SVD in a simpler way through a sum of simpler matrices that involves the singular vectors:

$A = \sigma_1 \ \mathbf{u}_1 \, \mathbf{v}_1^T + \sigma_2 \ \mathbf{u}_2 \, \mathbf{v}_2^T + \dots + \sigma_r \ \mathbf{u}_r \, \mathbf{v}_r^T = \sum_{j=1}^r \sigma_j \ \mathbf{u}_j \, \mathbf{v}_j^T$ Each term in the summation—specifically $\sigma_j \ \mathbf{u}_j \, \mathbf{v}_j^T$ , is considered a latent component of the original matrix. This is because each component represents incremental computations of the hidden elements within the matrix. Notice that the columns of each latent component are linearly dependent, so they're rank-1 matrices.

Proof

You can assume that $m < n$ (otherwise, first compute the product between $\Sigma$ and $V$ ). Then:
$\begin{align*} A &=U \Sigma V^T \\ &= \left[\begin{array}{l|l|l|l} \mathbf{u}_1 & \mathbf{u}_2 & \cdots & \mathbf{u}_m \end{array}\right] \begin{bmatrix} \sigma_1 & 0 & \dots & 0 & 0 & \dots & 0 \\ 0 & \sigma_2 & \dots & 0 & 0 & \dots & 0 \\ \vdots & \vdots& \ddots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & \sigma_m & 0 & \dots & 0 \\ \end{bmatrix} \begin{bmatrix} \mathbf{v}_1 \\ \mathbf{v}_2 \\ \vdots \\ \mathbf{v}_n \end{bmatrix} \\ &= \left[\begin{array}{l|l|l|l|l|l|l} \sigma_1 \mathbf{u}_1 & \sigma_2 \mathbf{u}_2 & \cdots & \sigma_r \mathbf{u}_m & \mathbf{0} & \cdots & \mathbf{0} \end{array}\right] \begin{bmatrix} \mathbf{v}_1 \\ \mathbf{v}_2 \\ \vdots \\ \mathbf{v}_n \end{bmatrix} \\ &= \sigma_1 \mathbf{u}_1 \mathbf{v}_1^T + \sigma_2 \mathbf{u}_2 \mathbf{v}_2^T + \dots + \sigma_m \mathbf{u}_m \mathbf{v}_m^T \end{align*}$ Since $\sigma_j = 0$ for every $j>r$ , this implies that:

$\begin{align*} A &= \sigma_1 \mathbf{u}_1 \mathbf{v}_1^T + \sigma_2 \mathbf{u}_2 \mathbf{v}_2^T + \dots + \sigma_m \mathbf{u}_m \mathbf{v}_m^T \\ &= \sigma_1 \mathbf{u}_1 \mathbf{v}_1^T + \sigma_2 \mathbf{u}_2 \mathbf{v}_2^T + \dots + \sigma_r \mathbf{u}_r \mathbf{v}_r^T + 0 \\ &= \sigma_1 \mathbf{u}_1 \mathbf{v}_1^T + \sigma_2 \mathbf{u}_2 \mathbf{v}_2^T + \dots + \sigma_r \mathbf{u}_r \mathbf{v}_r^T \end{align*}$

$\blacksquare$

In order to get this alternative form, you'll study this matrix: $A=\left( \begin{array}{ccc} 1 & 0 & 2 \\ 0 & 1 & 1 \\ -2 & -1 & 1 \\ \end{array} \right)$ Its SVD is:

$U=\left( \begin{array}{ccc} \sqrt{\frac{2}{3}} & \sqrt{\frac{2}{15}} & -\frac{1}{\sqrt{5}} \\ \frac{1}{\sqrt{6}} & \frac{1}{\sqrt{30}} & \frac{2}{\sqrt{5}} \\ \frac{1}{\sqrt{6}} & -\sqrt{\frac{5}{6}} & 0 \\ \end{array} \right) \qquad \Sigma=\left( \begin{array}{ccc} \sqrt{6} & 0 & 0 \\ 0 & \sqrt{6} & 0 \\ 0 & 0 & 1 \\ \end{array} \right)\qquad V=\left( \begin{array}{ccc} 0 & \frac{2}{\sqrt{5}} & -\frac{1}{\sqrt{5}} \\ 0 & \frac{1}{\sqrt{5}} & \frac{2}{\sqrt{5}} \\ 1 & 0 & 0 \\ \end{array} \right)$ Thus:

$\sigma_1 \, \mathbf{u}_1 \, \mathbf{v}_1^T = \sqrt{6} \begin{pmatrix} \sqrt{\frac{2}{3}} \\ \frac{1}{\sqrt{6}} \\ \frac{1}{\sqrt{6}} \\ \end{pmatrix} \begin{pmatrix} 0& 0&1 \end{pmatrix} = \begin{pmatrix} 0 & 0 & 2 \\ 0 & 0 & 1 \\ 0 & 0 & 1 \\ \end{pmatrix}$

$\sigma_2 \, \mathbf{u}_2 \, \mathbf{v}_2^T =\sqrt{2} \begin{pmatrix} \sqrt{\frac{2}{15}} \\ \frac{1}{\sqrt{30}} \\ -\sqrt{\frac{5}{6}} \\ \end{pmatrix} \begin{pmatrix} \frac{2}{\sqrt{5}} & \frac{1}{\sqrt{5}} & 0 \\ \end{pmatrix} = \begin{pmatrix} \frac{4}{5} & \frac{2}{5} & 0 \\ \frac{2}{5} & \frac{1}{5} & 0 \\ -2 & -1 & 0 \\ \end{pmatrix}$ $\sigma_3 \, \mathbf{u}_3 \, \mathbf{v}_3^T = \begin{pmatrix} -\frac{1}{\sqrt{5}} \\ \frac{2}{\sqrt{5}} \\ 0 \end{pmatrix} \begin{pmatrix} -\frac{1}{\sqrt{5}} & \frac{2}{\sqrt{5}} & 0 \\ \end{pmatrix} = \begin{pmatrix} \frac{1}{5} & -\frac{2}{5} & 0 \\ -\frac{2}{5} & \frac{4}{5} & 0 \\ 0 & 0 & 0 \end{pmatrix}$ Now, putting it all together:

$\begin{align*} \sum_{j=1}^3 \sigma_j \ \mathbf{u}_j \, \mathbf{v}_j^T &= \sigma_1 \ \mathbf{u}_1 \, \mathbf{v}_1^T +\sigma_2 \ \mathbf{u}_2 \, \mathbf{v}_2^T + \sigma_3 \ \mathbf{u}_3 \, \mathbf{v}_3^T \\ & =\begin{pmatrix} 0 & 0 & 2 \\ 0 & 0 & 1 \\ 0 & 0 & 1 \\ \end{pmatrix} + \begin{pmatrix} \frac{4}{5} & \frac{2}{5} & 0 \\ \frac{2}{5} & \frac{1}{5} & 0 \\ -2 & -1 & 0 \\ \end{pmatrix}+ \begin{pmatrix} \frac{1}{5} & -\frac{2}{5} & 0 \\ -\frac{2}{5} & \frac{4}{5} & 0 \\ 0 & 0 & 0 \end{pmatrix} = A \end{align*}$

Extra: SVD for the linear transformation associated with A

$\begin{align*} L_A(\mathbf{x}) &= A \mathbf{x} = U \Sigma V^T \mathbf{x} \\ &= U \Sigma \begin{pmatrix} \mathbf{v_1} \cdot \mathbf{x} \\ \mathbf{v_2} \cdot \mathbf{x} \\ \vdots \\ \mathbf{v_n} \cdot \mathbf{x} \end{pmatrix} = U \begin{pmatrix}\sigma_1 \, \mathbf{v_1} \cdot \mathbf{x} \\ \sigma_2 \,\mathbf{v_2} \cdot \mathbf{x} \\ \vdots \\ \sigma_r \, \mathbf{v_r} \cdot \mathbf{x} \\ 0 \\ \vdots \\ 0 \end{pmatrix} \\ &= \sigma_1\left( \mathbf{v_1} \cdot \mathbf{x} \right) \ \mathbf{u_1} + \sigma_2 \left( \mathbf{v_2} \cdot \mathbf{x} \right) \ \mathbf{u_2} + \dots +\sigma_r \left( \mathbf{v_r} \cdot \mathbf{x} \right) \ \mathbf{u_r} \end{align*}$ Note that after computing the SVD, you don't have to calculate the value of $L_A$ in any vector $x$ explicitly anymore.

$\blacksquare$

Truncated SVD

The alternative form of the SVD is the most important source of the applications of this decomposition. The more latent components you add, the closer you get to the matrix. Each of these partial sums is known as a truncated singular value decomposition. For this reason, for every $k \in \{1, \dots, r\}$ we define:

$A_k = \sum_{j=1}^k \sigma_j \ \mathbf{u}_j \, \mathbf{v}_j^T$ The important thing about all of this is that, for all $1 \leq k \leq r$ , among all the matrices of rank $k$ , $A_k$ is the one that most resembles A. This is the main reason why SVD is used in real applications. You can interpret it as the SVD arranging $A$ into its “most important” and “least important” pieces. For this reason, the largest singular values describe the broad strokes of $A$ , whilst the smallest singular values take care of the finer details.

The best way to approximate a high-rank matrix by a low-rank one is by discarding the pieces of its singular value decomposition which have the smallest singular values.

Let's compute the truncated SVD for the matrix from the previous section:

$A=\begin{pmatrix} 1 & 1 & 1 & 1 \\ 2 & 2 & -2 & -2 \\ 3 & -3 & 3 & -3 \\ \end{pmatrix}$ Its latent components are: $\sigma_1 \, \mathbf{u}_1 \, \mathbf{v}_1^T = \begin{pmatrix} 0 & 0 & 2 \\ 0 & 0 & 1 \\ 0 & 0 & 1 \\ \end{pmatrix} \qquad \sigma_2 \, \mathbf{u}_2 \, \mathbf{v}_2^T = \begin{pmatrix} \frac{4}{5} & \frac{2}{5} & 0 \\ \frac{2}{5} & \frac{1}{5} & 0 \\ -2 & -1 & 0 \\ \end{pmatrix} \qquad \sigma_3 \, \mathbf{u}_3 \, \mathbf{v}_3^T = \begin{pmatrix} \frac{1}{5} & -\frac{2}{5} & 0 \\ -\frac{2}{5} & \frac{4}{5} & 0 \\ 0 & 0 & 0 \end{pmatrix}$ Then, the best approximations (of rank $1$ , $2$ and $3$ respectively) for $A$ are:

$A_1 = \begin{pmatrix} 0 & 0 & 2 \\ 0 & 0 & 1 \\ 0 & 0 & 1 \\ \end{pmatrix} \qquad A_2 = \left( \begin{array}{ccc} \frac{4}{5} & \frac{2}{5} & 2 \\ \frac{2}{5} & \frac{1}{5} & 1 \\ -2 & -1 & 1 \\ \end{array} \right) \qquad A_3 = \left( \begin{array}{ccc} 1 & 0 & 2 \\ 0 & 1 & 1 \\ -2 & -1 & 1 \\ \end{array} \right)$

Image compression

Truncated singular value decomposition often retains a stunningly large level of accuracy even when the values of $k$ are much smaller than $r$ . This is because, in real-world matrices, only a minuscule proportion of singular values are large. As a result, $A_k$ serves as an accurate approximation of A.

This is particularly useful for image compression. A black and white image can be represented as a matrix with values from 0 to 255, where 0 is full black and 255 equals white. As the numbers increase, lighter and lighter shades are obtained. Let's see truncated SVD in action with this cute panda:

A black and white image

This image corresponds to a $350 \times 634$ matrix $A$ . Since every column is nearly unique, the rank of $A$ is $350$ —the biggest possible. This implies that there are $350$ latent components. The first singular value is larger, and the first latent component is the best rank- $1$ approximation to the image:

The first latent component

Perhaps it's not a good approximation, but note that as it's rank $1$ , every row is multiple of any other one—and the same occurs with the columns. Now look at the approximation with $k=5$ :

Truncated SVD with k=5

It's getting better with only $5$ singular values. But when $k=10, 20, 50$ the results are amazing:

Truncated SVD with k=10

Truncated SVD with k=20

Truncated SVD with k=50

$50$ singular values are excellent, and note that this is much less than $350$ . As the next singular values are negligible, when $k=100,200$ the approximation is so good that the difference is not even distinguished anymore:

Truncated SVD with k=100

Truncated SVD with k=200

Conclusion

When every singular value of $A$ is positive, the matrix is invertible and $A^{-1}=V \Sigma^{-1} U^T$ .
The geometry of $A^{-1}$ and $A^T$ is closely related to that of $A$ .
The four fundamental spaces of $A$ are $\operatorname{Ker}(L_A)$ , $\operatorname{Im}(L_A)$ , $\operatorname{Ker}(L_{A^T})$ and $\operatorname{Im}(L_{A^T})$ . The SVD of $A$ gives you an orthonormal basis for every such space.
The alternative form of the SVD of $A$ is the sum of its latent components $A = \sum_{j=1}^r \sigma_j \ \mathbf{u}_j \, \mathbf{v}_j^T$ .
The best way to approximate an $r-$ rank matrix $A$ by a $k-$ rank one ( $k \leq r$ ) is its truncated SVD $A_k = \sum_{j=1}^r \sigma_j \ \mathbf{u}_j \, \mathbf{v}_j^T$ .
The singular values are ordered non-increasingly $\sigma_1 \geq \sigma_2 \geq \dots \geq \sigma_r > 0$ .

5 learners liked this piece of theory. 0 didn't like it. What about you?

Report a typo