MathAlgebraLinear algebraMatrix decomposition

Pseudoinverse

6 minutes read

Systems of linear equations gave rise to linear algebra. But it has been a while since you began to learn this branch of mathematics. You have learned a lot of sophisticated techniques and concepts. At this point, solving a system of equations must seem easy. However, your new favorite tool, singular value decomposition, has much to say about it. Actually, thanks to it, you'll look at solving systems of linear equations in a slightly different way.

In this topic, you will learn how to calculate the solution of any system of equations. But wait, not all systems have solutions, and some even have infinite solutions. So, how can you talk about "the solution" of any system? Get ready to get a little dizzy and develop an infallible tool!

In the following, you'll be working with an m×nm \times n matrix AA of rank rr. Also, consider a vector b\mathbf{b} in Rm\mathbb{R}^m.

Definition

As you already know, every matrix has an SVD. This decomposition allows you to understand the properties of the matrix deeply. Remarkably, the most useful form of SVD is the following:

A=j=1rσj ujvjTA = \sum_{j=1}^r \sigma_j \ \mathbf{u}_j \, \mathbf{v}_j^TLet's investigate the relationship between the decomposition and the inverse of the matrix. Remember that if AA is the square of size nn and all of its singular values are positive, then it's invertible, and its inverse is given by:

A1=VΣ1UT=j=1n1σj vjujTA^{-1} =V \Sigma^{-1} U^T = \sum_{j=1}^n \frac{1}{\sigma_j} \ \mathbf{v}_j \, \mathbf{u}_j^TFor a general matrix, only the first rr singular values are non-zero. So, you draw inspiration from the last expression to define the closest thing to the inverse of AA, namely, its pseudoinverse:
A=j=1r1σj vjujTA^{\dag} = \sum_{j=1}^r \frac{1}{\sigma_j} \ \mathbf{v}_j \, \mathbf{u}_j^TNote that the size of AA^{\dag} is n×mn \times m. With the original form of SVD, if A=UΣVTA = U\Sigma V^T, then:A=VΣUTA^{\dag} = V \Sigma ^{\dag} U^Twhere Σ\Sigma ^{\dag} is the diagonal matrix whose non-zero entries are the reciprocals of the non-zero entries of Σ\Sigma and its zero columns are moved as rows at the bottom so that Σ\Sigma ^{\dag} has size n×mn \times m as opposed with Σ\Sigma whose size is m×nm \times n.

This means that, although not every matrix is invertible, thanks to SVD, every matrix (even rectangular ones) has a pseudoinverse! As a first application, note that when the matrix is square of size nn and is invertible, its pseudoinverse coincides with the inverse. This is because, in such a case, Σ=Σ1\Sigma ^{\dag} = \Sigma ^{-1} which implies that:

AA=(VΣ1UT)(UΣVT)=(VΣ1)(ΣVT)=VVT=IA^{\dag} A = \left(V \Sigma ^{-1} U^T\right) \left( U \Sigma V^T\right) = \left(V \Sigma ^{-1} \right) \left( \Sigma V^T\right) =V V^T = I

In summary, you've just proved that:

If AA is invertible, then A=A1A^{\dag} = A^{-1}

Before delving into the properties of the pseudoinverse, let's see how you can easily calculate it.

Calculation

You already have plenty of practice calculating the SVD and are familiar with the advantages it brings. You can use this decomposition to construct the pseudoinverse easily as you just saw. Let's see an example. Take the matrix:

A=(111122223333)A=\left( \begin{array}{cccc} 1 & 1 & 1 & 1 \\ 2 & 2 & -2 & -2 \\ 3 & -3 & 3 & -3 \\ \end{array} \right)It's clear that it isn't invertible, but as it always has an SVD, it has a pseudoinverse. First of all, an SVD for AA is given by:

U=(001010100)Σ=(600004000020)V=12(1111111111111111)U = \left( \begin{array}{ccc} 0 & 0 & 1 \\ 0 & 1 & 0 \\ 1 & 0 & 0 \\ \end{array} \right) \qquad \Sigma= \left( \begin{array}{cccc} 6 & 0 & 0 & 0 \\ 0 & 4 & 0 & 0 \\ 0 & 0 & 2 & 0 \\ \end{array} \right) \qquad V= \frac{1}{2} \left( \begin{array}{cccc} 1 & 1 & 1 & 1 \\ -1 & 1 & 1 & -1 \\ 1 & -1 & 1 & -1 \\ -1 & -1 & 1 & 1 \\ \end{array} \right)This means that:

Σ=(160001400012000)\Sigma^{\dag }= \left( \begin{array}{cccc} \frac{1}{6} & 0 & 0 \\ 0 & \frac{1}{4} & 0 \\ 0 & 0 & \frac{1}{2} \\ 0 & 0 & 0 \end{array} \right)Thus:

A=VΣUT=12(1111111111111111)(160001400012000)(001010100)=(1418112141811214181121418112)\begin{align*} A^{\dag} &= V \Sigma ^{\dag} U^T \\ &= \frac{1}{2} \left( \begin{array}{cccc} 1 & 1 & 1 & 1 \\ -1 & 1 & 1 & -1 \\ 1 & -1 & 1 & -1 \\ -1 & -1 & 1 & 1 \\ \end{array} \right) \left( \begin{array}{cccc} \frac{1}{6} & 0 & 0 \\ 0 & \frac{1}{4} & 0 \\ 0 & 0 & \frac{1}{2} \\ 0 & 0 & 0 \end{array} \right) \left( \begin{array}{ccc} 0 & 0 & 1 \\ 0 & 1 & 0 \\ 1 & 0 & 0 \\ \end{array} \right) \\ &= \left( \begin{array}{ccc} \frac{1}{4} & \frac{1}{8} & \frac{1}{12} \\ \frac{1}{4} & \frac{1}{8} & -\frac{1}{12} \\ \frac{1}{4} & -\frac{1}{8} & \frac{1}{12} \\ \frac{1}{4} & -\frac{1}{8} & -\frac{1}{12} \\ \end{array} \right) \end{align*}Up to this point, you know that every matrix has a pseudoinverse, which is easy to calculate after obtaining the SVD, and that when the original matrix is invertible, it reduces to the inverse. The pseudoinverse allows you to find a unique solution. Now, you will explore important applications of this new tool: systems of equations where there is no unique solution.

The best solution among infinite

If you try to solve the system Ax=bA \mathbf{x} = \mathbf{b} given by:

A=(123234)b=(21)A= \begin{pmatrix} 1 & 2 & 3 \\ 2 & 3 & 4 \\ \end{pmatrix} \qquad \mathbf{b}= \begin{pmatrix} 2 \\ -1 \\ \end{pmatrix}you should get that the solutions are:

(850)+λ(121)for any λR\begin{pmatrix} -8 \\ 5 \\ 0 \\ \end{pmatrix} + \lambda \begin{pmatrix} 1 \\ -2 \\ 1 \\ \end{pmatrix} \quad \text{for any } \lambda \in \mathbb{R}

At first glance, it is not obvious which of all the possible solutions is the simplest. For instance, λ=100\lambda = 100 is perfectly valid, but the associated solution would look like (92,195,100)T( 92, -195, 100 )^T, quite ugly. The pseudo-inverse can help:A=13(112411722)A^{\dag}= \frac{1}{3} \begin{pmatrix} -\frac{11}{2} & 4 \\ -1 & 1 \\ \frac{7}{2} & -2 \\ \end{pmatrix}If you were to calculate x=Ab=(5,1,3)T\mathbf{x}=A^{\dag} \mathbf{b} = (-5, -1, 3)^T, then x\mathbf{x} is a solution of the system since Ax=bA \mathbf{x} = \mathbf{b}.

But this solution has a curious quality. It is the smallest in the sense that its norm is lower than any other solution. This means that it is the solution closest to the origin.

The smallest solution


Best of all, this is generally true:

If the system Ax=bA \mathbf{x} = \mathbf{b} has infinite solutions, then x=Ab\mathbf{x} =A^{\dag} \mathbf{b} is a solution with the property that for any other solution y\mathbf{y} it holds that:xy\lVert \mathbf{x}\rVert \leq \lVert \mathbf{y} \rVert

The closest thing to a solution

Let's face it: very few rectangular systems of equations have solutions. This is even more common when there are more equations than variables. In such a case, there are too many conditions that a few variables have to meet.

Remind that the linear system Ax=bA \mathbf{x} = \mathbf{b} can be rewritten as x1a1+x2a2++xnan=bx_1\mathbf{a}_1 + x_2\mathbf{a}_2 + \dots + x_n\mathbf{a}_n = \mathbf{b} where a1,a2,,an\mathbf{a}_1, \mathbf{a}_2, \dots, \mathbf{a}_n are the columns of AA. This means that the system has a solution only when b\mathbf{b} belongs to the subspace generated by the columns of AA, denoted by Im(A)\operatorname{Im}(A). When there is no solution, you could try to approximate b\mathbf{b} as much as possible with some element of the Im(A)\operatorname{Im}(A). That is, the vector AxIm(A)A \mathbf{x} \in \operatorname{Im}(A) such that AxbAyb\lVert A \mathbf{x} - \mathbf{b} \rVert \leq \lVert A\mathbf{y} - \mathbf{b} \rVert for any other yRn\mathbf{y} \in \mathbb{R}^n.

The best approximation when there's no solution

The picture suggests that the vector bAx\mathbf{b} - A \mathbf{x} is orthogonal to Im(A)\operatorname{Im}(A). You should think of the vector AxA \mathbf{x} as the perpendicular projection of b\mathbf{b} onto Im(A)\operatorname{Im}(A). It might seem tricky to find it, but the pseudoinverse to the rescue!

Consider the linear system Ax=bA \mathbf{x} = \mathbf{b} and define:x=Ab\mathbf{x}=A^{\dag} \mathbf{b}Then, for any other yRn\mathbf{y} \in \mathbb{R}^n:AxbAyb\lVert A \mathbf{x} - \mathbf{b} \rVert \leq \lVert A\mathbf{y} - \mathbf{b} \rVert

This optimization technique is known as least squares, and you will explore it in the next topic. For now, let's look at a simple example. Suppose you want to solve the system given by:

A=(111101010)b=(111)A= \begin{pmatrix} 1 & 1 & 1 \\ 1 & 0 & 1 \\ 0 & 1 & 0 \end{pmatrix} \qquad \mathbf{b}= \begin{pmatrix} 1 \\ 1 \\ 1 \end{pmatrix}This system has no solution, so the best strategy is to approximate b\mathbf{b} with the projection of b\mathbf{b} onto Im(A)\operatorname{Im}(A). For this, verify that:

A=13(1211211212112)x=Ab=13(121)Ax=13(422)A^{\dag} = \frac{1}{3} \begin{pmatrix} \frac{1}{2} & 1 & -\frac{1}{2} \\ 1 & -1 & 2 \\ \frac{1}{2} & 1 & -\frac{1}{2} \\ \end{pmatrix} \qquad \mathbf{x}=A^{\dag} \mathbf{b} = \frac{1}{3} \begin{pmatrix} 1\\2\\1 \end{pmatrix} \qquad A \mathbf{x} = \frac{1}{3} \begin{pmatrix} 4\\2\\2 \end{pmatrix}

Therefore, the best approximation to bb among all vectors of Im(A)\operatorname{Im}(A) is given by Ax=13(4,2,2)TA\mathbf{x} = \frac{1}{3} (4, 2, 2)^T and this would be your best estimation of it.

Conclusion

  • Every matrix AA has a pseudoinverse given by A=j=1r1σj vjujTA^{\dag} = \sum_{j=1}^r \frac{1}{\sigma_j} \ \mathbf{v}_j \, \mathbf{u}_j^T.
  • When AA matrix is invertible, A=A1A^{\dag} = A^{-1}.
  • If the linear system Ax=bA \mathbf{x} = \mathbf{b} has infinitely many solutions, then x=Ab\mathbf{x}=A^{\dag} \mathbf{b} is the shortest one.
  • If the linear system Ax=bA \mathbf{x} = \mathbf{b} has no solutions, then x=Ab\mathbf{x}=A^{\dag} \mathbf{b} gives you the closest thing to a solution in the sense that for any other yRn\mathbf{y} \in \mathbb{R}^n, AxA \mathbf{x} is closer to b\mathbf{b} than AyA \mathbf{y} is.

3 learners liked this piece of theory. 0 didn't like it. What about you?
Report a typo