MathAlgebraLinear algebraMatrix decomposition

Pseudoinverse

Provided by: Edvancium

6 minutes read

Systems of linear equations gave rise to linear algebra. But it has been a while since you began to learn this branch of mathematics. You have learned a lot of sophisticated techniques and concepts. At this point, solving a system of equations must seem easy. However, your new favorite tool, singular value decomposition, has much to say about it. Actually, thanks to it, you'll look at solving systems of linear equations in a slightly different way.

In this topic, you will learn how to calculate the solution of any system of equations. But wait, not all systems have solutions, and some even have infinite solutions. So, how can you talk about "the solution" of any system? Get ready to get a little dizzy and develop an infallible tool!

In the following, you'll be working with an $m \times n$ matrix $A$ of rank $r$ . Also, consider a vector $\mathbf{b}$ in $\mathbb{R}^m$ .

Definition

As you already know, every matrix has an SVD. This decomposition allows you to understand the properties of the matrix deeply. Remarkably, the most useful form of SVD is the following:

$A = \sum_{j=1}^r \sigma_j \ \mathbf{u}_j \, \mathbf{v}_j^T$ Let's investigate the relationship between the decomposition and the inverse of the matrix. Remember that if $A$ is the square of size $n$ and all of its singular values are positive, then it's invertible, and its inverse is given by:

$A^{-1} =V \Sigma^{-1} U^T = \sum_{j=1}^n \frac{1}{\sigma_j} \ \mathbf{v}_j \, \mathbf{u}_j^T$ For a general matrix, only the first $r$ singular values are non-zero. So, you draw inspiration from the last expression to define the closest thing to the inverse of $A$ , namely, its pseudoinverse:
$A^{\dag} = \sum_{j=1}^r \frac{1}{\sigma_j} \ \mathbf{v}_j \, \mathbf{u}_j^T$ Note that the size of $A^{\dag}$ is $n \times m$ . With the original form of SVD, if $A = U\Sigma V^T$ , then: $A^{\dag} = V \Sigma ^{\dag} U^T$ where $\Sigma ^{\dag}$ is the diagonal matrix whose non-zero entries are the reciprocals of the non-zero entries of $\Sigma$ and its zero columns are moved as rows at the bottom so that $\Sigma ^{\dag}$ has size $n \times m$ as opposed with $\Sigma$ whose size is $m \times n$ .

This means that, although not every matrix is invertible, thanks to SVD, every matrix (even rectangular ones) has a pseudoinverse! As a first application, note that when the matrix is square of size $n$ and is invertible, its pseudoinverse coincides with the inverse. This is because, in such a case, $\Sigma ^{\dag} = \Sigma ^{-1}$ which implies that:

$A^{\dag} A = \left(V \Sigma ^{-1} U^T\right) \left( U \Sigma V^T\right) = \left(V \Sigma ^{-1} \right) \left( \Sigma V^T\right) =V V^T = I$

In summary, you've just proved that:

A

is invertible, then

A^{\dag} = A^{-1}

Before delving into the properties of the pseudoinverse, let's see how you can easily calculate it.

Calculation

You already have plenty of practice calculating the SVD and are familiar with the advantages it brings. You can use this decomposition to construct the pseudoinverse easily as you just saw. Let's see an example. Take the matrix:

$A=\left( \begin{array}{cccc} 1 & 1 & 1 & 1 \\ 2 & 2 & -2 & -2 \\ 3 & -3 & 3 & -3 \\ \end{array} \right)$ It's clear that it isn't invertible, but as it always has an SVD, it has a pseudoinverse. First of all, an SVD for $A$ is given by:

$U = \left( \begin{array}{ccc} 0 & 0 & 1 \\ 0 & 1 & 0 \\ 1 & 0 & 0 \\ \end{array} \right) \qquad \Sigma= \left( \begin{array}{cccc} 6 & 0 & 0 & 0 \\ 0 & 4 & 0 & 0 \\ 0 & 0 & 2 & 0 \\ \end{array} \right) \qquad V= \frac{1}{2} \left( \begin{array}{cccc} 1 & 1 & 1 & 1 \\ -1 & 1 & 1 & -1 \\ 1 & -1 & 1 & -1 \\ -1 & -1 & 1 & 1 \\ \end{array} \right)$ This means that:

$\Sigma^{\dag }= \left( \begin{array}{cccc} \frac{1}{6} & 0 & 0 \\ 0 & \frac{1}{4} & 0 \\ 0 & 0 & \frac{1}{2} \\ 0 & 0 & 0 \end{array} \right)$ Thus:

$\begin{align*} A^{\dag} &= V \Sigma ^{\dag} U^T \\ &= \frac{1}{2} \left( \begin{array}{cccc} 1 & 1 & 1 & 1 \\ -1 & 1 & 1 & -1 \\ 1 & -1 & 1 & -1 \\ -1 & -1 & 1 & 1 \\ \end{array} \right) \left( \begin{array}{cccc} \frac{1}{6} & 0 & 0 \\ 0 & \frac{1}{4} & 0 \\ 0 & 0 & \frac{1}{2} \\ 0 & 0 & 0 \end{array} \right) \left( \begin{array}{ccc} 0 & 0 & 1 \\ 0 & 1 & 0 \\ 1 & 0 & 0 \\ \end{array} \right) \\ &= \left( \begin{array}{ccc} \frac{1}{4} & \frac{1}{8} & \frac{1}{12} \\ \frac{1}{4} & \frac{1}{8} & -\frac{1}{12} \\ \frac{1}{4} & -\frac{1}{8} & \frac{1}{12} \\ \frac{1}{4} & -\frac{1}{8} & -\frac{1}{12} \\ \end{array} \right) \end{align*}$ Up to this point, you know that every matrix has a pseudoinverse, which is easy to calculate after obtaining the SVD, and that when the original matrix is invertible, it reduces to the inverse. The pseudoinverse allows you to find a unique solution. Now, you will explore important applications of this new tool: systems of equations where there is no unique solution.

The best solution among infinite

If you try to solve the system $A \mathbf{x} = \mathbf{b}$ given by:

$A= \begin{pmatrix} 1 & 2 & 3 \\ 2 & 3 & 4 \\ \end{pmatrix} \qquad \mathbf{b}= \begin{pmatrix} 2 \\ -1 \\ \end{pmatrix}$ you should get that the solutions are:

$\begin{pmatrix} -8 \\ 5 \\ 0 \\ \end{pmatrix} + \lambda \begin{pmatrix} 1 \\ -2 \\ 1 \\ \end{pmatrix} \quad \text{for any } \lambda \in \mathbb{R}$

At first glance, it is not obvious which of all the possible solutions is the simplest. For instance, $\lambda = 100$ is perfectly valid, but the associated solution would look like $( 92, -195, 100 )^T$ , quite ugly. The pseudo-inverse can help: $A^{\dag}= \frac{1}{3} \begin{pmatrix} -\frac{11}{2} & 4 \\ -1 & 1 \\ \frac{7}{2} & -2 \\ \end{pmatrix}$ If you were to calculate $\mathbf{x}=A^{\dag} \mathbf{b} = (-5, -1, 3)^T$ , then $\mathbf{x}$ is a solution of the system since $A \mathbf{x} = \mathbf{b}$ .

But this solution has a curious quality. It is the smallest in the sense that its norm is lower than any other solution. This means that it is the solution closest to the origin.

The smallest solution

Best of all, this is generally true:

If the system

A \mathbf{x} = \mathbf{b}

has infinite solutions, then

\mathbf{x} =A^{\dag} \mathbf{b}

is a solution with the property that for any other solution

\mathbf{y}

it holds that:

\lVert \mathbf{x}\rVert \leq \lVert \mathbf{y} \rVert

The closest thing to a solution

Let's face it: very few rectangular systems of equations have solutions. This is even more common when there are more equations than variables. In such a case, there are too many conditions that a few variables have to meet.

Remind that the linear system $A \mathbf{x} = \mathbf{b}$ can be rewritten as $x_1\mathbf{a}_1 + x_2\mathbf{a}_2 + \dots + x_n\mathbf{a}_n = \mathbf{b}$ where $\mathbf{a}_1, \mathbf{a}_2, \dots, \mathbf{a}_n$ are the columns of $A$ . This means that the system has a solution only when $\mathbf{b}$ belongs to the subspace generated by the columns of $A$ , denoted by $\operatorname{Im}(A)$ . When there is no solution, you could try to approximate $\mathbf{b}$ as much as possible with some element of the $\operatorname{Im}(A)$ . That is, the vector $A \mathbf{x} \in \operatorname{Im}(A)$ such that $\lVert A \mathbf{x} - \mathbf{b} \rVert \leq \lVert A\mathbf{y} - \mathbf{b} \rVert$ for any other $\mathbf{y} \in \mathbb{R}^n$ .

The best approximation when there's no solution

The picture suggests that the vector $\mathbf{b} - A \mathbf{x}$ is orthogonal to $\operatorname{Im}(A)$ . You should think of the vector $A \mathbf{x}$ as the perpendicular projection of $\mathbf{b}$ onto $\operatorname{Im}(A)$ . It might seem tricky to find it, but the pseudoinverse to the rescue!

Consider the linear system

A \mathbf{x} = \mathbf{b}

and define:

\mathbf{x}=A^{\dag} \mathbf{b}

Then, for any other

\mathbf{y} \in \mathbb{R}^n

\lVert A \mathbf{x} - \mathbf{b} \rVert \leq \lVert A\mathbf{y} - \mathbf{b} \rVert

This optimization technique is known as least squares, and you will explore it in the next topic. For now, let's look at a simple example. Suppose you want to solve the system given by:

$A= \begin{pmatrix} 1 & 1 & 1 \\ 1 & 0 & 1 \\ 0 & 1 & 0 \end{pmatrix} \qquad \mathbf{b}= \begin{pmatrix} 1 \\ 1 \\ 1 \end{pmatrix}$ This system has no solution, so the best strategy is to approximate $\mathbf{b}$ with the projection of $\mathbf{b}$ onto $\operatorname{Im}(A)$ . For this, verify that:

$A^{\dag} = \frac{1}{3} \begin{pmatrix} \frac{1}{2} & 1 & -\frac{1}{2} \\ 1 & -1 & 2 \\ \frac{1}{2} & 1 & -\frac{1}{2} \\ \end{pmatrix} \qquad \mathbf{x}=A^{\dag} \mathbf{b} = \frac{1}{3} \begin{pmatrix} 1\\2\\1 \end{pmatrix} \qquad A \mathbf{x} = \frac{1}{3} \begin{pmatrix} 4\\2\\2 \end{pmatrix}$

Therefore, the best approximation to $b$ among all vectors of $\operatorname{Im}(A)$ is given by $A\mathbf{x} = \frac{1}{3} (4, 2, 2)^T$ and this would be your best estimation of it.

Conclusion

Every matrix $A$ has a pseudoinverse given by $A^{\dag} = \sum_{j=1}^r \frac{1}{\sigma_j} \ \mathbf{v}_j \, \mathbf{u}_j^T$ .
When $A$ matrix is invertible, $A^{\dag} = A^{-1}$ .
If the linear system $A \mathbf{x} = \mathbf{b}$ has infinitely many solutions, then $\mathbf{x}=A^{\dag} \mathbf{b}$ is the shortest one.
If the linear system $A \mathbf{x} = \mathbf{b}$ has no solutions, then $\mathbf{x}=A^{\dag} \mathbf{b}$ gives you the closest thing to a solution in the sense that for any other $\mathbf{y} \in \mathbb{R}^n$ , $A \mathbf{x}$ is closer to $\mathbf{b}$ than $A \mathbf{y}$ is.

3 learners liked this piece of theory. 0 didn't like it. What about you?

Report a typo