MathAlgebraLinear algebraMatrix decomposition

Positive definite matrices

5 minutes read

Positive definite matrices are a special family of symmetric matrices that behave like positive numbers. In this topic, you will learn about their geometry and a way to easily recognize them.

They're the last piece for building the singular value decomposition. You'll learn about their most useful properties and even define the square root of a matrix!

Definition

Let's start with the definition. Consider a symmetric matrix AA of size nn. It is called:

  • positive definite (PD) if all of its eigenvalues are strictly positive
  • positive semidefinite (PSD) if all of its eigenvalues are non-negative

For instance, take the following matrices:

(3113)(1111)\begin{pmatrix} 3 & 1 \\ 1 & 3 \end{pmatrix} \qquad \begin{pmatrix} 1 & -1 \\ -1 &1 \end{pmatrix}The eigenvalues of the first one are 44 and 22, so it's PD, while those of the second one are 22 and 00, hence it's PSD.

Although the set of symmetric matrices seems very small, it contains another family of even more special matrices with new special properties. But if it's such a small set, why bother studying them? After all, symmetric matrices are not even common in practice.

No worries. At the end of the topic, you'll discover that you can use any matrix (not necessarily square) to build a PSD one. Furthermore, you can use its properties to analyze the original matrix.

Since PD (PSD) matrices have positive (non-positive) eigenvalues, a good analogy is that they play a role similar to that of positive (non-positive) numbers. This analogy will appear naturally in the properties of these matrices. For now, note that since they're symmetric, they have a spectral decomposition:

A=UDUTA=UDU^T
Here UU is orthogonal and DD is diagonal. But now the decomposition is even better since the diagonal entries of DD (the eigenvalues of AA) are positive (non-positive). Let's take a closer look to the geometry of these new matrices. In the following let AA denote a PD (PSD) matrix

Geometry

A good way to geometrically measure the force with which AA deforms space is to calculate the dot product between a vector and its image under LAL_A. Therefore, define the quadratic form associated with AA as the function q: RnRq :\ \mathbb{R}^n \rightarrow \mathbb{R} given by:q(v)=v(LAv)=v(Av)=vTA vq(\mathbf{v}) = \mathbf{v} \cdot (L_A \mathbf{v}) = \mathbf{v} \cdot (A \mathbf{v}) = \mathbf{v}^T A \ \mathbf{v}Using the first matrix from the previous section, A=(3113)A=\begin{pmatrix} 3 & 1 \\ 1 & 3 \end{pmatrix}

you can easily compute its quadratic form as:

q(v)=vTAv=[xy][3113][xy]=[xy][3x+yx+3y]=3x2+2xy+3y2q(\mathbf{v}) = \mathbf{v}^T A \mathbf{v} = \begin{bmatrix} x & y \end{bmatrix} \begin{bmatrix} 3 & 1 \\ 1 & 3 \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} x & y \end{bmatrix} \begin{bmatrix} 3x + y\\ x + 3y \end{bmatrix} = 3x^2 +2 xy +3y^2A quadratic form is called positive definite (positive semidefinite) when q(v)>0\textcolor{black}{q(\mathbf{v}) > 0} (q(v)0\textcolor{black}{q(\mathbf{v}) \geq 0}) for every v0\mathbf{v} \neq \mathbf{0}. This name is no coincidence since a symmetric matrix is PD (PSD) if and only if its quadratic form is PD (PSD). Actually, when AA is PD, the function b(v,w)=vTAwb(\mathbf{v}, \mathbf{w}) = \mathbf{v}^T A \mathbf{w} defines an inner product in Rn\mathbb{R}^n.

Now, notice that q(v)=v(Av)=vAvcos(θ)q(\mathbf{v}) = \mathbf{v} \cdot (A \mathbf{v}) = \mid \mathbf{v} \mid \, \mid A \mathbf{v} \mid \cos(\theta) where θ\theta is the angle between v\mathbf{v} and AvA \mathbf{v}. So, when AA is PD (PSD), then q(v)>0\textcolor{black}{q(\mathbf{v}) > 0} (q(v)0\textcolor{black}{q(\mathbf{v}) \geq 0}) implies that cos(θ)>0\textcolor{black}{\cos(\theta) > 0} (cos(θ)0\textcolor{black}{\cos(\theta) \geq 0}). This in turn means that 0θ<π/2\textcolor{black}{0 \leq \theta < \pi /2} (0θπ/2\textcolor{black}{0 \leq \theta \leq \pi /2}), hence the angle is acute (or perpendicular). Thus you can think of PD (PSD) matrices as symmetric matrices that send the vectors to close places in the sense that the angle between them does not exceed π/2\pi /2.

The angle between each vector and its image under the matrix is always acute

Ellipses and the spectral decomposition

But let's go a step further by illustrating the power of our theoretical tools. Go back to the quadratic form you just calculated and consider all vectors v\mathbf{v} that satisfy q(v)=1q(\mathbf{v}) =1 that is:

3x2+2xy+3y2=13x^2 +2 xy +3y^2 = 1

If you remember anything about conics, you might recognize that this equation defines an ellipse.

However, the problem is that the "cross" term 2xy2xy makes its geometric interpretation very difficult. At this point, you're ready to use spectral decomposition to remove that cross term. Construct a spectral decomposition for AA with the pieces U=12(1111)U= \frac{1}{\sqrt{2}} \begin{pmatrix} 1 & -1 \\ 1 & 1 \end{pmatrix} and D=(4002)D = \begin{pmatrix} 4 & 0 \\ 0 & 2 \end{pmatrix}. As A=UDUTA=UDU^T you can reconstruct the quadratic form just as:

q(v)=vTA v=vTUDUT v=(UTv)TD(UTv)q(\mathbf{v}) = \mathbf{v}^T A \ \mathbf{v} = \mathbf{v}^T U D U ^T \ \mathbf{v} = \left(U^T \mathbf{v} \right) ^T D \left(U^T \mathbf{v} \right)

It's easy to see that UTv=12(1111)(xy)=12(x+yx+y)U^T \mathbf{v} = \frac{1}{\sqrt{2}} \begin{pmatrix} 1 & 1 \\ -1 & 1 \end{pmatrix} \begin{pmatrix} x \\ y \end{pmatrix} = \frac{1}{\sqrt{2}} \begin{pmatrix} x + y \\ -x + y \end{pmatrix}, so:

q(v)=(UTv)TD(UTv)=12[x+yx+y][4002][x+yx+y]=12[x+yx+y][4(x+y)2(x+y)]=12[4(x+y)2+2(x+y)2]=2(x+y)2+(x+y)2\begin{align*} q(\mathbf{v}) &= \left(U^T \mathbf{v} \right) ^T D \left(U^T \mathbf{v} \right) \\ & = \frac{1}{2} \begin{bmatrix} x + y & -x + y \end{bmatrix}\begin{bmatrix} 4 & 0 \\ 0 & 2 \end{bmatrix} \begin{bmatrix} x + y \\ -x + y \end{bmatrix} \\ &= \frac{1}{2} \begin{bmatrix} x + y & -x + y \end{bmatrix} \begin{bmatrix} 4(x + y) \\ 2(-x + y) \end{bmatrix} \\ &= \frac{1}{2} \left[ 4(x + y)^2 + 2(-x + y) ^2 \right] \\ &= 2(x + y)^2 + (-x + y) ^2 \end{align*}Great, no more cross terms. In fact, this trick always works: every PD matrix represents an ellipse (or its high-dimensional analogous) with cross terms that can be deleted thanks to the spectral decomposition. Also, take a look at columns of UU, they point in the directions of the principal axes of the ellipse.

A rotated ellipse

How to detect positive definiteness

You can tell if a matrix is symmetric or diagonal with just a glance, but to know if it's PD (PSD), you have to calculate all its eigenvalues, which isn't very straightforward.
The most popular method to facilitate this task is called Sylvester's criterion. It consists of calculating nn determinants in total. Specifically, for each number kk between 11 and nn, you have to compute the top-left k×kk \times k minor of AA. For example, for a matrix of size 33 the top-left minors look like this:

The top-left k minors of a matrix

The Sylvester criterion states that a symmetric matrix AA is PD if and only if every top-left k×kk \times k minor is positive.

You already know that the matrix (3113)\begin{pmatrix} 3 & 1 \\ 1 & 3 \end{pmatrix} is PD. Its top-left minors are 3=3\begin{vmatrix} 3 \end{vmatrix}=3 and 3113=8\begin{vmatrix} 3 & 1 \\ 1 & 3 \end{vmatrix} = 8, so our test is consistent. Now take a bigger matrix
(201030102)\left( \begin{array}{ccc} 2 & 0 & -1 \\ 0 & 3 & 0 \\ -1 & 0 & 2 \\ \end{array} \right)Its top-left minors are all positive:

2=22003=6201030102=9\begin{vmatrix} 2 \end{vmatrix} = 2 \qquad \begin{vmatrix} 2 & 0 \\ 0 & 3 \end{vmatrix}=6 \qquad \begin{vmatrix} 2 & 0 & -1 \\ 0 & 3 & 0 \\ -1 & 0 & 2 \\ \end{vmatrix} = 9so, the matrix is PD.

Please note that the Sylvester criterion only works for PD matrices.

Leveraging PD matrices

Any real number xx satisfies that x20x^2 \geq 0. Something very similar happens with matrices, and it is the reason why you're studying PD (PSD) matrices: For any invertible (square) matrix BB, the matrix BTBB^T B is PD (PSD).

What is more surprising is that the converse is also true. If a matrix AA is PD (PSD), then there exists an invertible (square) BB such that BTB=AB^T B = A. In that sense, it's as if BB were the square root of AA, but there's a problem. It's possible to find several matrices CC with the property that CTC=AC^T C = A. Despite this fact, all of these square-root candidates are connected to each other. This result is known as the orthogonal freedom, and it states that if BTB=CTCB^T B = C^T C then there exists an orthogonal matrix UU such that B=UCB=UC. This result may seem boring to you, but it is the key to the construction of the spectral decomposition.

The number 44 has two possible square roots, 22 and 2-2, but among them 22 is the only one that is still a positive number. Indeed something very similar occurs with any PSD matrix AA, since there's always a PSD matrix PP with the property that P2=AP^2 =A. It's known as the principal root of AA and is denoted as A\sqrt{A}. You can calculate it easily thanks to the spectral decomposition, because if A=UDUTA = UDU^T, then by defining the matrix D\sqrt{D} as the matrix whose entries are the square roots of those of DD you get that:A=UD UT\sqrt{A} = U \sqrt{D} \ U^T

Finally, just as you define the absolute value of any number xx as x=x2\mid x \mid =\sqrt{x^2}, the same goe for any square matrix. So, if MM is a square matrix, then MT MM^T\ M is PSD and therefore has a square root. Thus, you define M =MTM\mid M \mid \ = \sqrt{M^T M}.

Actually, this matrix is intimately connected with MM since they have several properties, but it has the advantage of being PSD. The definitive connection between them is known as the polar decomposition. It establishes that for any square matrix MM there exists an orthogonal one such that:

M=U MM = U\ |M|

Conclusion

Let's review everything you've learned so far. Consider a symmetric square matrix AA.

  • AA is positive definite PD (positive semidefinite PSD) if all of its eigenvalues are positive (non-negative).
  • The quadratic form associated with AA is the function q: RnRq :\ \mathbb{R}^n \rightarrow \mathbb{R} given by q(v)=vTA vq(\mathbf{v}) = \mathbf{v}^T A \ \mathbf{v}.

  • AA is PD (PSD) if and only if its quadratic form is PD (PSD).

  • AA is PD if and only if every top-left minor is positive.

  • If AA is PSD, then its principal root is the unique PSD matrix A\sqrt{A} such that (A)2=A(\sqrt{A})^2 = A.

  • For every square matrix MM, the new matrix MT MM^T\ M is PSD. Then, the absolute value of MM is M =MTM\mid M \mid \ = \sqrt{M^T M}

3 learners liked this piece of theory. 0 didn't like it. What about you?
Report a typo