MathAlgebraLinear algebraMatrix decomposition

Positive definite matrices

Provided by: Edvancium

5 minutes read

Positive definite matrices are a special family of symmetric matrices that behave like positive numbers. In this topic, you will learn about their geometry and a way to easily recognize them.

They're the last piece for building the singular value decomposition. You'll learn about their most useful properties and even define the square root of a matrix!

Definition

Let's start with the definition. Consider a symmetric matrix $A$ of size $n$ . It is called:

positive definite (PD) if all of its eigenvalues are strictly positive
positive semidefinite (PSD) if all of its eigenvalues are non-negative

For instance, take the following matrices:

$\begin{pmatrix} 3 & 1 \\ 1 & 3 \end{pmatrix} \qquad \begin{pmatrix} 1 & -1 \\ -1 &1 \end{pmatrix}$ The eigenvalues of the first one are $4$ and $2$ , so it's PD, while those of the second one are $2$ and $0$ , hence it's PSD.

Although the set of symmetric matrices seems very small, it contains another family of even more special matrices with new special properties. But if it's such a small set, why bother studying them? After all, symmetric matrices are not even common in practice.

No worries. At the end of the topic, you'll discover that you can use any matrix (not necessarily square) to build a PSD one. Furthermore, you can use its properties to analyze the original matrix.

Since PD (PSD) matrices have positive (non-positive) eigenvalues, a good analogy is that they play a role similar to that of positive (non-positive) numbers. This analogy will appear naturally in the properties of these matrices. For now, note that since they're symmetric, they have a spectral decomposition:

$A=UDU^T$
Here $U$ is orthogonal and $D$ is diagonal. But now the decomposition is even better since the diagonal entries of $D$ (the eigenvalues of $A$ ) are positive (non-positive). Let's take a closer look to the geometry of these new matrices. In the following let $A$ denote a PD (PSD) matrix

Geometry

A good way to geometrically measure the force with which $A$ deforms space is to calculate the dot product between a vector and its image under $L_A$ . Therefore, define the quadratic form associated with $A$ as the function $q :\ \mathbb{R}^n \rightarrow \mathbb{R}$ given by: $q(\mathbf{v}) = \mathbf{v} \cdot (L_A \mathbf{v}) = \mathbf{v} \cdot (A \mathbf{v}) = \mathbf{v}^T A \ \mathbf{v}$ Using the first matrix from the previous section, $A=\begin{pmatrix} 3 & 1 \\ 1 & 3 \end{pmatrix}$

you can easily compute its quadratic form as:

$q(\mathbf{v}) = \mathbf{v}^T A \mathbf{v} = \begin{bmatrix} x & y \end{bmatrix} \begin{bmatrix} 3 & 1 \\ 1 & 3 \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} x & y \end{bmatrix} \begin{bmatrix} 3x + y\\ x + 3y \end{bmatrix} = 3x^2 +2 xy +3y^2$ A quadratic form is called positive definite (positive semidefinite) when $\textcolor{black}{q(\mathbf{v}) > 0}$ ( $\textcolor{black}{q(\mathbf{v}) \geq 0}$ ) for every $\mathbf{v} \neq \mathbf{0}$ . This name is no coincidence since a symmetric matrix is PD (PSD) if and only if its quadratic form is PD (PSD). Actually, when $A$ is PD, the function $b(\mathbf{v}, \mathbf{w}) = \mathbf{v}^T A \mathbf{w}$ defines an inner product in $\mathbb{R}^n$ .

Now, notice that $q(\mathbf{v}) = \mathbf{v} \cdot (A \mathbf{v}) = \mid \mathbf{v} \mid \, \mid A \mathbf{v} \mid \cos(\theta)$ where $\theta$ is the angle between $\mathbf{v}$ and $A \mathbf{v}$ . So, when $A$ is PD (PSD), then $\textcolor{black}{q(\mathbf{v}) > 0}$ ( $\textcolor{black}{q(\mathbf{v}) \geq 0}$ ) implies that $\textcolor{black}{\cos(\theta) > 0}$ ( $\textcolor{black}{\cos(\theta) \geq 0}$ ). This in turn means that $\textcolor{black}{0 \leq \theta < \pi /2}$ ( $\textcolor{black}{0 \leq \theta \leq \pi /2}$ ), hence the angle is acute (or perpendicular). Thus you can think of PD (PSD) matrices as symmetric matrices that send the vectors to close places in the sense that the angle between them does not exceed $\pi /2$ .

The angle between each vector and its image under the matrix is always acute

Ellipses and the spectral decomposition

But let's go a step further by illustrating the power of our theoretical tools. Go back to the quadratic form you just calculated and consider all vectors $\mathbf{v}$ that satisfy $q(\mathbf{v}) =1$ that is:

$3x^2 +2 xy +3y^2 = 1$

If you remember anything about conics, you might recognize that this equation defines an ellipse.

However, the problem is that the "cross" term $2xy$ makes its geometric interpretation very difficult. At this point, you're ready to use spectral decomposition to remove that cross term. Construct a spectral decomposition for $A$ with the pieces $U= \frac{1}{\sqrt{2}} \begin{pmatrix} 1 & -1 \\ 1 & 1 \end{pmatrix}$ and $D = \begin{pmatrix} 4 & 0 \\ 0 & 2 \end{pmatrix}$ . As $A=UDU^T$ you can reconstruct the quadratic form just as:

$q(\mathbf{v}) = \mathbf{v}^T A \ \mathbf{v} = \mathbf{v}^T U D U ^T \ \mathbf{v} = \left(U^T \mathbf{v} \right) ^T D \left(U^T \mathbf{v} \right)$

It's easy to see that $U^T \mathbf{v} = \frac{1}{\sqrt{2}} \begin{pmatrix} 1 & 1 \\ -1 & 1 \end{pmatrix} \begin{pmatrix} x \\ y \end{pmatrix} = \frac{1}{\sqrt{2}} \begin{pmatrix} x + y \\ -x + y \end{pmatrix}$ , so:

$\begin{align*} q(\mathbf{v}) &= \left(U^T \mathbf{v} \right) ^T D \left(U^T \mathbf{v} \right) \\ & = \frac{1}{2} \begin{bmatrix} x + y & -x + y \end{bmatrix}\begin{bmatrix} 4 & 0 \\ 0 & 2 \end{bmatrix} \begin{bmatrix} x + y \\ -x + y \end{bmatrix} \\ &= \frac{1}{2} \begin{bmatrix} x + y & -x + y \end{bmatrix} \begin{bmatrix} 4(x + y) \\ 2(-x + y) \end{bmatrix} \\ &= \frac{1}{2} \left[ 4(x + y)^2 + 2(-x + y) ^2 \right] \\ &= 2(x + y)^2 + (-x + y) ^2 \end{align*}$ Great, no more cross terms. In fact, this trick always works: every PD matrix represents an ellipse (or its high-dimensional analogous) with cross terms that can be deleted thanks to the spectral decomposition. Also, take a look at columns of $U$ , they point in the directions of the principal axes of the ellipse.

A rotated ellipse

How to detect positive definiteness

You can tell if a matrix is symmetric or diagonal with just a glance, but to know if it's PD (PSD), you have to calculate all its eigenvalues, which isn't very straightforward.
The most popular method to facilitate this task is called Sylvester's criterion. It consists of calculating $n$ determinants in total. Specifically, for each number $k$ between $1$ and $n$ , you have to compute the top-left $k \times k$ minor of $A$ . For example, for a matrix of size $3$ the top-left minors look like this:

The top-left k minors of a matrix

The Sylvester criterion states that a symmetric matrix

A

is PD if and only if every top-left

k \times k

minor is positive.

You already know that the matrix $\begin{pmatrix} 3 & 1 \\ 1 & 3 \end{pmatrix}$ is PD. Its top-left minors are $\begin{vmatrix} 3 \end{vmatrix}=3$ and $\begin{vmatrix} 3 & 1 \\ 1 & 3 \end{vmatrix} = 8$ , so our test is consistent. Now take a bigger matrix
$\left( \begin{array}{ccc} 2 & 0 & -1 \\ 0 & 3 & 0 \\ -1 & 0 & 2 \\ \end{array} \right)$ Its top-left minors are all positive:

$\begin{vmatrix} 2 \end{vmatrix} = 2 \qquad \begin{vmatrix} 2 & 0 \\ 0 & 3 \end{vmatrix}=6 \qquad \begin{vmatrix} 2 & 0 & -1 \\ 0 & 3 & 0 \\ -1 & 0 & 2 \\ \end{vmatrix} = 9$ so, the matrix is PD.

Please note that the Sylvester criterion only works for PD matrices.

Leveraging PD matrices

Any real number $x$ satisfies that $x^2 \geq 0$ . Something very similar happens with matrices, and it is the reason why you're studying PD (PSD) matrices: For any invertible (square) matrix $B$ , the matrix $B^T B$ is PD (PSD).

What is more surprising is that the converse is also true. If a matrix $A$ is PD (PSD), then there exists an invertible (square) $B$ such that $B^T B = A$ . In that sense, it's as if $B$ were the square root of $A$ , but there's a problem. It's possible to find several matrices $C$ with the property that $C^T C = A$ . Despite this fact, all of these square-root candidates are connected to each other. This result is known as the orthogonal freedom, and it states that if $B^T B = C^T C$ then there exists an orthogonal matrix $U$ such that $B=UC$ . This result may seem boring to you, but it is the key to the construction of the spectral decomposition.

The number $4$ has two possible square roots, $2$ and $-2$ , but among them $2$ is the only one that is still a positive number. Indeed something very similar occurs with any PSD matrix $A$ , since there's always a PSD matrix $P$ with the property that $P^2 =A$ . It's known as the principal root of $A$ and is denoted as $\sqrt{A}$ . You can calculate it easily thanks to the spectral decomposition, because if $A = UDU^T$ , then by defining the matrix $\sqrt{D}$ as the matrix whose entries are the square roots of those of $D$ you get that: $\sqrt{A} = U \sqrt{D} \ U^T$

Finally, just as you define the absolute value of any number $x$ as $\mid x \mid =\sqrt{x^2}$ , the same goe for any square matrix. So, if $M$ is a square matrix, then $M^T\ M$ is PSD and therefore has a square root. Thus, you define $\mid M \mid \ = \sqrt{M^T M}$ .

Actually, this matrix is intimately connected with $M$ since they have several properties, but it has the advantage of being PSD. The definitive connection between them is known as the polar decomposition. It establishes that for any square matrix $M$ there exists an orthogonal one such that:

$M = U\ |M|$

Conclusion

Let's review everything you've learned so far. Consider a symmetric square matrix $A$ .

$A$ is positive definite PD (positive semidefinite PSD) if all of its eigenvalues are positive (non-negative).
The quadratic form associated with $A$ is the function $q :\ \mathbb{R}^n \rightarrow \mathbb{R}$ given by $q(\mathbf{v}) = \mathbf{v}^T A \ \mathbf{v}$ .
$A$ is PD (PSD) if and only if its quadratic form is PD (PSD).
$A$ is PD if and only if every top-left minor is positive.
If $A$ is PSD, then its principal root is the unique PSD matrix $\sqrt{A}$ such that $(\sqrt{A})^2 = A$ .
For every square matrix $M$ , the new matrix $M^T\ M$ is PSD. Then, the absolute value of $M$ is $\mid M \mid \ = \sqrt{M^T M}$

3 learners liked this piece of theory. 0 didn't like it. What about you?

Report a typo