MathAlgebraLinear algebraMatrix decomposition

Diagonalization of matrices

6 minutes read

Matrix diagonalization is very significant in mathematics. It allows you to unravel complex problems and reveal hidden insights. It allows for navigating the intricate world of eigenvectors, characteristic polynomials, and the transformative nature of matrix operations.

Also, diagonalization plays a vital role in various areas of mathematics, science, and engineering. Prepare to unlock the true potential of matrices.

Diagonal matrices

After the identity matrix, the diagonal matrices are the easiest to manipulate. With zero entries off the main diagonal, calculations involving diagonal matrices become significantly streamlined. As most entries cancel out, basic operations such as addition, subtraction, and multiplication become straightforward. This property greatly reduces the complexity and computational burden of diagonal matrix operations.

Take an n×nn \times n diagonal matrix AA with diagonal entries a1,a2,,ana_1, a_2, \dots, a_n. Transforming a vector x=(x1,x2,,xn)tx = (x_1, x_2, \dots, x_n)^t would be as easy as:

Ax=[a1000a2000an][x1x2xn]=[a1x1a2x2anxn]A x = \begin{bmatrix} a_{1} & 0 & \dots & 0 \\ 0 & a_{2} & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & a_{n} \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix} = \begin{bmatrix} a_1 x_1 \\ a_2 x_2 \\ \vdots \\ a_n x_n \end{bmatrix}Making many matrix products is tricky, especially if their dimensions are large. But with diagonal matrices taking powers is exceptionally straightforward:

A2=AA=[a1000a2000an][a1000a2000an]=[a12000a22000an2]A^2 = AA = \begin{bmatrix} a_{1} & 0 & \dots & 0 \\ 0 & a_{2} & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & a_{n} \end{bmatrix} \begin{bmatrix} a_{1} & 0 & \dots & 0 \\ 0 & a_{2} & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & a_{n} \end{bmatrix} = \begin{bmatrix} a_{1}^2 & 0 & \dots & 0 \\ 0 & a_{2}^2 & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & a_{n}^2 \end{bmatrix}In general:

Ak=[a1k000a2k000ank]A^k = \begin{bmatrix} a_{1}^k & 0 & \dots & 0 \\ 0 & a_{2}^k & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & a_{n}^k \end{bmatrix}Okay, manipulating diagonal matrices is easy, but in real life, most matrices are not. What's the point of knowing their properties? This is where the concept of diagonalization comes into play!

Diagonalizable matrices

Diagonalizing a matrix consists of decomposing it as the product of other simpler ones. The precise definition is that a square matrix AA is diagonalizable if there exists a diagonal matrix DD and an invertible matrix PP (both square of dimension nn) such that:

A=PDP1A = P D P^{-1}It may not be a simple or promising definition, but the central idea is that a diagonalizable matrix behaves very much like a diagonal one. You can see this by taking powers:

A2=(PDP1)(PDP1)=PD(P1P)DP1=PD2P1A^2 = \left(P D P^{-1}\right) \left(P D P^{-1}\right) = P D \left(P^{-1} P \right) D P^{-1} = P D ^2 P^{-1}And in general:

Ak=PDkP1A ^k = P D^k P^{-1}

Without diagonalizing, you would have had to perform k1k-1 matrix products, but with its help, you only raise some numbers to that power and perform 22 matrix products!

But how to know if a matrix is diagonalizable? And if it is, how to find the matrices D and P? We'll answer these questions in the next topic, but for now, let's focus on the uses of diagonalization.

The Importance of diagonalization

One of the key advantages of diagonalization is its ability to simplify complex matrices. While most matrices do not have a diagonal form initially, diagonalization enables us to transform them into a diagonal matrix using a specific set of operations. This process not only facilitates a clearer understanding of the matrix's structure but also unveils its inherent properties and relationships.

In data science, diagonalization plays a crucial role in dimensionality reduction techniques. By diagonalizing a covariance matrix, you can identify the principal components that capture the most significant variation in a dataset. These main components provide a lower-dimensional data representation, allowing more efficient analysis, visualization, and modeling.

Dimensionality reduction from 3D into 2D

In programming, diagonalization finds applications in various areas, such as graph theory and network analysis. Diagonalizing the adjacency matrix of a graph allows you to extract valuable information about its connectivity, centrality measures, and community structures. This information is vital for understanding social networks, recommendation systems, and fraud detection algorithms.

Furthermore, diagonalization is closely related to eigenvalue problems, which arise in various scientific and engineering disciplines. Solving eigenvalue problems through diagonalization enables us to analyze vibrations, simulate physical systems, optimize algorithms, and predict the behavior of dynamic systems accurately.

Diagonalization in action: a weather forecast Markov chain

Imagine you live in a small city with very predictable weather. This city has three types of weather: sunny, cloudy, and snowy. The weather transitions from one type to another based only on the current weather condition. This is a random system where the future depends solely on the present state, known as a Markov chain.

Let's denote the states as 1=sunny1=sunny, 2=cloudy2=cloudy and 3=snow3=snow. We can represent the probabilities of transitioning between conditions using a transition matrix. Call it AA, with each entry aija_{ij} representing the probability of transitioning from weather state ii to state jj. Our matrix is:

A=[0.70.20.10.40.50.10.20.30.5]A=\begin{bmatrix} 0.7 & 0.2 & 0.1 \\ 0.4 & 0.5 & 0.1 \\ 0.2 & 0.3 & 0.5 \\ \end{bmatrix}

Weather Markov chain

So, for example, the probability of going from a sunny day to a cloudy day is 0.20.2. Similarly, if it is snowing today, the probability that it will be snowing tomorrow is 0.5. Note that if it snows today, then tomorrow it has to snow or change state, that is, the third row of the matrix represents the probability of transitioning from snow to another state, and therefore its entries add up to 1. The same is true for the other states.

If today is cloudy, you know the probability that tomorrow will be sunny, but how can you determine the probability that it will snow the day after tomorrow? Or that in a week it will be cloudy again? In other words, you need the probability of transition from state ii to state jj in kk steps. It turns out that this is the entry i,ji, j of the kkth-power of AA!

But taking powers of AA is not an easy task, and it is precisely here where we should diagonalize it. We won't go through the process for now. We'll just note that the matrices D=[10000.30000.4]andP=[10.1250.210.750.2111]D=\begin{bmatrix}1&0&0\\ 0&0.3&0\\ 0&0&0.4\end{bmatrix} \qquad \text{and} \qquad P=\begin{bmatrix}1&0.125&-0.2\\ 1&-0.75&-0.2\\ 1&1&1\end{bmatrix}do the work A=PDP1A = P D P^{-1}. Then, the transition probabilities in 7 steps are in the matrix:

A7=PD7P1=P[10000.370000.47]P1=[0.52430.30920.16630.52410.30940.16630.52130.31060.1680]A^7 = PD^7P^{-1} = P\begin{bmatrix}1&0&0\\ 0&0.3^7&0\\ 0&0&0.4^7\end{bmatrix} P^{-1} = \begin{bmatrix}0.5243&0.3092&0.1663\\ 0.5241&0.3094&0.1663\\ 0.5213&0.3106&0.1680\end{bmatrix}Going back to our question, if it is cloudy today, then the probability that it will be cloudy again in a week is entry 2,22,2 of A7A^7, that is, 0.30940.3094. Bonus: If you are curious, you will see that the rows of the matrix are very similar. This is a sign that the Markov chain is reaching equilibrium.

Conclusion

  • Diagonal matrices are the easiest to manipulate, have simple properties but are uncommon in practice.
  • A square matrix AA is diagonalizable if there exists a diagonal matrix DD and an invertible matrix PP such that A=PDP1A = P D P^{-1}
  • Diagonalizable matrices have properties similar to diagonals.
  • If A is diagonalizable, then its powers can be calculated as Ak=PDkP1A ^k = P D^k P^{-1}
  • Diagonalization applications span various fields, including data science, programming, and optimization.
5 learners liked this piece of theory. 0 didn't like it. What about you?
Report a typo