MathAlgebraLinear algebraVectors and vector spaces

Orthogonal and orthonormal bases

4 minutes read

You already know that any two vectors not lying on the same line form a basis in the plane. However, if you ask a random person to draw two coordinate axes on a checkered piece of paper, they will likely draw two perpendicular lines. This is because the concept of orthogonality complements the idea of choosing a basis in a vector space quite elegantly. The combination of these concepts is highly fruitful and finds applications everywhere, from dimensionality reduction in machine learning and Fourier analysis to the frontiers of modern technology and science. The scope of applications for orthogonal bases is so vast that providing a comprehensive outline of its full extent is a challenging task. Considering this, let’s try to understand where it all comes from.

Example

Consider a two-dimensional Euclidean space (V,,)(V,\left\lang \cdot,\cdot\right\rang). Choose arbitrary vectors v1\vec{v}_{1} and v2\vec{v}_{2} forming a basis of VV. A vector v2=v2v1,v2v1,v1v1\vec{v}'_{2} = \vec{v}_{2} - \frac{\left\lang\vec{v}_{1},\vec{v}_{2}\right\rang}{\left\lang\vec{v}_{1},\vec{v}_{1}\right\rang}\cdot\vec{v}_{1}will now be of particular interest to us. Why? Because it happens to be perpendicular to v1\vec{v}_{1}:

v1,v2=v1,v2v1,v2v1,v1v1=v1,v2v1,v2v1,v1v1,v1=0\left\lang\vec{v}_{1},\vec{v}'_{2}\right\rang = \left\lang\vec{v}_{1},\vec{v}_{2} - \frac{\left\lang\vec{v}_{1},\vec{v}_{2}\right\rang}{\left\lang\vec{v}_{1},\vec{v}_{1}\right\rang}\cdot\vec{v}_{1}\right\rang = \left\lang\vec{v}_{1},\vec{v}_{2}\right\rang - \frac{\left\lang\vec{v}_{1},\vec{v}_{2}\right\rang}{\cancel{\left\lang\vec{v}_{1},\vec{v}_{1}\right\rang}}\cdot\cancel{\left\lang\vec{v}_{1},\vec{v}_{1}\right\rang} = 0And as {v1,v2}\left\{\vec{v}_{1},\vec{v}_{2}\right\} forms a basis v1\vec{v}_{1}and v2\vec{v}'_{2} are linearly independent. Therefore, {v1,v2}\left\{\vec{v}_{1},\vec{v}_{2}'\right\} is also a basis of VV. So here, you can construct a basis with orthogonal vectors for any two-dimensional space given an arbitrary basis in it! Such a basis is called an orthogonal basis.

For example, if v1,v1=9,v2,v2=8,v1,v2=6\left\lang\vec{v}_{1},\vec{v}_{1}\right\rang = 9, \qquad \left\lang\vec{v}_{2},\vec{v}_{2}\right\rang = 8, \qquad \left\lang\vec{v}_{1},\vec{v}_{2}\right\rang = 6then v2=v2v1,v2v1,v1v1=v223v1\vec{v}_{2}' = \vec{v}_{2} - \frac{\left\lang\vec{v}_{1},\vec{v}_{2}\right\rang}{\left\lang\vec{v}_{1},\vec{v}_{1}\right\rang}\cdot\vec{v}_{1} = \vec{v}_{2} - \frac{2}{3}\vec{v}_{1}and {v1,v223v1}\{\vec{v}_{1},\vec{v}_{2} - \frac{2}{3}\vec{v}_{1}\} is an orthogonal basis (if you want, you can check it for this particular case). It is often useful to consider a basis with the lengths of all vectors equal to 11. Right now, vectors v1\vec{v}_{1} and v2\vec{v}_{2}'have lengths:

v1=v1,v1=3v2=v223v1,v223v1=v2,v243v1,v2+49v1,v1=2\|\vec{v}_{1}\| = \sqrt{\left\lang\vec{v}_{1},\vec{v}_{1}\right\rang} = 3\\[3mm] \|\vec{v}_{2}'\| = \sqrt{\left\lang\vec{v}_{2}-\frac{2}{3}\vec{v}_{1},\vec{v}_{2} - \frac{2}{3}\vec{v}_{1}\right\rang} = \sqrt{\left\lang\vec{v}_{2},\vec{v}_{2}\right\rang - \frac{4}{3}\left\lang\vec{v}_{1},\vec{v}_{2}\right\rang + \frac{4}{9} \left\lang\vec{v}_{1},\vec{v}_{1}\right\rang} = 2If you divide a vector by its length, you will end up with a vector co-directed with the initial one but with length 11. Such a process is called a normalization of a vector.

The result of the normalization of any vector w\vec{w} is always a unit vector:

ww=1ww=1ww=1\left\|\frac{\vec{w}}{\|\vec{w}\|}\right\| = \left\|\frac{1}{\|\vec{w}\|}\cdot \vec{w}\right\| = \frac{1}{\cancel{\|\vec{w}\|}}\cdot \cancel{\|\vec{w}\|} = 1

Let’s normalize vectors v1\vec{v}_{1} and v2\vec{v}_{2}':e1=v1v1=13v1e2=v2v2=12v2=12v213v1\vec{e}_{1} = \frac{\vec{v}_{1}}{\|\vec{v}_{1}\|} = \frac{1}{3}\vec{v}_{1}\\[3mm] \vec{e}_{2} = \frac{\vec{v}_{2}'}{\|\vec{v}_{2}'\|} = \frac{1}{2}\vec{v}_{2}' = \frac{1}{2}\vec{v}_{2} - \frac{1}{3}\vec{v}_{1}Vectors e1\vec{e}_{1} and e2\vec{e}_{2} have the same direction as v1\vec{v}_{1} and v2\vec{v}_{2}' (normalization does not change it), therefore {e1,e2}\{\vec{e}_{1},\vec{e}_{2}\} also forms a basis of VV, now it is orthogonal and with length-one-vectors:

e1,e1=e2,e2=1e1,e2=0\left\lang\vec{e}_{1},\vec{e}_{1}\right\rang = \left\lang\vec{e}_{2},\vec{e}_{2}\right\rang = 1\\[3mm] \left\lang\vec{e}_{1},\vec{e}_{2}\right\rang = 0Lastly, notice that these relations are the same as relations for R2\mathbb{R}^{2} with standard basis and usual dot product!

Geometric side of the story

All the algebraic manipulations introduced higher could seem a little overcomplicated. But if you look at them from a geometric point of view, it will all get more natural.

So let’s consider vectors v1\vec{v}_{1} and v2\vec{v}_{2} again, but now as arrows on a plain:

Two generic vectors on a plane.

Now notice that the vector v1,v2v1,v1v1\frac{\left\lang\vec{v}_{1},\vec{v}_{2}\right\rang}{\left\lang\vec{v}_{1}, \vec{v}_{1}\right\rang}\cdot\vec{v}_{1}is just a projection

projv1(v2)\mathbf{proj}_{\vec{v}_{1}}(\vec{v}_{2})of vector v2\vec{v}_{2} onto v1\vec{v}_{1}. This projection can be easily illustrated, using this picture:

A projection of one vector onto another

Therefore, the vector v2\vec{v}_{2}' is the vector v2\vec{v}_{2} from which you ‘removed’ its projection onto v1\vec{v}_{1}:

v2=v2projv1(v2)\vec{v}_{2}' = \vec{v}_{2} - \mathbf{proj}_{\vec{v}_{1}}(\vec{v}_{2})

You’ve already proven that v2\vec{v}'_{2} is going to be perpendicular to v1\vec{v}_{1}, however now it's also easy to see it:

Two vectors and the result of ‘removing’ a projection of second vector on the first from the second vector

Last, but not least, you normalize the vectors so that their lengths are equal to 11. In our picture, it looks like this.

The result of constructing two vectors of length 1 by shrinking the vectors of the basis

Here e1\vec{e}_{1} and e2\vec{e}_{2} are normalized versions of v1\vec{v}_{1} and v2\vec{v}_{2}' correspondingly. They form a basis on this plane, which is really similar to the one you usually choose in a checkered notebook:

An example of the way we usually draw a basis in a checkered notebook.

Higher dimensions

Now this idea of basis with length-one vectors, which are orthogonal to each other, could be adapted to an arbitrary dimension nn. First, let’s give it a name

Let (V,,)(V,\lang\cdot,\cdot\rang) be a Euclidean space (dim(V)=n\dim(V) = n). A basis {e1,e2,en}\{\vec{e}_{1},\vec{e}_{2},\dots\vec{e}_{n} \} is called orthogonal if ei,ej=0\left \lang \vec{e}_{i} , \vec{e}_{j} \right \rang = 0for any i,j{1,2,,n}i,j\in\{1,2,\dots,n\}, such that iji\ne j. It literally means that each vector of this basis is orthogonal to any other vector.

An orthonormal basis {e1,e2,en}\{\vec{e}_{1},\vec{e}_{2},\dots\vec{e}_{n} \} is called orthonormal if

ei,ei=1\left\lang\vec{e}_{i},\vec{e}_{i}\right\rang = 1for any i{1,2,,n}i \in \{1,2,\dots,n\}. This means, that besides orthogonality, each vector is a unit vector.

There is a very useful conventional symbol in mathematics, which is called Kronecker delta. It is defined in the following manner

δi,j={1,ifi=j0,ifij\delta_{i,j} = \begin{cases} 1, \quad \mathrm{if} \quad i = j\\ 0, \quad \mathrm{if} \quad i\ne j \end{cases}It means that δ4,4\delta_{4,4} and δ3,5\delta_{3,5} mean the same as 11 and 00 correspondingly. How could it be used? Well, you can say that a basis {e1,e2,en}\{\vec{e}_{1},\vec{e}_{2},\dots\vec{e}_{n} \} is orthonormal if

ei,ej=δi,j\left\lang\vec{e}_{i},\vec{e}_{j}\right\rang = \delta_{i,j}for i,j{1,2,,n}i,j\in\{1,2,\dots,n\}. This way, you can combine the two above-mentioned definitions into one.

You will learn more on why orthonormal bases are so great later. But first, let's state that such a basis could be constructed for any finite-dimensional vector space. The process of this construction is called Gram-Schmidt process, and it goes like this:

Let’s start with VV (with inner product ,\lang\cdot,\cdot\rang and dim(V)=n\dim(V) = n) and its arbitrary basis {v1,v2,,vn}\{\vec{v}_{1},\vec{v}_{2},\dots,\vec{v}_{n}\}. Introduce a new basis {w1,w2,,wn}\{\vec{w}_{1},\vec{w}_{2},\dots,\vec{w}_{n}\} in the following manner.

  1. Vector w1\vec{w}_{1} is equal to v1\vec{v}_{1}.
  2. Each following vector wk\vec{w}_{k} (for 1<kn1<k\le n) is defined as vk\vec{v}_{k}, from which you ‘removed’ all the projections on the previous vectors v1\vec{v}_{1}, v2\vec{v}_{2}, …, vk1\vec{v}_{k - 1}:wk=vkprojw1(vk)projw2(vk)projwk1(vk)\vec{w}_{k} = \vec{v}_{k} - \mathbf{proj}_{\vec{w}_{1}}(\vec{v}_{k}) - \mathbf{proj}_{\vec{w}_{2}}(\vec{v}_{k}) - \dots - \mathbf{proj}_{\vec{w}_{k - 1}}(\vec{v}_{k})These vectors turn to be orthogonal to each other and form a basis of VV.
  3. Finally, normalize vectors wk\vec{w}_{k}:ek=1wkwk\vec{e}_{k} = \frac{1}{\|\vec{w}_{k}\|}\cdot\vec{w}_{k}Vectors ek\vec{e}_{k} are still orthogonal, but also have length 11.

Therefore, {e1,e2,,en}\{\vec{e}_{1},\vec{e}_{2},\dots,\vec{e}_{n}\} is an orthonormal basis of VV.

The proof of orthogonality of {w1,w2,,wn}\{\vec{w}_{1},\vec{w}_{2},\dots,\vec{w}_{n}\} is actually quite a tricky task. Think about it like this: by ‘removing’ all the components of the vector that are co-directed with v1\vec{v}_{1}, v2\vec{v}_{2}, …, vk1\vec{v}_{k -1} you end up with a vector that is perpendicular to all of them.

Here’s an example for a better understanding. Consider three linearly independent vectors in a Euclidean space R3\mathbb{R}^{3} with the choice of dot product as an inner product (meaning that v,w=vw\lang\vec{v},\vec{w}\rang = \vec{v}\cdot\vec{w}):

v1=(101)Tv2=(120)Tv3=(111)T\vec{v}_{1} = \begin{pmatrix}1&0&1\end{pmatrix}^{\mathsf{T}}\qquad \vec{v}_{2} = \begin{pmatrix}1&-2&0\end{pmatrix}^{\mathsf{T}}\qquad \vec{v}_{3} = \begin{pmatrix}1&-1&1\end{pmatrix}^{\mathsf{T}}Applying the Gram-Schmidt process:

w1=v1=(101)T,e1=(12012)Tw2=v2w1,v2w1,w1w1=(12212)T,e2=(2622326)Tw3=v3w2,v3w2,w2w2w1,v3w1,w1w1=(291929)T,e3=(231323)T\vec{w}_{1} = \vec{v}_{1} = \begin{pmatrix}1 & 0 & 1\end{pmatrix}^{\mathsf{T}}, \qquad \vec{e}_{1} = \begin{pmatrix} \frac{1}{\sqrt{2}}& 0 & \frac{1}{\sqrt{2}}\end{pmatrix}^{\mathsf{T}}\\ \vec{w}_{2} = \vec{v}_{2} - \frac{\left\lang\vec{w}_{1},\vec{v}_{2}\right\rang}{\left\lang\vec{w}_{1},\vec{w}_{1}\right\rang}\cdot\vec{w}_{1} =\begin{pmatrix}\frac{1}{2} & - 2 & - \frac{1}{2} \end{pmatrix}^{\mathsf{T}}, \qquad \vec{e}_{2} = \begin{pmatrix}\frac{\sqrt{2}}{6}&-\frac{2\sqrt{2}}{3}&-\frac{\sqrt{2}}{6}\end{pmatrix}^{\mathsf{T}}\\ \vec{w}_{3} = \vec{v}_{3} - \frac{\left\lang\vec{w}_{2},\vec{v}_{3}\right\rang}{\left\lang\vec{w}_{2},\vec{w}_{2}\right\rang}\cdot\vec{w}_{2} - \frac{\left\lang\vec{w}_{1},\vec{v}_{3}\right\rang}{\left\lang\vec{w}_{1},\vec{w}_{1}\right\rang}\cdot\vec{w}_{1} = \begin{pmatrix}-\frac{2}{9}& -\frac{1}{9}& \frac{2}{9}\end{pmatrix}^{\mathsf{T}},\qquad \vec{e}_{3} = \begin{pmatrix}-\frac{2}{3}&-\frac{1}{3}&\frac{2}{3}\end{pmatrix}^{\mathsf{T}}By calculating the dot products, you can check that {e1,e2,e3}\{\vec{e}_{1},\vec{e}_{2},\vec{e}_{3}\} is indeed an orthonormal basis of R3\mathbb{R}^{3}.

Features of an orthonormal basis

The main feature of choice of an orthonormal basis in VV is that it turns VV into a Rn\mathbb{R}^{n} (here n=dim(V)n = \dim(V)) and ,\lang\cdot,\cdot\rang into a standard dot product. Let see how it works.

If {e1,e2,,en}\{\vec{e}_{1},\vec{e}_{2},\dots,\vec{e}_{n}\} is an arbitrary basis of VV then you can expressthe inner product of vectors a=a1e1+a2e2++anen\vec{a} = a_{1}\vec{e}_{1} + a_{2}\vec{e}_{2} + \dots + a_{n}\vec{e}_{n} and b=b1e1+b2e2++bnen\vec{b} = b_{1}\vec{e}_{1} + b_{2}\vec{e}_{2} + \dots + b_{n}\vec{e}_{n} only with the help of all possible products of form ei,ej\left\lang\vec{e}_{i},\vec{e}_{j}\right\rang where iji\le j:

a,b=a1b1e1,e1+a2b2e2,e2++a1b1en,eni=j++(a1b2+a2b1)e1,e2+(a1b3+a3b1)e1,e3++(an1bn+anbn1)en1,eni<j\left\lang\vec{a},\vec{b}\right\rang = \underbrace{a_{1}b_{1}\left\lang\vec{e}_{1},\vec{e}_{1}\right\rang + a_{2}b_{2}\left\lang\vec{e}_{2},\vec{e}_{2}\right\rang + \dots + a_{1}b_{1}\left\lang\vec{e}_{n},\vec{e}_{n}\right\rang}_{i = j} + \\ + \underbrace{(a_{1}b_{2} + a_{2}b_{1})\left\lang\vec{e}_{1},\vec{e}_{2}\right\rang + (a_{1}b_{3} + a_{3}b_{1})\left\lang\vec{e}_{1},\vec{e}_{3}\right\rang + \dots + (a_{n-1}b_{n} + a_{n}b_{n -1})\left\lang\vec{e}_{n-1},\vec{e}_{n}\right\rang}_{i<j}This is one monstrous expression. The problem with it is that starting with n=4n = 4 the number of terms with i<ji<j is greater than with i=ji = j and it grows in quadratic manner with the increase of nn. That means that computations of inner products in an arbitrary basis are much harder than in Rn\mathbb{R}^{n} with standard dot product, in which the number of terms is just nn.

But notice that if {e1,e2,,en}\{\vec{e}_{1},\vec{e}_{2},\dots,\vec{e}_{n}\} is an orthonormal basis, this problem vanishes! Why? Because all the terms with i<ji < j in this enormous formula will be equal to zero. Moreover, all the other terms will be equal to 11! Which results in a very elegant formula

a,b=a1b1+a2b2++anbn\left\langle\vec{a},\vec{b}\right\rangle = a_{1}b_{1} + a_{2}b_{2} + \dots +a_{n}b_{n}Not only this formula is way easier, but it is also very familiar! It is just a dot product of vectors (a1a2an)TRn\begin{pmatrix}a_{1} & a_{2} & \dots & a_{n}\end{pmatrix}^{\mathsf{T}}\in\mathbb{R}^{n} and (b1b2bn)TRn\begin{pmatrix} b_{1} & b_{2} & \dots & b_{n} \end{pmatrix}^{\mathsf{T}}\in\mathbb{R}^{n}. Take into account that writing down

(a1a2an)+(b1b2bn)=(a1+b1a2+b2an+bn);λ(a1a2an)=(λa1λa2λan)\begin{pmatrix} a_{1}\\ a_{2}\\ \vdots\\ a_{n} \end{pmatrix} + \begin{pmatrix} b_{1}\\ b_{2}\\ \vdots\\ b_{n} \end{pmatrix} = \begin{pmatrix} a_{1} + b _{1}\\ a_{2} + b_{2}\\ \vdots\\ a_{n} + b_{n} \end{pmatrix}; \qquad \lambda \begin{pmatrix} a_{1}\\ a_{2}\\ \vdots\\ a_{n} \end{pmatrix} = \begin{pmatrix} \lambda a_{1}\\ \lambda a_{2}\\ \vdots\\ \lambda a_{n} \end{pmatrix} is just another way of writing(a1e1+a2e2+anen)+(b1e1+b2e2+bnen)=((a1+b1)e1+(a2+b2)e2+(an+bn)en)λ(a1e1+a2e2+anen)=λa1e1+λa2e2+λanen(a_{1}\vec{e}_{1} + a_{2}\vec{e}_{2} + \dots a_{n}\vec{e}_{n}) + (b_{1}\vec{e}_{1} + b_{2}\vec{e}_{2} + \dots b_{n}\vec{e}_{n}) = ((a_{1}+b_{1})\vec{e}_{1} + (a_{2} + b_{2})\vec{e}_{2} + \dots (a_{n} + b_{n})\vec{e}_{n})\\ \lambda (a_{1}\vec{e}_{1} + a_{2}\vec{e}_{2} + \dots a_{n}\vec{e}_{n}) = \lambda a_{1}\vec{e}_{1} + \lambda a_{2}\vec{e}_{2} + \dots \lambda a_{n}\vec{e}_{n}You can conclude, that writing down vectors in an orthogonal basis of a Euclidean space VV is basically the same as working with Rn\mathbb{R}^{n} with a standard dot product!

There is one more way to express this property. Let’s x\vec{x} be a vector in a Euclidean space (V,,)(V,\lang\cdot,\cdot\rang). Let {e1,,en}\{\vec{e}_{1},\dots,\vec{e}_{n}\} be an orthonormal basis of this space and x=x1e1++xnen\vec{x} = x_{1}\vec{e}_{1} + \dots + x_{n}\vec{e}_{n}. Now let’s calculate the following sum

i=1nx,eiei=x,e1e1++x,enen==x1e1++xnen,e1e1++x1e1++xnen,enen\sum_{i = 1}^{n}\lang\vec{x},\vec{e}_{i}\rang\cdot\vec{e}_{i} = \lang\vec{x},\vec{e}_{1}\rang\cdot\vec{e}_{1} + \dots + \lang\vec{x},\vec{e}_{n}\rang\cdot\vec{e}_{n} = \\ = \lang x_{1}\vec{e}_{1} + \dots + x_{n}\vec{e}_{n},\vec{e}_{1}\rang\cdot\vec{e}_{1} + \dots + \lang x_{1}\vec{e}_{1} + \dots + x_{n}\vec{e}_{n},\vec{e}_{n}\rang\cdot\vec{e}_{n}If you throw out all the inner products that are equal to 00 you will end up with

x,e1e1++x,enen=x1e1,e1e1++xnen,enen=x1e1++xnen\lang\vec{x},\vec{e}_{1}\rang\cdot\vec{e}_{1} + \dots + \lang\vec{x},\vec{e}_{n}\rang\cdot\vec{e}_{n} = x_{1}\lang\vec{e}_{1},\vec{e}_{1}\rang \cdot\vec{e}_{1} + \dots + x_{n}\lang\vec{e}_{n},\vec{e}_{n}\rang\cdot\vec{e}_{n} = x_{1}\vec{e}_{1} + \dots + x_{n}\vec{e}_{n}This literally means that i=1nx,eiei=x,e1e1++x,enen=x\sum_{i = 1}^{n}\lang\vec{x},\vec{e}_{i}\rang\cdot\vec{e}_{i} = \lang\vec{x},\vec{e}_{1}\rang\cdot\vec{e}_{1} + \dots + \lang\vec{x},\vec{e}_{n}\rang\cdot\vec{e}_{n} = \vec{x}Therefore the coordinates xi\vec{x}_{i} are just the projections of x\vec{x} onto ei\vec{e}_{i}.

Conclusion

Let (V,,)\left(V,\lang\cdot,\cdot\rang\right) be a Euclidean space and dim(V)=n\dim(V) = n.

  • A normalization of a vector v\vec{v} is a unit vector 1vv\frac{1}{\|\vec{v}\|}\cdot \vec{v}.
  • A basis {e1,e2,,en}\{\vec{e}_{1},\vec{e}_{2},\dots,\vec{e}_{n}\} is orthogonal if ei,ej=0\lang\vec{e}_{i},\vec{e}_{j}\rang = 0 for all iji\ne j.
  • A basis {e1,e2,,en}\{\vec{e}_{1},\vec{e}_{2},\dots,\vec{e}_{n}\} is orthonormal if ei,ej=δi,j\lang\vec{e}_{i},\vec{e}_{j}\rang = \delta_{i,j}

  • If you write the vectors of VV in an orthogonal basis, then VV could be thought as Rn\mathbb{R}^{n} and ,\lang\cdot,\cdot\rang as the dot product.

  • Knowing some base of a Euclidean space you can always obtain an orthonormal basis of this space by the Dram-Schmidt process.

  • The following identity holds: x=x,e1e1+x,enen\vec{x} = \lang\vec{x},\vec{e}_{1}\rang\cdot\vec{e}_{1} + \dots \lang\vec{x},\vec{e}_{n}\rang\cdot\vec{e}_{n}.

3 learners liked this piece of theory. 0 didn't like it. What about you?
Report a typo