11 minutes read

You already know how to add vectors and multiply them by numbers in Rn\mathbb{R}^{n}. However, we haven't discussed yet whether it is possible to multiply vectors by other vectors in Rn\mathbb{R}^{n}. Here you're going to learn one of the ways to construct such a new algebraic operation. It is going to provide you with new advanced geometric tools and shed new light on the concepts of angle and orthogonality. Some may argue that this sounds overly abstract and disconnected from real-life applications, but you will see that the opposite is true.

Motivation

Let’s look at a plane, which is a classic representation of a two-dimensional vector space R2\mathbb{R}^{2}. A basis {e1,e2}\{\vec{e}_{1},\vec{e}_{2}\} of R2\mathbb{R}{^2} is typically chosen in such a way that e1\vec{e}_{1} and e2\vec{e}_{2} are perpendicular to each other, and the length of each of them is 1 (see the picture below). Here, a vector x=(x1x2)T\vec{x} = \begin{pmatrix} x_{1} & x_{2}\end{pmatrix}^{\mathsf{T}} is identified with x1e1+x2e2x_{1}\vec{e}_{1}+x_{2}\vec{e}_{2}.

Basis vectors and other vectors

Now let’s look at another pair of perpendicular vectors a1=(24)T\vec{a}_{1} = \begin{pmatrix} 2 & 4 \end{pmatrix}^{\mathsf{T}} and a2=(63)T\vec{a}_{2} = \begin{pmatrix} - 6 & 3 \end{pmatrix}^{\mathsf{T}}. We will perform a non-obvious procedure, namely, multiply the first coordinates of both vectors and add them with the product of the second coordinates, like this: 2(6)+43=12+12=02\cdot (- 6) + 4\cdot 3 = -12 + 12 = 0. Notice that the sum is zero. If you repeat it with another pair of perpendicular vectors, for example b1=(32)T\vec{b}_{1} = \begin{pmatrix} -3 & -2 \end{pmatrix}^{\mathsf{T}} and b2=(23)T\vec{b}_{2} = \begin{pmatrix} 2 & -3 \end{pmatrix}^{\mathsf{T}}, you again obtain (3)2+(2)(3)=0(−3)⋅2+(−2)⋅(−3)=0. On the contrary, if you take non-perpendicular vectors a1\vec{a}_{1} and b1\vec{b}_{1}, you will get 2(3)+4(2)=1402\cdot(-3) + 4\cdot(-2) = −14 \ne 0. Seems like we stumbled upon a criterion of perpendicularity of vectors! But how are these weird calculations related to the mutual arrangement of vectors? Let’s figure it out!

Definition and properties

Let's start with the formal definition. Define a dot product of two arbitrary columns (vectors) x=(x1x2xn)T\vec{x} = \begin{pmatrix}x_{1} & x_{2} & \dots & x_{n}\end{pmatrix}^{\mathsf{T}} and y=(y1y2yn)T\vec{y} = \begin{pmatrix}y_{1} & y_{2} & \dots & y_{n}\end{pmatrix}^{\mathsf{T}} from Rn\mathbb{R}^{n}as xy=k=1nxkyk=x1y1+x2y2++xkyk++xnyn\vec{x}\cdot\vec{y}=\sum_{k = 1}^{n}x_{k}y_{k}=x_{1}y_{1} + x_{2}y_{2} + \dots +x_{k}y_{k}+ \dots + x_{n}y_{n}Thereby, given a pair of columns, the dot product returns a real number (scalar), which is the sum of their pairwise multiplied coordinates. This is why it is sometimes referred to as a scalar product. Despite not producing an element of the same nature as the elements in the input, there are a lot of similarities with the multiplication of numbers. For example, the dot product is

  • Commutative: xy=yx\vec{x}\cdot\vec{y} = \vec{y}\cdot\vec{x} (This is obvious since xkyk=ykxkx_{k}y_{k} = y_{k}x_{k}.)
  • Distributive x(y+z)=xy+xz\vec{x}\cdot(\vec{y}+\vec{z})=\vec{x}\cdot\vec{y} + \vec{x}\cdot\vec{z}(As xk(yk+zk)=xkyk+xkzkx_{k}(y_{k}+z_{k})=x_{k}y_{k}+x_{k}z_{k}).

However, there are, of course, some differences. For example, the notion of associative property doesn't apply to the dot product. The expression (v1v2)v3(\vec{v}_{1}\cdot\vec{v}_{2})\cdot\vec{v}_{3} is not a dot product of three vectors, it is a product of a scalar v1v2\vec{v}_{1}\cdot\vec{v}_{2} and a vector v3\vec{v}_{3}, which is a completely different operation.

The key idea of a vector space is that vectors interact with vectors through addition and with scalars by multiplication. The bilinear property describes the most general way of combining the dot product with these basic Rn\mathbb{R}^{n} operations

  • (α1x1+α2x2)(β1y1+β2y2)=α1β1x1y1+α1β2x1y2+α2β1x2y1+α2β2x2y2\left(\alpha_{1}\vec{x}_{1} + \alpha_{2}\vec{x}_{2}\right)\cdot \left(\beta_{1}\vec{y}_{1} + \beta_{2}\vec{y}_{2}\right) = \alpha_{1}\beta_{1}\vec{x}_{1}\cdot\vec{y}_{1} + \alpha_{1}\beta_{2}\vec{x}_{1}\cdot\vec{y}_{2} + \alpha_{2}\beta_{1}\vec{x}_{2}\cdot\vec{y}_{1} + \alpha_{2}\beta_{2}\vec{x}_{2}\cdot\vec{y}_{2}

In particular, (αx)(βy)=αβxy(\alpha\vec{x})⋅(\beta\vec{y}) = \alpha\beta\vec{x}\cdot\vec{y} (here Greek letters denote scalars). The bilinear property follows directly from the definition of the dot product.

Bilinearity is called this way because both multiples of the dot product behave linearly with respect to addition and multiplication by numbers. It makes calculations with vectors almost identical to analogous with numbers. Moreover, sometimes it allows proceeding calculations without any information about columns themselves. For example, if xx=4\vec{x}\cdot\vec{x} = 4 and xy=1\vec{x}\cdot\vec{y} = - 1, then 3x(14x2y)=34xx6xy=3446(1)=3+6=93\vec{x}\cdot\left(\frac{1}{4}\vec{x} -2\vec{y}\right)=\frac{3}{4}\vec{x}\cdot\vec{x} - 6\vec{x}\cdot\vec{y}=\frac{3}{4}\cdot 4 - 6\cdot(−1)=3+6=9

Be careful when calculating, despite all the similarities between the dot product and the usual product of numbers, these are still very different operations in nature. For example, as mentioned above, it is important to understand very well in what order dot products are calculated in the case of their repeated use in one monomial expression since associativity for them is meaningless in the usual sense.

Geometric interpretation (plane)

We described a lot of algebraic properties of the dot product, showing that one can multiply vectors very similar to numbers or standard expressions with variables. When we motivated the introduction of the dot product at the beginning of this topic, we concentrated on its geometric meaning. Let’s find the connection of dot product with geometry. This match of algebraic and geometric properties will allow us to combine these two dominant areas of mathematics, giving us the ability to translate geometric concepts to the algebraic language and back around.

First, let's go back to the plane example.

Angle between vectors

Consider two arbitrary vectors a=(a1a2)T\vec{a} = \begin{pmatrix} a_{1} & a_{2} \end{pmatrix}^{\mathsf{T}} and b=(b1b2)T\vec{b} = \begin{pmatrix} b_{1} & b_{2}\end{pmatrix}^{\mathsf{T}} (see the picture). First of all, note that according to the Pythagorean theorem, the lengths of a\vec{a} and b\vec{b} (as a matter of fact of any vector on a plane), can be calculated using the dot product:a=a12+a22=aa,b=b12+b22=bb\|\vec{a}\| = \sqrt{a_{1}^{2} + a_{2}^{2}} = \sqrt{\vec{a}\cdot\vec{a}}, \quad \|\vec{b}\| = \sqrt{b_{1}^{2} + b_{2}^{2}} = \sqrt{\vec{b}\cdot\vec{b}}Now, let φ\varphi be an angle between a\vec{a} and b\vec{b}. Thus if α\alpha is an angle between a\vec{a} and e1\vec{e}_{1} and β\beta an angle between b\vec{b} and e1\vec{e}_{1}, then φ=βα\varphi = \beta - \alpha. By definition cos(α)=a1a,sin(α)=a2a,cos(β)=b1b,sin(β)=b2b\cos(\alpha) = \frac{a_{1}}{\|\vec{a}\|}, \quad \sin(\alpha) = \frac{a_{2}}{\|\vec{a}\|},\quad \cos(\beta) = \frac{b_{1}}{\|\vec{b}\|},\quad \sin(\beta) = \frac{b_{2}}{\|\vec{b}\|}Ascos(βα)=cos(β)cos(α)+sin(β)sin(α)\cos(\beta - \alpha) = \cos(\beta)\cos(\alpha) + \sin(\beta)\sin(\alpha)we obtaincos(φ)=cos(βα)=b1ba1a+b2ba2a=a1b1+a2b2ab\cos(\varphi) = \cos(\beta - \alpha) = \frac{b_{1}}{\|\vec{b}\|}\cdot\frac{a_{1}}{\|\vec{a}\|} + \frac{b_{2}}{\|\vec{b}\|}\cdot\frac{a_{2}}{\|\vec{a}\|} = \frac{a_{1}b_{1} + a_{2}b_{2}}{\|\vec{a}\|\cdot\|\vec{b}\|}By definition a1b1+a2b2=aba_{1}b_{1} + a_{2}b_{2} = \vec{a}\cdot\vec{b}, which implies

cos(φ)=ababab=abcos(φ)\cos(\varphi) = \frac{\vec{a}\cdot\vec{b}}{\|\vec{a}\|\cdot\|\vec{b}\|} \Longrightarrow \boxed{\vec{a}\cdot\vec{b} = \|\vec{a}\|\cdot\|\vec{b}\|\cdot\cos(\varphi)}This is the most important junction between different points of view on the dot product. Here you saw that the plain calculation of the sum of pairwise products of coordinates of vectors gives the same result as calculating the product of vector lengths together with the cosine of the angle between them. From here we obtain our initial claim.

Two vectors are perpendicular if, and only if, their product is equal to zero.

That follows from the fact that the cosine of a right angle is zero.

Multidimensional generalization

The idea of the dot product being connected with lengths of vectors and angles between them can be used to generalize the R2\mathbb{R}^{2} case. For example, in R3\mathbb{R}^{3} it gives the well-known formulas for Pythagorean theorem generalization. In fact, in the general case, this dot product intuition is the simplest way to define length and angles in Rn\mathbb{R}^{n} for any natural nn. For vectors x=(x1x2xn)T\vec{x} = \begin{pmatrix}x_{1}& x_{2} &\dots &x_{n}\end{pmatrix}^{\mathsf{T}} and y=(y1y2yn)T\vec{y} = \begin{pmatrix}y_{1}& y_{2} &\dots &y_{n}\end{pmatrix}^{\mathsf{T}} we define

  • the length of x\vec{x} as
  • x=xx=x12+x22++xn2\|\vec{x}\| = \sqrt{\vec{x}\cdot\vec{x}} = \sqrt{x_{1}^{2} + x_{2}^{2} + \dots + x_{n}^{2}}
  • the cosine of the angle between x\vec{x} and y\vec{y} as
  • xyxy=x1y1+x2y2++xnynx12+x22++xn2y12+y22++yn2\frac{\vec{x}\cdot\vec{y}}{\|\vec{x}\|\cdot\|\vec{y}\|} = \frac{x_{1}y_{1} + x_{2}y_{2}+\dots+x_{n}y_{n}}{\sqrt{x_{1}^{2}+x_{2}^{2} + \dots + x_{n}^{2}}\cdot\sqrt{y_{1}^{2} + y_{2}^{2} + \dots+y_{n}^{2}}}There is an ambiguity when it comes to calculating the angle itself, as the angle could be defined by cosine up the sight and some number of 360360^{\circ} rotations, however, usually, the angle is defined just as
  • arccos(xyxy)\arccos\left(\frac{\vec{x}\cdot\vec{y}}{\|\vec{x}\|\cdot\|\vec{y}\|}\right)to resolve this issue.

The dot product of a vector x\vec{x} with itself is often called the square of x\vec{x}, and it is usually denoted x2\vec{x}^{2} or x2x^{2}. Therefore, x2=x2\|\vec{x}\|^{2} = \vec{x}^{2}.

We say that vectors x\vec{x} and y\vec{y} are orthogonal (or perpendicular) if xy=0\vec{x}\cdot\vec{y} = 0.

This way to define lengths and angles is very universal. Let us look at an example. R5\mathbb{R}^{5} is a five-dimensional vector space. Unfortunately, our perception is settled in such a way that we can’t imagine it, however now you can calculate an angle between any two five-dimensional vectors, without having an actual picture. The length of vector x=(11101)T\vec{x} = \begin{pmatrix}1&1&-1&0&1\end{pmatrix}^{\mathsf{T}} is

x=1+1+1+0+1=2\|\vec{x}\| = \sqrt{1 + 1+ 1 + 0+1} = 2

The length of y=(1,0,0,0,1)\vec{y} = \begin{pmatrix}1,0,0,0,1\end{pmatrix} is

y=1+0+0+0+1=2\|\vec{y}\| = \sqrt{1 + 0 + 0 +0 + 1} = \sqrt{2}The angle between x\vec{x} and y\vec{y} is

arccos(xyxy)=arccos(1+0+0+0+122)=arccos(12)=45\arccos\left(\frac{\vec{x}\cdot\vec{y}}{\|\vec{x}\|\cdot\|\vec{y}\|}\right) = \arccos\left(\frac{1 + 0 + 0 + 0 + 1}{2\cdot\sqrt{2}}\right) = \arccos\left(\frac{1}{\sqrt{2}}\right) = 45^{\circ}Note, that to find this purely geometric characteristic of vectors disposition not only didn’t we draw any pictures, we just did some simple algebraic calculations.

Lastly, multidimensionality works in another direction also. If we take n=1n = 1 for R1=R\mathbb{R}^{1} = \mathbb{R} dot product is just usual multiplication!

A few words on applications

Here you have the right to say that, of course, it’s wonderful that you can count some geometric things in a high-dimensional space, but isn’t it just a fantasy applicable only to the fiction of Nolan films or reflections at your leisure. And of course, one can use our developments for these purposes. However, dot product has many more mundane applications. It’s used ubiquitously in science, especially in modern physics, not only because according to some theories spacetime is 26- or 10-dimensional, but because usually physicists do not work with the physical space itself, but with the so-called phase space of equations arising in a problem, which can have any dimension, but in which it is vital to be able to do geometric calculations. It is also indispensable in game development to describe rotations, movements, and light intensities.

In machine learning and data analysis, as you can guess data is usually described using lists and arrays, thus equipping it naturally with a structure of vector space. The whole idea of data analysis is to describe geometric patterns in these spaces to make predictions and classify data, without being able to find lengths and directions of vectors it would be impossible, that is why you will be inevitably calculating dot products.

The philosophy that it is useful to take out from here is that the operation we have constructed, and its generalizations, allow us to endow some not necessarily geometric objects with a geometric structure and calculate distances and angles for them with the same immediacy with which we did it in a checkered notebook in the elementary school.

Conclusion

  • The dot product is an operation that matches a real number to any pair of vectors of Rn\mathbb{R}^{n}.
  • The dot product of x=(x1,x2,,xn)T\vec{x} = (x_{1},x_{2},\dots,x_{n})^{\mathsf{T}} and y=(y1,y2,,yn)T\vec{y} = (y_{1},y_{2},\dots, y_{n})^{\mathsf{T}} is defined as
  • xy=x1y1+x2y2++xnyn\vec{x}\cdot\vec{y} = x_{1}y_{1} + x_{2}y_{2} + \dots + x_{n}y_{n}
  • Dot product allows one to define the lengths of vectors and angles between them in a vector space. Namely v=(vv)\left\|\vec{v}\right\| = \sqrt{(\vec{v}\cdot\vec{v})} and arccos(vwvw)\arccos\left(\frac{\vec{v}\cdot\vec{w}}{\left\|\vec{v}\right\|\cdot\left\|\vec{w}\right\|}\right) is a value of an angle between v\vec{v} and w\vec{w}.
  • For any two vectors v\vec{v} and w\vec{w}, such that the angle between them is φ\varphi
  • vw=vwcos(φ)\vec{v}\cdot\vec{w} = \|\vec{v}\|\cdot\|\vec{w}\|\cdot\cos(\varphi)
  • Two vectors v1\vec{v}_{1} and v2\vec{v}_{2}, such that v1v2=0\vec{v}_{1}\cdot\vec{v}_{2} = \vec{0}, are called orthogonal or perpendicular.
16 learners liked this piece of theory. 0 didn't like it. What about you?
Report a typo