MathAnalysisCalculusMultivariable calculus

Vector derivatives

11 minutes read

Because you've already worked with curves, you're ready to expand your understanding of limits and derivatives to vector functions. In this topic, you'll explore these concepts and discover they are incredibly simple to calculate.

Actually, you don't need to master new derivation techniques; you just need to use what you already learned. In this section, you'll also become familiar with new properties and start applying them immediately.

Limits and continuity

A vector function in Rn\mathbb{R}^n is a function from R\mathbb{R} into Rn\mathbb{R}^n. To make things simpler, you'll be working with n=3n=3, but keep in mind that the results can apply to greater dimensions. The component functions of a vector function c\mathbf{c} are represented by:

c(t)=[u(t)v(t)w(t)]=(u(t),v(t),w(t))TtR\mathbf{c}(t) = \begin{bmatrix} u(t) \\ v(t) \\ w(t) \end{bmatrix}=(u(t), v(t), w(t))^T \qquad t \in \mathbb{R}

Now, let's apply the basic principles of calculus to vector functions. The limit of c\mathbf{c} at a time aa is very similar your past understanding. It's the vector that c\mathbf{c} gets closer to as tt approaches aa, if it exists. In particular, c\mathbf{c} is said to be continuous at aa if limtac(t)=c(a)\lim_{t \rightarrow a} \mathbf{c}(t) = \mathbf{c}(a).

The rigorous definition (optional)

If you are familiar with the rigorous definition of limit for real-valued functions, you might have noticed that the key concept is distance. l\mathbf{l} is the limit if the difference between it and c(t)\mathbf{c}(t) can be minimized by bringing tt closer to aa. Since you know how to calculate the distance between vectors, the definition should be straightforward:

l\mathbf{l} is the limit of c\mathbf{c} at aa if for any ε>0\varepsilon >0 there is a δ>0\delta >0 such that 0<at<δ0 <|a-t| < \delta assures that lc(t)<ε\lVert \mathbf{l} - \mathbf{c}(t) \rVert < \varepsilon.

A critical feature of limits is that they can be calculated for each entry.

To be more specific, c\mathbf{c} has a limit at aa if and only if each of its components has a limit at aa; in any case, the limit can be calculated entry-wise:

limtac(t)=[limtau(t)limtav(t)limtaw(t)]\lim_{t \rightarrow a} \mathbf{c}(t) = \begin{bmatrix} \lim_{t \rightarrow a} u(t) \\ \lim_{t \rightarrow a} v(t) \\ \lim_{t \rightarrow a}w(t) \end{bmatrix}

Hopefully, this relaxes you. You don't have to learn new techniques to calculate the limits of vector functions. Let's take the example of a function given by:

c(t)=[3 tt2et]\mathbf{c}(t) = \begin{bmatrix} 3 \ t \\ t^2 \\ e^t \end{bmatrix}

Since the component functions are continuous, it turns out that c\mathbf{c} is also continuous. As an easy example, its limit at 00 is:

limt0c(t)=[limt03 tlimt0t2limt0et]=[3(0)02e0]=[001]=c(0)\lim_{t \rightarrow 0} \mathbf{c}(t) = \begin{bmatrix} \lim_{t \rightarrow 0} 3 \ t \\ \lim_{t \rightarrow 0} t^2 \\ \lim_{t \rightarrow 0}e^t \end{bmatrix} = \begin{bmatrix} 3 (0) \\ 0^2 \\ e^0 \end{bmatrix} = \begin{bmatrix}0\\ 0 \\ 1 \end{bmatrix} = \mathbf{c}(0)

Derivatives

Vector functions represent curves in space. Similar to real-valued functions, it's intuitive to try to understand the tangent line at any point on the curve. Luckily, everything works smoothly and the derivative's definition remains almost unaltered:

c(a)=limh0c(a+h)c(a)h\mathbf{c}'(a) = \lim_{h \rightarrow 0} \frac{\mathbf{c}(a+h) - \mathbf{c}(a)}{h}

How can you interpret this geometrically? You aim to find the tangent line at c(a)\mathbf{c}(a). At time a+ha+h, the curve is at c(a+h)\mathbf{c}(a+h). So, c(a+h)c(a)\mathbf{c}(a+h) - \mathbf{c}(a) is a secant vector to the curve:

A secant vector

It seems that the scaled vector c(a+h)c(a)h\frac{\mathbf{c}(a+h) - \mathbf{c}(a)}{h} approaches the tangent vector as h0h \rightarrow 0. Consequently, when the limit exists, c(a)\mathbf{c}'(a) is known as the tangent vector to the curve traced by c\mathbf{c} at c(a)\mathbf{c}(a). In physics, the tangent vector is understood as the velocity vector and its length c(a)| \mathbf{c}'(a) | as the speed. Similarly, you can interpret c(a)\mathbf{c}''(a), the derivative of c(a)\mathbf{c}'(a), as the acceleration at that time.

The tangent vector

The tangent line to the curve defined by c\mathbf{c} at c(a)\mathbf{c}(a) is the line parallel to c(a)\mathbf{c}'(a) and it passes through c(a)\mathbf{c}(a).

That was pretty simple, right? But there's even more, Since you can calculate the limits entry-wise, computing the derivative becomes straightforward:

c(a)=limh0c(a+h)c(a)h=[limh0u(a+h)u(a)hlimh0v(a+h)v(a)hlimh0w(a+h)w(a)h]=[u(a)v(a)w(a)]\mathbf{c}'(a) = \lim_{h \rightarrow 0} \frac{\mathbf{c}(a+h) - \mathbf{c}(a)}{h} = \begin{bmatrix} \lim_{h \rightarrow 0} \frac{u(a+h) -u(a)}{h} \\ \lim_{h \rightarrow 0} \frac{v(a+h) - v(a)}{h} \\ \lim_{h \rightarrow 0} \frac{w(a+h) - w(a)}{h} \end{bmatrix} = \begin{bmatrix} u'(a) \\ v'(a) \\ w'(a) \end{bmatrix}

In simple terms, derive each component and you're done! Let's examine one of the most well-known curves, the helix. Its parametric equations are:

x=costy=sintz=ttRx= \cos t \qquad y= \sin t \qquad z=t \qquad t \in \mathbb{R}

Set aside the last coordinate for a moment. The first two are the parametric equations of the unit circle you are familiar with. Then, as time passes, these coordinates just rotate around this circle in a counterclockwise direction. Now let's return to the last coordinate. It's simple, as it's identical to the time tt. Because this coordinate indicates height in space, its parametric equation indicates that the curve increases its height uniformly while rotating on the unit circle:

The helix

As the component functions of the helix are differentiable at any point, the derivative of the curve at any time is simply:

c(t)=[(cost)(sint)(t)]=[sintcost1]\mathbf{c}'(t) = \begin{bmatrix} (\cos t)' \\ (\sin t)' \\(t)'\end{bmatrix} = \begin{bmatrix} - \sin t \\ \cos t \\1 \end{bmatrix}

So, at time t=π/2t = \pi/2 the curve is at point

c(π/2)=[cosπ/2sinπ/2π/2]=[0 1π/2]\mathbf{c}(\pi/2)= \begin{bmatrix} \cos \pi/2 \\\sin \pi/2 \\ \pi/2 \end{bmatrix} = \begin{bmatrix} 0 \\\ 1 \\ \pi/2 \end{bmatrix}

and its tangent vector is

c(π/2)=[sinπ/2cosπ/21]=[101]\mathbf{c}'(\pi/2)= \begin{bmatrix}- \sin \pi/2 \\ \cos \pi/2 \\ 1 \end{bmatrix} = \begin{bmatrix}-1 \\ 0 \\1 \end{bmatrix}

Properties

The vector derivative seems to behave much like the ordinary derivative. This happens most of the time and it also inherits all the desirable properties that allow you to manipulate it with ease.

Take two vector functions c\mathbf{c} and d\mathbf{d}, a real-valued function ff and a number tt. Then, the basic properties hold:

  • (c+d)(t)=c(t)+d(t)\left( \mathbf{c} + \mathbf{d} \right)'(t) = \mathbf{c}'(t) + \mathbf{d}'(t)

  • (ac)(t)=a c(t)\left( a \mathbf{c}\right)'(t) = a \ \mathbf{c}'(t)

  • (fc)(t)=f(t)c(t)+f(t)c(t)\left( f\, \mathbf{c} \right)'(t) = f'(t) \mathbf{c}(t) + f(t) \mathbf{c}'(t)

  • (cd)(t)=c(t)d(t)+c(t)d(t)\left( \mathbf{c} \cdot \mathbf{d} \right)'(t) = \mathbf{c}'(t) \cdot \mathbf{d}(t) +\mathbf{c}(t) \cdot \mathbf{d}'(t)

  • (cf)(t)=f(t) c(f(t))\left( \mathbf{c} \circ f \right)'(t) = f'(t) \ \mathbf{c}'( f(t))

Of course the last property is the chain rule. As an example of the fourth property consider the following functions:

c(t)=[sintcostt]d(t)=[tcostsint]\mathbf{c}(t) = \begin{bmatrix} \sin t \\ \cos t \\ t\end{bmatrix} \qquad \mathbf{d}(t) = \begin{bmatrix} t \\ \cos t \\ \sin t \end{bmatrix}

Since their derivatives are

c(t)=[costsint1]d(t)=[1sintcost]\mathbf{c}'(t) = \begin{bmatrix} \cos t \\ - \sin t \\ 1 \end{bmatrix} \qquad \mathbf{d}'(t) = \begin{bmatrix} 1 \\- \sin t \\ \cos t \end{bmatrix}

you get that:

(cd)(t)=c(t)d(t)+c(t)d(t)=tcostsintcost+sint+sintsintcost+tcost=2(sint+tcostsintcost)\begin{align*} \left( \mathbf{c} \cdot \mathbf{d} \right)'(t) &= \mathbf{c}'(t) \cdot \mathbf{d}(t) +\mathbf{c}(t) \cdot \mathbf{d}'(t) \\ &=t \cos t - \sin t \cos t + \sin t + \sin t - \sin t \cos t + t \cos t \\ &= 2\left( \sin t + t \cos t- \sin t \cos t \right) \end{align*}

A curious fact of vectorial functions is that when they lie in a sphere, its derivative it's perpendicular to it. In other words, if c(t)|\mathbf{c}(t)| is constant for any tt, then c(t)c(t)=0\mathbf{c}(t) \cdot \mathbf{c}'(t) =0. Let's discover why using the properties of the derivative!

A curve with constant length

First, note that as c(t)\lVert \mathbf{c}(t) \rVert is constant, is also true that c(t)2\lVert \mathbf{c}(t) \rVert^2 is constant for any time. But c(t)2=c(t)c(t)\lVert \mathbf{c}(t) \rVert^2 = \mathbf{c}(t) \cdot \mathbf{c}(t), so the real-valued function cc\mathbf{c}\cdot \mathbf{c} must be constant, and this implies that its derivative is 00:

(cc)(t)=0(\mathbf{c} \cdot \mathbf{c})'(t) =0

On the other hand, the forth property says that:

(cc)(t)=c(t)c(t) + c(t)c(t)=2c(t)c(t)(\mathbf{c} \cdot \mathbf{c})'(t) = \mathbf{c}'(t) \cdot \mathbf{c}(t) \ + \ \mathbf{c}(t) \cdot \mathbf{c}'(t) = 2 \mathbf{c}(t) \cdot \mathbf{c}'(t)

Connecting both parts you get that:

0=2 c(t)c(t)0=2 \ \mathbf{c}(t) \cdot \mathbf{c}'(t)

And finally c(t)c(t)=0\mathbf{c}(t) \cdot \mathbf{c}'(t) =0.

Applications in machine learning

Vector derivatives look a bit abstract at the beginning. Apart from geometry and physics they don't seem to have other applications, but don't judge a book by its cover. When you're specifically looking at the application of derivatives of curves, or functions from time to space, in machine learning, you're often dealing with concepts that involve the evolution of certain quantities over time and how they change. Here are some nice applications in machine learning that relate to the derivative of such curves:

  • Time-Series Analysis: In machine learning, time-series data is a sequence of data points indexed in time order. Derivatives can be used to analyze the rate of change of features over time, which can be crucial for forecasting and understanding trends and seasonality. For example, the first derivative can tell you about the trend, while the second derivative can inform you about the concavity of the time-series curve, indicating acceleration or deceleration in the trend.

  • Trajectory Optimization: In robotics and autonomous systems, machine learning models often need to predict and optimize trajectories over time. The derivative of a trajectory with respect to time gives velocity, and the second derivative gives acceleration. These derivatives are essential for planning and control algorithms that ensure smooth and efficient movements of robots or autonomous vehicles.

  • Signal Processing: Derivatives of signals with respect to time, such as in audio or other sensor data, can be used to extract features that are fed into machine learning models for classification, clustering, or anomaly detection tasks. For example, the derivative of an audio waveform can be used to identify the onset of sounds, which is useful in speech recognition and music analysis.

  • Recurrent Neural Networks (RNNs): RNNs are a class of neural networks that are explicitly designed to handle sequence data. When training RNNs, the derivatives of the loss function with respect to the elements of the sequence are computed through time, using a technique called Backpropagation Through Time (BPTT). This involves taking the derivative of curves that represent the loss as a function of network parameters at each time step.

Conclusion

Take two vectorial functions c\mathbf{c} and d\mathbf{d}, a real-valued function ff and a number aa.

  • The limit of a vectorial function can be calculated entry wise: limtac(t)=[limtau(t)limtav(t)limtaw(t)]\lim_{t \rightarrow a} \mathbf{c}(t) = \begin{bmatrix} \lim_{t \rightarrow a} u(t) \\ \lim_{t \rightarrow a} v(t) \\ \lim_{t \rightarrow a}w(t) \end{bmatrix}

  • The derivative of c\mathbf{c} at aa is defined as:c(a)=limh0c(a+h)c(a)h\mathbf{c}'(a) = \lim_{h \rightarrow 0} \frac{\mathbf{c}(a+h) - \mathbf{c}(a)}{h}

  • The derivative is got entry wise:

    c(a)=[u(a)v(a)w(a)]\mathbf{c}'(a) = \begin{bmatrix} u'(a) \\ v'(a) \\ w'(a) \end{bmatrix}

  • (cd)(t)=c(t)d(t)+c(t)d(t)\left( \mathbf{c} \cdot \mathbf{d} \right)'(t) = \mathbf{c}'(t) \cdot \mathbf{d}(t) +\mathbf{c}(t) \cdot \mathbf{d}'(t)

  • (cf)(t)=f(t) c(f(t))\left( \mathbf{c} \circ f \right)'(t) = f'(t) \ \mathbf{c}'( f(t))

How did you like the theory?
Report a typo