MathComputational mathOptimization

Least squares method

Provided by: Edvancium

7 minutes read

In one of the previous topics we have talked about approximants (finding a function that in a way best describes the dependency of a certain dataset - pairs of dots $(x,y)$ ) and their all-round usage. This time we’ll discuss a method for finding the best coefficients for the chosen functions given the data. After finishing this lesson you will be well familiar with that interesting and quite useful method.

Creating a model for the least squares method

As we’ve discussed before, sometimes we would like to find out the relation between the set of dots $x_i, y_i$ , by picking the type of the dependency, the class of function, and finding the best function from that class for our data. Let us recall that the approximation problem consists in finding dependencies of the data in question and making such functions that can help us with predicting behavior of certain processes. Suppose we have the following data:

$y$	$6$	$2$	$1$	$3$	$5.5$
$x$	$1$	$2$	$3$	$4$	$5$

As we have already told you, in order to choose the class of functions where you might find the approximant, it would be reasonable to visualize the data. Here is how our data looks like on the plane:

The set of dots on the plane

Let’s pick the quadratic function $y = ax^2 + bx + c$ as the approximant. As you could already understand, in the future we will use the cost function, which is the sum of squares of deviations of the values $F = \sum_{i = 1}^n(f(x_i) - y_i)^2$ , as the evaluation score for the approximant. This is why this method is called the least squares method (Because we are trying to find a function that will result in the minimal sum and every term is a square of difference. So we are looking for a function that will give us 'the least squares').

If you have guessed that the lower the value of the cost function $F = \sum_{i=1}^n e_i^2$ is, the better we approximate the data, you’re right.

How to find the minimum of the cost function of accuracy

In general, the cost function will take the following form: $F = \sum_{i=1}^n (f(x_i) - y_i)^2$ . In order to find its minimum, we are going to use the theory of functions of multiple variables, in particular the interior extremum theorem. It means that in order to find our coefficients, we are going to null partial derivatives of the cost function:

$\begin{cases} \frac{\partial F}{\partial a_1} = 0\\ ... \\ \frac{\partial F}{\partial a_n} = 0 \end{cases}$

After solving the system we will find the coefficients. If you analyze the system, it will become clear that you shouldn’t choose the class of function that will create more coefficients than the number of data pairs. If you did that, the resulting system would have an infinite number of solutions, which implies that loads of functions would pass through our dataset.

To give you an example: say we have only two dots that we want to approximate. The simplest solution would be to 'connect' them - make a line that does just that. And if we were to look for a quadratic function, then we could make various quadratic functions passing through these two dota (for instance, we could make the parabola vertex be to the left of both dots, to the right of them or in the middle of them - that's already 3 different options).

The loss of accuracy in this case is mainly related to the possibility of our function to heavily distort between the dots and outside of the limits. Therefore we need to remember our initial task - learning to make predictions based on the given data, and not to build a function that goes through all the given dots! That could cause us to lose the dependency, even though the cost function will equal zero. In addition, it would increase the size of the equation system, which would mean more useless calculations.

This way we can find the best coefficients for the chosen class of functions and the given data.

An example of finding the approximant using the least squares method

Let's go over the method step by step using an example. Consider the following dataset:

$y$	$1$	$2$	$8$	$14$	$33$
$x$	$0$	$1$	$2$	$3$	$4$

As we have discussed before, to choose the class of functions, we need to visualize our data first:

More points on the plane

Assuming that we are looking for an exponential function, it will be as follows:

$f(x) = a \cdot 2 ^x+c$

Construct the accuracy cost function:

$F = \sum_{i=1}^n(a\cdot2^{x_i} + c - y_i)^2$

Make the equation system to find its minimum:

$\begin{cases} \frac{\partial F}{\partial a} = \sum_{i = 1}^n (2^{x_i\cdot2 + 1} \cdot a + 2^{x_i + 1} \cdot c - 2^{x_i + 1} \cdot y_i)= 0\\ \frac{\partial F}{\partial a_n} = \sum_{i = 1}^n (2^{x_i + 1} \cdot a + 2\cdot c - 2\cdot y_i ) = 0 \end{cases}$

After finding partial derivatives, solve the system using one of the methods of solving equation systems and receive the following result:

$a = 1.76, c = 1.45$

Out of curiosity, we could evaluate how much our function differs from the dataset:

$F(a, c) = 18.2$

Considering the distance between the dots, we could say that the function fits quite well. Let’s visualize the result
Assuming that we are looking for an exponential function

In the graph below, the deviations from the constructed function in the form of squares are drawn in green. In our case, we have found such a function that the area of these squares is minimal.

The deviations from the constructed function in the form of squares are drawn in green

Conclusion

In this topic we have examined a fascinating method: the least squares method, and learned one of the important approaches to finding the approximants. It will help you to understand not only the method itself but also certain intricacies, such as making a cost function and its minimization. This, in turn, will help you to solve the given problem and study the science of optimization. Nowadays those tools are widely used in areas such as computer vision, machine learning, and much more.

9 learners liked this piece of theory. 8 didn't like it. What about you?

Report a typo

Least squares method

Creating a model for the least squares method

How to find the minimum of the cost function of accuracy

An example of finding the approximant using the least squares method

Conclusion

Related topics