In one of the previous topics we have talked about approximants (finding a function that in a way best describes the dependency of a certain dataset - pairs of dots ) and their all-round usage. This time we’ll discuss a method for finding the best coefficients for the chosen functions given the data. After finishing this lesson you will be well familiar with that interesting and quite useful method.
Creating a model for the least squares method
As we’ve discussed before, sometimes we would like to find out the relation between the set of dots , by picking the type of the dependency, the class of function, and finding the best function from that class for our data. Let us recall that the approximation problem consists in finding dependencies of the data in question and making such functions that can help us with predicting behavior of certain processes. Suppose we have the following data:
As we have already told you, in order to choose the class of functions where you might find the approximant, it would be reasonable to visualize the data. Here is how our data looks like on the plane:
Let’s pick the quadratic function as the approximant. As you could already understand, in the future we will use the cost function, which is the sum of squares of deviations of the values , as the evaluation score for the approximant. This is why this method is called the least squares method (Because we are trying to find a function that will result in the minimal sum and every term is a square of difference. So we are looking for a function that will give us 'the least squares').
If you have guessed that the lower the value of the cost function is, the better we approximate the data, you’re right.
How to find the minimum of the cost function of accuracy
In general, the cost function will take the following form: . In order to find its minimum, we are going to use the theory of functions of multiple variables, in particular the interior extremum theorem. It means that in order to find our coefficients, we are going to null partial derivatives of the cost function:
After solving the system we will find the coefficients. If you analyze the system, it will become clear that you shouldn’t choose the class of function that will create more coefficients than the number of data pairs. If you did that, the resulting system would have an infinite number of solutions, which implies that loads of functions would pass through our dataset.
To give you an example: say we have only two dots that we want to approximate. The simplest solution would be to 'connect' them - make a line that does just that. And if we were to look for a quadratic function, then we could make various quadratic functions passing through these two dota (for instance, we could make the parabola vertex be to the left of both dots, to the right of them or in the middle of them - that's already 3 different options).
The loss of accuracy in this case is mainly related to the possibility of our function to heavily distort between the dots and outside of the limits. Therefore we need to remember our initial task - learning to make predictions based on the given data, and not to build a function that goes through all the given dots! That could cause us to lose the dependency, even though the cost function will equal zero. In addition, it would increase the size of the equation system, which would mean more useless calculations.
This way we can find the best coefficients for the chosen class of functions and the given data.
An example of finding the approximant using the least squares method
Let's go over the method step by step using an example. Consider the following dataset:
As we have discussed before, to choose the class of functions, we need to visualize our data first:
Assuming that we are looking for an exponential function, it will be as follows:
Construct the accuracy cost function:
Make the equation system to find its minimum:
After finding partial derivatives, solve the system using one of the methods of solving equation systems and receive the following result:
Out of curiosity, we could evaluate how much our function differs from the dataset:
Considering the distance between the dots, we could say that the function fits quite well. Let’s visualize the result
In the graph below, the deviations from the constructed function in the form of squares are drawn in green. In our case, we have found such a function that the area of these squares is minimal.
Conclusion
In this topic we have examined a fascinating method: the least squares method, and learned one of the important approaches to finding the approximants. It will help you to understand not only the method itself but also certain intricacies, such as making a cost function and its minimization. This, in turn, will help you to solve the given problem and study the science of optimization. Nowadays those tools are widely used in areas such as computer vision, machine learning, and much more.