Computer scienceData scienceMachine learningEnsemble learning

Ensemble methods

12 minutes read

Can you guess how many jelly beans are in this jar?

A glass jar filled with colorful jelly beans

Finance professor Jack Treynor conducted a similar experiment in his class with a jar containing 850 beans. Individually, most students' guesses were far from the correct answer. However, the average of all the students' guesses was 871, which is remarkably accurate.

This example illustrates the concept of ensembles in machine learning, where predictions obtained from several models are combined to achieve better performance on a given task.

In this topic, you will learn more about ensembles and various ensembling techniques.

What are ensembles

Ensemble learning refers to training several ML models and combining their predictions in a way that yields more accurate results than any individual model would separately.

Ensembles are often employed to solve classification and regression tasks, but can also be used to tackle unsupervised problems.

Component models in an ensemble are commonly referred to as base learners. Technically, ensembling can be applied to any ML model, and you can even combine different models within a single ensemble. However, decision trees are most often used as base learners. One of the main reasons for this is that they are fast to build, so training many such models isn't too time-consuming.

But how and why does ensembling work? Let's consider the three main approaches to ensembling – bagging, boosting, and stacking – and discuss them in more detail.

Bagging

Bagging (short for bootstrap aggregating) suggests that every base learner in an ensemble should be trained on a different variant of the training data. Each variant, being slightly different from the others, will give rise to a different model. The predictions of these models will then be combined using majority vote for classification tasks or averaged for regression tasks.

But how do we create different variants of the training data? In bagging, they are created by randomly sampling examples from the training set with replacement. This process is called bootstrapping. As a result of bootstrapping, some examples will occur multiple times in each variant of the original training set, while others may not be selected at all. Such unselected examples are called out-of-bag samples.

What's the purpose of this procedure? Imagine that the original training set contains outliers. These aren't always easy to identify upfront, but their presence can negatively affect the performance of any ML model. If there aren't many outliers in our data, chances are high that such 'bad' examples won't be included in a bootstrap sample, so the model trained on this sample won't be affected. By drawing several bootstrap samples, we use all of our data as much as possible while trying to minimize the impact of outliers.

The process of training a bagged ensemble is illustrated below:

Illustration of how to train a model with the bagging ensemble technique

Arguably the most widely-used ML model that relies on bagging is Random Forest, which combines a set of decision trees in a bagged ensemble. As an addition to the traditional bagging approach, decision trees in a random forest are trained on different subsets of the available input features to further promote diversity in the base models' predictions.

Boosting

Another common ensembling strategy is boosting. Unlike bagging, where a number of base learners are trained independently of each other, boosting suggests training the models sequentially in such a way that the next model improves the performance of the previous one.

For example, in adaptive boosting, commonly abbreviated as AdaBoost, this is achieved by assigning a weight to each training example. At every iteration, AdaBoost increases the weight of the examples on which the last base learner made mistakes, and decreases that of the correctly predicted examples. As a result, the new base learner that is trained at the next iteration will focus more on the 'difficult' examples from the training data and (hopefully) will improve the performance on them.

This process is illustrated below:

Illustration of how to train a model with the boosting ensemble technique

In gradient boosting, each model predicts the errors of the previous one, and the predictions of the models in the ensemble are summed up. Turns out this procedure actually corresponds to the numeric optimization algorithm called gradient descent, which gives the name to this ensembling method.

While AdaBoost isn't used that much nowadays, gradient boosting-based ensembles are incredibly popular.

Stacking

Yet another kind of ensembling is called stacking. It refers to building a new ML model that learns how to combine the predictions of several other models to achieve better results.

To train a basic stacked ensemble, you first need to train a number of different machine learning models on a part of the training data available. Then, one more model is trained that predicts the value of the target attribute based on the values predicted by the models trained at the previous step, often along with the original features:

Illustration of how to train a model with the stacking ensemble technique

When solving a classification task, logistic regression is typically used as the combiner model, while linear regression is often employed for numeric targets.

In reality, stacked ensembles can be more complex and contain more 'layers'.

Conclusions

Ensembling refers to combining predictions of several ML models.
An ensemble typically, though not always, performs better than the models it consists of.
Bagging refers to training a number of base models in parallel, each on a different bootstrap sample of the original training data.
In boosting, base models are trained sequentially, with each model aiming to improve the performance of the previous one.
Stacking is a type of ensembling where predictions obtained by several ML models are combined into the final prediction by a new ML model.

13 learners liked this piece of theory. 0 didn't like it. What about you?

Report a typo