Computer scienceData scienceInstrumentsVisualizationKinds of graphs

Matplotlib violin plot

9 minutes read

A violin plot is a method to visualize the distribution of numeric variables for one or several groups. By nature, it is a combination of a box plot and a histogram. Similar to box plots, it shows the data summary statistics such as mean/median and interquartile ranges. Apart from that, it shows the data distribution like a histogram. A violin plot is extremely useful when you want to compare the distributions between multiple groups. In the following sections, we will take a closer look at the details of this statistical tool.

Violin plot composition

Let's consider a simple example of how random data collections are formed into a single list called data_list. We need to build axes and create a violin plot. To do it, call the violinplot() function with the data_list parameter.

Let's suppose we have the sales number for two months: January and August. Values may vary — 0 percent, 20 percent, 50 percent, and so on. We want to see the data density to determine which month was the most profitable. The following code implements the datasets and shows the density in a violin plot:

import matplotlib.pyplot as plt

data1 = [0, 20, 20, 20, 20, 20, 70]
data2 = [0, 20, 50, 50, 50, 90, 50]
data_list = [data1, data2]
fig, axes = plt.subplots()
plt.violinplot(data_list)

plt.show()

A simple violin plot

The plot is puzzling — we don't know which violin plot represents each month. For this reason, we need to customize it!

Labels

Let's see how to customize a violin plot for better illustration. You've probably noticed that it is difficult to interpret the graph without labels on the X and Y-axes.

As of now, we have the values of 1 and 2 as X-coordinates of the violin plots. So, the first step is to set their ticks. After this, we need to add the labels to interpret values as January and August. We can do it by using the set_xticklabels function.

axes.set_xticks((1, 2))
axes.set_xticklabels(("January","August"))

In this example, Y-axis shows the sales in percentage, so we want to show only this label. In this case, the function is set_ylabel:

axes.set_ylabel("Sales in percentage")

Finally, we add the plot's title to show the general idea of the graphical representation. Nothing fancy this time; let's just call it Violin plot.

axes.set_title('Violin plot')

After adding the labels, you'll end up with something like this:

Set the ticks, ticklabels, ylabel and title of a violin plot

Chart interpretation

Once we know how to build the graph, it's time to talk about its interpretation.

Violin plot is a wonderful tool for illustration. First of all, it shows the distribution, or simply, the shape of the data by using the kernel density estimation that tries to "guess" the density of the data. Without going into too much detail, you can think of it as a smoothed histogram. The main advantage of this function (instead of distinct values in a histogram) is the lack of bins. Depending on the shape of the violin plot, we can interpret the number of observations:

  • The violin is wider where the data entries are plentiful;
  • The violin is narrower where the data entries are scarce.

Apart from the shape of the data, a violin plot can illustrate several summary values: the minimum and maximum, mean, medians, and quartiles. These features make a violin similar to a box plot. Now, let's update our example and introduce these parameters to our violin plot.

Adding parameters to a violin plot

We can choose to show means and medians in our violin plot. By default, only the extremes are present in the figure.

Some parameters of the violinplot( ) function include:

  • showmeans — if True, adds the mean values to a violin plot;

  • showmedians — if True, adds the median values to a violin plot;

  • showextrema is True by default. It shows the extreme points in the plot;

  • quantiles is None by default. You can set it as a list of floats in the interval [0, 1] to display the quantiles in the violin plot.

Let's customize the first example to show the sales for the whole year in one violin plot. Also, let's add the mean (magenta) and the median (green) values.

The code for this example is:

import matplotlib.pyplot as plt
data2021 = [0, 20, 20, 50, 85, 20, 20, 20, 70, 90, 10, 0]
fig, axes = plt.subplots()

axes.set_title('Violin plot')
xticklabels = ['Year 2021']
axes.set_xticks([1])
axes.set_xticklabels(xticklabels)
axes.set_ylabel("Annually Sales in percentage")

sales=plt.violinplot(data2021, showextrema=True, showmeans=True, showmedians=True)
sales['cmeans'].set_color('m')
sales['cmedians'].set_color('g')

plt.show()

The figure for the 2021 sales will look like this:

Show the mean and median in a violin plot

Quantiles

In the example below, we display only some parts of the statistics — the quantiles 0.25, 0.75, and 0.9. These values depend on what we want to depict in a chart. The code is similar to this:

import matplotlib.pyplot as plt
data2021 = [0, 20, 20, 50, 85, 20, 20, 20, 70, 90, 10, 0]
fig, axes = plt.subplots()
axes.set_title('Violin plot')
xticklabels = ['Year 2021']
axes.set_xticks([1])
axes.set_xticklabels(xticklabels)
axes.set_ylabel("Annually Sales in percentage")

sales=plt.violinplot(data2021, showextrema=True, quantiles=[0.25, 0.75, 0.9])
sales['cquantiles'].set_color('y')


plt.show()

Below is an example of the violin plot that displays the 2021 sales. The plot only shows the quantiles defined above:

Show the quantiles on a violin plot

Conclusion

Violin plots are a combination of a box plot and a histogram. They allow us to see how the data is spread and concentrated. We've gone over several ways to create a violin plot using Matplotlib. We've also covered several options to customize them by adding X and Y labels, adding means, medians, and quantiles.

24 learners liked this piece of theory. 1 didn't like it. What about you?
Report a typo