The matplotlib library is all about data visualization. In addition to creating plots, matplotlib provides us with the possibility to make various bar charts. A bar chart (also, bar plot) is a diagram where variables are represented as rectangular bars: the taller or longer the bar, the higher value it represents. Usually, one axis of a bar chart represents a category, and the other is its value. A bar chart is used to compare discrete data, such as occurrences or proportions. Let's see how we can make use of matplotlib to visualize our data with bar charts.
Creating a bar chart
To create a bar chart with matplotlib, you simply need to call the bar() function. The syntax for this method is as follows:
import matplotlib.pyplot as plt
plt.bar(x, height, width, bottom, align)where:
xis a category;heightis the corresponding value;widthis how wide you want your bars; the default value is0.8;bottomis the base of the y-coordinate; in other words, it is the point where your bars start. The default is0;alignis where you want to place your category names. By default, they are positioned at the barcenter; you can also place them on the left edge of a bar by passing theedgeargument.
Let's create our first simple bar graph. Suppose you want to compare the box office of the movies released in 2020 in the USA. Our code will look something like this:
import matplotlib.pyplot as plt
films = ['Wonder Woman', 'Sonic', '1917', 'Star Wars', 'Onward']
box_office = [16.7, 26.1, 37.0, 34.5, 10.6]
plt.bar(films, box_office)
plt.show() Here's the resulting bar graph:
Looks pretty decent, doesn't it? It lacks the description, though. Let's see what we can do about it.
Matplotlib labels
What is the point of a graph if we do not know what the figures represent? One way to make your chart more illustrative is to add labels. Check out the code on how it can be done:
import matplotlib.pyplot as plt
films = ['Wonder Woman', 'Sonic', '1917', 'Star Wars', 'Onward']
box_office = [16.7, 26.1, 37.0, 34.5, 10.6]
plt.bar(films, box_office)
plt.ylabel('Box Office (mil. $)') # labling y-axis
plt.xlabel('Film title') # labling x-axis
plt.title('Box office of 5 different films of 2020 in the USA') # giving chart a title
plt.show()Good! Now, we can clearly see what the bars represent. A good thing about setting the labels is that the order doesn't matter — whether you call the ylabel(), xlabel(), or title() first, the result will be the same.
Gridlines
Another way to make your chart more representative is by adding gridlines. To do this, call the .grid() method, passing the parameters for color, linestyle, width, and axis. To make gridlines transparent, you can tweak the alpha parameter. It can range from 0.0 to 1.0.
import matplotlib.pyplot as plt
films = ['Wonder Woman', 'Sonic', '1917', 'Star Wars', 'Onward']
box_office = [16.7, 26.1, 37.0, 34.5, 10.6]
plt.bar(films, box_office)
# add grid lines
plt.grid(color='grey', linestyle=':', linewidth=1.0, axis='y', alpha=0.5)
plt.ylabel('Box Office (mil. $)')
plt.xlabel('Film title')
plt.title('Box office of 5 different films of 2020 in the USA')
plt.show()You can find more information on grid parameters in the Official Documentation.
Horizontal bar chart
If you want to display bars horizontally instead of vertically, you can call the barh() function. But don't forget to switch the axes labels!
import matplotlib.pyplot as plt
films = ['Wonder Woman', 'Sonic', '1917', 'Star Wars', 'Onward']
box_office = [16.7, 26.1, 37.0, 34.5, 10.6]
# plotting the chart horizontally
plt.barh(films, box_office)
plt.xlabel('Box Office (mil. $)')
plt.ylabel('Film title')
plt.title('Box office of 5 different films of 2020 in the USA')
plt.show()Grouped bar plot
Plotting multiple bars next to each other can come in handy when we need to compare two or more data series that share categories. Suppose we have the survey results on whether people prefer dogs or cats for several years:
import numpy as np
import matplotlib.pyplot as plt
years = ["2016", "2017", "2018", "2019"]
cats = [57, 50, 47, 30]
dogs = [43, 50, 53, 70]
# create x-axis values depending on the number of years
x_axis = np.arange(len(years))
# increase the figure size
plt.figure(figsize=(10, 6))
plt.bar(x_axis-0.2, cats, width=0.4, label='Cats')
plt.bar(x_axis+0.2, dogs, width=0.4, label='Dogs')
# set tick labels and their location
plt.xticks(x_axis, years)
plt.xlabel('Years', fontsize=14)
plt.ylabel('Preference (%)', fontsize=14)
plt.title('The results of cat/dog survey', fontsize=20)
# add legend
plt.legend()
plt.show()The code above can seem a bit intimidating. Let's unpack it step by step:
First, we import the required libraries:
numpyfor numerical calculations with arrays andmatplotlibfor data visualization;We need to create lists with our data;
We use
np.arange()method to create a range ofxvalues. Here, the number ofxvalues is the number of items in our data set;This time, we have decided to increase the size of the figure, so it could be more descriptive;
We plot the multiple bars calling the
plt.bar()function two times. To avoid overlapping, we add or subtract0.2from the x-axis value and set the bar size to0.4. Feel free to experiment with these values and see the result;We have also added a
labelparameter inside eachbar()function. We need it for adding a legend to our plot;We add ticks (or labels) to the x-axis, where the positions are determined by x-axis value and labels are equal to years. In most cases, bars on the x-axis are labeled automatically. We have done it explicitly to shift the bars in different directions;
Finally, we label each axis, name the plot, and add a legend to it.
Stacked bar plot
Another way of plotting several series of data is by creating a stacked bar chart. A stacked bar chart is a type of graph that displays multiple data points on top of each other. In a stacked bar chart, each bar represents a single category that contains smaller categories. Use it to demonstrate how parts relate to the total amount.
With stacked bar plots, we need to provide the additional bottom parameter that we have mentioned before. It indicates where the bar should start. Have a look at the code:
import matplotlib.pyplot as plt
years = ['2016', '2017', '2018', '2019']
cats = [57, 50, 47, 30]
dogs = [43, 50, 53, 70]
plt.figure(figsize=(10, 6))
plt.bar(years, cats, label='Cats')
plt.bar(years, dogs, bottom=cats, label='Dogs')
plt.xlabel('Years', fontsize=14)
plt.ylabel('Preference (%)', fontsize=14)
plt.title('The results of cat/dog survey', fontsize=20)
plt.legend()
plt.show()We have provided the bottom argument to the second bar plot, so it starts not from 0 but from the value of cats.
If you need to stack three or more categories, provide a sum of the previous category values to the bottom parameter. Have a look at the code below:
import matplotlib.pyplot as plt
import numpy as np
years = ['2016', '2017', '2018', '2019']
cats = np.array([50, 45, 37, 30])
dogs = np.array([40, 39, 50, 55])
hamsters = np.array([10, 16, 13, 15])
plt.figure(figsize=(10, 6))
plt.bar(years, cats, label='Cats')
plt.bar(years, dogs, bottom=cats, label='Dogs')
plt.bar(years, hamsters, bottom=cats+dogs, label='Hamsters')
plt.xlabel('Years', fontsize=14)
plt.ylabel('Preference (%)', fontsize=14)
plt.title('The results of cat/dog survey', fontsize=20)
plt.legend()
plt.show()You may have noticed that this time our data is represented by numpy arrays. This is necessary to perform proper summation operations on list items.
Conclusion
In this topic, we have covered the basics of creating bar charts using matplotlib. However, there is much more of what you can do with this library. The versatility and high customizability of matplotlib are the reasons why it is so popular. If you want to know which other bar plot parameters you can tweak, you can check the Official Matplotlib Bar Chart Documentation.