A bubble chart is very similar to a scatter plot. Both visualize data using circles. The key characteristic of a bubble chart is its ability to represent up to four variables in one chart.
We may want to use a bubble chart to display different relationships between data in various fields such as business, marketing, and research.
While bubble charts may appear simple, they can sometimes be challenging to read and interpret, particularly when bubbles overlap. We will attempt to clarify these issues.
In this topic, we will learn how to create a bubble chart using matplotlib in Python. We will also highlight the benefits of visualizing four different variables without delving too deeply into details.
Bubble chart composition
Imagine you work in a laboratory. Your task is to compile and visualize weather data for a specific location. The weather station collects data related to temperature, rainfall, and other weather activities. It calculates the average temperatures and rainfall for each month and also estimates whether a month has been predominantly cloudy, sunny, or rainy.
The months are represented by a range from 1 to 12. The temperatures for each month are stored in an array. The first step is to visualize this data in a scatter plot. The code for this task is shown below:
import matplotlib.pyplot as plt
months = range(1, 13)
avg_temperature = [8, 12, 15, 20, 21, 23, 24, 23, 23, 20, 14, 11]
plt.scatter(months, avg_temperature)
plt.xlabel("Months")
plt.ylabel("Average temperature")
plt.show()
he chart you see is a scatter plot; it shows the average temperature for each month. We can transform it into a bubble plot by adding a third element — the average rainfall amount. This will be displayed as the size of each circle, forming bubbles. The interpretation is as follows: a smaller bubble indicates a low average rainfall, while a bigger bubble represents a high average rainfall for the specific month.
The values for the average rainfall are stored in an array, similar to this one:
avg_rainfall = [110, 105, 94, 61, 66, 31, 22, 23, 98, 110, 188, 143]The scatter() function also determines the size parameter for each bubble:
plt.scatter(months, avg_temperature, s=avg_rainfall)This first example shows how easy it is to build a bubble chart that shows three-dimensional data in a single diagram. Now, let's see some other advantages of a bubble chart.
The fourth variable
The bubble chart becomes a 4-dimensional chart when we add another variable — the bubble color. In this example, we've mentioned that the weather station stores the weather data for each month. We will mark it in the following way:
yellow – most days were sunny;
orange – most days were hot;
light gray – most days were cloudy or rainy;
light blue – it was snowing on most days.
This data is stored as an array. The code for this application is shown below:
import matplotlib.pyplot as plt
months = range(1, 13)
avg_temperature = [8, 12, 15, 20, 21, 23, 24, 23, 23, 20, 14, 11]
avg_rainfall = [110, 105, 94, 61, 66, 31, 22, 23, 98, 110, 188, 143]
#colors
snow = "lightblue"
rain = "lightgray"
sun = "yellow"
hot = "orange"
skye = [snow, rain, rain, sun, sun, hot, hot, hot, rain, rain, snow, snow]
plt.scatter(months, avg_temperature, s=avg_rainfall, c=skye)
plt.xlabel("Months")
plt.ylabel("Average temperature")
plt.show()Customizing the chart
We can significantly improve the examples above. First, we want to increase the bubble size to make the differences more apparent. However, we should consider that larger bubbles may lead to a common issue — they can grow too big and overlap. This problem consumes a lot of space and can also mislead readers.
To increase the size of each bubble, we'll multiply the value it represents by an integer. Let's multiply each value of the array avg_rainfall by 25.
Additionally, let's customize the color of each bubble by making it more transparent, reducing the color intensity to about 80% of its original value.
Here's how the updated code looks:
plt.scatter(months, avg_temperature, s=[25*n for n in avg_rainfall], c=skye, alpha=0.8)The complete solution is:
import matplotlib.pyplot as plt
months=["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug","Sep", "Oct", "Nov", "Dec"]
avg_temperature=[8, 12, 15, 20, 21, 23, 24, 23, 23, 20, 14, 11]
avg_rainfall=[110, 105, 94, 61, 66, 31, 22, 23, 98, 110, 188, 143]
snow = "lightblue"
rain = "lightgray"
sun = "yellow"
hot = "orange"
skye = [snow, rain, rain, sun, sun, hot, hot, hot, rain, rain, snow, snow]
plt.scatter(months, avg_temperature, s=[25*n for n in avg_rainfall], c=skye, alpha=0.8)
plt.xlabel("Months")
plt.ylabel("Average temperature")
plt.show()
Conclusions
A bubble chart is a representation of up to four variables in the same diagram.
Considering the above, we can conclude the following advantages:
By presenting the relationship between three variables, this type of chart shows changes in trends
It broadens the scope of analysis
It provides a bigger picture of the entire dataset, making it easier to read
We can incorporate four variables by changing the colors of the bubbles
One of the disadvantages of a bubble chart is that it can become too complex if you overload your plot with data. So, keep it simple!