A scatter plot is a visualization of how two variables relate to each other by using plots. It is widely used for its simplicity in building a chart.
In this topic, you’ll become familiar with creating basic scatter plots using matplotlib. In later sections, you’ll learn how to customize such plots.
Building scatter plots
Now, let's try to visualize and compare data, for example, transportation prices from Vienna to Budapest.
You can create a scatter plot by using plt.scatter(), where the arguments are the two variables you wish to compare as input arguments. In this case, we want to show the relationship between transportation and price. The scatter() function also takes the s parameter as an argument that specifies the marker size. In this example, we will also show the labels for each axis and the title of the visualization:
import matplotlib.pyplot as plt
travel = ['flight', 'car', 'train', 'taxi']
price = [142, 62, 36, 100]
plt.title("Travel Costs in Euro: Vienna - Budapest")
plt.xlabel("Transportation")
plt.ylabel("Price in Euro")
plt.scatter(travel, price, s=100)
plt.show()In the next sections, we’ll start exploring more advanced uses of scatter().
Understanding the parameters
You’ve learned about the main input parameters to create scatter plots in the sections above. Here’s a summary of key points to remember about the main input parameters:
Parameter | Description |
|---|---|
| These parameters represent two variables we want to show the relationship. |
| Defines the marker size. |
| Represents the marker color. |
| Customizes the shape of the marker. |
| Selects the mapping between values and colors. |
| This parameter is a float number and represents the transparency of the markers. |
Now, let's customize the first example of a scatter plot using different parameters. We will keep the x and y values, in this case, the travel and price arrays.
Changing the colors of the plots
A good idea is to show different colors plots with different prices. In this case, the с parameter, which determines colors, will depend on the values of prices. The code is shown below:
import matplotlib.pyplot as plt
travel = ['flight', 'car', 'train', 'taxi']
price = [142, 62, 36, 100]
plt.title("Travel Costs in Euro: Vienna - Budapest")
plt.xlabel("Transportation")
plt.ylabel("Price in Euro")
plt.scatter(travel, price, c=price, s=100)
plt.show()If the colors are not clear enough, let's add another parameter — cmap. It shows the highest prices with a specific color. Just like any mapping, the cmap parameter will add a detailed map of colors. You can take a look at the list of all color maps available in matplotlib. One of them is called Viridis, and we can specify it as a parameter of the function by declaring cmap='viridis'. Let's add it to our scatter function. Also, we may add a color bar to make the visualization more clear.
plt.scatter(travel, price, c=price, cmap='viridis', s=100)
plt.colorbar()The result is the Scatter plot shown below.
As you can see, the relationship between the plot color and price is clear now.
Customizing the marker
We can choose to show different markers, not just circles. Let's change the way of showing plots by changing the marker parameter. By default, it is a circle. In this example, let's make it an x symbol. The function call will be similar to this:
plt.scatter(travel, price, c='orange', marker='x')The resulting scatter plot is shown in the figure below.
Scatter vs. plot functions
You can also implement a scatter plot by using another function within matplotlib.pyplot. The function plt.plot() is a general-purpose plotting function that creates various line or marker plots. In this case, we want to create a simple scatter plot, just by using the plot function:
import matplotlib.pyplot as plt
travel = ['flight', 'car', 'train', 'taxi']
price = [160, 62, 36, 100]
plt.title("Travel Costs in Euro: Vienna - Budapest")
plt.xlabel("Transportation")
plt.ylabel("Price in Euro")
plt.plot(travel, price, "o")
plt.show()How to choose between plot() and scatter() functions? Here is a rule of thumb:
If you need a basic scatter plot, use
plt.plot(), especially if you want to prioritize performance.If you want to customize your scatter plot by using more advanced plotting features, use
plt.scatter().
Read more on this topic in SQL and Python: applying programming languages on Hyperskill Blog.
Conclusions
In this topic, we've discussed how to create and customize scatter plots using plt.scatter(). You’re ready to start practicing with your datasets and examples. This function gives you a chance to explore your data and present your findings.
You can get the most out of visualization using plt.scatter() by learning more about all the features in matplotlib.