11 minutes read

Data visualization is the final step when you work with data. Drawing graphs and plots may seem easy enough; don't underestimate it! However good your analysis is, no one would appreciate it without a clear visualization. Graphs, charts, and tables can help you with useful insights into the data.

In this topic, we'll talk about seaborn — a data visualization library built on top of matplotlib. It makes your visualizations both prettier and easier. Many things that take long code lines in matplotlib are straightforward with seaborn. Let's dive in!

Getting started with seaborn

If you don't have seaborn, use pip install seaborn in the command line. Once you have it, let's start with the imports! This is all we need for now:

import seaborn as sb

Remember, import seaborn as sns is another way to import seaborn.

As you already know, seaborn is built on top of matplotlib, so often you'd be working with both libraries at once. Seaborn is also closely integrated with pandas — a great library for dealing with data. When it comes to the real tasks, you'd need these two as well:

import matplotlib.pyplot as plt
import pandas as pd

Now, let's create some plots!

Seaborn scatterplot

The good news is that seaborn comes with a bunch of pre-installed datasets that you can use to practice.

Sb.get_dataset_names() gets the list of all datasets; sb.load_dataset(name) loads one of them. In this topic, we'll use the penguins dataset. It contains information on 343 penguins: species, sex, island, and body measurements. Let's save it into a penguins variable:

penguins = sb.load_dataset('penguins')

Take a look at the data:

penguins Seaborn dataset

Let's visualize it. We need to see whether there's an association between the body mass and flipper length. That sounds like a scatterplot! The function is sb.scatterplot(x, y, data). The x and y are variables for the x- and y-axis as str. Data is the name of your dataset. So your code would look like this:

sb.scatterplot(x='flipper_length_mm', y='body_mass_g', data=penguins)

That's already a bit simpler than matplotlib — less code to specify the variables you want on your plot. Let's see what we've got:

Make a scatterplot with the seaborn library

As you see, the axis labels have been added automatically. Of course, it's possible to change them. To do that, save your scatterplot to a variable (named plot in our example) and use the .set() method. It has arguments: xlabel, ylabel, and title that take a str. Let's try that:

plot = sb.scatterplot(x='flipper_length_mm', y='body_mass_g', data=penguins)
plot.set(xlabel='flipper length', ylabel='body mass', title='Penguins body mass and flipper length')

Now our plot looks like this:

Set labels and title on a scatterplot with seaborn

It's a bit more informative now, but what if we want another variable? In seaborn, we can do it with just one keyword argument! We have a choice: hue (color), size, or style (a marker's form). They are arguments of sb.scatterplot() that take a str: the variable name. For example, we want to analyze the body mass and flipper length depending on sex. Let's add that info as color:

plot = sb.scatterplot(x='flipper_length_mm', y='body_mass_g', hue='sex', data=penguins)

Here's our plot:

Use hue to distinguish between features on a scatterplot

As you can see, the legend is added automatically. We can see that females generally have less body mass and shorter flippers than males. When it comes to size and style arguments, use them for the same variable, making dots vary not only in color but in form or size. Another option is to use them to represent other variables (for example, hue for sex, size for bill length), but be careful not to overload your plot with information. Too much info on one plot makes it hard to interpret.

Seaborn pair plot

If you happen to have a lot of variables in your dataset, it may be a good idea to study the relationship between each pair of variables separately. Seaborn lets us do that easily with the sb.pairplot() function. Let's try that:

sb.pairplot(hue='species', data=penguins)

The result is quite impressive:

Make a pair plot with the seaborn library

This matrix may seem overwhelming at first; but after a closer look, you'll see that it provides a perfect insight into the data. In this example, we've used color (the hue argument) to represent species of penguins.

Here you see that each column has a common x-axis (for example, bill length for the first column), and each row has a common y-axis (like bill depth for the second row). The plots in the diagonal show us univariate distribution (probability distribution of one variable) while other plots represent bivariate distribution (probability distribution of two variables).

Matrices of this kind are useful and easy-to-plot with seaborn.

Styling options

Do you have a feeling that this topic is missing something? Right, we haven't used any of the styling opportunities yet! That's a big shame since seaborn is known for hyping plots up. It's esthetics time!

Seaborn makes styling easier by providing default themes. That means you can set a style used in all of your plots! It's a perfect tool to avoid styling each plot separately. Themes are set with sb.set_style(). This function can take two arguments: style and rc. The style takes a name of a pre-configured style or dict of parameters if you want to create your style. The rc takes a dict and can override some of the parameters defined in style. For now, we'll just use one of the ready-to-use styles: darkgrid, whitegrid, dark, white and ticks. Let's try the darkgrid theme first:

sb.set_style('darkgrid')

After you've used this function once, all your plots would look like this:

Set style on all plots with the seaborn library

Much prettier than in the previous sections, right? The best thing is that you won't need to repeat the styling function all the time.

We can also change the colors. To do that, we can set a color palette with the sb.set_palette() function. It's also used once and then applied to all the plots. It takes a str as argument: either a built-in seaborn color palette (deep, muted, bright, pastel, dark, or colorblind) or a matplotlib colormap. Here's how you do that:

sb.set_palette('muted')

Now here's our plot:

Set the color palette on all plots with seaborn

Not much different from the previous one! All that changed is the tone, but the colors are the same. That's because all six built-in palettes consist of the same colors; the difference is only in the tone. If you want different colors, use the names of matplotlib colormaps.

It's considered good practice to put sb.set_style() and sb.set_palette() in the beginning of your code, usually after the import statements.

Another lifehack is that seaborn allows you to see your palette with the sb.color_palette() and sb.palplot() functions. Let's see how the built-in bright palette looks like:

palette = sb.color_palette('bright')
sb.palplot(palette)

As you can see, sb.color_palette() takes the name of a colormap (str) and sb.palplot() takes the result of sb.color_palette(). Here's what we get:

Visualize the color spectrum of the bright palette

Another useful function is sb.despine(). It removes the upper boundaries from your plot (they aren't necessary for the analysis). It only works with the white, whitegrid, and ticks themes.

The sb.despine() function is used separately for each plot.

Now, let's sum up everything we've learned about styles in seaborn and test other colors and styles on our penguin plot:

# set the default style and color palette
sb.set_style('whitegrid')
sb.set_palette('Accent')

# create a plot
plot = sb.scatterplot(x='flipper_length_mm', y='body_mass_g', hue='sex', data=penguins)
# set labels and title to the plot
plot.set(xlabel='flipper length', ylabel='body mass', title='Penguins body mass and flipper length')

# remove the upper boundaries
sb.despine()

Our plot looks like this:

Customize scatterplot with hue, palette, style, and despine

What we've described here is just the tip of the iceberg. Seaborn provides you with the opportunities to create your styles and use them in sb.set_style(), so you can change anything you want: from fonts to line width and color of the grid. The official documentation can introduce you to the full list of parameters.

Conclusion

In this topic, we've discussed the basics of the seaborn library: loading datasets, creating plots and matrices of plots, and styling your visualizations. We used scatterplots as an example, but of course, seaborn can be used to create all kinds of plots. Histograms, heatmaps, violin plots, bar plots — anything that can be done with matplotlib, can be done with seaborn more simply.

To learn more about different plots in seaborn, make sure to check out the official documentation. It is very clear and easy to understand.

Now let's practice what you've learned!

15 learners liked this piece of theory. 0 didn't like it. What about you?
Report a typo