Data visualization with Python

10 minutes read

In the expansive field of data analysis, effective communication of insights is as vital as the analysis itself. Data visualization emerges as a key tool, enabling analysts and data scientists to present complex information in a readily understandable format. Within the Python ecosystem, two standout libraries, Matplotlib and Seaborn, take the spotlight as indispensable resources for creating compelling and informative visualizations.

Matplotlib: A Foundation for Visualization

Matplotlib stands as a versatile and all-encompassing 2D plotting library, serving as the cornerstone for numerous Python data visualization projects. With Matplotlib, users can generate a wide array of static, animated, and interactive plots, making it an indispensable tool for exploratory data analysis and presentation.

Basics of Matplotlib:

Getting started with Matplotlib involves importing the library and using its pyplot module. Below is the command for that:

import matplotlib.pyplot as plt

Now let's check the list of basic comands we will use while creating graphs in Matplotlib:

Command

Description

plt.plot()

Fundamental for generating line plots, offering options to customize line styles, colors, and markers.

plt.scatter()

Creating scatter plots, facilitating the representation of individual data points with customization options for markers and colors.

plt.bar()

Visualizing categorical data through bar charts, offering flexibility in adjusting bar widths, colors, and positions.

plt.hist()

Designed for histograms, aiding in visualizing the distribution of numerical data with customization options for bins and colors.

plt.xlabel(), plt.ylabel()

Labeling the x and y-axes, enhancing plot clarity and interpretability.

plt.title()

Add titles to plots, providing context and summarizing the main insights.

plt.legend()

Useful when dealing with multiple datasets, the legend command helps distinguish between them, enhancing overall plot comprehension.

plt.subplot()

Enables the creation of subplots within a larger figure, allowing the display of multiple plots side by side.

plt.show()

Displays the plot, rendering it visible to the user.

Graphs in Matplotlib

Now when we know main commands for graph creation, let's put them together and see what we get as a result.

Here is how we make a line plot:

A line plot is a fundamental visualization that represents the relationship between two variables. In the example, we showcase the linear progression of values on the x-axis against corresponding values on the y-axis. Line plots are useful for illustrating trends over a continuous range.

import matplotlib.pyplot as plt

# Sample data
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]

# Create a simple line plot
plt.plot(x, y)

# Adding labels and title
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Simple Line Plot')

# Display the plot
plt.show()

SImple Line Plot
Now let's create a histogram, with number of bins equal to 5 in orange color:

Histograms provide a visual representation of the distribution of a dataset. The example displays the frequency of values within specified bins. This type of plot is valuable for understanding the underlying pattern and distribution of continuous data.

# Sample data
data = [1, 2, 2, 3, 3, 3, 4, 4, 5, 5, 5, 5]

# Create a histogram
plt.hist(data, bins=5, color='orange', edgecolor='black')

# Adding labels and title
plt.xlabel('Values')
plt.ylabel('Frequency')
plt.title('Histogram')

# Display the plot
plt.show()

Histogram

These are minimal examples showcasing the straightforward syntax of Matplotlib. You can customize plots extensively by modifying colors, line styles, and markers. For more commands and details, refer to the official Matplotlib website.

Seaborn: Enhancing Aesthetics and Simplicity

While Matplotlib provides a solid foundation, Seaborn is a high-level interface that builds upon Matplotlib, offering a more aesthetically pleasing and simplified approach to creating statistical graphics.

Key Features of Seaborn:

  1. Built-in Themes and Color Palettes: Seaborn comes with several built-in themes and color palettes that instantly elevate the visual appeal of plots. Users can easily switch between themes to find the most suitable style for their data.

  2. Statistical Estimation: Seaborn simplifies the process of incorporating statistical estimation into visualizations. The sns.regplot function, for example, adds a linear regression line to a scatter plot, providing insights into data trends.

To import Seaborn library let's run the following command:

import seaborn as sns

Now let's discover basic comands of Seaborn in the table below:

Command

Description

sns.relplot()

Creates relational plots, such as scatter plots or line plots, for exploring relationships between variables.

sns.catplot()

Generates different types of categorical plots (scatter, strip, swarm) for visualizing categorical data relationships.

sns.histplot()

Aids in visualizing the distribution of univariate data through histograms, with options to customize bins and colors.

sns.boxplot()

Visualizes the distribution of categorical data using box-and-whisker plots, highlighting summary statistics.

sns.heatmap()

Essential for visualizing 2D datasets, using color intensity to represent values (commonly used for correlation matrices).

sns.pairplot()

Creates a grid of scatterplots and histograms for an overview of relationships between multiple variables.

sns.lineplot()

Draws line plots to visualize trends over continuous or categorical variables.

sns.barplot()

Displays the relationship between a categorical variable and a continuous variable through bar plots with confidence intervals.

sns.countplot()

Creates bar plots specifically for counting the occurrences of observations in each category for categorical data.

sns.set()

Sets aesthetic parameters, allowing customization of the visual appearance of plots (e.g., color palettes and styles).

sns.xlabel(), sns.ylabel()

Adds labels to the x and y-axes, enhancing plot clarity and interpretability.

sns.title()

Adds titles to plots, providing context and summarizing main insights.

sns.legend()

Helps distinguish between multiple datasets by adding legends, enhancing overall plot comprehension.

Creating a Seaborn Plot

Let's create a simple box plot using the code below:

A box plot, or box-and-whisker plot, summarizes the distribution of a dataset and highlights key statistics such as median, quartiles, and outliers. In this example, the box represents the interquartile range, providing insights into the spread of the data.

# Sample data
data = [20, 35, 15, 25, 30, 40, 50]

# Create a box plot using Seaborn
sns.boxplot(data, color='purple')

# Adding labels and title
plt.xlabel('Values')
plt.title('Seaborn Box Plot')

# Display the plot
plt.show()

Seaborn Box Plot

Now let's try visualising a Heatmap:

Heatmaps are particularly useful for visualizing relationships in a matrix. In our example, we generate a heatmap to represent the correlation matrix of randomly generated data. Brighter colors indicate stronger correlations, offering a quick overview of patterns in the data.

# Sample data (correlation matrix)
import numpy as np

data = np.random.rand(5, 5)

# Create a heatmap using Seaborn
sns.heatmap(data, annot=True, cmap='coolwarm')

# Adding title
plt.title('Seaborn Heatmap')

# Display the plot
plt.show()

Seaborn Heatmap

Seaborn's concise syntax and default aesthetics make it an excellent choice for users who prioritize simplicity without compromising on visual appeal. These commands provide a starting point for creating various types of plots in Seaborn. For more in-depth details and customization options, you can refer to the official Seaborn documentation.

Conclusion

Matplotlib and Seaborn emerge as dynamic and versatile tools within the Python data visualization landscape, addressing a spectrum of user requirements. Whether you seek extensive customization and control offered by Matplotlib or the simplicity and aesthetics provided by Seaborn, mastering these libraries opens a gateway to insightful and impactful data visualizations. As you embark on your journey of data exploration and storytelling, these libraries will be your trusted companions, turning raw data into compelling narratives. Now, let's assess your newfound knowledge and apply these skills in practice!

How did you like the theory?
Report a typo