Matplotlib, Seaborn, and Bokeh are all libraries used for data visualization in Python, but they have some key differences in terms of their features and use cases.
Matplotlib is a versatile library that provides a wide range of plot types and customization options, making it a popular choice for creating static visualizations. It has a large user base and extensive documentation, making it relatively easy to learn and use. An example of a visualization that is more suitable for Matplotlib would be a line chart that shows the trend of a stock price over time:
import matplotlib.pyplot as plt
x = [1, 2, 3, 4, 5]
y = [10, 12, 8, 15, 20]
plt.plot(x, y)
plt.title('Stock Price Trend')
plt.xlabel('Time')
plt.ylabel('Price')
plt.show()
Seaborn, on the other hand, is a library that is specifically designed for statistical data visualization. It provides a high-level interface for creating attractive and informative statistical graphics. Seaborn has built-in support for various statistical plots such as scatterplots, lineplots, barplots, and histograms, making it a popular choice for exploratory data analysis. An example of a visualization that is more suitable for Seaborn would be a scatterplot that shows the relationship between two variables.
import seaborn as sns
import pandas as pd
df = pd.read_csv('data.csv')
sns.scatterplot(x='age', y='income', data=df)
sns.set_style('darkgrid')
sns.set_palette('dark')
sns.set(rc={'figure.figsize':(10,8)})
plt.title('Income vs. Age')
plt.show()
Bokeh is a library that is used for creating interactive visualizations that can be displayed in web browsers. It provides a set of tools for creating interactive plots, dashboards, and data applications. Bokeh is ideal for creating visualizations that allow users to interact with data in real-time. An example of a visualization that is more suitable for Bokeh would be an interactive heatmap that shows the correlation between various features of a dataset.
from bokeh.io import output_file, show
from bokeh.layouts import gridplot
from bokeh.plotting import figure
from bokeh.models import ColumnDataSource
from bokeh.transform import linear_cmap
import pandas as pd
df = pd.read_csv('data.csv')
source = ColumnDataSource(df)
corr = df.corr()
values = corr.values.tolist()
features = corr.columns.tolist()
p = figure(title='Correlation Heatmap',
x_range=features, y_range=features,
tools='hover,save', toolbar_location='below')
mapper = linear_cmap(field_name='value', palette='Spectral11', low=-1, high=1)
p.rect(x='x', y='y', width=1, height=1,
source=source,
fill_color=mapper,
line_color=None)
p.xaxis.major_label_orientation = 1
p.xaxis.axis_label = 'Features'
p.yaxis.axis_label = 'Features'
output_file('heatmap.html')
show(p)
Seaborn is a Python library used for data visualization. It has many functions to create different types of plots. Here are some of the main functions to create relational, categorical, and distribution plots:
Example:
import seaborn as sns
tips = sns.load_dataset("tips")
sns.scatterplot(x="total_bill", y="tip", data=tips)
This code will create a scatter plot of the “total_bill” column on the x-axis and the “tip” column on the y-axis from the “tips” dataset.
Example:
import seaborn as sns
tips = sns.load_dataset("tips")
sns.boxplot(x="day", y="total_bill", data=tips)
This code will create a box plot of the “total_bill” column on the y-axis and the “day” column on the x-axis from the “tips” dataset.
Example:
import seaborn as sns
tips = sns.load_dataset("tips")
sns.histplot(data=tips, x="total_bill")
This code will create a histogram of the “total_bill” column from the “tips” dataset.
The role of the Seaborn Cheat Sheet can be used as a reference to the many ways to display the visualization of data. I can see it being extremely useful since there are so many different types of graphs to plot for data.
Some key sections that I see being helpful for my work flow and reference is the ‘Categorical Plots’ section and the Basics/Data section. With these 2 sections it should not be too hard getting started.
I definitely want to learn more about the further customization of these data plots from seaborn that I saw in the cheat sheet. Also, further my knowledge on using this in combination with pandas to create beautiful charts.
I also want to know when is it preferred to use sklearn’s LinearRegression model over seaborn’s? Or do we just usually use the more powerful tool, in this case, seaborn.
References