12. Seaborn - Data Visualization
Last updated
Last updated
Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.
In [1]:
In [2]:
In [3]:
In [4]:
Out[4]:
0
16.99
1.01
Female
No
Sun
Dinner
2
1
10.34
1.66
Male
No
Sun
Dinner
3
2
21.01
3.50
Male
No
Sun
Dinner
3
3
23.68
3.31
Male
No
Sun
Dinner
2
4
24.59
3.61
Female
No
Sun
Dinner
4
We use load_dataset method to load datasets from data repository.
In [5]:
Out[5]:
0
16.99
1.01
Female
No
Sun
Dinner
2
1
10.34
1.66
Male
No
Sun
Dinner
3
2
21.01
3.50
Male
No
Sun
Dinner
3
3
23.68
3.31
Male
No
Sun
Dinner
2
4
24.59
3.61
Female
No
Sun
Dinner
4
...
...
...
...
...
...
...
...
239
29.03
5.92
Male
No
Sat
Dinner
3
240
27.18
2.00
Female
Yes
Sat
Dinner
2
241
22.67
2.00
Male
Yes
Sat
Dinner
2
242
17.82
1.75
Male
No
Sat
Dinner
2
243
18.78
3.00
Female
No
Thur
Dinner
2
244 rows × 7 columns
Creating graph with the help of seaborn is as easy as matplotlib.Suppose we want to create a scatter plot of relationship between Totalbill and tips. All you need to do is to use scatterplot method.
Note :We use scatterplot() method to create scatter plot instead of scatter like we used to do in matplotlib.
hue : (optional) This parameter take column name for colour encoding.
Suppose we also want to know smokers along with total_bill and tip. We can show this third information with the help of hue.
In [7]:
You can also change order of hue by using hue_order property.
In [9]:
What if you want to change color encoding for smokers and non-smokers.
For this you can use palette property.
In [11]:
smoker is a categorical value , but you can also use any numerical value in hue. Let's say i want to color encode sizes.
In [12]:
relplot is used for many purposes , but the main purpose behind using replot is sub-plotting. Suppose we want to suplot our graph based on smoker and non-smoker.
We can use row,col to achieve this result.
Let's see this with an example.
In [14]:
What if i used row property.
In [15]:
Graph would have appeared like this.
We can do same with size by choosing col as size.
In [17]:
Now as you can see it has one column and all the graphs are in a single column. We can use col_wrap to restrict number of graphs in a single column.
In [21]:
what if i want to see data of smokers in column and data of time in rows, we can use col and row property in a single plot.
In [22]:
what if one of my axis is categorical column ?
In [23]:
relplot offers a property size that changes the size of values. Let's see this with an example.
In [25]:
Seaborn offers another dataset named flights. Let's see this data first.
In [13]:
Out[13]:
0
1949
Jan
112
1
1949
Feb
118
2
1949
Mar
132
3
1949
Apr
129
4
1949
May
121
...
...
...
...
139
1960
Aug
606
140
1960
Sep
508
141
1960
Oct
461
142
1960
Nov
390
143
1960
Dec
432
144 rows × 3 columns
If we want to know how many passengers were on board on a particular month and year.
In [7]:
Whenever there is a categorical column and a numerical column , we donot use relplot. Let me show you why?
In [15]:
This obviously is not a clear graph. So whenever we deal with categorical column we use catplot(categorical plot).
Categorical plot is always a best option if one value is Categorical and another is numerical.
In [17]:
Suppose we want to know total bill dayswise , we can use catplot as day is a categorical plot while total bill is numerical.
In [23]:
You can change graph to bar by using kind = 'bar' property.
In [25]:
You can change graph to bar by using kind = 'point' property.
In [27]:
You can change graph to bar by using kind = 'violin' property.
In [29]:
From this violin chart it is very easy to figure out pattern of bills that were given on any day. Like on thursday most of the total bill were between 10 to 20 dollars.
You can change graph to bar by using kind = 'boxen' property.
In [31]:
Boxen chart also works like violin chart. The only difference is Boxen chart assigns darkest colour to the area where most of the data lies.
As you can see a horizontal line , this horizontal line is median of the data.
dots that you see in boxen chart are outliers
You can change graph to bar by using kind = 'box' property.
In [32]:
seaborn.countplot() method is used to Show the counts of observations in each categorical bin.
Suppose we want to count Type 1 of pokemon in our pokemon dataset. We can use countplot here.
In [22]:
Plot pairwise relationships in a dataset.
By default, this function will create a grid of Axes such that each numeric variable in data will by shared across the y-axes across a single row and the x-axes across a single column.
In [35]:
You can also add hue property to see gender of customers.
In [36]:
In [38]:
0
5.1
3.5
1.4
0.2
setosa
1
4.9
3.0
1.4
0.2
setosa
2
4.7
3.2
1.3
0.2
setosa
3
4.6
3.1
1.5
0.2
setosa
4
5.0
3.6
1.4
0.2
setosa
...
...
...
...
...
...
145
6.7
3.0
5.2
2.3
virginica
146
6.3
2.5
5.0
1.9
virginica
147
6.5
3.0
5.2
2.0
virginica
148
6.2
3.4
5.4
2.3
virginica
149
5.9
3.0
5.1
1.8
virginica
150 rows × 5 columns
Now as we have discussed before in the note, whenever we are dealing with ranges , Histogram is always a better option. So let us create a histogram for species and their sepal length.
In [40]:
When you install seaborn , it comes with datasets for you to practice on. You can go through .