PDA Assignments
  • Python For Data Analytics
    • 1.Python
      • 1.Python Documents
        • 1.Data Types
        • 2.Variables In Python
        • 3.Operators In Python
        • 4.User Input In Python
        • 5.TypeCasting In Python
        • 6.Strings In Python
        • 7.Conditional Statements In Python
        • 8.Branching using Conditional Statements and Loops in Python
        • 9.Lists In Python
        • 10.Sets In Python
        • 11.Tuples In Python
        • 12.Dictionary In Python
        • 13.Functions In Python
        • 14.File Handling In Python
        • 15.Numerical Computing with Python and Numpy
      • 2.Python Assignments
        • Data Type & Variables
        • Operators Assignment
        • User Input & Type Casting
        • Functions- Basic Assignments
        • String Assignments
          • String CheatSheet
        • Conditional Statements Assignments
        • Loops Assignments
        • List Assignments
          • List Cheatsheet
        • Set Assignments
          • Sets Cheatsheet
        • Dictionary Assignments
          • Dictionary Cheatsheet
        • Function Assignments
        • Functions used in Python
      • 3.Python Projects
        • Employee Management System
        • Hamming distance
        • Webscraping With Python
          • Introduction To Web Scraping
          • Importing Necessary Libraries
          • Basic Introduction To HTML
          • Introduction To BeautifulSoup
          • Flipkart Web Scraping
            • Scraping Step By Step
        • Retail Sales Analysis
        • Guess the Word Game
        • Data Collection Through APIs
        • To-Do List Manager
        • Atm-functionalities(nested if)
        • Distribution of Cards(List & Nested for)
        • Guess the Number Game
      • 4.Python + SQL Projects
        • Bookstore Management System
    • 2.Data Analytics
      • 1.Pandas
        • 1.Pandas Documents
          • 1.Introduction To Pandas
          • Reading and Loading Different Data
          • 2.Indexing and Slicing In Pandas
          • 3.Joining In Pandas
          • 4.Missing Values In Pandas
          • 5.Outliers In Pandas
          • 6.Aggregating Data
          • 7.DateTime In Pandas
          • 8.Validation In Pandas
          • 9.Fetching Data From SQL
          • 10. Automation In Pandas
          • 11.Matplotlib - Data Visualization
          • 12. Seaborn - Data Visualization
          • 13. Required Files
        • 3.Pandas Projects
          • Retail Sales Analysis
            • Retail Sales Step By Step
          • IMDB - Dataset Analysis - Basic
        • 2. Pandas Assignments
          • 1. Reading and Loading the Data
          • 2. Data frame Functions and Properties
          • 3. Series - Basic Operations
          • 4. Filtering in Pandas
          • 5. Advance Filtering
          • 6. Aggregate Functions & Groupby
          • 7. Pivot Tables
          • 8. Datetime
          • 9. String Functions
Powered by GitBook
On this page
  • Install Seaborn
  • Get Data Repository:
  • Import All Libraries
  • Get Dataset names:
  • Load Dataset
  • How to create your first graph
  • Load Dataset
  • hue:
  • Palette:
  • relplot:
  • Applying Row and Column property in a single plot
  • using categorical column
  • A Useful Example
  • Categorical column and relplot
  • Catplot
  • A Useful example
  • Countplot
  • A Useful Example
  • Introduction to pairplot:
  • A Useful Example :
  • Check Range of Sepal length for species:
  1. Python For Data Analytics
  2. 2.Data Analytics
  3. 1.Pandas
  4. 1.Pandas Documents

12. Seaborn - Data Visualization

Previous11.Matplotlib - Data VisualizationNext13. Required Files

Last updated 2 years ago

Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.

Install Seaborn

In [1]:

pip install seaborn

Get Data Repository:

Import All Libraries

In [2]:

import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

Get Dataset names:

In [3]:

sns.get_dataset_names()
['anagrams',
 'anscombe',
 'attention',
 'brain_networks',
 'car_crashes',
 'diamonds',
 'dots',
 'exercise',
 'flights',
 'fmri',
 'gammas',
 'geyser',
 'iris',
 'mpg',
 'penguins',
 'planets',
 'taxis',
 'tips',
 'titanic']

Load Dataset

In [4]:

import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

df = sns.load_dataset('tips')
df.head()

Out[4]:

total_bill
tip
sex
smoker
day
time
size

0

16.99

1.01

Female

No

Sun

Dinner

2

1

10.34

1.66

Male

No

Sun

Dinner

3

2

21.01

3.50

Male

No

Sun

Dinner

3

3

23.68

3.31

Male

No

Sun

Dinner

2

4

24.59

3.61

Female

No

Sun

Dinner

4

How to create your first graph

Load Dataset

We use load_dataset method to load datasets from data repository.

In [5]:

sns.load_dataset('tips')

Out[5]:

total_bill
tip
sex
smoker
day
time
size

0

16.99

1.01

Female

No

Sun

Dinner

2

1

10.34

1.66

Male

No

Sun

Dinner

3

2

21.01

3.50

Male

No

Sun

Dinner

3

3

23.68

3.31

Male

No

Sun

Dinner

2

4

24.59

3.61

Female

No

Sun

Dinner

4

...

...

...

...

...

...

...

...

239

29.03

5.92

Male

No

Sat

Dinner

3

240

27.18

2.00

Female

Yes

Sat

Dinner

2

241

22.67

2.00

Male

Yes

Sat

Dinner

2

242

17.82

1.75

Male

No

Sat

Dinner

2

243

18.78

3.00

Female

No

Thur

Dinner

2

244 rows × 7 columns

Creating graph with the help of seaborn is as easy as matplotlib.Suppose we want to create a scatter plot of relationship between Totalbill and tips. All you need to do is to use scatterplot method.

Note :We use scatterplot() method to create scatter plot instead of scatter like we used to do in matplotlib.

sns.scatterplot(x='total_bill',y='tip',data=df)
plt.show()

hue:

hue : (optional) This parameter take column name for colour encoding.

Suppose we also want to know smokers along with total_bill and tip. We can show this third information with the help of hue.

In [7]:

sns.scatterplot(x='total_bill',y='tip',hue='smoker',data=df)
plt.show()

You can also change order of hue by using hue_order property.

In [9]:

sns.scatterplot(x='total_bill',y='tip',hue='smoker',hue_order=['No','Yes'],data=df)
plt.show()

Palette:

What if you want to change color encoding for smokers and non-smokers.

For this you can use palette property.

In [11]:

c = {'No':'Green','Yes':'Red'}
sns.scatterplot(x='total_bill',y='tip',hue='smoker',hue_order=['No','Yes'],palette = c,data=df)
plt.show()

smoker is a categorical value , but you can also use any numerical value in hue. Let's say i want to color encode sizes.

In [12]:

# sns.scatterplot(x='total_bill',y='tip',hue='size',data=df)
# plt.show()

relplot:

relplot is used for many purposes , but the main purpose behind using replot is sub-plotting. Suppose we want to suplot our graph based on smoker and non-smoker.

We can use row,col to achieve this result.

Let's see this with an example.

In [14]:

sns.relplot(x='total_bill',y='tip',data=df,col='smoker')
plt.show()

What if i used row property.

In [15]:

# sns.relplot(x='total_bill',y='tip',data=df,row='smoker')
# plt.show()

Graph would have appeared like this.

We can do same with size by choosing col as size.

In [17]:

# %matplotlib notebook
# sns.relplot(x='total_bill',y='tip',hue='size',data=df,col='size')
# plt.show()

Now as you can see it has one column and all the graphs are in a single column. We can use col_wrap to restrict number of graphs in a single column.

In [21]:

# %matplotlib inline
# sns.relplot(x='total_bill',y='tip',hue='size',data=df,col='size',col_wrap=3)
# plt.show()

Applying Row and Column property in a single plot

what if i want to see data of smokers in column and data of time in rows, we can use col and row property in a single plot.

In [22]:

sns.relplot(x='total_bill',y='tip',data=df,row='time',col='smoker')
plt.show()

using categorical column

what if one of my axis is categorical column ?

In [23]:

# sns.relplot(x='total_bill',y='time',data=df)
# plt.show()

relplot offers a property size that changes the size of values. Let's see this with an example.

In [25]:

sns.relplot(x='total_bill',y='tip',data=df,size='size')
plt.show()

A Useful Example

Seaborn offers another dataset named flights. Let's see this data first.

In [13]:

import matplotlib.pyplot as plt
import seaborn as sns
df = sns.load_dataset('flights')
df

Out[13]:

year
month
passengers

0

1949

Jan

112

1

1949

Feb

118

2

1949

Mar

132

3

1949

Apr

129

4

1949

May

121

...

...

...

...

139

1960

Aug

606

140

1960

Sep

508

141

1960

Oct

461

142

1960

Nov

390

143

1960

Dec

432

144 rows × 3 columns

If we want to know how many passengers were on board on a particular month and year.

In [7]:

sns.relplot(x='passengers',y='month',hue = 'year',data=df)

Categorical column and relplot

Whenever there is a categorical column and a numerical column , we donot use relplot. Let me show you why?

In [15]:

import matplotlib.pyplot as plt
import seaborn as sns

df = sns.load_dataset('tips')
sns.relplot(x='time',y='tip',data=df)

This obviously is not a clear graph. So whenever we deal with categorical column we use catplot(categorical plot).

Catplot

Categorical plot is always a best option if one value is Categorical and another is numerical.

In [17]:

import matplotlib.pyplot as plt
import seaborn as sns

df = sns.load_dataset('tips')
sns.catplot(x='time',y='tip',data=df)

A Useful example

Suppose we want to know total bill dayswise , we can use catplot as day is a categorical plot while total bill is numerical.

In [23]:

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

df = sns.load_dataset('tips')
sns.catplot(x='total_bill',y='day',data=df)
plt.show()

You can change graph to bar by using kind = 'bar' property.

In [25]:

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

df = sns.load_dataset('tips')
sns.catplot(x='total_bill',y='day',data=df,kind='bar')
plt.show()

You can change graph to bar by using kind = 'point' property.

In [27]:

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

df = sns.load_dataset('tips')
sns.catplot(x='total_bill',y='day',data=df,kind='point')
plt.show()

You can change graph to bar by using kind = 'violin' property.

In [29]:

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

df = sns.load_dataset('tips')
sns.catplot(x='total_bill',y='day',data=df,kind='violin')
plt.grid()
plt.show()

From this violin chart it is very easy to figure out pattern of bills that were given on any day. Like on thursday most of the total bill were between 10 to 20 dollars.

You can change graph to bar by using kind = 'boxen' property.

In [31]:

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

df = sns.load_dataset('tips')
sns.catplot(x='day',y='total_bill',data=df,kind='boxen')
plt.grid()
plt.show()

Boxen chart also works like violin chart. The only difference is Boxen chart assigns darkest colour to the area where most of the data lies.

As you can see a horizontal line , this horizontal line is median of the data.

dots that you see in boxen chart are outliers

You can change graph to bar by using kind = 'box' property.

In [32]:

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

df = sns.load_dataset('tips')
sns.catplot(x='day',y='total_bill',data=df,kind='box')
plt.grid()
plt.show()

Countplot

seaborn.countplot() method is used to Show the counts of observations in each categorical bin.

import matplotlib.pyplot as plt
import seaborn as sns

df = sns.load_dataset('tips')
sns.countplot(y='smoker',data=df)

A Useful Example

Suppose we want to count Type 1 of pokemon in our pokemon dataset. We can use countplot here.

In [22]:

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

df = pd.read_csv('Pokemon.csv')
sns.countplot(y='Type 1',data=df)

Introduction to pairplot:

Plot pairwise relationships in a dataset.

By default, this function will create a grid of Axes such that each numeric variable in data will by shared across the y-axes across a single row and the x-axes across a single column.

In [35]:

import matplotlib.pyplot as plt
import seaborn as sns

df = sns.load_dataset('tips')
sns.pairplot(data = df)
plt.show()

You can also add hue property to see gender of customers.

In [36]:

import matplotlib.pyplot as plt
import seaborn as sns

df = sns.load_dataset('tips')
sns.pairplot(data = df,hue='sex')
plt.show()

A Useful Example :

In [38]:

import matplotlib.pyplot as plt
import seaborn as sns

df = sns.load_dataset('iris')
df
sepal_length
sepal_width
petal_length
petal_width
species

0

5.1

3.5

1.4

0.2

setosa

1

4.9

3.0

1.4

0.2

setosa

2

4.7

3.2

1.3

0.2

setosa

3

4.6

3.1

1.5

0.2

setosa

4

5.0

3.6

1.4

0.2

setosa

...

...

...

...

...

...

145

6.7

3.0

5.2

2.3

virginica

146

6.3

2.5

5.0

1.9

virginica

147

6.5

3.0

5.2

2.0

virginica

148

6.2

3.4

5.4

2.3

virginica

149

5.9

3.0

5.1

1.8

virginica

150 rows × 5 columns

Check Range of Sepal length for species:

Now as we have discussed before in the note, whenever we are dealing with ranges , Histogram is always a better option. So let us create a histogram for species and their sepal length.

In [40]:

import matplotlib.pyplot as plt
import seaborn as sns

df = sns.load_dataset('iris')
f = sns.FacetGrid(data = df,col='species')
f.map(plt.hist,'sepal_length')
plt.show()

When you install seaborn , it comes with datasets for you to practice on. You can go through .

Data Repository