PDA Assignments
  • Python For Data Analytics
    • 1.Python
      • 1.Python Documents
        • 1.Data Types
        • 2.Variables In Python
        • 3.Operators In Python
        • 4.User Input In Python
        • 5.TypeCasting In Python
        • 6.Strings In Python
        • 7.Conditional Statements In Python
        • 8.Branching using Conditional Statements and Loops in Python
        • 9.Lists In Python
        • 10.Sets In Python
        • 11.Tuples In Python
        • 12.Dictionary In Python
        • 13.Functions In Python
        • 14.File Handling In Python
        • 15.Numerical Computing with Python and Numpy
      • 2.Python Assignments
        • Data Type & Variables
        • Operators Assignment
        • User Input & Type Casting
        • Functions- Basic Assignments
        • String Assignments
          • String CheatSheet
        • Conditional Statements Assignments
        • Loops Assignments
        • List Assignments
          • List Cheatsheet
        • Set Assignments
          • Sets Cheatsheet
        • Dictionary Assignments
          • Dictionary Cheatsheet
        • Function Assignments
        • Functions used in Python
      • 3.Python Projects
        • Employee Management System
        • Hamming distance
        • Webscraping With Python
          • Introduction To Web Scraping
          • Importing Necessary Libraries
          • Basic Introduction To HTML
          • Introduction To BeautifulSoup
          • Flipkart Web Scraping
            • Scraping Step By Step
        • Retail Sales Analysis
        • Guess the Word Game
        • Data Collection Through APIs
        • To-Do List Manager
        • Atm-functionalities(nested if)
        • Distribution of Cards(List & Nested for)
        • Guess the Number Game
      • 4.Python + SQL Projects
        • Bookstore Management System
    • 2.Data Analytics
      • 1.Pandas
        • 1.Pandas Documents
          • 1.Introduction To Pandas
          • Reading and Loading Different Data
          • 2.Indexing and Slicing In Pandas
          • 3.Joining In Pandas
          • 4.Missing Values In Pandas
          • 5.Outliers In Pandas
          • 6.Aggregating Data
          • 7.DateTime In Pandas
          • 8.Validation In Pandas
          • 9.Fetching Data From SQL
          • 10. Automation In Pandas
          • 11.Matplotlib - Data Visualization
          • 12. Seaborn - Data Visualization
          • 13. Required Files
        • 3.Pandas Projects
          • Retail Sales Analysis
            • Retail Sales Step By Step
          • IMDB - Dataset Analysis - Basic
        • 2. Pandas Assignments
          • 1. Reading and Loading the Data
          • 2. Data frame Functions and Properties
          • 3. Series - Basic Operations
          • 4. Filtering in Pandas
          • 5. Advance Filtering
          • 6. Aggregate Functions & Groupby
          • 7. Pivot Tables
          • 8. Datetime
          • 9. String Functions
Powered by GitBook
On this page
  • Introduction To joins in pandas:
  • Types of joins in Pandas :
  • Inner Join In Pandas
  • Merge Function :
  • Outer Join :
  • Left join
  • Right Join
  • Concatenate Dataframe :
  1. Python For Data Analytics
  2. 2.Data Analytics
  3. 1.Pandas
  4. 1.Pandas Documents

3.Joining In Pandas

Previous2.Indexing and Slicing In PandasNext4.Missing Values In Pandas

Last updated 2 years ago

Introduction To joins in pandas:

Working on a single table is easy , but what might be challenging is to work on multiple tables. You have to think of a lot of

criterias , how to join these tables . Pandas offers some joining methods that makes this task easy for you. There are four

different types of joins in pandas. Let's go through them one by one.

Types of joins in Pandas :

We’ll take a simple problem from a related marketing brand here. We are given two tables – one which contains data about products and the other that has customer- information.

We will use these tables to understand how the different types of joins work using Pandas.

Inner Join In Pandas

Inner join is the most common type of join you’ll be working with. It returns a dataframe with only those rows that have common characteristics.

This is similar to the intersection of two sets.

So let's start by importing our pandas library.

In [1]:

import pandas as pd

To understand join better, we have two dataframes :

In [2]:

product=pd.read_csv('Products.csv')
product
id
Product_ID
Product_name
Category
Price
Seller_City

0

0

101

Watch

Fashion

299.0

Delhi

1

1

102

Bag

Fashion

1350.5

Mumbai

2

2

103

Shoes

Fashion

2999.0

Chennai

3

3

104

Smartphone

Electronics

14999.0

Kolkata

4

4

105

Books

Study

145.0

Delhi

5

5

106

Oil

Grocery

110.0

Chennai

6

6

107

Laptop

Electronics

79999.0

Bengalore

customer=pd.read_csv('Customers.csv')
customer
id
name
age
Product_ID
Purchased_Product
City

0

1

Olivia

20

101

Watch

Mumbai

1

2

Aditya

25

0

NaN

Delhi

2

3

Cory

15

106

Oil

Bangalore

3

4

Isabell

10

0

NaN

Chennai

4

5

Dominic

30

103

Shoes

Chennai

5

6

Tyler

65

104

Smartphone

Delhi

6

7

Samuel

35

0

NaN

Kolkata

7

8

Daniel

18

0

NaN

Delhi

8

9

Jeremy

23

107

Laptop

Mumbai

Let's say i want to find all the products that are sold with customers details. To find this , we will perform inner join.

Merge Function :

Merge function helps to join these two tables and by default it performs an inner join.

In [4]:

product.merge(customer,on='Product_ID')

Out[4]:

id_x
Product_ID
Product_name
Category
Price
Seller_City
id_y
name
age
Purchased_Product
City

0

0

101

Watch

Fashion

299.0

Delhi

1

Olivia

20

Watch

Mumbai

1

2

103

Shoes

Fashion

2999.0

Chennai

5

Dominic

30

Shoes

Chennai

2

3

104

Smartphone

Electronics

14999.0

Kolkata

6

Tyler

65

Smartphone

Delhi

3

5

106

Oil

Grocery

110.0

Chennai

3

Cory

15

Oil

Bangalore

4

6

107

Laptop

Electronics

79999.0

Bengalore

9

Jeremy

23

Laptop

Mumbai

What if the column names are different in the two dataframes? Then, we have to explicitly mention both the column names.

left_on and right_on are two arguments through which we can achieve this. ‘left_on’ is the name of the key in the left dataframe and ‘right_on’ in the right dataframe.

In [5]:

product.merge(customer,left_on='Product_name',right_on='Purchased_Product')
id_x
Product_ID_x
Product_name
Category
Price
Seller_City
id_y
name
age
Product_ID_y
Purchased_Product
City

0

0

101

Watch

Fashion

299.0

Delhi

1

Olivia

20

101

Watch

Mumbai

1

2

103

Shoes

Fashion

2999.0

Chennai

5

Dominic

30

103

Shoes

Chennai

2

3

104

Smartphone

Electronics

14999.0

Kolkata

6

Tyler

65

104

Smartphone

Delhi

3

5

106

Oil

Grocery

110.0

Chennai

3

Cory

15

106

Oil

Bangalore

4

6

107

Laptop

Electronics

79999.0

Bengalore

9

Jeremy

23

107

Laptop

Mumbai

product.merge(customer,how='inner',left_on=['Product_ID','Seller_City'],right_on=['Product_ID','City'])
id_x
Product_ID
Product_name
Category
Price
Seller_City
id_y
name
age
Purchased_Product
City

0

2

103

Shoes

Fashion

2999.0

Chennai

5

Dominic

30

Shoes

Chennai

Outer Join :

Here’s another interesting task for you. We have to combine both dataframes so that we can find all the products that are not sold and all the customers who didn’t purchase anything from us. We can use Full Join for this purpose. Full Join, also known as Full Outer Join, returns all those records which either have a match in the left or right dataframe. When rows in both the dataframes do not match, the resulting dataframe will have NaN for every column of the dataframe that lacks a matching row. We can perform Full join by just passing the how argument as ‘outer’ to the merge() function:

In [7]:

product.merge(customer,on='Product_ID',how='outer')
id_x
Product_ID
Product_name
Category
Price
Seller_City
id_y
name
age
Purchased_Product
City

0

0.0

101

Watch

Fashion

299.0

Delhi

1.0

Olivia

20.0

Watch

Mumbai

1

1.0

102

Bag

Fashion

1350.5

Mumbai

NaN

NaN

NaN

NaN

NaN

2

2.0

103

Shoes

Fashion

2999.0

Chennai

5.0

Dominic

30.0

Shoes

Chennai

3

3.0

104

Smartphone

Electronics

14999.0

Kolkata

6.0

Tyler

65.0

Smartphone

Delhi

4

4.0

105

Books

Study

145.0

Delhi

NaN

NaN

NaN

NaN

NaN

5

5.0

106

Oil

Grocery

110.0

Chennai

3.0

Cory

15.0

Oil

Bangalore

6

6.0

107

Laptop

Electronics

79999.0

Bengalore

9.0

Jeremy

23.0

Laptop

Mumbai

7

NaN

0

NaN

NaN

NaN

NaN

2.0

Aditya

25.0

NaN

Delhi

8

NaN

0

NaN

NaN

NaN

NaN

4.0

Isabell

10.0

NaN

Chennai

9

NaN

0

NaN

NaN

NaN

NaN

7.0

Samuel

35.0

NaN

Kolkata

10

NaN

0

NaN

NaN

NaN

NaN

8.0

Daniel

18.0

NaN

Delhi

Left join

Now, let’s say the leadership team wants information about only those customers who bought something from us. You guessed it – we can use the concept of Left Join here.

Left join, also known as Left Outer Join, returns a dataframe containing all the rows of the left dataframe.

All the non-matching rows of the left dataframe contain NaN for the columns in the right dataframe. It is simply an inner join plus all the non-matching rows of the left dataframe filled with NaN for columns of the right dataframe.

Performing a left join is actually quite similar to a full join. Just change the how argument to ‘left’:

In [8]:

product.merge(customer,on='Product_ID',how='left')
id_x
Product_ID
Product_name
Category
Price
Seller_City
id_y
name
age
Purchased_Product
City

0

0

101

Watch

Fashion

299.0

Delhi

1.0

Olivia

20.0

Watch

Mumbai

1

1

102

Bag

Fashion

1350.5

Mumbai

NaN

NaN

NaN

NaN

NaN

2

2

103

Shoes

Fashion

2999.0

Chennai

5.0

Dominic

30.0

Shoes

Chennai

3

3

104

Smartphone

Electronics

14999.0

Kolkata

6.0

Tyler

65.0

Smartphone

Delhi

4

4

105

Books

Study

145.0

Delhi

NaN

NaN

NaN

NaN

NaN

5

5

106

Oil

Grocery

110.0

Chennai

3.0

Cory

15.0

Oil

Bangalore

6

6

107

Laptop

Electronics

79999.0

Bengalore

9.0

Jeremy

23.0

Laptop

Mumbai

Right Join

Similarly, if we want to create a table of customers including the information about the products they bought, we can use the right join.

Right join, also known as Right Outer Join, is similar to the Left Outer Join. The only difference is that all the rows of the right dataframe are taken as it is and only those of the left dataframe that are common in both.

Similar to other joins, we can perform a right join by changing the how argument to ‘right’:

In [9]:

product.merge(customer,on='Product_ID',how='right')
id_x
Product_ID
Product_name
Category
Price
Seller_City
id_y
name
age
Purchased_Product
City

0

0.0

101

Watch

Fashion

299.0

Delhi

1

Olivia

20

Watch

Mumbai

1

NaN

0

NaN

NaN

NaN

NaN

2

Aditya

25

NaN

Delhi

2

5.0

106

Oil

Grocery

110.0

Chennai

3

Cory

15

Oil

Bangalore

3

NaN

0

NaN

NaN

NaN

NaN

4

Isabell

10

NaN

Chennai

4

2.0

103

Shoes

Fashion

2999.0

Chennai

5

Dominic

30

Shoes

Chennai

5

3.0

104

Smartphone

Electronics

14999.0

Kolkata

6

Tyler

65

Smartphone

Delhi

6

NaN

0

NaN

NaN

NaN

NaN

7

Samuel

35

NaN

Kolkata

7

NaN

0

NaN

NaN

NaN

NaN

8

Daniel

18

NaN

Delhi

8

6.0

107

Laptop

Electronics

79999.0

Bengalore

9

Jeremy

23

Laptop

Mumbai

Concatenate Dataframe :

In [10]:

df1 = pd.read_csv('firsthalf.csv')
df1
id
name
age

0

1

abhi

23

1

2

rajesh

35

2

3

adarsh

56

3

4

suresh

40

4

5

dev

27

df2 = pd.read_csv('secondhalf.csv')
df2
id
name
age

0

6

raj

33

1

7

himesh

21

2

8

pranav

24

3

9

pawan

30

4

10

gaurav

32

pd.concat((df1,df2))
id
name
age

0

1

abhi

23

1

2

rajesh

35

2

3

adarsh

56

3

4

suresh

40

4

5

dev

27

0

6

raj

33

1

7

himesh

21

2

8

pranav

24

3

9

pawan

30

4

10

gaurav

32