6. Aggregate Functions & Groupby

  1. Find the total revenue generated from all transactions.

Solution
print(df['Total_Amount'].sum())

# Output
401978248.89447826
  1. Find the average value of the orders.

Solution
print(df['Total_Amount'].mean())

# Output
1367.6869831155632
  1. Find the highest sale value.

Solution
print(df['Total_Amount'].max())

# Output
4999.625796
  1. Find the lowest sale value.

Solution
print(df['Total_Amount'].max())

# Output
10.00374959
  1. Find the average rating given by customers.

Solution
print(df['Ratings'].mean())

# Output
3.16230083256496
  1. Find the total number of products sold.

Solution
print(df['Total_Purchases'].sum())

# Output
1575323
  1. Find the city and country with highest frequencies.

Solution
print(df['City'].mode())

# Output
Chicago

print(df['Country'].mode())

# output
USA
  1. Count the number of transactions per country wise.

Solution
ndf=df.groupby('Country').agg(
    total_transactions=('Transaction_ID','count')
)
ndf.reset_index()

# Output
        Country	total_transactions
0	Australia 44170
1	Canada	44110
2	Germany	51433
3	UK	61398
4	USA	92800
  1. Find the total sales city wise.

Solution
ndf=df.groupby('City').agg(
    total_sale=('Total_Amount','sum')
)
ndf.reset_index()

# Output
	City	total_sale
0	Adelaide	2.972902e+06
1	Albuquerque	1.213217e+06
2	Albury-Wodonga	3.026290e+06
3	Arlington	1.169384e+06
4	Atlanta	1.229512e+06
...	...	...
125	Wichita	1.188115e+06
126	Windsor	2.954510e+06
127	Winnipeg	3.160659e+06
128	Wollongong	3.101860e+06
129	Wuppertal	3.043289e+06
130 rows × 2 columns
  1. Find out average sales for every product category.

Solution
ndf=df.groupby('Product_Category').agg(
    avg_revenue=('Total_Amount','mean')
)
ndf=ndf.reset_index()
ndf

# Output
Product_Category	avg_revenue
0	Books	1367.069440
1	Clothing	1368.820004
2	Electronics	1369.736547
3	Grocery	1366.061942
4	Home Decor	1366.480831
  1. Create a new dataset which contains the total sale value of each brand.

Solution
ndf=df.groupby('Product_Brand').agg(
    total_sale=('Total_Amount','sum')
)
ndf=ndf.reset_index()
ndf

# Output
Product_Brand	total_sale
0	Adidas	2.423476e+07
1	Apple	2.396481e+07
2	Bed Bath & Beyond	2.427537e+07
3	BlueStar	2.992519e+06
4	Coca-Cola	2.446867e+07
5	HarperCollins	2.433139e+07
6	Home Depot	2.415726e+07
7	IKEA	2.390750e+07
8	Mitsubhisi	9.008944e+06
9	Nestle	2.407735e+07
10	Nike	2.435496e+07
11	Penguin Books	2.417092e+07
12	Pepsi	4.042013e+07
13	Random House	2.422442e+07
14	Samsung	2.469212e+07
15	Sony	2.446911e+07
16	Whirepool	9.884276e+06
17	Zara	2.434375e+07
  1. Find out the highest average transaction value as per payment method.

Solution
ndf=df.groupby('Payment_Method').agg(
    avg_trnasaction_value=('Total_Amount','mean')
)
ndf=ndf.reset_index()
ndf.max()

# Output
Payment_Method                PayPal
avg_trnasaction_value    1369.456358
dtype: object
  1. Find the total number of transactions for each shipping method.

Solution
ndf=df.groupby('Shipping_Method').agg(
    Shipping_method_count=('Transaction_ID','count')
)
ndf.reset_index()


# Output
    Shipping_Method	Shipping_method_count
0	Express	        99600
1	Same-Day	101541
2	Standard	92770
  1. Find out the top sale from each country.

Solution
ndf=df.groupby('Country').agg(
    top_sale=('Total_Amount','max')
)
ndf.reset_index()

# Output
        Country	        top_sale
0	Australia	4997.714637
1	Canada	4999.171428
2	Germany	4998.603558
3	UK	4999.625796
4	USA	4999.340097
  1. Find out total number of transactions for each product category in every country.

Solution
ndf=df.groupby(['Country', 'Product_Category']).agg(
    total_transactions=('Transaction_ID','count')
)
ndf

# Output
		        total_transactions
Country	Product_Category	
Australia	Books	8165
Clothing	8055
Electronics	11038
Grocery	8743
Home Decor	8169
Canada	Books	7965
Clothing	8175
Electronics	11334
Grocery	8676
Home Decor	7960
Germany	Books	9581
Clothing	9549
Electronics	12575
Grocery	10162
Home Decor	9566
UK	Books	11455
Clothing	11643
Electronics	14786
Grocery	12048
Home Decor	11466
USA	Books	16033
Clothing	15860
Electronics	19632
Grocery	25497
Home Decor	15778
  1. Find out the total number of transactions for each payment method for every country.

Solution
ndf=df.groupby(['Country', 'Payment_Method']).agg(
     new=('Transaction_ID','count')
)
ndf

# Output
		          new
Country	Payment_Method	
Australia	Cash	10733
Credit Card	13879
Debit Card	11283
PayPal	8275
Canada	Cash	10742
Credit Card	13972
Debit Card	11134
PayPal	8262
Germany	Cash	12439
Credit Card	15690
Debit Card	13137
PayPal	10167
UK	Cash	14934
Credit Card	18026
Debit Card	15848
PayPal	12590
USA	Cash	23079
Credit Card	26214
Debit Card	23342
PayPal	20165
Payment_Method

Last updated