8. Datetime

  1. Load the data and display.

Solution
import pandas as pd
df=pd.read_csv('bakery_sales.csv')
df

# Output
        datetime		total
0	11-07-2019 15:35	23800.0
1	11-07-2019 16:10	15800.0
2	12-07-2019 11:49	58000.0
3	13-07-2019 13:19	14800.0
4	13-07-2019 13:22	15600.0
...	...	...	...
2415	02-05-2020 11:37	19500.0
2416	02-05-2020 11:39	19800.0
2417	02-05-2020 12:15	14300.0
2418	02-05-2020 13:45	15000.0
2419	02-05-2020 14:45	24100.0
2420 rows × 3 columns
  1. Convert the datetime column into proper datetime format, considering the data in the column is in (dd-mm-yyyy hh:mm) format.

Solution
df['datetime']=pd.to_datetime(df['datetime'],format="%d-%m-%Y %H:%M")
df

# Output

        datetime		total
0	2019-07-11 15:35:00	23800.0
1	2019-07-11 16:10:00	15800.0
2	2019-07-12 11:49:00	58000.0
3	2019-07-13 13:19:00	14800.0
4	2019-07-13 13:22:00	15600.0
...	...	...	...
2415	2020-05-02 11:37:00	19500.0
2416	2020-05-02 11:39:00	19800.0
2417	2020-05-02 12:15:00	14300.0
2418	2020-05-02 13:45:00	15000.0
2419	2020-05-02 14:45:00	24100.0
2420 rows × 3 columns
  1. Separate day, month name, and year in different columns from datetime.

Solution
df['date']=df['datetime'].dt.date
df['time']=df['datetime'].dt.time
df

# Output
        datetime		total	date	        time
0	2019-07-11 15:35:00	23800.0	2019-07-11	15:35:00
1	2019-07-11 16:10:00	15800.0	2019-07-11	16:10:00
2	2019-07-12 11:49:00	58000.0	2019-07-12	11:49:00
3	2019-07-13 13:19:00	14800.0	2019-07-13	13:19:00
4	2019-07-13 13:22:00	15600.0	2019-07-13	13:22:00
...	...	...	...	...	...
2415	2020-05-02 11:37:00	19500.0	2020-05-02	11:37:00
2416	2020-05-02 11:39:00	19800.0	2020-05-02	11:39:00
2417	2020-05-02 12:15:00	14300.0	2020-05-02	12:15:00
2418	2020-05-02 13:45:00	15000.0	2020-05-02	13:45:00
2419	2020-05-02 14:45:00	24100.0	2020-05-02	14:45:00
  1. Separate date, month and year from date column.

Solution
df['day']=df['datetime'].dt.day
df['month']=df['datetime'].dt.month_name()
df['year']=df['datetime'].dt.year
df

# Output
        datetime	        total	day	month	year
0	2019-07-11 15:35:00	23800	11	July	2019
1	2019-07-11 16:10:00	15800	11	July	2019
2	2019-07-12 11:49:00	58000	12	July	2019
3	2019-07-13 13:19:00	14800	13	July	2019
4	2019-07-13 13:22:00	15600	13	July	2019
...	...	...	...	...	...
2415	2020-05-02 11:37:00	19500	2	May	2020
2416	2020-05-02 11:39:00	19800	2	May	2020
2417	2020-05-02 12:15:00	14300	2	May	2020
2418	2020-05-02 13:45:00	15000	2	May	2020
2419	2020-05-02 14:45:00	24100	2	May	2020
2420 rows × 5 columns
  1. Extract time from the datetime column.

Solution
df['time']=df['datetime'].dt.time
df

# Output
        datetime        	total	date	month	year	time
0	2019-07-11 15:35:00	23800	11	July	2019	15:35:00
1	2019-07-11 16:10:00	15800	11	July	2019	16:10:00
2	2019-07-12 11:49:00	58000	12	July	2019	11:49:00
3	2019-07-13 13:19:00	14800	13	July	2019	13:19:00
4	2019-07-13 13:22:00	15600	13	July	2019	13:22:00
...	...	...	...	...	...	...
2415	2020-05-02 11:37:00	19500	2	May	2020	11:37:00
2416	2020-05-02 11:39:00	19800	2	May	2020	11:39:00
2417	2020-05-02 12:15:00	14300	2	May	2020	12:15:00
2418	2020-05-02 13:45:00	15000	2	May	2020	13:45:00
2419	2020-05-02 14:45:00	24100	2	May	2020	14:45:00
2420 rows × 6 columns
  1. Create a sample of 700 rows from the dataset and find the oldest date in the dataset.

Solution
ndf=df.sample(700)
oldest_date=df['datetime'].max()
oldest_date
  1. Create a sample of 700 rows from the dataset and find the latest date in the dataset.

Solution
ndf=df.sample(700)
latest_date=df['datetime'].min()
latest_date
  1. Create a sample of 700 rows from the dataset and arrange them in ascending order on behalf of date.

Solution
ndf=df.sample(700)
ndf=ndf.sort_values(by='datetime')
ndf
  1. Display the data of the sale that happened between 1st Aug 2019 and 01 Dec 2019.

Solution
df_filtered = df[(df['datetime'] > '2019-08-01') & (df['datetime']< '2019-12-01')]
df_filtered

# Output
        datetime	        total	day	month	year	time
135	2019-08-01 11:03:00	14500	1	August	2019	11:03:00
136	2019-08-01 11:17:00	13100	1	August	2019	11:17:00
137	2019-08-01 11:25:00	16100	1	August	2019	11:25:00
138	2019-08-01 11:50:00	14800	1	August	2019	11:50:00
139	2019-08-01 12:02:00	21100	1	August	2019	12:02:00
...	...	...	...	...	...	...
1082	2019-11-30 13:01:00	38200	30	November	2019	13:01:00
1083	2019-11-30 13:44:00	25100	30	November	2019	13:44:00
1084	2019-11-30 13:49:00	20300	30	November	2019	13:49:00
1085	2019-11-30 14:24:00	15800	30	November	2019	14:24:00
1086	2019-11-30 17:04:00	27600	30	November	2019	17:04:00
952 rows × 6 columns
  1. Find out the total sales on each date.

Solution
df['date']=df['datetime'].dt.date
ndf=df.groupby('date').agg(
    total_sale=('total','sum'),
)
ndf.reset_index()

# Output

        date	    total_sale
0	2019-07-11	39600
1	2019-07-12	58000
2	2019-07-13	117400
3	2019-07-14	212000
4	2019-07-15	30900
...	...	...
244	2020-04-27	264600
245	2020-04-29	118100
246	2020-04-30	134100
247	2020-05-01	215700
248	2020-05-02	92700
249 rows × 2 columns
  1. Find out total sale on each day.

Solution
df['day_of_the_week']=df['datetime'].dt.day_name()
ndf=df.groupby('day_of_the_week').agg(
    total_sale=('total','sum')
)
ndf.reset_index()

# output
    day_of_the_week	total_sale
0	Friday	8177700
1	Monday	6887000
2	Saturday	9098000
3	Sunday	11287500
4	Thursday	8140700
5	Tuesday	56000
6	Wednesday	7590600
  1. Find rows where the transaction happened in the afternoon (12 PM - 6 PM)

Solution
df['hour']=df['datetime'].dt.hour
ndf=df[(df['hour']>12) & (df['hour']<18)]
ndf

# Output
                datetime	total	day	month	year	time	date	day_of_the_week	hour
0	2019-07-11 15:35:00	23800	11	July	2019	15:35:00	2019-07-11	Thursday	15
1	2019-07-11 16:10:00	15800	11	July	2019	16:10:00	2019-07-11	Thursday	16
3	2019-07-13 13:19:00	14800	13	July	2019	13:19:00	2019-07-13	Saturday	13
4	2019-07-13 13:22:00	15600	13	July	2019	13:22:00	2019-07-13	Saturday	13
5	2019-07-13 14:54:00	15800	13	July	2019	14:54:00	2019-07-13	Saturday	14
...	...	...	...	...	...	...	...	...	...
2412	2020-05-01 13:55:00	21300	1	May	2020	13:55:00	2020-05-01	Friday	13
2413	2020-05-01 15:03:00	14800	1	May	2020	15:03:00	2020-05-01	Friday	15
2414	2020-05-01 15:19:00	14500	1	May	2020	15:19:00	2020-05-01	Friday	15
2418	2020-05-02 13:45:00	15000	2	May	2020	13:45:00	2020-05-02	Saturday	13
2419	2020-05-02 14:45:00	24100	2	May	2020	14:45:00	2020-05-02	Saturday	14
1160 rows × 9 columns
  1. Find the sale happened on weekends(sat and sun).

Solution
ndf=df.loc[(df['day_of_the_week']=='Saturday') | (df['day_of_the_week']=='Sunday')]
ndf

# Output

datetime	total	day	month	year	time	date	day_of_the_week	hour
3	2019-07-13 13:19:00	14800	13	July	2019	13:19:00	2019-07-13	Saturday	13
4	2019-07-13 13:22:00	15600	13	July	2019	13:22:00	2019-07-13	Saturday	13
5	2019-07-13 14:54:00	15800	13	July	2019	14:54:00	2019-07-13	Saturday	14
6	2019-07-13 15:08:00	15800	13	July	2019	15:08:00	2019-07-13	Saturday	15
7	2019-07-13 15:09:00	14000	13	July	2019	15:09:00	2019-07-13	Saturday	15
...	...	...	...	...	...	...	...	...	...
2415	2020-05-02 11:37:00	19500	2	May	2020	11:37:00	2020-05-02	Saturday	11
2416	2020-05-02 11:39:00	19800	2	May	2020	11:39:00	2020-05-02	Saturday	11
2417	2020-05-02 12:15:00	14300	2	May	2020	12:15:00	2020-05-02	Saturday	12
2418	2020-05-02 13:45:00	15000	2	May	2020	13:45:00	2020-05-02	Saturday	13
2419	2020-05-02 14:45:00	24100	2	May	2020	14:45:00	2020-05-02	Saturday	14
1007 rows × 9 columns

Last updated