8. Datetime
Last updated
Last updated
Load the data and display.
import pandas as pd
df=pd.read_csv('bakery_sales.csv')
df
# Output
datetime total
0 11-07-2019 15:35 23800.0
1 11-07-2019 16:10 15800.0
2 12-07-2019 11:49 58000.0
3 13-07-2019 13:19 14800.0
4 13-07-2019 13:22 15600.0
... ... ... ...
2415 02-05-2020 11:37 19500.0
2416 02-05-2020 11:39 19800.0
2417 02-05-2020 12:15 14300.0
2418 02-05-2020 13:45 15000.0
2419 02-05-2020 14:45 24100.0
2420 rows × 3 columns
Convert the datetime column into proper datetime format, considering the data in the column is in (dd-mm-yyyy hh:mm) format.
df['datetime']=pd.to_datetime(df['datetime'],format="%d-%m-%Y %H:%M")
df
# Output
datetime total
0 2019-07-11 15:35:00 23800.0
1 2019-07-11 16:10:00 15800.0
2 2019-07-12 11:49:00 58000.0
3 2019-07-13 13:19:00 14800.0
4 2019-07-13 13:22:00 15600.0
... ... ... ...
2415 2020-05-02 11:37:00 19500.0
2416 2020-05-02 11:39:00 19800.0
2417 2020-05-02 12:15:00 14300.0
2418 2020-05-02 13:45:00 15000.0
2419 2020-05-02 14:45:00 24100.0
2420 rows × 3 columns
Separate day, month name, and year in different columns from datetime.
df['date']=df['datetime'].dt.date
df['time']=df['datetime'].dt.time
df
# Output
datetime total date time
0 2019-07-11 15:35:00 23800.0 2019-07-11 15:35:00
1 2019-07-11 16:10:00 15800.0 2019-07-11 16:10:00
2 2019-07-12 11:49:00 58000.0 2019-07-12 11:49:00
3 2019-07-13 13:19:00 14800.0 2019-07-13 13:19:00
4 2019-07-13 13:22:00 15600.0 2019-07-13 13:22:00
... ... ... ... ... ...
2415 2020-05-02 11:37:00 19500.0 2020-05-02 11:37:00
2416 2020-05-02 11:39:00 19800.0 2020-05-02 11:39:00
2417 2020-05-02 12:15:00 14300.0 2020-05-02 12:15:00
2418 2020-05-02 13:45:00 15000.0 2020-05-02 13:45:00
2419 2020-05-02 14:45:00 24100.0 2020-05-02 14:45:00
Separate date, month and year from date column.
df['day']=df['datetime'].dt.day
df['month']=df['datetime'].dt.month_name()
df['year']=df['datetime'].dt.year
df
# Output
datetime total day month year
0 2019-07-11 15:35:00 23800 11 July 2019
1 2019-07-11 16:10:00 15800 11 July 2019
2 2019-07-12 11:49:00 58000 12 July 2019
3 2019-07-13 13:19:00 14800 13 July 2019
4 2019-07-13 13:22:00 15600 13 July 2019
... ... ... ... ... ...
2415 2020-05-02 11:37:00 19500 2 May 2020
2416 2020-05-02 11:39:00 19800 2 May 2020
2417 2020-05-02 12:15:00 14300 2 May 2020
2418 2020-05-02 13:45:00 15000 2 May 2020
2419 2020-05-02 14:45:00 24100 2 May 2020
2420 rows × 5 columns
Extract time from the datetime column.
df['time']=df['datetime'].dt.time
df
# Output
datetime total date month year time
0 2019-07-11 15:35:00 23800 11 July 2019 15:35:00
1 2019-07-11 16:10:00 15800 11 July 2019 16:10:00
2 2019-07-12 11:49:00 58000 12 July 2019 11:49:00
3 2019-07-13 13:19:00 14800 13 July 2019 13:19:00
4 2019-07-13 13:22:00 15600 13 July 2019 13:22:00
... ... ... ... ... ... ...
2415 2020-05-02 11:37:00 19500 2 May 2020 11:37:00
2416 2020-05-02 11:39:00 19800 2 May 2020 11:39:00
2417 2020-05-02 12:15:00 14300 2 May 2020 12:15:00
2418 2020-05-02 13:45:00 15000 2 May 2020 13:45:00
2419 2020-05-02 14:45:00 24100 2 May 2020 14:45:00
2420 rows × 6 columns
Create a sample of 700 rows from the dataset and find the oldest date in the dataset.
Create a sample of 700 rows from the dataset and find the latest date in the dataset.
Create a sample of 700 rows from the dataset and arrange them in ascending order on behalf of date.
Display the data of the sale that happened between 1st Aug 2019 and 01 Dec 2019.
df_filtered = df[(df['datetime'] > '2019-08-01') & (df['datetime']< '2019-12-01')]
df_filtered
# Output
datetime total day month year time
135 2019-08-01 11:03:00 14500 1 August 2019 11:03:00
136 2019-08-01 11:17:00 13100 1 August 2019 11:17:00
137 2019-08-01 11:25:00 16100 1 August 2019 11:25:00
138 2019-08-01 11:50:00 14800 1 August 2019 11:50:00
139 2019-08-01 12:02:00 21100 1 August 2019 12:02:00
... ... ... ... ... ... ...
1082 2019-11-30 13:01:00 38200 30 November 2019 13:01:00
1083 2019-11-30 13:44:00 25100 30 November 2019 13:44:00
1084 2019-11-30 13:49:00 20300 30 November 2019 13:49:00
1085 2019-11-30 14:24:00 15800 30 November 2019 14:24:00
1086 2019-11-30 17:04:00 27600 30 November 2019 17:04:00
952 rows × 6 columns
Find out the total sales on each date.
df['date']=df['datetime'].dt.date
ndf=df.groupby('date').agg(
total_sale=('total','sum'),
)
ndf.reset_index()
# Output
date total_sale
0 2019-07-11 39600
1 2019-07-12 58000
2 2019-07-13 117400
3 2019-07-14 212000
4 2019-07-15 30900
... ... ...
244 2020-04-27 264600
245 2020-04-29 118100
246 2020-04-30 134100
247 2020-05-01 215700
248 2020-05-02 92700
249 rows × 2 columns
Find out total sale on each day.
df['day_of_the_week']=df['datetime'].dt.day_name()
ndf=df.groupby('day_of_the_week').agg(
total_sale=('total','sum')
)
ndf.reset_index()
# output
day_of_the_week total_sale
0 Friday 8177700
1 Monday 6887000
2 Saturday 9098000
3 Sunday 11287500
4 Thursday 8140700
5 Tuesday 56000
6 Wednesday 7590600
Find rows where the transaction happened in the afternoon (12 PM - 6 PM)
df['hour']=df['datetime'].dt.hour
ndf=df[(df['hour']>12) & (df['hour']<18)]
ndf
# Output
datetime total day month year time date day_of_the_week hour
0 2019-07-11 15:35:00 23800 11 July 2019 15:35:00 2019-07-11 Thursday 15
1 2019-07-11 16:10:00 15800 11 July 2019 16:10:00 2019-07-11 Thursday 16
3 2019-07-13 13:19:00 14800 13 July 2019 13:19:00 2019-07-13 Saturday 13
4 2019-07-13 13:22:00 15600 13 July 2019 13:22:00 2019-07-13 Saturday 13
5 2019-07-13 14:54:00 15800 13 July 2019 14:54:00 2019-07-13 Saturday 14
... ... ... ... ... ... ... ... ... ...
2412 2020-05-01 13:55:00 21300 1 May 2020 13:55:00 2020-05-01 Friday 13
2413 2020-05-01 15:03:00 14800 1 May 2020 15:03:00 2020-05-01 Friday 15
2414 2020-05-01 15:19:00 14500 1 May 2020 15:19:00 2020-05-01 Friday 15
2418 2020-05-02 13:45:00 15000 2 May 2020 13:45:00 2020-05-02 Saturday 13
2419 2020-05-02 14:45:00 24100 2 May 2020 14:45:00 2020-05-02 Saturday 14
1160 rows × 9 columns
Find the sale happened on weekends(sat and sun).
ndf=df.loc[(df['day_of_the_week']=='Saturday') | (df['day_of_the_week']=='Sunday')]
ndf
# Output
datetime total day month year time date day_of_the_week hour
3 2019-07-13 13:19:00 14800 13 July 2019 13:19:00 2019-07-13 Saturday 13
4 2019-07-13 13:22:00 15600 13 July 2019 13:22:00 2019-07-13 Saturday 13
5 2019-07-13 14:54:00 15800 13 July 2019 14:54:00 2019-07-13 Saturday 14
6 2019-07-13 15:08:00 15800 13 July 2019 15:08:00 2019-07-13 Saturday 15
7 2019-07-13 15:09:00 14000 13 July 2019 15:09:00 2019-07-13 Saturday 15
... ... ... ... ... ... ... ... ... ...
2415 2020-05-02 11:37:00 19500 2 May 2020 11:37:00 2020-05-02 Saturday 11
2416 2020-05-02 11:39:00 19800 2 May 2020 11:39:00 2020-05-02 Saturday 11
2417 2020-05-02 12:15:00 14300 2 May 2020 12:15:00 2020-05-02 Saturday 12
2418 2020-05-02 13:45:00 15000 2 May 2020 13:45:00 2020-05-02 Saturday 13
2419 2020-05-02 14:45:00 24100 2 May 2020 14:45:00 2020-05-02 Saturday 14
1007 rows × 9 columns