Datetimes and timedeltas#
Python has several ways of representing datetimes and timedelta. This notebook shows the three most common ways and how to convert between them.
Our general advice: use pandas whenever you can.
from datetime import datetime, timedelta
import numpy as np
import pandas as pd
Datetime/timestamp#
The most common datetime representations in Python:
datetime.datetime (Python build-in)
For string representations of datetimes use ISO 8601 (e.g. 2021-09-07T19:03:12Z) when possible.
See Python Pandas For Your Grandpa - 4.2 Dates and Times for a 18-min video introduction to three datetime representations (including time-zone handling).
datetime.datetime#
The build-in datetime representation is quite simple.
dt_dt = datetime(2018,1,1,19,3,1)
dt_dt
datetime.datetime(2018, 1, 1, 19, 3, 1)
NumPy: np.datetime64#
np.datetime64 is essentially an integer (np.int64) representing the time since epoch time 1970-01-01 00:00:00 in a specified unit e.g. days, seconds or nano-seconds.
dt_np = np.datetime64('2018-01-01 19:03:01') # implicitly [s]
dt_np
np.datetime64('2018-01-01T19:03:01')
np.int64(dt_np)
np.int64(1514833381)
np.datetime64('1970-01-01 00:00:00') + np.int64(dt_np)
np.datetime64('2018-01-01T19:03:01')
dt_np.dtype.name
'datetime64[s]'
dt_np.astype(datetime) # np.datetime64 -> datetime.datetime
datetime.datetime(2018, 1, 1, 19, 3, 1)
Pandas: pd.Timestamp#
pd.Timestamp uses np.datetime64[ns] under the hood. Pandas is good at recognizing various string representations of datetimes:
dt_pd = pd.Timestamp("2018/8/1") # equivalent to pd.to_datetime()
dt_pd
Timestamp('2018-08-01 00:00:00')
dt_pd.to_numpy() # pd.Timestamp -> np.datetime64
np.datetime64('2018-08-01T00:00:00')
dt_pd.to_pydatetime() # pd.Timestamp -> datetime.datetime
datetime.datetime(2018, 8, 1, 0, 0)
pd.Timestamp(dt_np) # np.datetime64 -> pd.Timestamp
Timestamp('2018-01-01 19:03:01')
pd.Timestamp(dt_dt) # datetime.datetime -> pd.Timestamp
Timestamp('2018-01-01 19:03:01')
Timedeltas#
We often need to represent differences between two timestamps. The most common representations are:
Which corresponds to the above three representations of datetimes.
datetime.timedelta#
The Python build-in way of working with differences between two datetimes.
del_dt = timedelta(days=6)
del_dt
datetime.timedelta(days=6)
dt_dt + del_dt # datetime.datetime + datetime.timedelta
datetime.datetime(2018, 1, 7, 19, 3, 1)
dt_dt2 = datetime(2018,2,3,11,3,1)
dt_dt2 - dt_dt # datetime.datetime - datetime.datetime
datetime.timedelta(days=32, seconds=57600)
Numpy: np.timedelta64#
np.timedelta64 is an int64 in a specific unit e.g. seconds or nanoseconds.
dt_np2 = np.datetime64('2018-02-02 16:21:11')
del_np = dt_np2 - dt_np # np.datetime64 - np.datetime64
del_np
np.timedelta64(2755090,'s')
dt_np + del_np
np.datetime64('2018-02-02T16:21:11')
np.int64(del_np), np.dtype(del_np).name
(np.int64(2755090), 'timedelta64[s]')
del_np.astype(timedelta) # np.timedelta64 -> datetime.timedelta
datetime.timedelta(days=31, seconds=76690)
Pandas: pd.Timedelta#
dt_pd2 = pd.Timestamp("2018/8/4 23:01:03")
del_pd = dt_pd2 - dt_pd # pd.Timedelta - pd.Timedelta
del_pd
Timedelta('3 days 23:01:03')
dt_pd + del_pd
Timestamp('2018-08-04 23:01:03')
del_pd.total_seconds()
342063.0
print(pd.Timedelta(del_dt)) # datetime.timedelta -> pd.Timedelta
print(pd.Timedelta(del_np)) # np.datetime64 -> pd.Timedelta
6 days 00:00:00
31 days 21:18:10
print(del_pd.to_pytimedelta()) # pd.Timedelta -> datetime.timedelta
print(del_pd.to_timedelta64()) # pd.Timedelta -> np.timedelta64
3 days, 23:01:03
342063 seconds
Datetime ranges#
Pandas is very powerful for vectors of datetimes. Use the pd.date_range() method for creating a pd.DatetimeIndex
dti = pd.date_range('2018', periods=8, freq='5D')
dti
DatetimeIndex(['2018-01-01', '2018-01-06', '2018-01-11', '2018-01-16',
'2018-01-21', '2018-01-26', '2018-01-31', '2018-02-05'],
dtype='datetime64[ns]', freq='5D')
tdi = dti - dti[0]
tdi
TimedeltaIndex([ '0 days', '5 days', '10 days', '15 days', '20 days',
'25 days', '30 days', '35 days'],
dtype='timedelta64[ns]', freq='5D')
tdi.total_seconds().to_numpy()
array([ 0., 432000., 864000., 1296000., 1728000., 2160000.,
2592000., 3024000.])
Slicing with DatetimeIndex#
df = pd.DataFrame(np.ones(8), index=dti, columns=['one'])
df
one | |
---|---|
2018-01-01 | 1.0 |
2018-01-06 | 1.0 |
2018-01-11 | 1.0 |
2018-01-16 | 1.0 |
2018-01-21 | 1.0 |
2018-01-26 | 1.0 |
2018-01-31 | 1.0 |
2018-02-05 | 1.0 |
df.loc["2018-01-05":"2018-01-21"] # notice that end of slice is included!!!
one | |
---|---|
2018-01-06 | 1.0 |
2018-01-11 | 1.0 |
2018-01-16 | 1.0 |
2018-01-21 | 1.0 |
df.loc["2018-01-26":]
one | |
---|---|
2018-01-26 | 1.0 |
2018-01-31 | 1.0 |
2018-02-05 | 1.0 |
df.loc["2018-02"]
one | |
---|---|
2018-02-05 | 1.0 |