10  Reading dfs0

This section guides you through loading time series data from dfs0 files into Pandas DataFrames. This approach allows you to leverage your existing Pandas skills, learned in Module 1, for powerful time series analysis and manipulation.

10.1 Workflow

The general workflow for working with dfs0 data is as follows:

  1. Read dfs0 file into Dataset
  2. Subset Dataset for specific items and times (optional)
  3. Convert Dataset (or DataArray) to DataFrame
  4. Perform some additional analysis via the DataFrame

10.2 Datasets

The primary function for reading MIKE IO files is mikeio.read(). It returns a Dataset object, which is a container for one or more DataArray objects (e.g. a specific time series).

Reading a dfs0 file into a Dataset is as simple as calling the read() method with the dfs0 file path as the argument:

ds = mikeio.read("data/sirius_idf_rainfall.dfs0")
ds
<mikeio.Dataset>
dims: (time:22)
time: 2019-01-01 00:00:00 - 2019-01-02 00:00:00 (22 non-equidistant records)
geometry: GeometryUndefined()
items:
  0:   F=20 <Rainfall Intensity> (mm per hour) - 3
  1:   F=10 <Rainfall Intensity> (mm per hour) - 3
  2:   F=5 <Rainfall Intensity> (mm per hour) - 3
  3:   F=2 <Rainfall Intensity> (mm per hour) - 3
  4:   F=1 <Rainfall Intensity> (mm per hour) - 3
  5:   F=0.5 <Rainfall Intensity> (mm per hour) - 3
  6:   F=0.2 <Rainfall Intensity> (mm per hour) - 3
  7:   F=0.1 <Rainfall Intensity> (mm per hour) - 3
  8:   F=0.05 <Rainfall Intensity> (mm per hour) - 3

Notice the representation of the Dataset object shows information about:

  • Total number of time steps
  • Timestamps for first and last time step
  • All the items (i.e. DataArrays) available

By default, mikeio.read() loads the entire dfs0 file into memory. This is fine for smaller files, but for very large dfs0 files, you might want to load only specific items or a particular time range to conserve memory and improve performance. You can do this directly with the items or time arguments in the read() function. For example, to read only the first item (index 0):

ds = mikeio.read("data/sirius_idf_rainfall.dfs0", items=0)
ds
<mikeio.Dataset>
dims: (time:22)
time: 2019-01-01 00:00:00 - 2019-01-02 00:00:00 (22 non-equidistant records)
geometry: GeometryUndefined()
items:
  0:   F=20 <Rainfall Intensity> (mm per hour) - 3

Similarly, to read only the data for the first time step (index 0):

ds = mikeio.read("data/sirius_idf_rainfall.dfs0", time=0)
ds
<mikeio.Dataset>
dims: ()
time: 2019-01-01 00:00:00 (time-invariant)
geometry: GeometryUndefined()
items:
  0:   F=20 <Rainfall Intensity> (mm per hour) - 3
  1:   F=10 <Rainfall Intensity> (mm per hour) - 3
  2:   F=5 <Rainfall Intensity> (mm per hour) - 3
  3:   F=2 <Rainfall Intensity> (mm per hour) - 3
  4:   F=1 <Rainfall Intensity> (mm per hour) - 3
  5:   F=0.5 <Rainfall Intensity> (mm per hour) - 3
  6:   F=0.2 <Rainfall Intensity> (mm per hour) - 3
  7:   F=0.1 <Rainfall Intensity> (mm per hour) - 3
  8:   F=0.05 <Rainfall Intensity> (mm per hour) - 3

10.3 DataArrays

DataArray objects are accessed via the Dataset object after reading.

Select a specific DataArray from the Dataset by its index or its name. For example, to select the first DataArray by its index:

da = ds[0]
da
<mikeio.DataArray>
name:  F=20
dims: ()
time: 2019-01-01 00:00:00 (time-invariant)
geometry: GeometryUndefined()
values: 0.0

Alternatively, select a DataArray by its name using square brackets:

da = ds[" F=20"]
da
<mikeio.DataArray>
name:  F=20
dims: ()
time: 2019-01-01 00:00:00 (time-invariant)
geometry: GeometryUndefined()
values: 0.0

Notice the representation of the DataArray object is also informative, just like the Dataset object.

10.4 Convert to Pandas

You can convert an entire Dataset (which might contain multiple time series) into a Pandas DataFrame. Each item in the Dataset will become a column in the DataFrame.

import pandas as pd

ds = mikeio.read("data/sirius_idf_rainfall.dfs0")
df = ds.to_dataframe()
df.head()
F=20 F=10 F=5 F=2 F=1 F=0.5 F=0.2 F=0.1 F=0.05
2019-01-01 00:00:00 0.00 0.000000 0.0 0.0 0.000000 0.000000 0.000000 0.000000 0.0
2019-01-01 06:00:00 0.15 0.283333 0.4 0.4 0.466667 0.683333 0.966667 1.316667 1.4
2019-01-01 07:00:00 0.20 0.400000 0.6 0.6 0.800000 1.100000 1.600000 2.200000 2.3
2019-01-01 08:00:00 0.30 0.600000 0.7 0.8 0.900000 1.400000 2.000000 2.800000 3.1
2019-01-01 09:00:00 0.30 0.600000 0.9 1.0 1.200000 1.800000 2.600000 3.600000 4.0

Similarly, a single DataArray can be converted to a Pandas DataFrame (which will have one data column).

da = ds[" F=20"]
df_T20 = da.to_dataframe()
df_T20.head()
F=20
2019-01-01 00:00:00 0.00
2019-01-01 06:00:00 0.15
2019-01-01 07:00:00 0.20
2019-01-01 08:00:00 0.30
2019-01-01 09:00:00 0.30

Once your data is in a DataFrame, you can use all of Pandas’ powerful methods. For instance, you can easily plot a time series:

df_T20.plot(
    title="Rainfall for Return Period F=20",
    ylabel="Rainfall (mm/hr)"
)

Or get some descriptive statistics:

df.describe().T
count mean std min 25% 50% 75% max
F=20 22.0 2.158333 4.824954 0.0 0.300 0.60 1.80 22.799999
F=10 22.0 4.039394 9.101874 0.0 0.525 1.10 3.00 43.200001
F=5 22.0 5.890909 13.389296 0.0 0.725 1.60 4.65 63.599998
F=2 22.0 9.408333 17.422466 0.0 0.825 2.00 8.40 75.599998
F=1 22.0 12.641667 25.616199 0.0 0.950 2.40 9.75 114.000000
F=0.5 22.0 15.833333 30.442241 0.0 1.425 3.40 13.50 134.399994
F=0.2 22.0 22.475000 36.939988 0.0 2.075 5.30 23.40 151.199997
F=0.1 22.0 27.500758 39.726610 0.0 2.900 7.60 34.05 153.600006
F=0.05 22.0 30.627272 42.803784 0.0 3.200 8.65 39.75 158.399994
Learning Python’s scientific ecosystem pays off…

Notice that using a common structure for data (e.g. DataFrame) unlocks familiar analyses independent of the original data source file format (e.g. dfs0, csv). This is an example of why converting data into a format compatible with the scientific Python ecosystem is useful.