Dfs0

Dfs0#

import pandas as pd
import mikeio

Reading data#

ds = mikeio.read("data/Oresund_ts.dfs0")
ds

<mikeio.Dataset>
dims: (time:2017)
time: 2018-03-04 00:00:00 - 2018-03-11 00:00:00 (2017 records)
geometry: GeometryUndefined()
number of items: 12

type(ds)

mikeio.dataset._dataset.Dataset

The mikeio read function returns a Dataset which is a container of DataArrays.

A DataArray can be selected by name or by index.

da = ds["Drogden: Surface elevation"]   # or ds.Drogden_Surface_elevation or ds[2]
da

<mikeio.DataArray>
name: Drogden: Surface elevation
dims: (time:2017)
time: 2018-03-04 00:00:00 - 2018-03-11 00:00:00 (2017 records)
geometry: GeometryUndefined()
values: [0.05063, 0.04713, ..., 0.2404]

Upon read, specific items can be selected with the items argument using name or index.

ds = mikeio.read("data/Oresund_ts.dfs0", items=[0,2,3])
ds

<mikeio.Dataset>
dims: (time:2017)
time: 2018-03-04 00:00:00 - 2018-03-11 00:00:00 (2017 records)
geometry: GeometryUndefined()
items:
  0:  Viken: Surface elevation <Surface Elevation> (meter)
  1:  Drogden: Surface elevation <Surface Elevation> (meter)
  2:  Klagshamn: Surface elevation <Surface Elevation> (meter)

Wildcards can be used to select multiple items:

ds = mikeio.read("data/Oresund_ts.dfs0", items="*Surf*")
ds

<mikeio.Dataset>
dims: (time:2017)
time: 2018-03-04 00:00:00 - 2018-03-11 00:00:00 (2017 records)
geometry: GeometryUndefined()
items:
  0:  Viken: Surface elevation <Surface Elevation> (meter)
  1:  Hornbæk: Surface elevation <Surface Elevation> (meter)
  2:  Drogden: Surface elevation <Surface Elevation> (meter)
  3:  Klagshamn: Surface elevation <Surface Elevation> (meter)

A specific time subset can be selected by index (or sel method) or already upon read:

ds["2018-03-04"]  # or ds.sel(time="2018-03-04")

<mikeio.Dataset>
dims: (time:288)
time: 2018-03-04 00:00:00 - 2018-03-04 23:55:00 (288 records)
geometry: GeometryUndefined()
items:
  0:  Viken: Surface elevation <Surface Elevation> (meter)
  1:  Hornbæk: Surface elevation <Surface Elevation> (meter)
  2:  Drogden: Surface elevation <Surface Elevation> (meter)
  3:  Klagshamn: Surface elevation <Surface Elevation> (meter)

ds = mikeio.read("data/Oresund_ts.dfs0", items="*Surf*", time="2018-03-04")
ds

<mikeio.Dataset>
dims: (time:288)
time: 2018-03-04 00:00:00 - 2018-03-04 23:55:00 (288 records)
geometry: GeometryUndefined()
items:
  0:  Viken: Surface elevation <Surface Elevation> (meter)
  1:  Hornbæk: Surface elevation <Surface Elevation> (meter)
  2:  Drogden: Surface elevation <Surface Elevation> (meter)
  3:  Klagshamn: Surface elevation <Surface Elevation> (meter)

The Dataset and DataArray have a number of useful attributes like time, items, ndims, shape, values (only DataArray) etc

ds.time

DatetimeIndex(['2018-03-04 00:00:00', '2018-03-04 00:05:00',
               '2018-03-04 00:10:00', '2018-03-04 00:15:00',
               '2018-03-04 00:20:00', '2018-03-04 00:25:00',
               '2018-03-04 00:30:00', '2018-03-04 00:35:00',
               '2018-03-04 00:40:00', '2018-03-04 00:45:00',
               ...
               '2018-03-04 23:10:00', '2018-03-04 23:15:00',
               '2018-03-04 23:20:00', '2018-03-04 23:25:00',
               '2018-03-04 23:30:00', '2018-03-04 23:35:00',
               '2018-03-04 23:40:00', '2018-03-04 23:45:00',
               '2018-03-04 23:50:00', '2018-03-04 23:55:00'],
              dtype='datetime64[ns]', length=288, freq=None)

ds.items

[Viken: Surface elevation <Surface Elevation> (meter),
 Hornbæk: Surface elevation <Surface Elevation> (meter),
 Drogden: Surface elevation <Surface Elevation> (meter),
 Klagshamn: Surface elevation <Surface Elevation> (meter)]

da.item

Drogden: Surface elevation <Surface Elevation> (meter)

da.shape

(2017,)

da.values

array([0.05062908, 0.04713159, 0.04382962, ..., 0.23521478, 0.23782268,
       0.24038477])

The time series can be plotted with the plot method.

ds.plot();

_images/eb4de9caa58a968f3eea30776d7bbc0841f0aa936913760e44c74bb0bf18c7a9.png

A simple timeseries Dataset can easily be converted to a Pandas DataFrame.

df = ds.to_pandas() # convert dataset to dataframe
df

	Viken: Surface elevation	Hornbæk: Surface elevation	Drogden: Surface elevation	Klagshamn: Surface elevation
2018-03-04 00:00:00	0.268416	0.279761	0.050629	0.018723
2018-03-04 00:05:00	0.265478	0.276792	0.047132	0.014833
2018-03-04 00:10:00	0.262521	0.273909	0.043830	0.011230
2018-03-04 00:15:00	0.259481	0.270935	0.040655	0.007876
2018-03-04 00:20:00	0.256348	0.267858	0.037621	0.004712
...	...	...	...	...
2018-03-04 23:35:00	0.232721	0.237954	0.099927	0.076122
2018-03-04 23:40:00	0.229933	0.235055	0.098214	0.073399
2018-03-04 23:45:00	0.227105	0.232199	0.096391	0.070642
2018-03-04 23:50:00	0.224168	0.229341	0.094460	0.067864
2018-03-04 23:55:00	0.221046	0.226392	0.092474	0.065087

288 rows × 4 columns

Writing data#

Often, time series data will come from a csv or an Excel file. Here is an example of how to read a csv file with pandas and then write the pandas DataFrame to a dfs0 file.

df = pd.read_csv("data/naples_fl.csv", skiprows=1, parse_dates=True, index_col=0)
df

	TAVG (Degrees Fahrenheit)	TMAX (Degrees Fahrenheit)	TMIN (Degrees Fahrenheit)	PRCP (Inches)	SNOW (Inches)	SNWD (Inches)
Date
2002-03-01	67.0	78.0	56.0	0.00	NaN	NaN
2002-03-02	76.0	83.0	69.0	0.00	NaN	NaN
2002-03-03	78.0	84.0	71.0	0.00	NaN	NaN
2002-03-04	64.0	76.0	51.0	0.08	NaN	NaN
2002-03-05	58.0	70.0	45.0	0.00	NaN	NaN
...	...	...	...	...	...	...
2021-08-11	NaN	93.0	77.0	0.23	NaN	NaN
2021-08-12	NaN	94.0	77.0	0.00	0.0	0.0
2021-08-13	NaN	95.0	77.0	0.03	0.0	0.0
2021-08-14	NaN	85.0	74.0	0.05	0.0	0.0
2021-08-15	NaN	83.0	75.0	0.01	0.0	0.0

7108 rows × 6 columns

You will probably have the need to parse certain a specific data formats many times, then it is a good idea to create a function.

def read_ncei_obs(filename):
    # old name : new name
    mapping = {'TAVG (Degrees Fahrenheit)': 'temperature_avg_f',
               'TMAX (Degrees Fahrenheit)': 'temperature_max_f',
               'TMIN (Degrees Fahrenheit)': 'temperature_min_f',
               'PRCP (Inches)': 'prec_in'}
    
    sel_cols = mapping.values() # No need to repeat ['temperature_avg_f',...]
    df = (
        pd.read_csv(filename, skiprows=1, parse_dates=True, index_col=0)
           .rename(columns=mapping)
    )[sel_cols]
    return df

df = read_ncei_obs("data/naples_fl.csv")
df.head()

	temperature_avg_f	temperature_max_f	temperature_min_f	prec_in
Date
2002-03-01	67.0	78.0	56.0	0.00
2002-03-02	76.0	83.0	69.0	0.00
2002-03-03	78.0	84.0	71.0	0.00
2002-03-04	64.0	76.0	51.0	0.08
2002-03-05	58.0	70.0	45.0	0.00

df.tail()

	temperature_avg_f	temperature_max_f	temperature_min_f	prec_in
Date
2021-08-11	NaN	93.0	77.0	0.23
2021-08-12	NaN	94.0	77.0	0.00
2021-08-13	NaN	95.0	77.0	0.03
2021-08-14	NaN	85.0	74.0	0.05
2021-08-15	NaN	83.0	75.0	0.01

df.shape

(7108, 4)

Convert temperature to Celsius and precipitation to mm.

df_final = df.assign(temperature_max_c=(df['temperature_max_f'] - 32)/1.8,
                 prec_mm=df['prec_in'] * 25.4)

df_final.head()

	temperature_avg_f	temperature_max_f	temperature_min_f	prec_in	temperature_max_c	prec_mm
Date
2002-03-01	67.0	78.0	56.0	0.00	25.555556	0.000
2002-03-02	76.0	83.0	69.0	0.00	28.333333	0.000
2002-03-03	78.0	84.0	71.0	0.00	28.888889	0.000
2002-03-04	64.0	76.0	51.0	0.08	24.444444	2.032
2002-03-05	58.0	70.0	45.0	0.00	21.111111	0.000

df_final.loc['2021'].plot();

_images/4e4d79f8e5b85b548fe3aa380efc04c2123cfc767900d39d3cd1c0afb22f0258.png

Creating a dfs0 file from a dataframe is pretty straightforward.

Convert the dataframe to a Dataset

ds = mikeio.from_pandas(df_final)
ds

<mikeio.Dataset>
dims: (time:7108)
time: 2002-03-01 00:00:00 - 2021-08-15 00:00:00 (7108 records)
geometry: GeometryUndefined()
items:
  0:  temperature_avg_f <Undefined> (undefined)
  1:  temperature_max_f <Undefined> (undefined)
  2:  temperature_min_f <Undefined> (undefined)
  3:  prec_in <Undefined> (undefined)
  4:  temperature_max_c <Undefined> (undefined)
  5:  prec_mm <Undefined> (undefined)

Write the Dataset to a dfs0 file.

ds.to_dfs("output/naples_fl.dfs0")

Let’s read it back in again…

saved_ds = mikeio.read("output/naples_fl.dfs0")
saved_ds

<mikeio.Dataset>
dims: (time:7108)
time: 2002-03-01 00:00:00 - 2021-08-15 00:00:00 (7108 records)
geometry: GeometryUndefined()
items:
  0:  temperature_avg_f <Undefined> (undefined)
  1:  temperature_max_f <Undefined> (undefined)
  2:  temperature_min_f <Undefined> (undefined)
  3:  prec_in <Undefined> (undefined)
  4:  temperature_max_c <Undefined> (undefined)
  5:  prec_mm <Undefined> (undefined)

By default, EUM types are undefined. But it can be specified. Let’s select a few colums.

df2 = df_final[['temperature_max_c', 'prec_in']]
df2.head()

	temperature_max_c	prec_in
Date
2002-03-01	25.555556	0.00
2002-03-02	28.333333	0.00
2002-03-03	28.888889	0.00
2002-03-04	24.444444	0.08
2002-03-05	21.111111	0.00

from mikeio import ItemInfo, EUMType, EUMUnit

ds2 = mikeio.from_pandas(df2, 
                         items=[
                   ItemInfo(EUMType.Temperature),
                   ItemInfo(EUMType.Precipitation_Rate, EUMUnit.inch_per_day)]
           )
ds2

<mikeio.Dataset>
dims: (time:7108)
time: 2002-03-01 00:00:00 - 2021-08-15 00:00:00 (7108 records)
geometry: GeometryUndefined()
items:
  0:  temperature_max_c <Temperature> (degree Celsius)
  1:  prec_in <Precipitation Rate> (inch per day)

EUM#

from mikeio.eum import ItemInfo, EUMType, EUMUnit

EUMType.search("wind")

[Wind Velocity,
 Wind Direction,
 Wind friction factor,
 Wind speed,
 Depth of Wind,
 Wind friction speed]

EUMType.Wind_speed.units

[meter per sec, feet per sec, knot, km per hour, miles per hour]

Inline Exercise#

What is the best EUM Type for “peak wave direction”? What is the default unit?

# insert your code here

Precipitation data#

df = pd.read_csv("data/precipitation.csv", parse_dates=True, index_col=0)
df.head()

	Precipitation station 1	Precipitation station 2	Precipitation station 3	Precipitation station 4	Precipitation station 5	Precipitation station 6	Precipitation station 7	Precipitation station 8	Precipitation station 9
date
2001-01-01	0.0	0.000	0.021	0.071	0.000	0.000	0.025	0.025	0.000
2001-01-02	0.0	0.025	0.037	0.000	0.004	0.054	0.042	0.021	0.054
2001-01-03	0.0	0.000	0.000	0.000	0.000	0.000	0.000	0.042	0.000
2001-01-04	0.0	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000
2001-01-05	0.0	0.000	0.158	0.021	0.000	0.000	0.017	0.021	0.000

from mikecore.DfsFile import DataValueType

(mikeio.from_pandas(df, items=ItemInfo(EUMType.Precipitation_Rate, EUMUnit.mm_per_hour, data_value_type=DataValueType.MeanStepBackward))
        .to_dfs("output/precipitation.dfs0")
)

Selecting#

ds = mikeio.read("output/precipitation.dfs0", items=[1,4]) # select item by item number (starting from zero)
ds

<mikeio.Dataset>
dims: (time:31)
time: 2001-01-01 00:00:00 - 2001-01-31 00:00:00 (31 records)
geometry: GeometryUndefined()
items:
  0:  Precipitation station 2 <Precipitation Rate> (mm per hour) - 3
  1:  Precipitation station 5 <Precipitation Rate> (mm per hour) - 3

ds = mikeio.read("output/precipitation.dfs0", items=["Precipitation station 5","Precipitation station 1"]) # or by name (in the order you like it)
ds

<mikeio.Dataset>
dims: (time:31)
time: 2001-01-01 00:00:00 - 2001-01-31 00:00:00 (31 records)
geometry: GeometryUndefined()
items:
  0:  Precipitation station 5 <Precipitation Rate> (mm per hour) - 3
  1:  Precipitation station 1 <Precipitation Rate> (mm per hour) - 3

Inline Exercise#

Read all items to a variable ds. Select “Precipitation station 3” - which different ways can you select this item?

# insert your code here

import utils

utils.sysinfo()

    System: 3.11.9 (main, Jul 15 2024, 21:50:21) [GCC 11.4.0]
    NumPy: 2.1.1
    Pandas: 2.2.2
    MIKE IO: 2.1.0
    Last modified: 2024-09-18 15:25:46.457344
    

Dfs0

Contents

Dfs0#

Reading data#

Writing data#

EUM#

Inline Exercise#

Precipitation data#

Selecting#

Inline Exercise#