Dataset

The Dataset is the MIKE IO data structure for data from dfs files. The mikeio.read methods returns a Dataset as a container of DataArray (Dfs items). Each DataArray has the properties, item, time, geometry and values. The time and geometry are common to all DataArrays in the Dataset.

The Dataset has the following primary properties:

items - a list of mikeio.ItemInfo items for each dataarray
time - a pandas.DatetimeIndex with the time instances of the data
geometry - a Geometry object with the spatial description of the data

Use Dataset’s string representation to get an overview of the Dataset

import mikeio
ds = mikeio.read("../data/HD2D.dfsu")
ds

<mikeio.Dataset>
dims: (time:9, element:884)
time: 1985-08-06 07:00:00 - 1985-08-07 03:00:00 (9 records)
geometry: Dfsu2D (884 elements, 529 nodes)
items:
  0:  Surface elevation <Surface Elevation> (meter)
  1:  U velocity <u velocity component> (meter per sec)
  2:  V velocity <v velocity component> (meter per sec)
  3:  Current speed <Current Speed> (meter per sec)

Selecting items

Selecting a specific item “itemA” (at position 0) from a Dataset ds can be done with:

ds[["itemA"]] - returns a new Dataset with “itemA”
ds["itemA"] - returns “itemA” DataArray
ds[[0]] - returns a new Dataset with “itemA”
ds[0] - returns “itemA” DataArray
ds.itemA - returns “itemA” DataArray

We recommend the use named items for readability.

ds.Surface_elevation

<mikeio.DataArray>
name: Surface elevation
dims: (time:9, element:884)
time: 1985-08-06 07:00:00 - 1985-08-07 03:00:00 (9 records)
geometry: Dfsu2D (884 elements, 529 nodes)

Negative index e.g. ds[-1] can also be used to select from the end. Several items (“itemA” at 0 and “itemC” at 2) can be selected with the notation:

ds[["itemA", "itemC"]]
ds[[0, 2]]

Note that this behavior is similar to pandas and xarray.

Temporal selection

A time slice of a Dataset can be selected in several different ways.

ds.sel(time="1985-08-06 12:00")

<mikeio.Dataset>
dims: (element:884)
time: 1985-08-06 12:00:00 (time-invariant)
geometry: Dfsu2D (884 elements, 529 nodes)
items:
  0:  Surface elevation <Surface Elevation> (meter)
  1:  U velocity <u velocity component> (meter per sec)
  2:  V velocity <v velocity component> (meter per sec)
  3:  Current speed <Current Speed> (meter per sec)

ds["1985-8-7":]

<mikeio.Dataset>
dims: (time:2, element:884)
time: 1985-08-07 00:30:00 - 1985-08-07 03:00:00 (2 records)
geometry: Dfsu2D (884 elements, 529 nodes)
items:
  0:  Surface elevation <Surface Elevation> (meter)
  1:  U velocity <u velocity component> (meter per sec)
  2:  V velocity <v velocity component> (meter per sec)
  3:  Current speed <Current Speed> (meter per sec)

Spatial selection

The sel method finds a single element.

ds.sel(x=607002, y=6906734)

<mikeio.Dataset>
dims: (time:9)
time: 1985-08-06 07:00:00 - 1985-08-07 03:00:00 (9 records)
geometry: GeometryPoint2D(x=607002.7094112666, y=6906734.833048992)
items:
  0:  Surface elevation <Surface Elevation> (meter)
  1:  U velocity <u velocity component> (meter per sec)
  2:  V velocity <v velocity component> (meter per sec)
  3:  Current speed <Current Speed> (meter per sec)

Plotting

In most cases, you will not plot the Dataset, but rather it’s DataArrays. But there are two exceptions:

dfs0-Dataset : plot all items as timeseries with ds.plot()
scatter : compare two items using ds.plot.scatter(x=“itemA”, y=“itemB”)

See details in the Dataset Plotter API.

Properties

The Dataset (and DataArray) has several properties:

n_items - Number of items
n_timesteps - Number of timesteps
n_elements - Number of elements
start_time - First time instance (as datetime)
end_time - Last time instance (as datetime)
is_equidistant - Is the time series equidistant in time
timestep - Time step in seconds (if is_equidistant)
shape - Shape of each item
deletevalue - File delete value (NaN value)

Methods

Dataset (and DataArray) has several useful methods for working with data, including different ways of selecting data:

sel() - Select subset along an axis
isel() - Select subset along an axis with an integer

Aggregations along an axis:

mean() - Mean value along an axis
nanmean() - Mean value along an axis (NaN removed)
max() - Max value along an axis
nanmax() - Max value along an axis (NaN removed)
min() - Min value along an axis
nanmin() - Min value along an axis (NaN removed)
average() - Compute the weighted average along the specified axis.
aggregate() - Aggregate along an axis
quantile() - Quantiles along an axis
nanquantile() - Quantiles along an axis (NaN ignored)

Mathematical operations

ds + value
ds - value
ds * value

and + and - between two Datasets (if number of items and shapes conform):

ds1 + ds2
ds1 - ds2

Other methods that also return a Dataset:

interp_like - Spatio (temporal) interpolation (see Dfsu interpolation notebook)
interp_time() - Temporal interpolation (see Time interpolation notebook)
dropna() - Remove time steps where all items are NaN
squeeze() - Remove axes of length 1

Conversion:

to_dataframe() - Convert Dataset to a pandas.DataFrame.
to_xarray() - Convert Dataset to a xarray.Dataset (great for Dfs2, Dfs3).
to_dfs() - Write Dataset to a Dfs file