Dataset

The Dataset is the MIKE IO data structure for data from dfs files. The mikeio.read methods returns a Dataset as a container of DataArray (Dfs items). Each DataArray has the properties, item, time, geometry and values. The time and geometry are common to all DataArrays in the Dataset.

The Dataset has the following primary properties:

Use Dataset’s string representation to get an overview of the Dataset

import mikeio
ds = mikeio.read("../data/HD2D.dfsu")
ds
<mikeio.Dataset>
dims: (time:9, element:884)
time: 1985-08-06 07:00:00 - 1985-08-07 03:00:00 (9 records)
geometry: Dfsu2D (884 elements, 529 nodes)
items:
  0:  Surface elevation <Surface Elevation> (meter)
  1:  U velocity <u velocity component> (meter per sec)
  2:  V velocity <v velocity component> (meter per sec)
  3:  Current speed <Current Speed> (meter per sec)

Selecting items

Selecting a specific item “itemA” (at position 0) from a Dataset ds can be done with:

  • ds[["itemA"]] - returns a new Dataset with “itemA”
  • ds["itemA"] - returns “itemA” DataArray
  • ds[[0]] - returns a new Dataset with “itemA”
  • ds[0] - returns “itemA” DataArray
  • ds.itemA - returns “itemA” DataArray

We recommend the use named items for readability.

ds.Surface_elevation
<mikeio.DataArray>
name: Surface elevation
dims: (time:9, element:884)
time: 1985-08-06 07:00:00 - 1985-08-07 03:00:00 (9 records)
geometry: Dfsu2D (884 elements, 529 nodes)

Negative index e.g. ds[-1] can also be used to select from the end. Several items (“itemA” at 0 and “itemC” at 2) can be selected with the notation:

  • ds[["itemA", "itemC"]]
  • ds[[0, 2]]

Note that this behavior is similar to pandas and xarray.

Temporal selection

A time slice of a Dataset can be selected in several different ways.

ds.sel(time="1985-08-06 12:00")
<mikeio.Dataset>
dims: (element:884)
time: 1985-08-06 12:00:00 (time-invariant)
geometry: Dfsu2D (884 elements, 529 nodes)
items:
  0:  Surface elevation <Surface Elevation> (meter)
  1:  U velocity <u velocity component> (meter per sec)
  2:  V velocity <v velocity component> (meter per sec)
  3:  Current speed <Current Speed> (meter per sec)
ds["1985-8-7":]
<mikeio.Dataset>
dims: (time:2, element:884)
time: 1985-08-07 00:30:00 - 1985-08-07 03:00:00 (2 records)
geometry: Dfsu2D (884 elements, 529 nodes)
items:
  0:  Surface elevation <Surface Elevation> (meter)
  1:  U velocity <u velocity component> (meter per sec)
  2:  V velocity <v velocity component> (meter per sec)
  3:  Current speed <Current Speed> (meter per sec)

Spatial selection

The sel method finds a single element.

ds.sel(x=607002, y=6906734)
<mikeio.Dataset>
dims: (time:9)
time: 1985-08-06 07:00:00 - 1985-08-07 03:00:00 (9 records)
geometry: GeometryPoint2D(x=607002.7094112666, y=6906734.833048992)
items:
  0:  Surface elevation <Surface Elevation> (meter)
  1:  U velocity <u velocity component> (meter per sec)
  2:  V velocity <v velocity component> (meter per sec)
  3:  Current speed <Current Speed> (meter per sec)

Plotting

In most cases, you will not plot the Dataset, but rather it’s DataArrays. But there are two exceptions:

  • dfs0-Dataset : plot all items as timeseries with ds.plot()
  • scatter : compare two items using ds.plot.scatter(x=“itemA”, y=“itemB”)

See details in the Dataset Plotter API.

Properties

The Dataset (and DataArray) has several properties:

  • n_items - Number of items
  • n_timesteps - Number of timesteps
  • n_elements - Number of elements
  • start_time - First time instance (as datetime)
  • end_time - Last time instance (as datetime)
  • is_equidistant - Is the time series equidistant in time
  • timestep - Time step in seconds (if is_equidistant)
  • shape - Shape of each item
  • deletevalue - File delete value (NaN value)

Methods

Dataset (and DataArray) has several useful methods for working with data, including different ways of selecting data:

  • sel() - Select subset along an axis
  • isel() - Select subset along an axis with an integer

Aggregations along an axis:

  • mean() - Mean value along an axis
  • nanmean() - Mean value along an axis (NaN removed)
  • max() - Max value along an axis
  • nanmax() - Max value along an axis (NaN removed)
  • min() - Min value along an axis
  • nanmin() - Min value along an axis (NaN removed)
  • average() - Compute the weighted average along the specified axis.
  • aggregate() - Aggregate along an axis
  • quantile() - Quantiles along an axis
  • nanquantile() - Quantiles along an axis (NaN ignored)

Mathematical operations

  • ds + value
  • ds - value
  • ds * value

and + and - between two Datasets (if number of items and shapes conform):

  • ds1 + ds2
  • ds1 - ds2

Other methods that also return a Dataset:

Conversion: