Dataset

Dataset(
    self,
    data,
    time=None,
    items=None,
    geometry=None,
    zn=None,
    dims=None,
    validate=True,
    dt=1.0,
)

Dataset containing one or more DataArrays with common geometry and time.

Most often obtained by reading a dfs file. But can also be created a sequence or dictonary of DataArrays. The mikeio.Dataset is inspired by and similar to the xarray.Dataset.

The Dataset is primarily a container for one or more DataArrays all having the same time and geometry (and shape, dims, etc). For convenience, the Dataset provides access to these common properties:

Selecting items

Selecting a specific item “itemA” (at position 0) from a Dataset ds can be done with:

  • ds[[“itemA”]] - returns a new Dataset with “itemA”
  • ds[“itemA”] - returns the “itemA” DataArray
  • ds[[0]] - returns a new Dataset with “itemA”
  • ds[0] - returns the “itemA” DataArray
  • ds.itemA - returns the “itemA” DataArray

Examples

import mikeio
mikeio.read("../data/europe_wind_long_lat.dfs2")
<mikeio.Dataset>
dims: (time:1, y:101, x:221)
time: 2012-01-01 00:00:00 (time-invariant)
geometry: Grid2D (ny=101, nx=221)
items:
  0:  Mean Sea Level Pressure <Air Pressure> (hectopascal)
  1:  Wind x-comp (10m) <Wind Velocity> (meter per sec)
  2:  Wind y-comp (10m) <Wind Velocity> (meter per sec)

Attributes

Name Description
deletevalue File delete value.
dims Named array dimensions of each DataArray.
end_time Last time instance (as datetime).
geometry Geometry of each DataArray.
is_equidistant Is Dataset equidistant in time?
items ItemInfo for each of the DataArrays as a list.
n_elements Number of spatial elements/points.
n_items Number of items/DataArrays, equivalent to len().
n_timesteps Number of time steps.
names Name of each of the DataArrays as a list.
ndim Number of array dimensions of each DataArray.
shape Shape of each DataArray.
start_time First time instance (as datetime).
time Time axis.
timestep Time step in seconds if equidistant (and at

Methods

Name Description
aggregate Aggregate along an axis.
average Compute the weighted average along the specified axis.
concat Concatenate Datasets along the time axis.
copy Returns a copy of this dataset.
create_data_array Create a new DataArray with the same time and geometry as the dataset.
describe Generate descriptive statistics.
dropna Remove time steps where all items are NaN.
extract_track Extract data along a moving track.
flipud Flip data upside down (on first non-time axis).
insert Insert DataArray in a specific position.
interp Interpolate data in time and space.
interp_like Interpolate in space (and in time) to other geometry (and time axis).
interp_time Temporal interpolation.
isel Return a new Dataset whose data is given by
max Max value along an axis.
mean Mean value along an axis.
merge Merge Datasets along the item dimension.
min Min value along an axis.
nanmax Max value along an axis (NaN removed).
nanmean Mean value along an axis (NaN removed).
nanmin Min value along an axis (NaN removed).
nanquantile Compute the q-th quantile of the data along the specified axis, while ignoring nan values.
nanstd Standard deviation along an axis (NaN removed).
ptp Range (max - min) a.k.a Peak to Peak along an axis
quantile Compute the q-th quantile of the data along the specified axis.
remove Remove DataArray from Dataset.
rename Rename items (DataArrays) in Dataset.
sel Return a new Dataset whose data is given by
squeeze Remove axes of length 1.
std Standard deviation along an axis.
to_dataframe Convert Dataset to a Pandas DataFrame.
to_dfs Write dataset to a new dfs file.
to_numpy Stack data to a single ndarray with shape (n_items, n_timesteps, …).
to_pandas Convert Dataset to a Pandas DataFrame.
to_xarray Export to xarray.Dataset.

aggregate

Dataset.aggregate(axis=0, func=np.nanmean, **kwargs)

Aggregate along an axis.

Parameters

Name Type Description Default
axis int | str axis number or “time”, “space” or “items”, by default 0 0
func Callable default np.nanmean np.nanmean
**kwargs Any additional arguments passed to the function {}

Returns

Name Type Description
Dataset dataset with aggregated values

average

Dataset.average(weights, axis=0, **kwargs)

Compute the weighted average along the specified axis.

Wraps numpy.average

Parameters

Name Type Description Default
weights weights to average over required
axis axis number or “time”, “space” or “items”, by default 0 0
**kwargs additional arguments passed to the function {}

Returns

Name Type Description
Dataset dataset with weighted average values

See Also

nanmean : Mean values with NaN values removed
aggregate : Weighted average

Examples

>>> dfs = Dfsu("HD2D.dfsu")
>>> ds = dfs.read(["Current speed"])
>>> area = dfs.get_element_area()
>>> ds2 = ds.average(axis="space", weights=area)

concat

Dataset.concat(datasets, keep='last')

Concatenate Datasets along the time axis.

Parameters

Name Type Description Default
datasets Sequence['Dataset'] list of Datasets to concatenate required
keep Literal['last', 'first'] which values to keep in case of overlap, by default ‘last’ 'last'

Returns

Name Type Description
Dataset concatenated dataset

Examples

>>> import mikeio
>>> ds1 = mikeio.read("HD2D.dfsu", time=[0,1])
>>> ds2 = mikeio.read("HD2D.dfsu", time=[2,3])
>>> ds1.n_timesteps
2
>>> ds3 = Dataset.concat([ds1,ds2])
>>> ds3.n_timesteps
4

copy

Dataset.copy()

Returns a copy of this dataset.

create_data_array

Dataset.create_data_array(data, item=None)

Create a new DataArray with the same time and geometry as the dataset.

Examples

>>> ds = mikeio.read("file.dfsu")
>>> values = np.zeros(ds.Temperature.shape)
>>> da = ds.create_data_array(values)
>>> da_name = ds.create_data_array(values,"Foo")
>>> da_eum = ds.create_data_array(values, item=mikeio.ItemInfo("TS", mikeio.EUMType.Temperature))

describe

Dataset.describe(**kwargs)

Generate descriptive statistics.

Wraps pandas.DataFrame.describe.

dropna

Dataset.dropna()

Remove time steps where all items are NaN.

extract_track

Dataset.extract_track(track, method='nearest', dtype=np.float32)

Extract data along a moving track.

Parameters

Name Type Description Default
track pd.DataFrame with DatetimeIndex and (x, y) of track points as first two columns x,y coordinates must be in same coordinate system as dfsu required
track pd.DataFrame filename of csv or dfs0 file containing t,x,y required
method Literal['nearest', 'inverse_distance'] Spatial interpolation method (‘nearest’ or ‘inverse_distance’) default=‘nearest’ 'nearest'
dtype Any Data type of the returned data, default=np.float32 np.float32

Returns

Name Type Description
Dataset A dataset with data dimension t The first two items will be x- and y- coordinates of track

flipud

Dataset.flipud()

Flip data upside down (on first non-time axis).

insert

Dataset.insert(key, value)

Insert DataArray in a specific position.

Parameters

Name Type Description Default
key int index in Dataset where DataArray should be inserted required
value DataArray DataArray to be inserted, must comform with with existing DataArrays and must have a unique item name required

interp

Dataset.interp(time=None, x=None, y=None, z=None, n_nearest=3, **kwargs)

Interpolate data in time and space.

This method currently has limited functionality for spatial interpolation. It will be extended in the future.

The spatial parameters available depend on the geometry of the Dataset:

  • Grid1D: x
  • Grid2D: x, y
  • Grid3D: [not yet implemented!]
  • GeometryFM: (x,y)
  • GeometryFMLayered: (x,y) [surface point will be returned!]

Parameters

Name Type Description Default
time (float, pd.DatetimeIndex or Dataset) timestep in seconds or discrete time instances given by pd.DatetimeIndex (typically from another Dataset da2.time), by default None (=don’t interp in time) None
x float x-coordinate of point to be interpolated to, by default None None
y float y-coordinate of point to be interpolated to, by default None None
z float z-coordinate of point to be interpolated to, by default None None
n_nearest int When using IDW interpolation, how many nearest points should be used, by default: 3 3
**kwargs Any Additional keyword arguments are passed to the interpolant {}

Returns

Name Type Description
Dataset new Dataset with interped data

See Also

sel : Select data using label indexing interp_like : Interp to another time/space of another DataSet interp_time : Interp in the time direction only

Examples

>>> ds = mikeio.read("random.dfs1")
>>> ds.interp(time=3600)
>>> ds.interp(x=110)
>>> ds = mikeio.read("HD2D.dfsu")
>>> ds.interp(x=340000, y=6160000)

interp_like

Dataset.interp_like(other, **kwargs)

Interpolate in space (and in time) to other geometry (and time axis).

Note: currently only supports interpolation from dfsu-2d to dfs2 or other dfsu-2d Datasets

Parameters

Name Type Description Default
other 'Dataset' | DataArray | Grid2D | GeometryFM2D | pd.DatetimeIndex Dataset, DataArray, Grid2D or GeometryFM2D to interpolate to required
**kwargs Any additional kwargs are passed to interpolation method {}

Examples

>>> ds = mikeio.read("HD.dfsu")
>>> ds2 = mikeio.read("wind.dfs2")
>>> dsi = ds.interp_like(ds2)
>>> dsi.to_dfs("HD_gridded.dfs2")
>>> dse = ds.interp_like(ds2, extrapolate=True)
>>> dst = ds.interp_like(ds2.time)

Returns

Name Type Description
Dataset Interpolated Dataset

interp_time

Dataset.interp_time(
    dt=None,
    *,
    freq=None,
    method='linear',
    extrapolate=True,
    fill_value=np.nan,
)

Temporal interpolation.

Wrapper of scipy.interpolate.interp1d.

Parameters

Name Type Description Default
dt float | pd.DatetimeIndex | 'Dataset' | DataArray | None output timestep in seconds or discrete time instances given as a pd.DatetimeIndex (typically from another Dataset ds2.time) None
freq str | None pandas frequency None
method str Specifies the kind of interpolation as a string (‘linear’, ‘nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘previous’, ‘next’, where ‘zero’, ‘slinear’, ‘quadratic’ and ‘cubic’ refer to a spline interpolation of zeroth, first, second or third order; ‘previous’ and ‘next’ simply return the previous or next value of the point) or as an integer specifying the order of the spline interpolator to use. Default is ‘linear’. 'linear'
extrapolate bool Default True. If False, a ValueError is raised any time interpolation is attempted on a value outside of the range of x (where extrapolation is necessary). If True, out of bounds values are assigned fill_value True
fill_value float Default NaN. this value will be used to fill in for points outside of the time range. np.nan

Returns

Name Type Description
Dataset

Examples

ds = mikeio.read("../data/HD2D.dfsu")
ds
<mikeio.Dataset>
dims: (time:9, element:884)
time: 1985-08-06 07:00:00 - 1985-08-07 03:00:00 (9 records)
geometry: Dfsu2D (884 elements, 529 nodes)
items:
  0:  Surface elevation <Surface Elevation> (meter)
  1:  U velocity <u velocity component> (meter per sec)
  2:  V velocity <v velocity component> (meter per sec)
  3:  Current speed <Current Speed> (meter per sec)
ds.interp_time(dt=1800)
<mikeio.Dataset>
dims: (time:41, element:884)
time: 1985-08-06 07:00:00 - 1985-08-07 03:00:00 (41 records)
geometry: Dfsu2D (884 elements, 529 nodes)
items:
  0:  Surface elevation <Surface Elevation> (meter)
  1:  U velocity <u velocity component> (meter per sec)
  2:  V velocity <v velocity component> (meter per sec)
  3:  Current speed <Current Speed> (meter per sec)
ds.interp_time(freq='2h')
<mikeio.Dataset>
dims: (time:11, element:884)
time: 1985-08-06 07:00:00 - 1985-08-07 03:00:00 (11 records)
geometry: Dfsu2D (884 elements, 529 nodes)
items:
  0:  Surface elevation <Surface Elevation> (meter)
  1:  U velocity <u velocity component> (meter per sec)
  2:  V velocity <v velocity component> (meter per sec)
  3:  Current speed <Current Speed> (meter per sec)

isel

Dataset.isel(idx=None, axis=0, **kwargs)

Return a new Dataset whose data is given by integer indexing along the specified dimension(s).

The spatial parameters available depend on the dims (i.e. geometry) of the Dataset:

  • Grid1D: x
  • Grid2D: x, y
  • Grid3D: x, y, z
  • GeometryFM: element

Parameters

Name Type Description Default
idx int | Sequence[int] | slice | None Index, or indices, along the specified dimension(s) None
axis int | str axis number or “time”, by default 0 0
time int time index,by default None required
x int x index, by default None required
y int y index, by default None required
z int z index, by default None required
element int Bounding box of coordinates (left lower and right upper) to be selected, by default None required
**kwargs Any Not used {}

Returns

Name Type Description
Dataset dataset with subset

Examples

>>> ds = mikeio.read("europe_wind_long_lat.dfs2")
>>> ds.isel(time=-1)
>>> ds.isel(x=slice(10,20), y=slice(40,60))
>>> ds.isel(y=34)
>>> ds = mikeio.read("tests/testdata/HD2D.dfsu")
>>> ds2 = ds.isel(time=[0,1,2])
>>> ds3 = ds2.isel(elements=[100,200])

max

Dataset.max(axis=0, **kwargs)

Max value along an axis.

Parameters

Name Type Description Default
axis int | str axis number or “time”, “space” or “items”, by default 0 0
**kwargs Any additional arguments passed to the function {}

Returns

Name Type Description
Dataset dataset with max values

See Also

nanmax : Max values with NaN values removed

mean

Dataset.mean(axis=0, **kwargs)

Mean value along an axis.

Parameters

Name Type Description Default
axis int | str axis number or “time”, “space” or “items”, by default 0 0
**kwargs Any additional arguments passed to the function {}

Returns

Name Type Description
Dataset dataset with mean values

See Also

nanmean : Mean values with NaN values removed
average : Weighted average

merge

Dataset.merge(datasets)

Merge Datasets along the item dimension.

Parameters

Name Type Description Default
datasets Sequence['Dataset'] list of Datasets to merge required

Returns

Name Type Description
Dataset merged dataset

min

Dataset.min(axis=0, **kwargs)

Min value along an axis.

Parameters

Name Type Description Default
axis int | str axis number or “time”, “space” or “items”, by default 0 0
**kwargs Any additional arguments passed to the function {}

Returns

Name Type Description
Dataset dataset with min values

See Also

nanmin : Min values with NaN values removed

nanmax

Dataset.nanmax(axis=0, **kwargs)

Max value along an axis (NaN removed).

Parameters

Name Type Description Default
axis int | str axis number or “time”, “space” or “items”, by default 0 0
**kwargs Any additional arguments passed to the function {}

See Also

max : Mean values

Returns

Name Type Description
Dataset dataset with max values

nanmean

Dataset.nanmean(axis=0, **kwargs)

Mean value along an axis (NaN removed).

Parameters

Name Type Description Default
axis int | str axis number or “time”, “space” or “items”, by default 0 0
**kwargs Any additional arguments passed to the function {}

Returns

Name Type Description
Dataset dataset with mean values

nanmin

Dataset.nanmin(axis=0, **kwargs)

Min value along an axis (NaN removed).

Parameters

Name Type Description Default
axis int | str axis number or “time”, “space” or “items”, by default 0 0
**kwargs Any additional arguments passed to the function {}

Returns

Name Type Description
Dataset dataset with min values

nanquantile

Dataset.nanquantile(q, *, axis=0, **kwargs)

Compute the q-th quantile of the data along the specified axis, while ignoring nan values.

Wrapping np.nanquantile

Parameters

Name Type Description Default
q float | Sequence[float] Quantile or sequence of quantiles to compute, which must be between 0 and 1 inclusive. required
axis int | str axis number or “time”, “space” or “items”, by default 0 0
**kwargs Any additional arguments passed to the function {}

Examples

>>> ds.nanquantile(q=[0.25,0.75])
>>> ds.nanquantile(q=0.5)
>>> ds.nanquantile(q=[0.01,0.5,0.99], axis="space")

Returns

Name Type Description
Dataset dataset with quantile values

nanstd

Dataset.nanstd(axis=0, **kwargs)

Standard deviation along an axis (NaN removed).

Parameters

Name Type Description Default
axis int | str axis number or “time”, “space” or “items”, by default 0 0
**kwargs Any additional arguments passed to the function {}

Returns

Name Type Description
Dataset dataset with standard deviation values

See Also

std : Standard deviation

ptp

Dataset.ptp(axis=0, **kwargs)

Range (max - min) a.k.a Peak to Peak along an axis

Parameters.

axis: (int, str, None), optional axis number or “time”, “space” or “items”, by default 0

Returns

Name Type Description
Dataset dataset with peak to peak values

quantile

Dataset.quantile(q, *, axis=0, **kwargs)

Compute the q-th quantile of the data along the specified axis.

Wrapping np.quantile

Parameters

Name Type Description Default
q float | Sequence[float] Quantile or sequence of quantiles to compute, which must be between 0 and 1 inclusive. required
axis int | str axis number or “time”, “space” or “items”, by default 0 0
**kwargs Any additional arguments passed to the function {}

Returns

Name Type Description
Dataset dataset with quantile values

Examples

>>> ds.quantile(q=[0.25,0.75])
>>> ds.quantile(q=0.5)
>>> ds.quantile(q=[0.01,0.5,0.99], axis="space")

See Also

nanquantile : quantile with NaN values ignored

remove

Dataset.remove(key)

Remove DataArray from Dataset.

Parameters

Name Type Description Default
key (int, str) index or name of DataArray to be remove from Dataset required

See also

pop

rename

Dataset.rename(mapper, inplace=False)

Rename items (DataArrays) in Dataset.

Parameters

Name Type Description Default
mapper Mapping[str, str] dictionary (or similar) mapping from old to new names required
inplace bool Should the renaming be done in the original dataset(=True) or return a new(=False)?, by default False False

Returns

Name Type Description
Dataset

Examples

>>> ds = mikeio.read("tide1.dfs1")
>>> newds = ds.rename({"Level":"Surface Elevation"})
>>> ds.rename({"Level":"Surface Elevation"}, inplace=True)

sel

Dataset.sel(**kwargs)

Return a new Dataset whose data is given by selecting index labels along the specified dimension(s).

In contrast to Dataset.isel, indexers for this method should use labels instead of integers.

The spatial parameters available depend on the geometry of the Dataset:

  • Grid1D: x
  • Grid2D: x, y, coords, area
  • Grid3D: [not yet implemented! use isel instead]
  • GeometryFM: (x,y), coords, area
  • GeometryFMLayered: (x,y,z), coords, area, layers

Parameters

Name Type Description Default
time (str, pd.DatetimeIndex or Dataset) time labels e.g. “2018-01” or slice(“2018-1-1”,“2019-1-1”), by default None required
x float x-coordinate of point to be selected, by default None required
y float y-coordinate of point to be selected, by default None required
z float z-coordinate of point to be selected, by default None required
coords np.array(float, float) As an alternative to specifying x, y and z individually, the argument coords can be used instead. (x,y)- or (x,y,z)-coordinates of point to be selected, by default None required
area (float, float, float, float) Bounding box of coordinates (left lower and right upper) to be selected, by default None required
layers int or str or list layer(s) to be selected: “top”, “bottom” or layer number from bottom 0,1,2,… or from the top -1,-2,… or as list of these; only for layered dfsu, by default None required
**kwargs Any Not used {}

Returns

Name Type Description
Dataset new Dataset with selected data

See Also

isel : Select data using integer indexing

Examples

>>> ds = mikeio.read("random.dfs1")
>>> ds.sel(time=slice(None, "2012-1-1 00:02"))
>>> ds.sel(x=100)
>>> ds = mikeio.read("oresund_sigma_z.dfsu")
>>> ds.sel(time="1997-09-15")
>>> ds.sel(x=340000, y=6160000, z=-3)
>>> ds.sel(area=(340000, 6160000, 350000, 6170000))
>>> ds.sel(layers="bottom")

squeeze

Dataset.squeeze()

Remove axes of length 1.

Returns

Name Type Description
Dataset

std

Dataset.std(axis=0, **kwargs)

Standard deviation along an axis.

Parameters

Name Type Description Default
axis int | str axis number or “time”, “space” or “items”, by default 0 0
**kwargs Any additional arguments passed to the function {}

Returns

Name Type Description
Dataset dataset with standard deviation values

See Also

nanstd : Standard deviation with NaN values removed

to_dataframe

Dataset.to_dataframe(unit_in_name=False, round_time='ms')

Convert Dataset to a Pandas DataFrame.

Parameters

Name Type Description Default
unit_in_name bool include unit in column name, default False, False
round_time str | bool round time to, by default “ms”, use False to avoid rounding 'ms'

Returns

Name Type Description
pd.DataFrame

to_dfs

Dataset.to_dfs(filename, **kwargs)

Write dataset to a new dfs file.

Parameters

Name Type Description Default
filename str | Path full path to the new dfs file required
**kwargs Any additional arguments passed to the writing function, e.g. dtype for dfs0 {}

to_numpy

Dataset.to_numpy()

Stack data to a single ndarray with shape (n_items, n_timesteps, …).

Returns

Name Type Description
np.ndarray

to_pandas

Dataset.to_pandas(**kwargs)

Convert Dataset to a Pandas DataFrame.

to_xarray

Dataset.to_xarray()

Export to xarray.Dataset.