Dataset

Dataset(self, data, time=None, items=None, geometry=None, zn=None, dims=None, validate=True, dt=1.0)

Dataset containing one or more DataArrays with common geometry and time

Most often obtained by reading a dfs file. But can also be created a sequence or dictonary of DataArrays. The mikeio.Dataset is inspired by and similar to the xarray.Dataset.

The Dataset is primarily a container for one or more DataArrays all having the same time and geometry (and shape, dims, etc). For convenience, the Dataset provides access to these common properties:

Selecting Items

Selecting a specific item “itemA” (at position 0) from a Dataset ds can be done with:

  • ds[[“itemA”]] - returns a new Dataset with “itemA”
  • ds[“itemA”] - returns the “itemA” DataArray
  • ds[[0]] - returns a new Dataset with “itemA”
  • ds[0] - returns the “itemA” DataArray
  • ds.itemA - returns the “itemA” DataArray

Examples

import mikeio
mikeio.read("../data/europe_wind_long_lat.dfs2")
<mikeio.Dataset>
dims: (time:1, y:101, x:221)
time: 2012-01-01 00:00:00 (time-invariant)
geometry: Grid2D (ny=101, nx=221)
items:
  0:  Mean Sea Level Pressure <Air Pressure> (hectopascal)
  1:  Wind x-comp (10m) <Wind Velocity> (meter per sec)
  2:  Wind y-comp (10m) <Wind Velocity> (meter per sec)

Attributes

Name Description
deletevalue File delete value
dims Named array dimensions of each DataArray
end_time Last time instance (as datetime)
geometry Geometry of each DataArray
is_equidistant Is Dataset equidistant in time?
items ItemInfo for each of the DataArrays as a list
n_elements Number of spatial elements/points
n_items Number of items/DataArrays, equivalent to len()
n_timesteps Number of time steps
names Name of each of the DataArrays as a list
ndim Number of array dimensions of each DataArray
shape Shape of each DataArray
start_time First time instance (as datetime)
time Time axis
timestep Time step in seconds if equidistant (and at

Methods

Name Description
aggregate Aggregate along an axis
average Compute the weighted average along the specified axis.
concat Concatenate Datasets along the time axis
copy Returns a copy of this dataset.
create_data_array Create a new DataArray with the same time and geometry as the dataset
describe Generate descriptive statistics by wrapping :py:meth:pandas.DataFrame.describe
dropna Remove time steps where all items are NaN
extract_track Extract data along a moving track
flipud Flip data upside down (on first non-time axis)
insert Insert DataArray in a specific position
interp Interpolate data in time and space
interp_like Interpolate in space (and in time) to other geometry (and time axis)
interp_time Temporal interpolation
isel Return a new Dataset whose data is given by
max Max value along an axis
mean Mean value along an axis
merge Merge Datasets along the item dimension
min Min value along an axis
nanmax Max value along an axis (NaN removed)
nanmean Mean value along an axis (NaN removed)
nanmin Min value along an axis (NaN removed)
nanquantile Compute the q-th quantile of the data along the specified axis, while ignoring nan values.
nanstd Standard deviation along an axis (NaN removed)
ptp Range (max - min) a.k.a Peak to Peak along an axis
quantile Compute the q-th quantile of the data along the specified axis.
remove Remove DataArray from Dataset
rename Rename items (DataArrays) in Dataset
sel Return a new Dataset whose data is given by
squeeze Remove axes of length 1
std Standard deviation along an axis
to_dataframe Convert Dataset to a Pandas DataFrame
to_dfs Write dataset to a new dfs file
to_numpy Stack data to a single ndarray with shape (n_items, n_timesteps, …)
to_pandas Convert Dataset to a Pandas DataFrame
to_xarray Export to xarray.Dataset

aggregate

Dataset.aggregate(axis=0, func=np.nanmean, **kwargs)

Aggregate along an axis

Parameters

Name Type Description Default
axis int | str axis number or “time”, “space” or “items”, by default 0 0
func Callable default np.nanmean np.nanmean

Returns

Type Description
Dataset dataset with aggregated values

average

Dataset.average(weights, axis=0, **kwargs)

Compute the weighted average along the specified axis.

Wraps :py:meth:numpy.average

Parameters

Name Type Description Default
weights weights to average over required
axis axis number or “time”, “space” or “items”, by default 0 0

Returns

Type Description
Dataset dataset with weighted average values

See Also

nanmean : Mean values with NaN values removed
aggregate : Weighted average

Examples

>>> dfs = Dfsu("HD2D.dfsu")
>>> ds = dfs.read(["Current speed"])
>>> area = dfs.get_element_area()
>>> ds2 = ds.average(axis="space", weights=area)

concat

Dataset.concat(datasets, keep='last')

Concatenate Datasets along the time axis

Parameters

Name Type Description Default
datasets Sequence[‘Dataset’] required
keep Literal[‘last’, ‘first’] which values to keep in case of overlap, by default ‘last’ 'last'

Returns

Type Description
Dataset concatenated dataset

Examples

>>> import mikeio
>>> ds1 = mikeio.read("HD2D.dfsu", time=[0,1])
>>> ds2 = mikeio.read("HD2D.dfsu", time=[2,3])
>>> ds1.n_timesteps
2
>>> ds3 = Dataset.concat([ds1,ds2])
>>> ds3.n_timesteps
4

copy

Dataset.copy()

Returns a copy of this dataset.

create_data_array

Dataset.create_data_array(data, item=None)

Create a new DataArray with the same time and geometry as the dataset

Examples

>>> ds = mikeio.read("file.dfsu")
>>> values = np.zeros(ds.Temperature.shape)
>>> da = ds.create_data_array(values)
>>> da_name = ds.create_data_array(values,"Foo")
>>> da_eum = ds.create_data_array(values, item=mikeio.ItemInfo("TS", mikeio.EUMType.Temperature))

describe

Dataset.describe(**kwargs)

Generate descriptive statistics by wrapping :py:meth:pandas.DataFrame.describe

dropna

Dataset.dropna()

Remove time steps where all items are NaN

extract_track

Dataset.extract_track(track, method='nearest', dtype=np.float32)

Extract data along a moving track

Parameters

Name Type Description Default
track pd.DataFrame with DatetimeIndex and (x, y) of track points as first two columns x,y coordinates must be in same coordinate system as dfsu required
track pd.DataFrame filename of csv or dfs0 file containing t,x,y required
method Literal[‘nearest’, ‘inverse_distance’] Spatial interpolation method (‘nearest’ or ‘inverse_distance’) default=‘nearest’ 'nearest'

Returns

Type Description
Dataset A dataset with data dimension t The first two items will be x- and y- coordinates of track

flipud

Dataset.flipud()

Flip data upside down (on first non-time axis)

insert

Dataset.insert(key, value)

Insert DataArray in a specific position

Parameters

Name Type Description Default
key int index in Dataset where DataArray should be inserted required
value DataArray DataArray to be inserted, must comform with with existing DataArrays and must have a unique item name required

interp

Dataset.interp(time=None, x=None, y=None, z=None, n_nearest=3, **kwargs)

Interpolate data in time and space

This method currently has limited functionality for spatial interpolation. It will be extended in the future.

The spatial parameters available depend on the geometry of the Dataset:

  • Grid1D: x
  • Grid2D: x, y
  • Grid3D: [not yet implemented!]
  • GeometryFM: (x,y)
  • GeometryFMLayered: (x,y) [surface point will be returned!]

Parameters

Name Type Description Default
time (float, pd.DatetimeIndex or Dataset) timestep in seconds or discrete time instances given by pd.DatetimeIndex (typically from another Dataset da2.time), by default None (=don’t interp in time) None
x float x-coordinate of point to be interpolated to, by default None None
y float y-coordinate of point to be interpolated to, by default None None
n_nearest int When using IDW interpolation, how many nearest points should be used, by default: 3 3

Returns

Type Description
Dataset new Dataset with interped data

See Also

sel : Select data using label indexing interp_like : Interp to another time/space of another DataSet interp_time : Interp in the time direction only

Examples

>>> ds = mikeio.read("random.dfs1")
>>> ds.interp(time=3600)
>>> ds.interp(x=110)
>>> ds = mikeio.read("HD2D.dfsu")
>>> ds.interp(x=340000, y=6160000)

interp_like

Dataset.interp_like(other, **kwargs)

Interpolate in space (and in time) to other geometry (and time axis)

Note: currently only supports interpolation from dfsu-2d to dfs2 or other dfsu-2d Datasets

Parameters

Name Type Description Default
other ‘Dataset’ | DataArray | Grid2D | GeometryFM2D | pd.DatetimeIndex required
kwargs Any {}

Examples

>>> ds = mikeio.read("HD.dfsu")
>>> ds2 = mikeio.read("wind.dfs2")
>>> dsi = ds.interp_like(ds2)
>>> dsi.to_dfs("HD_gridded.dfs2")
>>> dse = ds.interp_like(ds2, extrapolate=True)
>>> dst = ds.interp_like(ds2.time)

Returns

Type Description
Dataset Interpolated Dataset

interp_time

Dataset.interp_time(dt=None, *, freq=None, method='linear', extrapolate=True, fill_value=np.nan)

Temporal interpolation

Wrapper of :py:class:scipy.interpolate.interp1d

Parameters

Name Type Description Default
dt float | pd.DatetimeIndex | ‘Dataset’ | DataArray | None output timestep in seconds or discrete time instances given as a pd.DatetimeIndex (typically from another Dataset ds2.time) None
freq str | None pandas frequency None
method str Specifies the kind of interpolation as a string (‘linear’, ‘nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘previous’, ‘next’, where ‘zero’, ‘slinear’, ‘quadratic’ and ‘cubic’ refer to a spline interpolation of zeroth, first, second or third order; ‘previous’ and ‘next’ simply return the previous or next value of the point) or as an integer specifying the order of the spline interpolator to use. Default is ‘linear’. 'linear'
extrapolate bool Default True. If False, a ValueError is raised any time interpolation is attempted on a value outside of the range of x (where extrapolation is necessary). If True, out of bounds values are assigned fill_value True
fill_value float Default NaN. this value will be used to fill in for points outside of the time range. np.nan

Returns

Type Description
Dataset

Examples

>>> ds = mikeio.read("tests/testdata/HD2D.dfsu")
>>> ds
<mikeio.Dataset>
Dimensions: (9, 884)
Time: 1985-08-06 07:00:00 - 1985-08-07 03:00:00
Items:
0:  Surface elevation <Surface Elevation> (meter)
1:  U velocity <u velocity component> (meter per sec)
2:  V velocity <v velocity component> (meter per sec)
3:  Current speed <Current Speed> (meter per sec)
>>> dsi = ds.interp_time(dt=1800)
>>> dsi
<mikeio.Dataset>
Dimensions: (41, 884)
Time: 1985-08-06 07:00:00 - 1985-08-07 03:00:00
Items:
0:  Surface elevation <Surface Elevation> (meter)
1:  U velocity <u velocity component> (meter per sec)
2:  V velocity <v velocity component> (meter per sec)
3:  Current speed <Current Speed> (meter per sec)
>>> dsi = ds.interp_time(freq='2H')

isel

Dataset.isel(idx=None, axis=0, **kwargs)

Return a new Dataset whose data is given by integer indexing along the specified dimension(s).

The spatial parameters available depend on the dims (i.e. geometry) of the Dataset:

  • Grid1D: x
  • Grid2D: x, y
  • Grid3D: x, y, z
  • GeometryFM: element

Parameters

Name Type Description Default
idx int | Sequence[int] | slice | None None
axis int | str axis number or “time”, by default 0 0
time int time index,by default None required
x int x index, by default None required
y int y index, by default None required
z int z index, by default None required
element int Bounding box of coordinates (left lower and right upper) to be selected, by default None required

Returns

Type Description
Dataset dataset with subset

Examples

>>> ds = mikeio.read("europe_wind_long_lat.dfs2")
>>> ds.isel(time=-1)
>>> ds.isel(x=slice(10,20), y=slice(40,60))
>>> ds.isel(y=34)
>>> ds = mikeio.read("tests/testdata/HD2D.dfsu")
>>> ds2 = ds.isel(time=[0,1,2])
>>> ds3 = ds2.isel(elements=[100,200])

max

Dataset.max(axis=0, **kwargs)

Max value along an axis

Parameters

Name Type Description Default
axis int | str axis number or “time”, “space” or “items”, by default 0 0

Returns

Type Description
Dataset dataset with max values

See Also

nanmax : Max values with NaN values removed

mean

Dataset.mean(axis=0, **kwargs)

Mean value along an axis

Parameters

Name Type Description Default
axis int | str axis number or “time”, “space” or “items”, by default 0 0

Returns

Type Description
Dataset dataset with mean values

See Also

nanmean : Mean values with NaN values removed
average : Weighted average

merge

Dataset.merge(datasets)

Merge Datasets along the item dimension

Parameters

Name Type Description Default
datasets Sequence[‘Dataset’] required

Returns

Type Description
Dataset merged dataset

min

Dataset.min(axis=0, **kwargs)

Min value along an axis

Parameters

Name Type Description Default
axis int | str axis number or “time”, “space” or “items”, by default 0 0

Returns

Type Description
Dataset dataset with min values

See Also

nanmin : Min values with NaN values removed

nanmax

Dataset.nanmax(axis=0, **kwargs)

Max value along an axis (NaN removed)

Parameters

Name Type Description Default
axis int | str axis number or “time”, “space” or “items”, by default 0 0

See Also

max : Mean values

Returns

Type Description
Dataset dataset with max values

nanmean

Dataset.nanmean(axis=0, **kwargs)

Mean value along an axis (NaN removed)

Parameters

Name Type Description Default
axis int | str axis number or “time”, “space” or “items”, by default 0 0

Returns

Type Description
Dataset dataset with mean values

nanmin

Dataset.nanmin(axis=0, **kwargs)

Min value along an axis (NaN removed)

Parameters

Name Type Description Default
axis int | str axis number or “time”, “space” or “items”, by default 0 0

Returns

Type Description
Dataset dataset with min values

nanquantile

Dataset.nanquantile(q, *, axis=0, **kwargs)

Compute the q-th quantile of the data along the specified axis, while ignoring nan values.

Wrapping np.nanquantile

Parameters

Name Type Description Default
q float | Sequence[float] Quantile or sequence of quantiles to compute, which must be between 0 and 1 inclusive. required
axis int | str axis number or “time”, “space” or “items”, by default 0 0

Examples

>>> ds.nanquantile(q=[0.25,0.75])
>>> ds.nanquantile(q=0.5)
>>> ds.nanquantile(q=[0.01,0.5,0.99], axis="space")

Returns

Type Description
Dataset dataset with quantile values

nanstd

Dataset.nanstd(axis=0, **kwargs)

Standard deviation along an axis (NaN removed)

Parameters

Name Type Description Default
axis int | str axis number or “time”, “space” or “items”, by default 0 0

Returns

Type Description
Dataset dataset with standard deviation values

See Also

std : Standard deviation

ptp

Dataset.ptp(axis=0, **kwargs)

Range (max - min) a.k.a Peak to Peak along an axis

Parameters

Name Type Description Default
axis int | str axis number or “time”, “space” or “items”, by default 0 0

Returns

Type Description
Dataset dataset with peak to peak values

quantile

Dataset.quantile(q, *, axis=0, **kwargs)

Compute the q-th quantile of the data along the specified axis.

Wrapping np.quantile

Parameters

Name Type Description Default
q float | Sequence[float] Quantile or sequence of quantiles to compute, which must be between 0 and 1 inclusive. required
axis int | str axis number or “time”, “space” or “items”, by default 0 0

Returns

Type Description
Dataset dataset with quantile values

Examples

>>> ds.quantile(q=[0.25,0.75])
>>> ds.quantile(q=0.5)
>>> ds.quantile(q=[0.01,0.5,0.99], axis="space")

See Also

nanquantile : quantile with NaN values ignored

remove

Dataset.remove(key)

Remove DataArray from Dataset

Parameters

Name Type Description Default
key (int, str) index or name of DataArray to be remove from Dataset required

See Also

pop

rename

Dataset.rename(mapper, inplace=False)

Rename items (DataArrays) in Dataset

Parameters

Name Type Description Default
mapper Mapping[str, str] dictionary (or similar) mapping from old to new names required
inplace bool Should the renaming be done in the original dataset(=True) or return a new(=False)?, by default False False

Returns

Type Description
Dataset

Examples

>>> ds = mikeio.read("tide1.dfs1")
>>> newds = ds.rename({"Level":"Surface Elevation"})
>>> ds.rename({"Level":"Surface Elevation"}, inplace=True)

sel

Dataset.sel(**kwargs)

Return a new Dataset whose data is given by selecting index labels along the specified dimension(s).

In contrast to Dataset.isel, indexers for this method should use labels instead of integers.

The spatial parameters available depend on the geometry of the Dataset:

  • Grid1D: x
  • Grid2D: x, y, coords, area
  • Grid3D: [not yet implemented! use isel instead]
  • GeometryFM: (x,y), coords, area
  • GeometryFMLayered: (x,y,z), coords, area, layers

Parameters

Name Type Description Default
time (str, pd.DatetimeIndex or Dataset) time labels e.g. “2018-01” or slice(“2018-1-1”,“2019-1-1”), by default None required
x float x-coordinate of point to be selected, by default None required
y float y-coordinate of point to be selected, by default None required
z float z-coordinate of point to be selected, by default None required
coords np.array(float, float) As an alternative to specifying x, y and z individually, the argument coords can be used instead. (x,y)- or (x,y,z)-coordinates of point to be selected, by default None required
area (float, float, float, float) Bounding box of coordinates (left lower and right upper) to be selected, by default None required
layers int or str or list layer(s) to be selected: “top”, “bottom” or layer number from bottom 0,1,2,… or from the top -1,-2,… or as list of these; only for layered dfsu, by default None required

Returns

Type Description
Dataset new Dataset with selected data

See Also

isel : Select data using integer indexing

Examples

>>> ds = mikeio.read("random.dfs1")
>>> ds.sel(time=slice(None, "2012-1-1 00:02"))
>>> ds.sel(x=100)
>>> ds = mikeio.read("oresund_sigma_z.dfsu")
>>> ds.sel(time="1997-09-15")
>>> ds.sel(x=340000, y=6160000, z=-3)
>>> ds.sel(area=(340000, 6160000, 350000, 6170000))
>>> ds.sel(layers="bottom")

squeeze

Dataset.squeeze()

Remove axes of length 1

Returns

Type Description
Dataset

std

Dataset.std(axis=0, **kwargs)

Standard deviation along an axis

Parameters

Name Type Description Default
axis int | str axis number or “time”, “space” or “items”, by default 0 0

Returns

Type Description
Dataset dataset with standard deviation values

See Also

nanstd : Standard deviation with NaN values removed

to_dataframe

Dataset.to_dataframe(unit_in_name=False, round_time='ms')

Convert Dataset to a Pandas DataFrame

Parameters

Name Type Description Default
unit_in_name bool include unit in column name, default False, False
round_time str | bool round time to, by default “ms”, use False to avoid rounding 'ms'

Returns

Type Description
pd.DataFrame

to_dfs

Dataset.to_dfs(filename, **kwargs)

Write dataset to a new dfs file

Parameters

Name Type Description Default
filename str | Path full path to the new dfs file required
dtype Dfs0 only: set the dfs data type of the written data to e.g. np.float64, by default: DfsSimpleType.Float (=np.float32) required

to_numpy

Dataset.to_numpy()

Stack data to a single ndarray with shape (n_items, n_timesteps, …)

Returns

Type Description
np.ndarray

to_pandas

Dataset.to_pandas(**kwargs)

Convert Dataset to a Pandas DataFrame

to_xarray

Dataset.to_xarray()

Export to xarray.Dataset