Output statistics

Output statistics#

After running your MIKE simulation, you would often want to make different kinds of summary statistics of your data - both for your own understanding and for communicating your results.

Examples of statistics

Min, max, mean, standard deviation
Quantiles/percentiles (e.g. median, interquartile range, return period etc)
Probability/frequency of exceedance

Types of aggregations

Total - aggregate all data to a single number
Temporal - aggregate all time steps; for a dfsu 2d, the result would be a map
Spatial - aggregate all elements to a single value per time step
Others: monthly, by layer, spatial bin, sub domain etc…

Ways of calculating

mikeio.Dataset
mikeio.generic (temporal aggregations only; larger-than-memory)
custom code (typically with NumPy)

import mikeio
import mikeio.generic as generic

Dataset#

For smaller dfs files (maybe up to 2GB) it can be convenient to read the data to memory before doing aggregations. The MIKEIO.Dataset class have several methods for aggregating data along an axis. See the generic section below for larger-than-memory data.

ds = mikeio.read("data/NorthSea_HD_and_windspeed.dfsu")
ds

<mikeio.Dataset>
dims: (time:67, element:958)
time: 2017-10-27 00:00:00 - 2017-10-29 18:00:00 (67 records)
geometry: Dfsu2D (958 elements, 570 nodes)
items:
  0:  Surface elevation <Surface Elevation> (meter)
  1:  Wind speed <Wind speed> (meter per sec)

Temporal aggregations: mean#

The default is to aggregate along the time axis - the output will therefore be a map.

dsm  = ds.mean()
mean_ws = dsm["Wind speed"]
mean_ws.shape

(958,)

mean_ws.plot(title="Mean wind speed");

_images/817d8b375ff49dfae4b1e80be89f7534747d58c7fe45db7333e4388089bb183c.png

Spatial aggregations#

The Dataset aggregation methods (e.g. mean) takes an axis argument. If we give it the spatial axis (or the string ‘space’), it will produce a time series of spatially aggregated values.

Note

It’s important to note that the spatial aggregations here ignores element areas! Only average takes a weights argument.

df = ds.mean(axis="space").to_dataframe()
df.head()

	Surface elevation	Wind speed
2017-10-27 00:00:00	0.256916	10.234554
2017-10-27 01:00:00	0.274964	10.264292
2017-10-27 02:00:00	0.287414	10.531686
2017-10-27 03:00:00	0.290940	10.794677
2017-10-27 04:00:00	0.287235	10.858319

df['Wind speed'].plot();

_images/13fc6c97cd6ac50413101651a7e7f8f45afd57b255e1366da8a577b5e2378c53.png

Dataset has other methods for calculating typical statistics, e.g. max, quantile…

ds[["Wind speed"]].max(axis="space").to_dataframe().plot(title="Max wind speed");

_images/c13574c1f662be4dd7fc2531acdced64f06f5e1e04eb56d7fafccaab72d716fb.png

ds[["Wind speed"]].quantile(q=[0.1,0.5,0.9],axis="space").to_dataframe().plot();

_images/474047cfde450627c7ab1610b19a4706ce74058f8b0aff7c981fcd5080ccd05e.png

It’s important to know that the element area is not taking into account when doing the spatial aggregations! Only Dataset.average supports weighted averages.

df = ds[["Wind speed"]].mean(axis="space").to_dataframe()
df.columns = ["Simple mean"]
area=ds.geometry.get_element_area()
df['Weighted'] = ds[["Wind speed"]].average(axis="space", weights=area).to_dataframe()
df.plot(title="Mean wind speed (simple vs weighted by element area)");

_images/2db1b0cdf1fa6c40ab639440fa7fc69dd82e8fcd9aff93edda9cb919a1fa0484.png

Quantiles to file#

dsq  = ds.quantile(q=[0.1,0.5,0.9])
dsq

<mikeio.Dataset>
dims: (element:958)
time: 2017-10-27 00:00:00 (time-invariant)
geometry: Dfsu2D (958 elements, 570 nodes)
items:
  0:  Quantile 0.1, Surface elevation <Surface Elevation> (meter)
  1:  Quantile 0.5, Surface elevation <Surface Elevation> (meter)
  2:  Quantile 0.9, Surface elevation <Surface Elevation> (meter)
  3:  Quantile 0.1, Wind speed <Wind speed> (meter per sec)
  4:  Quantile 0.5, Wind speed <Wind speed> (meter per sec)
  5:  Quantile 0.9, Wind speed <Wind speed> (meter per sec)

Write to a new dfsu file

dsq.to_dfs("NorthSea_Quantiles.dfsu")

Total#

Aggregating over all data (both time and space) can be done from the Dataset in a few ways:

ds.describe() - will give you summary statistics like pandas df.describe()
using axis=None in ds.mean(), ds.min()
using standard NumPy aggregation functions on the Dataset data e.g. ds[“Wind speed”].mean()

ds.describe()

	Surface elevation	Wind speed
count	64186.000000	64186.000000
mean	0.449857	12.772705
std	0.651157	3.694293
min	-2.347003	1.190171
25%	0.057831	10.376003
50%	0.466257	12.653086
75%	0.849586	14.885848
max	3.756879	26.213045

ds.min(axis=None).to_dataframe()

	Surface elevation	Wind speed
2017-10-27	-2.347003	1.190171

ds["Wind speed"].values.min()

1.1901706

Generic#

The MIKEIO.generic submodule can produce common temporal statistics on any dfs file (of any size). The output will be a new dfs file. Currently, generic has these methods for calculating statistics:

avg_time()
quantile()

generic.avg_time("data/NorthSea_HD_and_windspeed.dfsu", "NorthSea_avg.dfsu")

  0%|          | 0/66 [00:00<?, ?it/s]

100%|██████████| 66/66 [00:00<00:00, 15567.66it/s]

ds = mikeio.read("NorthSea_avg.dfsu", items="Wind speed")
ds

<mikeio.Dataset>
dims: (time:1, element:958)
time: 2017-10-27 00:00:00 (time-invariant)
geometry: Dfsu2D (958 elements, 570 nodes)
items:
  0:  Wind speed <Wind speed> (meter per sec)

ds["Wind speed"].plot(title="Mean wind speed");

generic.quantile("data/NorthSea_HD_and_windspeed.dfsu", "NorthSea_Quantiles2.dfsu", q=[0.1, 0.5, 0.9])

Custom#

ds = mikeio.read("data/NorthSea_HD_and_windspeed.dfsu")

Dataset.aggregate#

With aggregate we can get Dataset statistics with our “own” function, e.g. standard deviation:

import numpy as np
dsa = ds.aggregate(func=np.std)

dsa["Wind speed"].plot(label="Std [m/s]");

_images/9cd5531df29821273fdd6f08de0c08b9b1d1e60cc2d5f66b53df0fca8359ce20.png

Exceedance probability#

Let’s find out how often the wind exceeds 12m/s in our simulation.

import matplotlib.pyplot as plt

nt = ds.n_timesteps
one_to_zero = 1. - np.arange(1., nt + 1.)/nt

val = ds["Wind speed"].isel(element=0).values
plt.plot(ds.time, val);
plt.plot(ds.time[val>12], val[val>12],'.r');
plt.axhline(y=12,color='r')
plt.ylabel('Wind speed [m/s]')
plt.title('How often is the wind speed above 12m/s (element 0)?');

_images/42e0d74078ecd342fb2e234f1cd0b45ff398c58d51436bf5376ac6ae6365815f.png

plt.plot(np.sort(val), one_to_zero);
plt.xlabel('Wind speed [m/s]')
plt.ylabel('Probability of exceedance [-]')
plt.axvline(x=12,color='r')
plt.title('Wind speed exceedance in element 0');

_images/c017668ee93614beaab66952b8c1f7e5034880145cf4e69e7686bfd89b107205.png

# Create empty DataArray 
item=mikeio.ItemInfo(mikeio.EUMType.Probability)
data = np.full(shape=(1,ds.geometry.n_elements), fill_value=np.nan)
dae = mikeio.DataArray(data=data, time="2017-10-27", item=item, geometry=ds.geometry)

threshold = 12
for j in range(ds.n_elements):
    # this is a naive and slow way of calculating this!
    dat = ds["Wind speed"][:,j].values
    dae[0,j] = np.interp(threshold, np.sort(dat), one_to_zero)

dae100 = dae*100
dae100.plot(title="Wind speed exceeding 12 m/s", 
    label="Time of Exceedance [%]", cmap="YlOrRd");

_images/341faad576845291ddeaaef8d2b7c0c801c1a0507d660feacd0e2a16a877be9c.png

total_hours = (ds.time[-1]-ds.time[0]).total_seconds()/3600
dae_hours = dae*total_hours
dae_hours.plot(title="Wind speed exceeding 12 m/s", 
    label="Exceedance [Hours]", cmap="YlOrRd");

_images/e497f120e84948b8e981468809e34ec980bad5fbec2fc3e0e54dbfd6890d3092.png