Output statistics#

After running your MIKE simulation, you would often want to make different kinds of summary statistics of your data - both for your own understanding and for communicating your results.

Examples of statistics

  • Min, max, mean, standard deviation

  • Quantiles/percentiles (e.g. median, interquartile range, return period etc)

  • Probability/frequency of exceedance

Types of aggregations

  • Total - aggregate all data to a single number

  • Temporal - aggregate all time steps; for a dfsu 2d, the result would be a map

  • Spatial - aggregate all elements to a single value per time step

  • Others: monthly, by layer, spatial bin, sub domain etc…

Ways of calculating

import mikeio
import mikeio.generic as generic

Dataset#

For smaller dfs files (maybe up to 2GB) it can be convenient to read the data to memory before doing aggregations. The MIKEIO.Dataset class have several methods for aggregating data along an axis. See the generic section below for larger-than-memory data.

ds = mikeio.read("data/NorthSea_HD_and_windspeed.dfsu")
ds
<mikeio.Dataset>
dims: (time:67, element:958)
time: 2017-10-27 00:00:00 - 2017-10-29 18:00:00 (67 records)
geometry: Dfsu2D (958 elements, 570 nodes)
items:
  0:  Surface elevation <Surface Elevation> (meter)
  1:  Wind speed <Wind speed> (meter per sec)

Temporal aggregations: mean#

The default is to aggregate along the time axis - the output will therefore be a map.

dsm  = ds.mean()
mean_ws = dsm["Wind speed"]
mean_ws.shape
(958,)
mean_ws.plot(title="Mean wind speed");
_images/817d8b375ff49dfae4b1e80be89f7534747d58c7fe45db7333e4388089bb183c.png

Spatial aggregations#

The Dataset aggregation methods (e.g. mean) takes an axis argument. If we give it the spatial axis (or the string ‘space’), it will produce a time series of spatially aggregated values.

Note

It’s important to note that the spatial aggregations here ignores element areas! Only average takes a weights argument.

df = ds.mean(axis="space").to_dataframe()
df.head()
Surface elevation Wind speed
2017-10-27 00:00:00 0.256916 10.234554
2017-10-27 01:00:00 0.274964 10.264292
2017-10-27 02:00:00 0.287414 10.531686
2017-10-27 03:00:00 0.290940 10.794677
2017-10-27 04:00:00 0.287235 10.858319
df['Wind speed'].plot();
_images/13fc6c97cd6ac50413101651a7e7f8f45afd57b255e1366da8a577b5e2378c53.png

Dataset has other methods for calculating typical statistics, e.g. max, quantile…

ds[["Wind speed"]].max(axis="space").to_dataframe().plot(title="Max wind speed");
_images/c13574c1f662be4dd7fc2531acdced64f06f5e1e04eb56d7fafccaab72d716fb.png
ds[["Wind speed"]].quantile(q=[0.1,0.5,0.9],axis="space").to_dataframe().plot();
_images/474047cfde450627c7ab1610b19a4706ce74058f8b0aff7c981fcd5080ccd05e.png

It’s important to know that the element area is not taking into account when doing the spatial aggregations! Only Dataset.average supports weighted averages.

df = ds[["Wind speed"]].mean(axis="space").to_dataframe()
df.columns = ["Simple mean"]
area=ds.geometry.get_element_area()
df['Weighted'] = ds[["Wind speed"]].average(axis="space", weights=area).to_dataframe()
df.plot(title="Mean wind speed (simple vs weighted by element area)");
_images/2db1b0cdf1fa6c40ab639440fa7fc69dd82e8fcd9aff93edda9cb919a1fa0484.png

Quantiles to file#

dsq  = ds.quantile(q=[0.1,0.5,0.9])
dsq
<mikeio.Dataset>
dims: (element:958)
time: 2017-10-27 00:00:00 (time-invariant)
geometry: Dfsu2D (958 elements, 570 nodes)
items:
  0:  Quantile 0.1, Surface elevation <Surface Elevation> (meter)
  1:  Quantile 0.5, Surface elevation <Surface Elevation> (meter)
  2:  Quantile 0.9, Surface elevation <Surface Elevation> (meter)
  3:  Quantile 0.1, Wind speed <Wind speed> (meter per sec)
  4:  Quantile 0.5, Wind speed <Wind speed> (meter per sec)
  5:  Quantile 0.9, Wind speed <Wind speed> (meter per sec)

Write to a new dfsu file

dsq.to_dfs("NorthSea_Quantiles.dfsu")

Total#

Aggregating over all data (both time and space) can be done from the Dataset in a few ways:

  • ds.describe() - will give you summary statistics like pandas df.describe()

  • using axis=None in ds.mean(), ds.min()

  • using standard NumPy aggregation functions on the Dataset data e.g. ds[“Wind speed”].mean()

ds.describe()
Surface elevation Wind speed
count 64186.000000 64186.000000
mean 0.449857 12.772705
std 0.651157 3.694293
min -2.347003 1.190171
25% 0.057831 10.376003
50% 0.466257 12.653086
75% 0.849586 14.885848
max 3.756879 26.213045
ds.min(axis=None).to_dataframe()
Surface elevation Wind speed
2017-10-27 -2.347003 1.190171
ds["Wind speed"].values.min()
1.1901706

Generic#

The MIKEIO.generic submodule can produce common temporal statistics on any dfs file (of any size). The output will be a new dfs file. Currently, generic has these methods for calculating statistics:

  • avg_time()

  • quantile()

generic.avg_time("data/NorthSea_HD_and_windspeed.dfsu", "NorthSea_avg.dfsu")
  0%|          | 0/66 [00:00<?, ?it/s]
100%|██████████| 66/66 [00:00<00:00, 15567.66it/s]

ds = mikeio.read("NorthSea_avg.dfsu", items="Wind speed")
ds
<mikeio.Dataset>
dims: (time:1, element:958)
time: 2017-10-27 00:00:00 (time-invariant)
geometry: Dfsu2D (958 elements, 570 nodes)
items:
  0:  Wind speed <Wind speed> (meter per sec)
ds["Wind speed"].plot(title="Mean wind speed");
_images/817d8b375ff49dfae4b1e80be89f7534747d58c7fe45db7333e4388089bb183c.png
generic.quantile("data/NorthSea_HD_and_windspeed.dfsu", "NorthSea_Quantiles2.dfsu", q=[0.1, 0.5, 0.9])

Custom#

ds = mikeio.read("data/NorthSea_HD_and_windspeed.dfsu")

Dataset.aggregate#

With aggregate we can get Dataset statistics with our “own” function, e.g. standard deviation:

import numpy as np
dsa = ds.aggregate(func=np.std)
dsa["Wind speed"].plot(label="Std [m/s]");
_images/9cd5531df29821273fdd6f08de0c08b9b1d1e60cc2d5f66b53df0fca8359ce20.png

Exceedance probability#

Let’s find out how often the wind exceeds 12m/s in our simulation.

import matplotlib.pyplot as plt
nt = ds.n_timesteps
one_to_zero = 1. - np.arange(1., nt + 1.)/nt
val = ds["Wind speed"].isel(element=0).values
plt.plot(ds.time, val);
plt.plot(ds.time[val>12], val[val>12],'.r');
plt.axhline(y=12,color='r')
plt.ylabel('Wind speed [m/s]')
plt.title('How often is the wind speed above 12m/s (element 0)?');
_images/42e0d74078ecd342fb2e234f1cd0b45ff398c58d51436bf5376ac6ae6365815f.png
plt.plot(np.sort(val), one_to_zero);
plt.xlabel('Wind speed [m/s]')
plt.ylabel('Probability of exceedance [-]')
plt.axvline(x=12,color='r')
plt.title('Wind speed exceedance in element 0');
_images/c017668ee93614beaab66952b8c1f7e5034880145cf4e69e7686bfd89b107205.png
# Create empty DataArray 
item=mikeio.ItemInfo(mikeio.EUMType.Probability)
data = np.full(shape=(1,ds.geometry.n_elements), fill_value=np.nan)
dae = mikeio.DataArray(data=data, time="2017-10-27", item=item, geometry=ds.geometry)
threshold = 12
for j in range(ds.n_elements):
    # this is a naive and slow way of calculating this!
    dat = ds["Wind speed"][:,j].values
    dae[0,j] = np.interp(threshold, np.sort(dat), one_to_zero)
dae100 = dae*100
dae100.plot(title="Wind speed exceeding 12 m/s", 
    label="Time of Exceedance [%]", cmap="YlOrRd");
_images/341faad576845291ddeaaef8d2b7c0c801c1a0507d660feacd0e2a16a877be9c.png
total_hours = (ds.time[-1]-ds.time[0]).total_seconds()/3600
dae_hours = dae*total_hours
dae_hours.plot(title="Wind speed exceeding 12 m/s", 
    label="Exceedance [Hours]", cmap="YlOrRd");
_images/e497f120e84948b8e981468809e34ec980bad5fbec2fc3e0e54dbfd6890d3092.png