Pre-matched data with auxiliary data

import modelskill as ms
import numpy as np
import pandas as pd
import mikeio

fn = "../data/SW/eur_matched.dfs0"
mikeio.read(fn)

<mikeio.Dataset>
dims: (time:67)
time: 2017-10-27 00:00:00 - 2017-10-29 18:00:00 (67 records)
geometry: GeometryUndefined()
items:
  0:  Hm0, model <Significant wave height> (meter)
  1:  Hm0, obs <Significant wave height> (meter)
  2:  Wind speed <Wind speed> (meter per sec)
  3:  Wind Direction <Wind Direction> (degree)

The function from_matched() takes a dataframe, a dfs0 or a mikeio.Dataset of already matched data and returns a Comparer object.

cmp = ms.from_matched(fn, obs_item=1, mod_items=0, aux_items=[2,3])
cmp.aux_names

['Wind speed', 'Wind Direction']

# NOTE: we rename data_vars to avoid spaces in names
cmp = cmp.rename({"Wind speed": "wind_speed", "Wind Direction": "wind_dir"})

cmp.aux_names

['wind_speed', 'wind_dir']

cmp

<Comparer>
Quantity: Significant wave height [m]
Observation: Hm0, obs, n_points=67
Model(s):
0: Hm0, model
 Auxiliary: wind_speed
 Auxiliary: wind_dir

cmp.skill()

	n	bias	rmse	urmse	mae	cc	si	r2
observation
Hm0, obs	67	0.052239	0.22824	0.222181	0.174851	0.968321	0.085898	0.929767

cmp.plot.scatter(quantiles=0, figsize=(6,6));
cmp.plot.timeseries();

Filter

Filter on auxiliary data using query() or where(). Below, we consider only wave data when the wind speed is above 15 m/s.

cmp.query("wind_speed > 15.0")

<Comparer>
Quantity: Significant wave height [m]
Observation: Hm0, obs, n_points=19
Model(s):
0: Hm0, model
 Auxiliary: wind_speed
 Auxiliary: wind_dir

cmp2 = cmp.where(cmp.data.wind_speed>15.0)
cmp2

<Comparer>
Quantity: Significant wave height [m]
Observation: Hm0, obs, n_points=19
Model(s):
0: Hm0, model
 Auxiliary: wind_speed
 Auxiliary: wind_dir

# notice that the model data is kept, but the observations are filtered
cmp2.plot.timeseries();

More auxiliary data can be added, e.g. as derived data from the original data.

cmp.data["residual"] = cmp.data["Hm0, model"] - cmp.data["Observation"]

large_residuals = np.abs(cmp.data.residual)>0.1
cmp3 = cmp.where(large_residuals)
cmp3.plot.scatter(figsize=(6,6));
cmp3.plot.timeseries();

cmp3.data.data_vars

Data variables:
    Observation  (time) float64 320B 0.92 1.03 1.24 1.34 ... 3.46 3.37 3.24 3.23
    Hm0, model   (time) float64 320B 1.43 1.655 1.789 ... 3.634 3.531 3.473
    wind_speed   (time) float64 320B 9.754 11.06 11.42 10.93 ... 13.3 13.3 13.54
    wind_dir     (time) float64 320B 327.4 331.5 333.3 ... 343.0 340.8 343.6
    residual     (time) float64 320B 0.5101 0.6253 0.5495 ... 0.2907 0.2427

cmp3.data.Observation.values

array([0.92000002, 1.02999997, 1.24000001, 1.34000003, 1.54999995,
       1.65999997, 1.79999995, 2.1500001 , 2.20000005, 2.1500001 ,
       2.1500001 , 2.08999991, 2.01999998, 2.02999997, 1.88999999,
       1.76999998, 2.1099999 , 2.27999997, 2.31999993, 2.77999997,
       2.72000003, 2.61999989, 2.79999995, 2.91000009, 2.96000004,
       3.31999993, 2.86999989, 3.3599999 , 4.13000011, 4.01000023,
       3.97000003, 3.8900001 , 4.17999983, 3.63000011, 3.79999995,
       3.47000003, 3.46000004, 3.36999989, 3.24000001, 3.23000002])

Aggregate

Let’s split the data based on wind direction sector and aggregate the skill calculation of the significant wave height predition for each sector.

Note: in this short example wind direction is between 274 and 353 degrees

df = cmp.data.wind_dir.to_dataframe()
windsectors = pd.cut(df.wind_dir,
                                [255, 285, 315, 345, 360],
                                labels=["W", "WNW", "NNW", "N"])
cmp.data["windsector"] = windsectors.astype(str)

cmp.skill(by="windsector")

	observation	n	bias	rmse	urmse	mae	cc	si	r2
windsector
NNW	Hm0, obs	28	0.115715	0.285428	0.260920	0.230681	0.969837	0.103408	0.927645
N	Hm0, obs	7	0.070214	0.252445	0.242484	0.222582	0.991219	0.082961	0.859219
WNW	Hm0, obs	15	-0.044628	0.141796	0.134590	0.107524	0.984303	0.049652	0.965368
W	Hm0, obs	17	0.025762	0.164749	0.162723	0.122650	0.962449	0.066609	0.903978

cmp.skill(by="windsector").rmse.plot.bar(title="Hm0 RMSE by wind sector");

cmp.where(cmp.data.windsector=="W").plot.timeseries();