Selecting data

The primary data filtering method of ModelSkill is the sel() method which is accesible on most ModelSkill data structures. The sel() method is a wrapper around xarray.Dataset.sel and can be used to select data based on time, location and/or variable. The sel() method returns a new data structure of the same type with the selected data.

TimeSeries data

Point and track timeseries data of both observation and model result kinds are stored in TimeSeries objects which uses xarray.Dataset as data container. The sel() method can be used to select data based on time and returns a new TimeSeries object with the selected data.

import modelskill as ms
o = ms.observation("../data/obs.nc", item="waterlevel", gtype='point')
o_1month = o.sel(time=slice("2018-01-01", "2018-02-01"))
o_1month

<PointObservation>: obs
Location: nan, nan
Time: 2018-01-01 00:00:00 - 2018-02-01 23:00:00
Quantity:  []

Comparer objects

The Comparer and ComparerCollection objects hold matched data from observations and model results, enabling you to evaluate model performance effectively. These objects provide intuitive methods to filter and query data based on time, model, quantity, or spatial criteria.

The primary methods for filtering the data are:

sel(): Use for structured selections based on time, model, or spatial boundaries.
where(): Use for conditional filtering based on logical criteria.
query(): Use for flexible, expression-based filtering in a pandas-like style.

Observation and model data

o = ms.observation("../data/SW/HKNA_Hm0.dfs0", item=0,
                    x=4.2420, y=52.6887,
                    name="HKNA")
m1 = ms.model_result("../data/SW/HKZN_local_2017_DutchCoast.dfsu", 
                      item="Sign. Wave Height",
                      name="m1")
m2 = ms.model_result("../data/SW/CMEMS_DutchCoast_2017-10-28.nc", 
                      item="VHM0",
                      name="m2")

cmp = ms.match(o, [m1, m2])

`sel()` method

The sel method allows you to select data based on specific criteria such as time, model name, or spatial area. It returns a new Comparer object with the selected data. This method is highly versatile and supports multiple selection parameters, which can be combined.

Syntax: Comparer.sel(model=None, time=None, area=None)

Parameter	Type	Description	Default
`model`	str, int, or list	Model name or index. Selects specific models.	None
`time`	str, datetime, or slice	Specific time or range for selection.	None
`area`	list of float or Polygon	Bounding box [x0, y0, x1, y1] or a polygon area filter.	None

Example 1: Selecting data by time

cmp_12hrs = cmp.sel(time=slice('2017-10-28', '2017-10-28 12:00'))
cmp_12hrs

<Comparer>
Quantity: Significant wave height [m]
Observation: HKNA, n_points=66
Model(s):
0: m1
1: m2

This selects data within the specified time range.

Example 2: Selecting a specific model

cmp_m1 = cmp.sel(model='m1')
cmp_m1

<Comparer>
Quantity: Significant wave height [m]
Observation: HKNA, n_points=120
Model(s):
0: m1

This filters the data to include only the model named “m1”.

Example 3: Selecting a spatial area

cmp_area = cmp.sel(area=[4.0, 52.5, 5.0, 53.0])

This filters the data within the bounding box defined by [x0, y0, x1, y1].

`where()` method

The where method is used to filter data conditionally. It works similarly to xarray’s where method and returns a new Comparer object with values satisfying a given condition. Other values will be masked (set to NaN).

Syntax: Comparer.where(cond)

Parameter	Type	Description
`cond`	bool, np.ndarray, or xr.DataArray	Condition to filter values (True or False).

Example 4: Filtering data conditionally

cmp.where(cmp.data.Observation > 3)

<Comparer>
Quantity: Significant wave height [m]
Observation: HKNA, n_points=52
Model(s):
0: m1
1: m2

This filters out any rows where the observation values are not greater than 3.

Example 5: Multiple conditions

cmp.where((cmp.data.m1 < 2.9) & (cmp.data.Observation > 3))

<Comparer>
Quantity: Significant wave height [m]
Observation: HKNA, n_points=8
Model(s):
0: m1
1: m2

This filters the data to include rows where m1 < 2.9 and Observation > 3.0.

`query()` method

The query method uses a pandas.DataFrame.query-style syntax to filter data based on string-based expressions. It provides a flexible way to apply complex filters using column names and logical operators.

Syntax: Comparer.query(query)

Parameter	Type	Description
`query`	str	Query string for filtering data.

Example 6: Querying data

cmp.query("Observation > 3.0 and m1 < 2.9")

<Comparer>
Quantity: Significant wave height [m]
Observation: HKNA, n_points=8
Model(s):
0: m1
1: m2

This filters the data where Observation is greater than 3.0 and m1 is less than 2.9.

Skill objects

The skill() and mean_skill() methods return a SkillTable object with skill scores from comparing observation and model result data using different metrics (e.g. root mean square error). It returns a SkillTable object, which wraps a pandas.DataFrame and organizes the skill scores for further filtering, visualization, or analysis.

The resulting SkillTable object provides several methods to facilitate filtering and formatting: - sel(): Select specific models or observations. - query(): Apply flexible conditions with pandas-like queries.

sk = cmp.skill(metrics=["rmse", "mae", "si"])
sk

		n	rmse	mae	si
model	observation
m1	HKNA	120	0.190451	0.155128	0.060252
m2	HKNA	120	0.574975	0.525915	0.080212

Example 7: Select model

sk.sel(model='m1')

	model	observation	n	rmse	mae	si
0	m1	HKNA	120	0.190451	0.155128	0.060252

Here, sk contains skill scores for all models, and sk_m1 filters the results to include only model “m1”. Observations can be selected in the same way.

Example 8: Querying skill scores

sk_high_rmse = sk.query("rmse > 0.3")
sk_high_rmse

		n	rmse	mae	si
model	observation
m2	HKNA	120	0.574975	0.525915	0.080212

This filters the SkillTable to include only rows where the root mean square error (RMSE) exceeds 0.3.

Example 9: Accessing and visualizing specific metrics

sk_rmse = sk.rmse
sk_rmse

		rmse
model	observation
m1	HKNA	0.190451
m2	HKNA	0.574975

sk_rmse.plot.bar(figsize=(5,3))

The rmse attribute directly accesses the RMSE column from the SkillTable, which can then be plotted or analyzed further.

TimeSeries data

Comparer objects

sel() method

where() method

query() method

Skill objects

`sel()` method

`where()` method

`query()` method