Skill Statistics

The Comparer and ComparerCollection objects in ModelSkill allow for detailed performance evaluation of models through skill metrics. These metrics are organized in a SkillTable object, which can be further filtered, analyzed, and visualized to gain insights into model accuracy and reliability.

Construct comparer from observation and model data

import modelskill as ms
o1 = ms.observation("../data/SW/HKNA_Hm0.dfs0", item=0,
                    x=4.2420, y=52.6887,
                    name="HKNA")
o2 = ms.observation("../data/SW/eur_Hm0.dfs0", item=0,
                    x=3.2760, y=51.9990, 
                    name="EPL")
m1 = ms.model_result("../data/SW/HKZN_local_2017_DutchCoast.dfsu", 
                      item="Sign. Wave Height",
                      name="m1")
m2 = ms.model_result("../data/SW/CMEMS_DutchCoast_2017-10-28.nc", 
                      item="VHM0",
                      name="m2")

cc = ms.match([o1, o2], [m1, m2])
cc

<ComparerCollection>
Comparers:
0: HKNA - Significant wave height [m]
1: EPL - Significant wave height [m]

Generating a SkillTable

The skill and mean_skill methods generate a SkillTable by comparing observations and model results using various skill metrics. It supports grouping and aggregation through the by parameter.

Syntax: ComparerCollection.skill(by=None, metrics=None)

Parameter	Type	Description	Default
`by`	str or list	Group by column names, temporal bins (e.g., `freq:M` for monthly), or attributes like `attrs`.	None
`metrics`	list	List of metrics (e.g., `rmse`, `bias`). Default uses predefined metrics.	None

Example 1: Generating a SkillTable

sk = cc.skill(metrics=["bias", "rmse", "si"])
sk

		n	bias	rmse	si
model	observation
m1	HKNA	120	-0.076142	0.190451	0.060252
m1	EPL	22	-0.190022	0.226535	0.049538
m2	HKNA	120	-0.525915	0.574975	0.080212
m2	EPL	22	-0.428523	0.457555	0.064425

This generates a SkillTable containing metrics for all observations and models.

Example 2: Grouping skill scores

sk_by6hr = cc.skill(by=['model','freq:6h'], 
                    metrics=["bias", "mae"]
                    )
sk_by6hr

		n	bias	mae
model	time
m1	2017-10-28 00:00:00	35	-0.077149	0.105698
	2017-10-28 06:00:00	42	-0.155072	0.190004
	2017-10-28 12:00:00	42	-0.093342	0.174382
	2017-10-28 18:00:00	23	-0.007996	0.172298
m2	2017-10-28 00:00:00	35	-0.235906	0.235906
	2017-10-28 06:00:00	42	-0.518031	0.518031
	2017-10-28 12:00:00	42	-0.660042	0.660042
	2017-10-28 18:00:00	23	-0.643544	0.643544

Here, skill scores are grouped by 6 hour (freq:6h), but it could also be by month or year, making it possible to analyze performance trends over time.

Filtering a SkillTable

The SkillTable object supports several methods for filtering and refining data. The sel() method allows selection of specific models or observations, while the query() method enables flexible condition-based filtering.

Example 3: Selecting a specific model

sk_m1 = sk.sel(model='m1')
sk_m1

	model	n	bias	rmse	si
observation
HKNA	m1	120	-0.076142	0.190451	0.060252
EPL	m1	22	-0.190022	0.226535	0.049538

This filters the SkillTable to include results for the model named “m1”. See more about filtering on the Selecting data page.

Sorting a SkillTable

The SkillTable supports sorting by index or values, and swapping levels in a MultiIndex to reorganize the data.

Example 4: Sorting by index

sk_sorted = sk.sort_index()
sk_sorted

		n	bias	rmse	si
model	observation
m1	EPL	22	-0.190022	0.226535	0.049538
m1	HKNA	120	-0.076142	0.190451	0.060252
m2	EPL	22	-0.428523	0.457555	0.064425
m2	HKNA	120	-0.525915	0.574975	0.080212

This sorts the SkillTable by its index levels.

Example 5: Sorting by a specific index level

sk_sorted_obs = sk.sort_index(level="observation")
sk_sorted_obs

		n	bias	rmse	si
model	observation
m1	EPL	22	-0.190022	0.226535	0.049538
m2	EPL	22	-0.428523	0.457555	0.064425
m1	HKNA	120	-0.076142	0.190451	0.060252
m2	HKNA	120	-0.525915	0.574975	0.080212

Here, the table is sorted specifically by the observation level in the index.

Example 6: Sorting by values

sk_sorted_values = sk.sort_values("rmse")
sk_sorted_values

		n	bias	rmse	si
model	observation
m1	HKNA	120	-0.076142	0.190451	0.060252
m1	EPL	22	-0.190022	0.226535	0.049538
m2	EPL	22	-0.428523	0.457555	0.064425
m2	HKNA	120	-0.525915	0.574975	0.080212

This sorts the table by the rmse column in ascending order.

Example 7: Sorting by multiple values

sk_sorted_multi = sk.sort_values(["n", "rmse"], ascending=[True, False])
sk_sorted_multi

		n	bias	rmse	si
model	observation
m2	EPL	22	-0.428523	0.457555	0.064425
m1	EPL	22	-0.190022	0.226535	0.049538
m2	HKNA	120	-0.525915	0.574975	0.080212
m1	HKNA	120	-0.076142	0.190451	0.060252

Here, the table is sorted first by column n (ascending) and then by rmse (descending).

Example 8: Swapping index levels

sk_swapped = sk.swaplevel("model", "observation").sort_index()
sk_swapped

		n	bias	rmse	si
observation	model
EPL	m1	22	-0.190022	0.226535	0.049538
EPL	m2	22	-0.428523	0.457555	0.064425
HKNA	m1	120	-0.076142	0.190451	0.060252
HKNA	m2	120	-0.525915	0.574975	0.080212

This swaps the model and observation levels in the MultiIndex and sorts the resulting table.

Rounding and Formatting

The round() method can be used to round all skill values to a specified number of decimal places, making the table more readable.

Example 9: Rounding skill values

sk.round(decimals=2)

		n	bias	rmse	si
model	observation
m1	HKNA	120	-0.08	0.19	0.06
m1	EPL	22	-0.19	0.23	0.05
m2	HKNA	120	-0.53	0.57	0.08
m2	EPL	22	-0.43	0.46	0.06

This rounds all values in the SkillTable to two decimal places.

Visualizing Skill Metrics

The SkillTable integrates table styling and plotting capabilities, allowing you to quickly visualize skill metrics.

Example 10: Styling the SkillTable

sk.style()

		n	bias	rmse	si
model	observation
m1	HKNA	120	-0.076	0.190	0.060
m1	EPL	22	-0.190	0.227	0.050
m2	HKNA	120	-0.526	0.575	0.080
m2	EPL	22	-0.429	0.458	0.064

The style() method applies color-based styling to the table, making it easier to identify high and low values.

Individual metrics can be accessed as columns and plotted using pandas-style plotting.

Example 11: Plotting a bar chart for RMSE

sk.rmse.plot.bar(figsize=(5,3))

This creates a bar chart showing RMSE values for each model-observation pair.

Example 12: Line plot

sk_by3hr = cc.skill(by=['model','freq:3h'])
sk_by3hr.rmse.plot.line(title="RMSE in 3 hour groups")

This generates a line plot showing RMSE values over the index.

Example 13: Bar chart

sk.rmse.plot.bar()

This creates a bar chart showing RMSE values for each model-observation pair.

Example 14: Horizontal bar chart

sk.rmse.plot.barh()

This generates a horizontal bar chart for RMSE values.

Example 15: Colored grid

sk.rmse.plot.grid()

This produces a colored grid representation of the skill metrics, which can help identify patterns.

Exporting a SkillTable

For further analysis, the SkillTable can be converted to a standard pandas.DataFrame or a GeoDataFrame for spatial data.

Example 16: Converting to DataFrame

df = sk.to_dataframe()
df

		n	bias	rmse	si
model	observation
m1	HKNA	120	-0.076142	0.190451	0.060252
m1	EPL	22	-0.190022	0.226535	0.049538
m2	HKNA	120	-0.525915	0.574975	0.080212
m2	EPL	22	-0.428523	0.457555	0.064425

This converts the SkillTable into a pandas.DataFrame for additional processing.

Example 17: Converting to GeoDataFrame

gdf = sk.to_geodataframe()

This converts the table to a GeoDataFrame, enabling spatial analysis of model performance.

Summary of Key Methods

The SkillTable object provides tools to filter, format, and visualize skill metrics efficiently:

sel(): Select specific models or observations.
query(): Apply flexible condition-based filtering.
sort_index(): Sort the table by index levels.
sort_values(): Sort the table by specific metric values.
swaplevel(): Swap levels in the MultiIndex for reorganization.
round(): Round skill values to improve readability.
plot.line(): Generate a line plot for skill metrics.
plot.bar(): Visualize metrics as a bar chart.
plot.barh(): Create a horizontal bar chart.
plot.grid(): Display a colored grid of skill metrics.
style(): Apply color-based formatting for easy interpretation.
to_dataframe(): Export to pandas.DataFrame.
to_geodataframe(): Export to GeoDataFrame for spatial analysis.

By combining these methods, you can analyze model performance in detail, identify trends, and communicate results effectively.