Skill Statistics

The Comparer and ComparerCollection objects in ModelSkill allow for detailed performance evaluation of models through skill metrics. These metrics are organized in a SkillTable object, which can be further filtered, analyzed, and visualized to gain insights into model accuracy and reliability.

Construct comparer from observation and model data
import modelskill as ms
o1 = ms.observation("../data/SW/HKNA_Hm0.dfs0", item=0,
                    x=4.2420, y=52.6887,
                    name="HKNA")
o2 = ms.observation("../data/SW/eur_Hm0.dfs0", item=0,
                    x=3.2760, y=51.9990, 
                    name="EPL")
m1 = ms.model_result("../data/SW/HKZN_local_2017_DutchCoast.dfsu", 
                      item="Sign. Wave Height",
                      name="m1")
m2 = ms.model_result("../data/SW/CMEMS_DutchCoast_2017-10-28.nc", 
                      item="VHM0",
                      name="m2")
cc = ms.match([o1, o2], [m1, m2])
cc
<ComparerCollection>
Comparers:
0: HKNA - Significant wave height [m]
1: EPL - Significant wave height [m]

Generating a SkillTable

The skill and mean_skill methods generate a SkillTable by comparing observations and model results using various skill metrics. It supports grouping and aggregation through the by parameter.

Syntax: ComparerCollection.skill(by=None, metrics=None)

Parameter Type Description Default
by str or list Group by column names, temporal bins (e.g., freq:M for monthly), or attributes like attrs. None
metrics list List of metrics (e.g., rmse, bias). Default uses predefined metrics. None

Example 1: Generating a SkillTable

sk = cc.skill(metrics=["bias", "rmse", "si"])
sk
n bias rmse si
model observation
m1 HKNA 120 -0.076142 0.190451 0.060252
EPL 22 -0.190022 0.226535 0.049538
m2 HKNA 120 -0.525915 0.574975 0.080212
EPL 22 -0.428523 0.457555 0.064425

This generates a SkillTable containing metrics for all observations and models.

Example 2: Grouping skill scores

sk_by6hr = cc.skill(by=['model','freq:6h'], 
                    metrics=["bias", "mae"]
                    )
sk_by6hr
n bias mae
model time
m1 2017-10-28 00:00:00 35 -0.077149 0.105698
2017-10-28 06:00:00 42 -0.155072 0.190004
2017-10-28 12:00:00 42 -0.093342 0.174382
2017-10-28 18:00:00 23 -0.007996 0.172298
m2 2017-10-28 00:00:00 35 -0.235906 0.235906
2017-10-28 06:00:00 42 -0.518031 0.518031
2017-10-28 12:00:00 42 -0.660042 0.660042
2017-10-28 18:00:00 23 -0.643544 0.643544

Here, skill scores are grouped by 6 hour (freq:6h), but it could also be by month or year, making it possible to analyze performance trends over time.

Filtering a SkillTable

The SkillTable object supports several methods for filtering and refining data. The sel() method allows selection of specific models or observations, while the query() method enables flexible condition-based filtering.

Example 3: Selecting a specific model

sk_m1 = sk.sel(model='m1')
sk_m1
model n bias rmse si
observation
HKNA m1 120 -0.076142 0.190451 0.060252
EPL m1 22 -0.190022 0.226535 0.049538

This filters the SkillTable to include results for the model named “m1”. See more about filtering on the Selecting data page.

Sorting a SkillTable

The SkillTable supports sorting by index or values, and swapping levels in a MultiIndex to reorganize the data.

Example 4: Sorting by index

sk_sorted = sk.sort_index()
sk_sorted
n bias rmse si
model observation
m1 EPL 22 -0.190022 0.226535 0.049538
HKNA 120 -0.076142 0.190451 0.060252
m2 EPL 22 -0.428523 0.457555 0.064425
HKNA 120 -0.525915 0.574975 0.080212

This sorts the SkillTable by its index levels.

Example 5: Sorting by a specific index level

sk_sorted_obs = sk.sort_index(level="observation")
sk_sorted_obs
n bias rmse si
model observation
m1 EPL 22 -0.190022 0.226535 0.049538
m2 EPL 22 -0.428523 0.457555 0.064425
m1 HKNA 120 -0.076142 0.190451 0.060252
m2 HKNA 120 -0.525915 0.574975 0.080212

Here, the table is sorted specifically by the observation level in the index.

Example 6: Sorting by values

sk_sorted_values = sk.sort_values("rmse")
sk_sorted_values
n bias rmse si
model observation
m1 HKNA 120 -0.076142 0.190451 0.060252
EPL 22 -0.190022 0.226535 0.049538
m2 EPL 22 -0.428523 0.457555 0.064425
HKNA 120 -0.525915 0.574975 0.080212

This sorts the table by the rmse column in ascending order.

Example 7: Sorting by multiple values

sk_sorted_multi = sk.sort_values(["n", "rmse"], ascending=[True, False])
sk_sorted_multi
n bias rmse si
model observation
m2 EPL 22 -0.428523 0.457555 0.064425
m1 EPL 22 -0.190022 0.226535 0.049538
m2 HKNA 120 -0.525915 0.574975 0.080212
m1 HKNA 120 -0.076142 0.190451 0.060252

Here, the table is sorted first by column n (ascending) and then by rmse (descending).

Example 8: Swapping index levels

sk_swapped = sk.swaplevel("model", "observation").sort_index()
sk_swapped
n bias rmse si
observation model
EPL m1 22 -0.190022 0.226535 0.049538
m2 22 -0.428523 0.457555 0.064425
HKNA m1 120 -0.076142 0.190451 0.060252
m2 120 -0.525915 0.574975 0.080212

This swaps the model and observation levels in the MultiIndex and sorts the resulting table.

Rounding and Formatting

The round() method can be used to round all skill values to a specified number of decimal places, making the table more readable.

Example 9: Rounding skill values

sk.round(decimals=2)
n bias rmse si
model observation
m1 HKNA 120 -0.08 0.19 0.06
EPL 22 -0.19 0.23 0.05
m2 HKNA 120 -0.53 0.57 0.08
EPL 22 -0.43 0.46 0.06

This rounds all values in the SkillTable to two decimal places.

Visualizing Skill Metrics

The SkillTable integrates table styling and plotting capabilities, allowing you to quickly visualize skill metrics.

Example 10: Styling the SkillTable

sk.style()
    n bias rmse si
model observation        
m1 HKNA 120 -0.076 0.190 0.060
EPL 22 -0.190 0.227 0.050
m2 HKNA 120 -0.526 0.575 0.080
EPL 22 -0.429 0.458 0.064

The style() method applies color-based styling to the table, making it easier to identify high and low values.

Individual metrics can be accessed as columns and plotted using pandas-style plotting.

Example 11: Plotting a bar chart for RMSE

sk.rmse.plot.bar(figsize=(5,3))

This creates a bar chart showing RMSE values for each model-observation pair.

Example 12: Line plot

sk_by3hr = cc.skill(by=['model','freq:3h'])
sk_by3hr.rmse.plot.line(title="RMSE in 3 hour groups")

This generates a line plot showing RMSE values over the index.

Example 13: Bar chart

sk.rmse.plot.bar()

This creates a bar chart showing RMSE values for each model-observation pair.

Example 14: Horizontal bar chart

sk.rmse.plot.barh()

This generates a horizontal bar chart for RMSE values.

Example 15: Colored grid

sk.rmse.plot.grid()

This produces a colored grid representation of the skill metrics, which can help identify patterns.

Exporting a SkillTable

For further analysis, the SkillTable can be converted to a standard pandas.DataFrame or a GeoDataFrame for spatial data.

Example 16: Converting to DataFrame

df = sk.to_dataframe()
df
n bias rmse si
model observation
m1 HKNA 120 -0.076142 0.190451 0.060252
EPL 22 -0.190022 0.226535 0.049538
m2 HKNA 120 -0.525915 0.574975 0.080212
EPL 22 -0.428523 0.457555 0.064425

This converts the SkillTable into a pandas.DataFrame for additional processing.

Example 17: Converting to GeoDataFrame

gdf = sk.to_geodataframe()

This converts the table to a GeoDataFrame, enabling spatial analysis of model performance.

Summary of Key Methods

The SkillTable object provides tools to filter, format, and visualize skill metrics efficiently:

  • sel(): Select specific models or observations.
  • query(): Apply flexible condition-based filtering.
  • sort_index(): Sort the table by index levels.
  • sort_values(): Sort the table by specific metric values.
  • swaplevel(): Swap levels in the MultiIndex for reorganization.
  • round(): Round skill values to improve readability.
  • plot.line(): Generate a line plot for skill metrics.
  • plot.bar(): Visualize metrics as a bar chart.
  • plot.barh(): Create a horizontal bar chart.
  • plot.grid(): Display a colored grid of skill metrics.
  • style(): Apply color-based formatting for easy interpretation.
  • to_dataframe(): Export to pandas.DataFrame.
  • to_geodataframe(): Export to GeoDataFrame for spatial analysis.

By combining these methods, you can analyze model performance in detail, identify trends, and communicate results effectively.