ComparerCollection
ComparerCollection(comparers)Collection of comparers.
The ComparerCollection is one of the main objects of the modelskill package. It is a collection of Comparer objects and created either by the match() function, by passing a list of Comparers to the ComparerCollection constructor, or by reading a config file using the from_config() function.
NOTE: In case of multiple model results with different time coverage, only the overlapping time period will be used! (intersection)
Main functionality:
- selecting/filtering data
- skill assessment
skill()mean_skill()gridded_skill()(for track observations)
- plotting
- load/save/export data
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| comparers | list of Comparer | list of uniquely named comparers | required |
Examples
>>> import modelskill as ms
>>> mr = ms.DfsuModelResult("Oresund2D.dfsu", item=0)
>>> o1 = ms.PointObservation("klagshamn.dfs0", item=0, x=366844, y=6154291, name="Klagshamn")
>>> o2 = ms.PointObservation("drogden.dfs0", item=0, x=355568.0, y=6156863.0)
>>> cmp1 = ms.match(o1, mr) # Comparer
>>> cmp2 = ms.match(o2, mr) # Comparer
>>> ccA = ms.ComparerCollection([cmp1, cmp2])
>>> ccB = ms.match(obs=[o1, o2], mod=mr)
>>> sk = ccB.skill()
>>> ccB["Klagshamn"].plot.timeseries()Attributes
| Name | Description |
|---|---|
| plot | Plot using the ComparerCollectionPlotter |
Methods
| Name | Description |
|---|---|
| skill | Aggregated skill assessment of model(s) |
| mean_skill | Weighted mean of skills |
| gridded_skill | Skill assessment of model(s) on a regular spatial grid. |
| score | Weighted mean score of model(s) over all observations |
| rename | Rename observation, model or auxiliary data variables |
| sel | Select data based on model, time and/or area. |
| query | Select data based on a query. |
| save | Save the ComparerCollection to a zip file. |
| load | Load a ComparerCollection from a zip file. |
skill
ComparerCollection.skill(by=None, metrics=None, observed=False)Aggregated skill assessment of model(s)
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| by | str or List[str] | group by, by default [“model”, “observation”] - by column name - by temporal bin of the DateTimeIndex via the freq-argument (using pandas pd.Grouper(freq)), e.g.: ‘freq:M’ = monthly; ‘freq:D’ daily - by the dt accessor of the DateTimeIndex (e.g. ‘dt.month’) using the syntax ‘dt:month’. The dt-argument is different from the freq-argument in that it gives month-of-year rather than month-of-data. - by attributes, stored in the cc.data.attrs container, e.g.: ‘attrs:obs_provider’ = group by observation provider or ‘attrs:gtype’ = group by geometry type (track or point) | None |
| metrics | list | list of modelskill.metrics (or str), by default modelskill.options.metrics.list | None |
| observed | bool | This only applies if any of the groupers are Categoricals. - True: only show observed values for categorical groupers. - False: show all values for categorical groupers. | False |
Returns
| Name | Type | Description |
|---|---|---|
| SkillTable | skill assessment as a SkillTable object |
See also
sel a method for filtering/selecting data
Examples
>>> import modelskill as ms
>>> cc = ms.match([HKNA,EPL,c2], mr)
>>> cc.skill().round(2)
n bias rmse urmse mae cc si r2
observation
HKNA 385 -0.20 0.35 0.29 0.25 0.97 0.09 0.99
EPL 66 -0.08 0.22 0.20 0.18 0.97 0.07 0.99
c2 113 -0.00 0.35 0.35 0.29 0.97 0.12 0.99>>> cc.sel(observation='c2', start='2017-10-28').skill().round(2)
n bias rmse urmse mae cc si r2
observation
c2 41 0.33 0.41 0.25 0.36 0.96 0.06 0.99>>> cc.skill(by='freq:D').round(2)
n bias rmse urmse mae cc si r2
2017-10-27 239 -0.15 0.25 0.21 0.20 0.72 0.10 0.98
2017-10-28 162 -0.07 0.19 0.18 0.16 0.96 0.06 1.00
2017-10-29 163 -0.21 0.52 0.47 0.42 0.79 0.11 0.99mean_skill
ComparerCollection.mean_skill(weights=None, metrics=None)Weighted mean of skills
First, the skill is calculated per observation, the weighted mean of the skills is then found.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| weights | str or List(float) or Dict(str, float) | weighting of observations, by default None - None: use observations weight attribute (if assigned, else “equal”) - “equal”: giving all observations equal weight, - “points”: giving all points equal weight, - list of weights e.g. [0.3, 0.3, 0.4] per observation, - dictionary of observations with special weigths, others will be set to 1.0 | None |
| metrics | list | list of modelskill.metrics, by default modelskill.options.metrics.list | None |
Returns
| Name | Type | Description |
|---|---|---|
| SkillTable | mean skill assessment as a SkillTable object |
See also
skill skill assessment per observation
Examples
>>> import modelskill as ms
>>> cc = ms.match([HKNA,EPL,c2], mod=HKZN_local)
>>> cc.mean_skill().round(2)
n bias rmse urmse mae cc si r2
HKZN_local 564 -0.09 0.31 0.28 0.24 0.97 0.09 0.99
>>> sk = cc.mean_skill(weights="equal")
>>> sk = cc.mean_skill(weights="points")
>>> sk = cc.mean_skill(weights={"EPL": 2.0}) # more weight on EPL, others=1.0gridded_skill
ComparerCollection.gridded_skill(
bins=5,
binsize=None,
by=None,
metrics=None,
n_min=None,
**kwargs,
)Skill assessment of model(s) on a regular spatial grid.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| bins | int | criteria to bin x and y by, argument bins to pd.cut(), default 5 define different bins for x and y a tuple e.g.: bins = 5, bins = (5,[2,3,5]) | 5 |
| binsize | float | bin size for x and y dimension, overwrites bins creates bins with reference to round(mean(x)), round(mean(y)) | None |
| by | (str, List[str]) | group by, by default [“model”, “observation”] - by column name - by temporal bin of the DateTimeIndex via the freq-argument (using pandas pd.Grouper(freq)), e.g.: ‘freq:M’ = monthly; ‘freq:D’ daily - by the dt accessor of the DateTimeIndex (e.g. ‘dt.month’) using the syntax ‘dt:month’. The dt-argument is different from the freq-argument in that it gives month-of-year rather than month-of-data. | None |
| metrics | list | list of modelskill.metrics, by default modelskill.options.metrics.list | None |
| n_min | int | minimum number of observations in a grid cell; cells with fewer observations get a score of np.nan |
None |
Returns
| Name | Type | Description |
|---|---|---|
| SkillGrid | skill assessment as a SkillGrid object |
See also
skill a method for aggregated skill assessment
Examples
>>> import modelskill as ms
>>> cc = ms.match([HKNA,EPL,c2], mr) # with satellite track measurements
>>> gs = cc.gridded_skill(metrics='bias')
>>> gs.data
<xarray.Dataset>
Dimensions: (x: 5, y: 5)
Coordinates:
observation 'alti'
* x (x) float64 -0.436 1.543 3.517 5.492 7.466
* y (y) float64 50.6 51.66 52.7 53.75 54.8
Data variables:
n (x, y) int32 3 0 0 14 37 17 50 36 72 ... 0 0 15 20 0 0 0 28 76
bias (x, y) float64 -0.02626 nan nan ... nan 0.06785 -0.1143>>> gs = cc.gridded_skill(binsize=0.5)
>>> gs.data.coords
Coordinates:
observation 'alti'
* x (x) float64 -1.5 -0.5 0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5
* y (y) float64 51.5 52.5 53.5 54.5 55.5 56.5score
ComparerCollection.score(metric=mtr.rmse, weights=None)Weighted mean score of model(s) over all observations
Wrapping mean_skill() with a single metric.
NOTE: will take simple mean over different quantities!
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| weights | str or List(float) or Dict(str, float) | weighting of observations, by default None - None: use observations weight attribute (if assigned, else “equal”) - “equal”: giving all observations equal weight, - “points”: giving all points equal weight, - list of weights e.g. [0.3, 0.3, 0.4] per observation, - dictionary of observations with special weigths, others will be set to 1.0 | None |
| metric | list | a single metric from modelskill.metrics, by default rmse | mtr.rmse |
Returns
| Name | Type | Description |
|---|---|---|
| Dict[str, float] | mean of skills score as a single number (for each model) |
See also
skill skill assessment per observation mean_skill weighted mean of skills assessment mean_skill_points skill assessment pooling all observation points together
Examples
>>> import modelskill as ms
>>> cc = ms.match([o1, o2], mod)
>>> cc.score()
{'mod': 0.30681206}
>>> cc.score(weights=[0.1,0.1,0.8])
{'mod': 0.3383011631797379}>>> cc.score(weights='points', metric="mape")
{'mod': 8.414442957854142}rename
ComparerCollection.rename(mapping)Rename observation, model or auxiliary data variables
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| mapping | dict | mapping of old names to new names | required |
Returns
| Name | Type | Description |
|---|---|---|
| ComparerCollection |
Examples
>>> cc = ms.match([o1, o2], [mr1, mr2])
>>> cc.mod_names
['mr1', 'mr2']
>>> cc2 = cc.rename({'mr1': 'model1'})
>>> cc2.mod_names
['model1', 'mr2']sel
ComparerCollection.sel(
model=None,
observation=None,
quantity=None,
start=None,
end=None,
time=None,
area=None,
**kwargs,
)Select data based on model, time and/or area.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| model | str or int or list of str or list of int | Model name or index. If None, all models are selected. | None |
| observation | str or int or list of str or list of int | Observation name or index. If None, all observations are selected. | None |
| quantity | str or int or list of str or list of int | Quantity name or index. If None, all quantities are selected. | None |
| start | str or datetime | Start time. If None, all times are selected. | None |
| end | str or datetime | End time. If None, all times are selected. | None |
| time | str or datetime | Time. If None, all times are selected. | None |
| area | list of float | bbox: [x0, y0, x1, y1] or Polygon. If None, all areas are selected. | None |
| **kwargs | Any | Filtering by comparer attrs similar to xarray.Dataset.filter_by_attrs e.g. sel(gtype='track') or sel(obs_provider='CMEMS') if at least one comparer has an entry obs_provider with value CMEMS in its attrs container. Multiple kwargs are combined with logical AND. |
{} |
Returns
| Name | Type | Description |
|---|---|---|
| ComparerCollection | New ComparerCollection with selected data. |
query
ComparerCollection.query(query)Select data based on a query.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| query | str | Query string. See pandas.DataFrame.query() for details. | required |
Returns
| Name | Type | Description |
|---|---|---|
| ComparerCollection | New ComparerCollection with selected data. |
save
ComparerCollection.save(filename)Save the ComparerCollection to a zip file.
Each comparer is stored as a netcdf file in the zip file.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| filename | str or Path | Filename of the zip file. | required |
Examples
>>> cc = ms.match(obs, mod)
>>> cc.save("my_comparer_collection.msk")load
ComparerCollection.load(filename)Load a ComparerCollection from a zip file.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| filename | str or Path | Filename of the zip file. | required |
Returns
| Name | Type | Description |
|---|---|---|
| ComparerCollection | The loaded ComparerCollection. |
Examples
>>> cc = ms.match(obs, mod)
>>> cc.save("my_comparer_collection.msk")
>>> cc2 = ms.ComparerCollection.load("my_comparer_collection.msk")