ComparerCollection

ComparerCollection(self, comparers)

Collection of comparers.

The ComparerCollection is one of the main objects of the modelskill package. It is a collection of Comparer objects and created either by the match() function, by passing a list of Comparers to the ComparerCollection constructor, or by reading a config file using the from_config() function.

NOTE: In case of multiple model results with different time coverage, only the overlapping time period will be used! (intersection)

Main functionality:

selecting/filtering data
- __get_item__() - get a single Comparer, e.g., cc[0] or cc['obs1']
- sel()
- query()
skill assessment
- skill()
- mean_skill()
- gridded_skill() (for track observations)
plotting
load/save/export data
- load()
- save()

Parameters

Name	Type	Description	Default
comparers	list of Comparer	list of comparers	required

Examples

>>> import modelskill as ms
>>> mr = ms.DfsuModelResult("Oresund2D.dfsu", item=0)
>>> o1 = ms.PointObservation("klagshamn.dfs0", item=0, x=366844, y=6154291, name="Klagshamn")
>>> o2 = ms.PointObservation("drogden.dfs0", item=0, x=355568.0, y=6156863.0)
>>> cmp1 = ms.match(o1, mr)  # Comparer
>>> cmp2 = ms.match(o2, mr)  # Comparer
>>> ccA = ms.ComparerCollection([cmp1, cmp2])
>>> ccB = ms.match(obs=[o1, o2], mod=mr)
>>> sk = ccB.skill()
>>> ccB["Klagshamn"].plot.timeseries()

Attributes

Name	Description
plot	Plot using the `ComparerCollectionPlotter`

Methods

Name	Description
skill	Aggregated skill assessment of model(s)
mean_skill	Weighted mean of skills
gridded_skill	Skill assessment of model(s) on a regular spatial grid.
score	Weighted mean score of model(s) over all observations
rename	Rename observation, model or auxiliary data variables
sel	Select data based on model, time and/or area.
query	Select data based on a query.
save	Save the ComparerCollection to a zip file.
load	Load a ComparerCollection from a zip file.

skill

ComparerCollection.skill(by=None, metrics=None, observed=False)

Aggregated skill assessment of model(s)

Parameters

Name	Type	Description	Default
by	str or List[str]	group by, by default [“model”, “observation”] - by column name - by temporal bin of the DateTimeIndex via the freq-argument (using pandas pd.Grouper(freq)), e.g.: ‘freq:M’ = monthly; ‘freq:D’ daily - by the dt accessor of the DateTimeIndex (e.g. ‘dt.month’) using the syntax ‘dt:month’. The dt-argument is different from the freq-argument in that it gives month-of-year rather than month-of-data. - by attributes, stored in the cc.data.attrs container, e.g.: ‘attrs:obs_provider’ = group by observation provider or ‘attrs:gtype’ = group by geometry type (track or point)	`None`
metrics	list	list of modelskill.metrics (or str), by default modelskill.options.metrics.list	`None`
observed	bool	This only applies if any of the groupers are Categoricals. - True: only show observed values for categorical groupers. - False: show all values for categorical groupers.	`False`

Returns

Name	Type	Description
	SkillTable	skill assessment as a SkillTable object

Examples

>>> import modelskill as ms
>>> cc = ms.match([HKNA,EPL,c2], mr)
>>> cc.skill().round(2)
               n  bias  rmse  urmse   mae    cc    si    r2
observation
HKNA         385 -0.20  0.35   0.29  0.25  0.97  0.09  0.99
EPL           66 -0.08  0.22   0.20  0.18  0.97  0.07  0.99
c2           113 -0.00  0.35   0.35  0.29  0.97  0.12  0.99

>>> cc.sel(observation='c2', start='2017-10-28').skill().round(2)
               n  bias  rmse  urmse   mae    cc    si    r2
observation
c2            41  0.33  0.41   0.25  0.36  0.96  0.06  0.99

>>> cc.skill(by='freq:D').round(2)
              n  bias  rmse  urmse   mae    cc    si    r2
2017-10-27  239 -0.15  0.25   0.21  0.20  0.72  0.10  0.98
2017-10-28  162 -0.07  0.19   0.18  0.16  0.96  0.06  1.00
2017-10-29  163 -0.21  0.52   0.47  0.42  0.79  0.11  0.99

mean_skill

ComparerCollection.mean_skill(weights=None, metrics=None, **kwargs)

Weighted mean of skills

First, the skill is calculated per observation, the weighted mean of the skills is then found.

Warning: This method is NOT the mean skill of all observational points! (mean_skill_points)

Parameters

Name	Type	Description	Default
weights	str or List(float) or Dict(str, float)	weighting of observations, by default None - None: use observations weight attribute (if assigned, else “equal”) - “equal”: giving all observations equal weight, - “points”: giving all points equal weight, - list of weights e.g. [0.3, 0.3, 0.4] per observation, - dictionary of observations with special weigths, others will be set to 1.0	`None`
metrics	list	list of modelskill.metrics, by default modelskill.options.metrics.list	`None`

Returns

Name	Type	Description
	SkillTable	mean skill assessment as a SkillTable object

Examples

>>> import modelskill as ms
>>> cc = ms.match([HKNA,EPL,c2], mod=HKZN_local)
>>> cc.mean_skill().round(2)
              n  bias  rmse  urmse   mae    cc    si    r2
HKZN_local  564 -0.09  0.31   0.28  0.24  0.97  0.09  0.99
>>> sk = cc.mean_skill(weights="equal")
>>> sk = cc.mean_skill(weights="points")
>>> sk = cc.mean_skill(weights={"EPL": 2.0}) # more weight on EPL, others=1.0

gridded_skill

ComparerCollection.gridded_skill(
    bins=5,
    binsize=None,
    by=None,
    metrics=None,
    n_min=None,
    **kwargs,
)

Skill assessment of model(s) on a regular spatial grid.

Parameters

Name	Type	Description	Default
bins	int	criteria to bin x and y by, argument bins to pd.cut(), default 5 define different bins for x and y a tuple e.g.: bins = 5, bins = (5,[2,3,5])	`5`
binsize	float	bin size for x and y dimension, overwrites bins creates bins with reference to round(mean(x)), round(mean(y))	`None`
by	(str, List[str])	group by, by default [“model”, “observation”] - by column name - by temporal bin of the DateTimeIndex via the freq-argument (using pandas pd.Grouper(freq)), e.g.: ‘freq:M’ = monthly; ‘freq:D’ daily - by the dt accessor of the DateTimeIndex (e.g. ‘dt.month’) using the syntax ‘dt:month’. The dt-argument is different from the freq-argument in that it gives month-of-year rather than month-of-data.	`None`
metrics	list	list of modelskill.metrics, by default modelskill.options.metrics.list	`None`
n_min	int	minimum number of observations in a grid cell; cells with fewer observations get a score of `np.nan`	`None`

Returns

Name	Type	Description
	SkillGrid	skill assessment as a SkillGrid object

Examples

>>> import modelskill as ms
>>> cc = ms.match([HKNA,EPL,c2], mr)  # with satellite track measurements
>>> gs = cc.gridded_skill(metrics='bias')
>>> gs.data
<xarray.Dataset>
Dimensions:      (x: 5, y: 5)
Coordinates:
    observation   'alti'
* x            (x) float64 -0.436 1.543 3.517 5.492 7.466
* y            (y) float64 50.6 51.66 52.7 53.75 54.8
Data variables:
    n            (x, y) int32 3 0 0 14 37 17 50 36 72 ... 0 0 15 20 0 0 0 28 76
    bias         (x, y) float64 -0.02626 nan nan ... nan 0.06785 -0.1143

>>> gs = cc.gridded_skill(binsize=0.5)
>>> gs.data.coords
Coordinates:
    observation   'alti'
* x            (x) float64 -1.5 -0.5 0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5
* y            (y) float64 51.5 52.5 53.5 54.5 55.5 56.5

score

ComparerCollection.score(metric=mtr.rmse, weights=None, **kwargs)

Weighted mean score of model(s) over all observations

Wrapping mean_skill() with a single metric.

NOTE: will take simple mean over different quantities!

Parameters

Name	Type	Description	Default
weights	str or List(float) or Dict(str, float)	weighting of observations, by default None - None: use observations weight attribute (if assigned, else “equal”) - “equal”: giving all observations equal weight, - “points”: giving all points equal weight, - list of weights e.g. [0.3, 0.3, 0.4] per observation, - dictionary of observations with special weigths, others will be set to 1.0	`None`
metric	list	a single metric from modelskill.metrics, by default rmse	`mtr.rmse`

Returns

Name	Type	Description
	Dict[str, float]	mean of skills score as a single number (for each model)

Examples

>>> import modelskill as ms
>>> cc = ms.match([o1, o2], mod)
>>> cc.score()
{'mod': 0.30681206}
>>> cc.score(weights=[0.1,0.1,0.8])
{'mod': 0.3383011631797379}

>>> cc.score(weights='points', metric="mape")
{'mod': 8.414442957854142}

rename

ComparerCollection.rename(mapping)

Rename observation, model or auxiliary data variables

Parameters

Name	Type	Description	Default
mapping	dict	mapping of old names to new names	required

Returns

Name	Type	Description
	ComparerCollection

Examples

>>> cc = ms.match([o1, o2], [mr1, mr2])
>>> cc.mod_names
['mr1', 'mr2']
>>> cc2 = cc.rename({'mr1': 'model1'})
>>> cc2.mod_names
['model1', 'mr2']

sel

ComparerCollection.sel(
    model=None,
    observation=None,
    quantity=None,
    start=None,
    end=None,
    time=None,
    area=None,
    **kwargs,
)

Select data based on model, time and/or area.

Parameters

Name	Type	Description	Default
model	str or int or list of str or list of int	Model name or index. If None, all models are selected.	`None`
observation	str or int or list of str or list of int	Observation name or index. If None, all observations are selected.	`None`
quantity	str or int or list of str or list of int	Quantity name or index. If None, all quantities are selected.	`None`
start	str or datetime	Start time. If None, all times are selected.	`None`
end	str or datetime	End time. If None, all times are selected.	`None`
time	str or datetime	Time. If None, all times are selected.	`None`
area	list of float	bbox: [x0, y0, x1, y1] or Polygon. If None, all areas are selected.	`None`
**kwargs	Any	Filtering by comparer attrs similar to xarray.Dataset.filter_by_attrs e.g. `sel(gtype='track')` or `sel(obs_provider='CMEMS')` if at least one comparer has an entry `obs_provider` with value `CMEMS` in its attrs container. Multiple kwargs are combined with logical AND.	`{}`

Returns

Name	Type	Description
	ComparerCollection	New ComparerCollection with selected data.

query

ComparerCollection.query(query)

Select data based on a query.

Parameters

Name	Type	Description	Default
query	str	Query string. See pandas.DataFrame.query() for details.	required

Returns

Name	Type	Description
	ComparerCollection	New ComparerCollection with selected data.

save

ComparerCollection.save(filename)

Save the ComparerCollection to a zip file.

Each comparer is stored as a netcdf file in the zip file.

Parameters

Name	Type	Description	Default
filename	str or Path	Filename of the zip file.	required

Examples

>>> cc = ms.match(obs, mod)
>>> cc.save("my_comparer_collection.msk")

load

ComparerCollection.load(filename)

Load a ComparerCollection from a zip file.

Parameters

Name	Type	Description	Default
filename	str or Path	Filename of the zip file.	required

Returns

Name	Type	Description
	ComparerCollection	The loaded ComparerCollection.

Examples

>>> cc = ms.match(obs, mod)
>>> cc.save("my_comparer_collection.msk")
>>> cc2 = ms.ComparerCollection.load("my_comparer_collection.msk")

ComparerCollection

Parameters

Examples

Attributes

Methods

skill

Parameters

Returns

See also

Examples

mean_skill

Parameters

Returns

See also

Examples

gridded_skill

Parameters

Returns

See also

Examples

score

Parameters

Returns

See also

Examples

rename

Parameters

Returns

Examples

sel

Parameters

Returns

query

Parameters

Returns

save

Parameters

Examples

load

Parameters

Returns

Examples