The ComparerCollection is one of the main objects of the modelskill package. It is a collection of Comparer objects and created either by the match() function, by passing a list of Comparers to the ComparerCollection constructor, or by reading a config file using the from_config() function.
NOTE: In case of multiple model results with different time coverage, only the overlapping time period will be used! (intersection)
Main functionality:
selecting/filtering data
__get_item__() - get a single Comparer, e.g., cc[0] or cc['obs1']
group by, by default [“model”, “observation”] - by column name - by temporal bin of the DateTimeIndex via the freq-argument (using pandas pd.Grouper(freq)), e.g.: ‘freq:M’ = monthly; ‘freq:D’ daily - by the dt accessor of the DateTimeIndex (e.g. ‘dt.month’) using the syntax ‘dt:month’. The dt-argument is different from the freq-argument in that it gives month-of-year rather than month-of-data. - by attributes, stored in the cc.data.attrs container, e.g.: ‘attrs:obs_provider’ = group by observation provider or ‘attrs:gtype’ = group by geometry type (track or point)
None
metrics
list
list of modelskill.metrics (or str), by default modelskill.options.metrics.list
None
observed
bool
This only applies if any of the groupers are Categoricals. - True: only show observed values for categorical groupers. - False: show all values for categorical groupers.
False
Returns
Name
Type
Description
SkillTable
skill assessment as a SkillTable object
See also
sel a method for filtering/selecting data
Examples
>>>import modelskill as ms>>> cc = ms.match([HKNA,EPL,c2], mr)>>> cc.skill().round(2) n bias rmse urmse mae cc si r2observationHKNA 385-0.200.350.290.250.970.090.99EPL 66-0.080.220.200.180.970.070.99c2 113-0.000.350.350.290.970.120.99
>>> cc.sel(observation='c2', start='2017-10-28').skill().round(2) n bias rmse urmse mae cc si r2observationc2 410.330.410.250.360.960.060.99
>>> cc.skill(by='freq:D').round(2) n bias rmse urmse mae cc si r22017-10-27239-0.150.250.210.200.720.100.982017-10-28162-0.070.190.180.160.960.061.002017-10-29163-0.210.520.470.420.790.110.99
First, the skill is calculated per observation, the weighted mean of the skills is then found.
Parameters
Name
Type
Description
Default
weights
str or List(float) or Dict(str, float)
weighting of observations, by default None - None: use observations weight attribute (if assigned, else “equal”) - “equal”: giving all observations equal weight, - “points”: giving all points equal weight, - list of weights e.g. [0.3, 0.3, 0.4] per observation, - dictionary of observations with special weigths, others will be set to 1.0
None
metrics
list
list of modelskill.metrics, by default modelskill.options.metrics.list
None
Returns
Name
Type
Description
SkillTable
mean skill assessment as a SkillTable object
See also
skill skill assessment per observation
Examples
>>>import modelskill as ms>>> cc = ms.match([HKNA,EPL,c2], mod=HKZN_local)>>> cc.mean_skill().round(2) n bias rmse urmse mae cc si r2HKZN_local 564-0.090.310.280.240.970.090.99>>> sk = cc.mean_skill(weights="equal")>>> sk = cc.mean_skill(weights="points")>>> sk = cc.mean_skill(weights={"EPL": 2.0}) # more weight on EPL, others=1.0
Skill assessment of model(s) on a regular spatial grid.
Parameters
Name
Type
Description
Default
bins
int
criteria to bin x and y by, argument bins to pd.cut(), default 5 define different bins for x and y a tuple e.g.: bins = 5, bins = (5,[2,3,5])
5
binsize
float
bin size for x and y dimension, overwrites bins creates bins with reference to round(mean(x)), round(mean(y))
None
by
(str, List[str])
group by, by default [“model”, “observation”] - by column name - by temporal bin of the DateTimeIndex via the freq-argument (using pandas pd.Grouper(freq)), e.g.: ‘freq:M’ = monthly; ‘freq:D’ daily - by the dt accessor of the DateTimeIndex (e.g. ‘dt.month’) using the syntax ‘dt:month’. The dt-argument is different from the freq-argument in that it gives month-of-year rather than month-of-data.
None
metrics
list
list of modelskill.metrics, by default modelskill.options.metrics.list
None
n_min
int
minimum number of observations in a grid cell; cells with fewer observations get a score of np.nan
None
Returns
Name
Type
Description
SkillGrid
skill assessment as a SkillGrid object
See also
skill a method for aggregated skill assessment
Examples
>>>import modelskill as ms>>> cc = ms.match([HKNA,EPL,c2], mr) # with satellite track measurements>>> gs = cc.gridded_skill(metrics='bias')>>> gs.data<xarray.Dataset>Dimensions: (x: 5, y: 5)Coordinates: observation 'alti'* x (x) float64 -0.4361.5433.5175.4927.466* y (y) float64 50.651.6652.753.7554.8Data variables: n (x, y) int32 300143717503672 ... 0015200002876 bias (x, y) float64 -0.02626 nan nan ... nan 0.06785-0.1143
>>> gs = cc.gridded_skill(binsize=0.5)>>> gs.data.coordsCoordinates: observation 'alti'* x (x) float64 -1.5-0.50.51.52.53.54.55.56.57.5* y (y) float64 51.552.553.554.555.556.5
Weighted mean score of model(s) over all observations
Wrapping mean_skill() with a single metric.
NOTE: will take simple mean over different quantities!
Parameters
Name
Type
Description
Default
weights
str or List(float) or Dict(str, float)
weighting of observations, by default None - None: use observations weight attribute (if assigned, else “equal”) - “equal”: giving all observations equal weight, - “points”: giving all points equal weight, - list of weights e.g. [0.3, 0.3, 0.4] per observation, - dictionary of observations with special weigths, others will be set to 1.0
None
metric
list
a single metric from modelskill.metrics, by default rmse
mtr.rmse
Returns
Name
Type
Description
Dict[str, float]
mean of skills score as a single number (for each model)
See also
skill skill assessment per observation mean_skill weighted mean of skills assessment mean_skill_points skill assessment pooling all observation points together
Examples
>>>import modelskill as ms>>> cc = ms.match([o1, o2], mod)>>> cc.score(){'mod': 0.30681206}>>> cc.score(weights=[0.1,0.1,0.8]){'mod': 0.3383011631797379}
Model name or index. If None, all models are selected.
None
observation
str or int or list of str or list of int
Observation name or index. If None, all observations are selected.
None
quantity
str or int or list of str or list of int
Quantity name or index. If None, all quantities are selected.
None
start
str or datetime
Start time. If None, all times are selected.
None
end
str or datetime
End time. If None, all times are selected.
None
time
str or datetime
Time. If None, all times are selected.
None
area
list of float
bbox: [x0, y0, x1, y1] or Polygon. If None, all areas are selected.
None
**kwargs
Any
Filtering by comparer attrs similar to xarray.Dataset.filter_by_attrs e.g. sel(gtype='track') or sel(obs_provider='CMEMS') if at least one comparer has an entry obs_provider with value CMEMS in its attrs container. Multiple kwargs are combined with logical AND.
{}
Returns
Name
Type
Description
ComparerCollection
New ComparerCollection with selected data.
query
ComparerCollection.query(query)
Select data based on a query.
Parameters
Name
Type
Description
Default
query
str
Query string. See pandas.DataFrame.query() for details.
required
Returns
Name
Type
Description
ComparerCollection
New ComparerCollection with selected data.
filter_by_attrs
ComparerCollection.filter_by_attrs(**kwargs)
Filter by comparer attrs similar to xarray.Dataset.filter_by_attrs
Parameters
Name
Type
Description
Default
**kwargs
Any
Filtering by comparer attrs similar to xarray.Dataset.filter_by_attrs e.g. sel(gtype='track') or sel(obs_provider='CMEMS') if at least one comparer has an entry obs_provider with value CMEMS in its attrs container. Multiple kwargs are combined with logical AND.
{}
Returns
Name
Type
Description
ComparerCollection
New ComparerCollection with selected data.
Examples
>>> cc = ms.match([HKNA, EPL, alti], mr)>>> cc.filter_by_attrs(gtype='track')<ComparerCollection>Comparer: alti
save
ComparerCollection.save(filename)
Save the ComparerCollection to a zip file.
Each comparer is stored as a netcdf file in the zip file.
Parameters
Name
Type
Description
Default
filename
str or Path
Filename of the zip file.
required
Examples
>>> cc = ms.match(obs, mod)>>> cc.save("my_comparer_collection.msk")
load
ComparerCollection.load(filename)
Load a ComparerCollection from a zip file.
Parameters
Name
Type
Description
Default
filename
str or Path
Filename of the zip file.
required
Returns
Name
Type
Description
ComparerCollection
The loaded ComparerCollection.
Examples
>>> cc = ms.match(obs, mod)>>> cc.save("my_comparer_collection.msk")>>> cc2 = ms.ComparerCollection.load("my_comparer_collection.msk")