Matching

A Comparer/ComparerCollection can be created in one of the following ways:

match() - match observations and model results
from_matched() - create a Comparer from matched data
from_config() - create a ComparerCollection from a config file

modelskill.match

match(obs, mod, *, obs_item=None, mod_item=None, gtype=None, max_model_gap=None, spatial_method=None)

Match observation and model result data in space and time

NOTE: In case of multiple model results with different time coverage, only the overlapping time period will be used! (intersection)

NOTE: In case of multiple observations, multiple models can only be matched if they are all of SpatialField type, e.g. DfsuModelResult or GridModelResult.

Parameters:

Name	Type	Description	Default
`obs`	`(str, Path, DataFrame, Observation, Sequence[Observation])`	Observation(s) to be compared	required
`mod`	`(str, Path, DataFrame, ModelResult, Sequence[ModelResult])`	Model result(s) to be compared	required
`obs_item`	`int or str`	observation item if obs is a file/dataframe, by default None	`None`
`mod_item`	`(int, str)`	model item if mod is a file/dataframe, by default None	`None`
`gtype`	`(str, optional)`	Geometry type of the model result (if mod is a file/dataframe). If not specified, it will be guessed.	`None`
`max_model_gap`	`(float, optional)`	Maximum time gap (s) in the model result (e.g. for event-based model results), by default None	`None`
`spatial_method`	`str`	For Dfsu- and GridModelResult, spatial interpolation/selection method. For DfsuModelResult, one of: 'contained' (=isel), 'nearest', 'inverse_distance' (with 5 nearest points), by default "inverse_distance". For GridModelResult, passed to xarray.interp() as method argument, by default 'linear'.	`None`

Returns:

Type	Description
`Comparer`	In case of a single observation
`ComparerCollection`	In case of multiple observations

See Also

from_matched Create a Comparer from observation and model results that are already matched

Source code in modelskill/matching.py

def match(
    obs,
    mod,
    *,
    obs_item=None,
    mod_item=None,
    gtype=None,
    max_model_gap=None,
    spatial_method: Optional[str] = None,
):
    """Match observation and model result data in space and time

    NOTE: In case of multiple model results with different time coverage,
    only the _overlapping_ time period will be used! (intersection)

    NOTE: In case of multiple observations, multiple models can _only_
    be matched if they are _all_ of SpatialField type, e.g. DfsuModelResult
    or GridModelResult.

    Parameters
    ----------
    obs : (str, Path, pd.DataFrame, Observation, Sequence[Observation])
        Observation(s) to be compared
    mod : (str, Path, pd.DataFrame, ModelResult, Sequence[ModelResult])
        Model result(s) to be compared
    obs_item : int or str, optional
        observation item if obs is a file/dataframe, by default None
    mod_item : (int, str), optional
        model item if mod is a file/dataframe, by default None
    gtype : (str, optional)
        Geometry type of the model result (if mod is a file/dataframe).
        If not specified, it will be guessed.
    max_model_gap : (float, optional)
        Maximum time gap (s) in the model result (e.g. for event-based
        model results), by default None
    spatial_method : str, optional
        For Dfsu- and GridModelResult, spatial interpolation/selection method.

        - For DfsuModelResult, one of: 'contained' (=isel), 'nearest',
        'inverse_distance' (with 5 nearest points), by default "inverse_distance".
        - For GridModelResult, passed to xarray.interp() as method argument,
        by default 'linear'.

    Returns
    -------
    Comparer
        In case of a single observation
    ComparerCollection
        In case of multiple observations

    See Also
    --------
    [from_matched][modelskill.from_matched]
        Create a Comparer from observation and model results that are already matched
    """
    if isinstance(obs, get_args(ObsInputType)):
        return _single_obs_compare(
            obs,
            mod,
            obs_item=obs_item,
            mod_item=mod_item,
            gtype=gtype,
            max_model_gap=max_model_gap,
            spatial_method=spatial_method,
        )

    if isinstance(obs, Collection):
        assert all(isinstance(o, get_args(ObsInputType)) for o in obs)
    else:
        raise TypeError(
            f"Obs is not the correct type: it is {type(obs)}. Check the order of the arguments (obs, mod)."
        )

    if len(obs) > 1 and isinstance(mod, Collection) and len(mod) > 1:
        if not all(isinstance(m, (DfsuModelResult, GridModelResult)) for m in mod):
            raise ValueError(
                """
                In case of multiple observations, multiple models can _only_ 
                be matched if they are _all_ of SpatialField type, e.g. DfsuModelResult 
                or GridModelResult. 

                If you want match multiple point observations with multiple point model results, 
                please match one observation at a time and then create a collection of these 
                using modelskill.ComparerCollection(cmp_list) afterwards. The same applies to track data.
                """
            )

    clist = [
        _single_obs_compare(
            o,
            mod,
            obs_item=obs_item,
            mod_item=mod_item,
            gtype=gtype,
            max_model_gap=max_model_gap,
            spatial_method=spatial_method,
        )
        for o in obs
    ]

    return ComparerCollection(clist)

modelskill.from_matched

from_matched(data, *, obs_item=0, mod_items=None, aux_items=None, quantity=None, name=None, weight=1.0, x=None, y=None, z=None, x_item=None, y_item=None)

Create a Comparer from observation and model results that are already matched (aligned)

Parameters:

Name	Type	Description	Default
`data`	`[DataFrame, str, Path, Dfs0, Dataset]`	DataFrame (or object that can be converted to a DataFrame e.g. dfs0) with columns obs_item, mod_items, aux_items	required
`obs_item`	`[str, int]`	Name or index of observation item, by default first item	`0`
`mod_items`	`Iterable[str, int]`	Names or indicies of model items, if None all remaining columns are model items, by default None	`None`
`aux_items`	`Iterable[str, int]`	Names or indicies of auxiliary items, by default None	`None`
`quantity`	`Quantity`	Quantity of the observation and model results, by default Quantity(name="Undefined", unit="Undefined")	`None`
`name`	`str`	Name of the comparer, by default None (will be set to obs_item)	`None`
`x`	`float`	x-coordinate of observation, by default None	`None`
`y`	`float`	y-coordinate of observation, by default None	`None`
`z`	`float`	z-coordinate of observation, by default None	`None`
`x_item`	`str \| int \| None`	Name of x item, only relevant for track data	`None`
`y_item`	`str \| int \| None`	Name of y item, only relevant for track data	`None`

Examples:

>>> import pandas as pd
>>> import modelskill as ms
>>> df = pd.DataFrame({'stn_a': [1,2,3], 'local': [1.1,2.1,3.1]}, index=pd.date_range('2010-01-01', periods=3))
>>> cmp = ms.from_matched(df, obs_item='stn_a') # remaining columns are model results
>>> cmp
<Comparer>
Quantity: Undefined [Undefined]
Observation: stn_a, n_points=3
 Model: local, rmse=0.100
>>> df = pd.DataFrame({'stn_a': [1,2,3], 'local': [1.1,2.1,3.1], 'global': [1.2,2.2,3.2], 'nonsense':[1,2,3]}, index=pd.date_range('2010-01-01', periods=3))
>>> cmp = ms.from_matched(df, obs_item='stn_a', mod_items=['local', 'global'])
>>> cmp
<Comparer>
Quantity: Undefined [Undefined]
Observation: stn_a, n_points=3
    Model: local, rmse=0.100
    Model: global, rmse=0.200

Source code in modelskill/matching.py

def from_matched(
    data: Union[str, Path, pd.DataFrame, mikeio.Dfs0, mikeio.Dataset],
    *,
    obs_item: str | int | None = 0,
    mod_items: Optional[Iterable[str | int]] = None,
    aux_items: Optional[Iterable[str | int]] = None,
    quantity: Optional[Quantity] = None,
    name: Optional[str] = None,
    weight: float = 1.0,
    x: Optional[float] = None,
    y: Optional[float] = None,
    z: Optional[float] = None,
    x_item: str | int | None = None,
    y_item: str | int | None = None,
) -> Comparer:
    """Create a Comparer from observation and model results that are already matched (aligned)

    Parameters
    ----------
    data : [pd.DataFrame, str, Path, mikeio.Dfs0, mikeio.Dataset]
        DataFrame (or object that can be converted to a DataFrame e.g. dfs0)
        with columns obs_item, mod_items, aux_items
    obs_item : [str, int], optional
        Name or index of observation item, by default first item
    mod_items : Iterable[str, int], optional
        Names or indicies of model items, if None all remaining columns are model items, by default None
    aux_items : Iterable[str, int], optional
        Names or indicies of auxiliary items, by default None
    quantity : Quantity, optional
        Quantity of the observation and model results, by default Quantity(name="Undefined", unit="Undefined")
    name : str, optional
        Name of the comparer, by default None (will be set to obs_item)
    x : float, optional
        x-coordinate of observation, by default None
    y : float, optional
        y-coordinate of observation, by default None
    z : float, optional
        z-coordinate of observation, by default None
    x_item: [str, int], optional,
        Name of x item, only relevant for track data
    y_item: [str, int], optional
        Name of y item, only relevant for track data

    Examples
    --------
    >>> import pandas as pd
    >>> import modelskill as ms
    >>> df = pd.DataFrame({'stn_a': [1,2,3], 'local': [1.1,2.1,3.1]}, index=pd.date_range('2010-01-01', periods=3))
    >>> cmp = ms.from_matched(df, obs_item='stn_a') # remaining columns are model results
    >>> cmp
    <Comparer>
    Quantity: Undefined [Undefined]
    Observation: stn_a, n_points=3
     Model: local, rmse=0.100
    >>> df = pd.DataFrame({'stn_a': [1,2,3], 'local': [1.1,2.1,3.1], 'global': [1.2,2.2,3.2], 'nonsense':[1,2,3]}, index=pd.date_range('2010-01-01', periods=3))
    >>> cmp = ms.from_matched(df, obs_item='stn_a', mod_items=['local', 'global'])
    >>> cmp
    <Comparer>
    Quantity: Undefined [Undefined]
    Observation: stn_a, n_points=3
        Model: local, rmse=0.100
        Model: global, rmse=0.200

    """
    # pre-process if dfs0, or mikeio.Dataset
    if isinstance(data, (str, Path)):
        if Path(data).suffix != ".dfs0":
            raise ValueError(f"File must be a dfs0 file, not {Path(data).suffix}")
        data = mikeio.read(data)  # now mikeio.Dataset
    elif isinstance(data, mikeio.Dfs0):
        data = data.read()  # now mikeio.Dataset
    if isinstance(data, mikeio.Dataset):
        assert len(data.shape) == 1, "Only 0-dimensional data are supported"
        if quantity is None:
            quantity = Quantity.from_mikeio_iteminfo(data[obs_item].item)
        data = data.to_dataframe()

    cmp = Comparer.from_matched_data(
        data,
        obs_item=obs_item,
        mod_items=mod_items,
        aux_items=aux_items,
        name=name,
        weight=weight,
        x=x,
        y=y,
        z=z,
        x_item=x_item,
        y_item=y_item,
        quantity=quantity,
    )

    return cmp

modelskill.from_config

from_config(conf, *, relative_path=True)

Load ComparerCollection from a config file (or dict)

Parameters:

Name	Type	Description	Default
`conf`	`Union[str, Path, dict]`	path to config file or dict with configuration	required
`relative_path`		True: file paths are relative to configuration file, False: file paths are absolute (relative to the current directory), by default True	`True`

Returns:

Type	Description
`ComparerCollection`	A ComparerCollection object from the given configuration

Examples:

>>> import modelskill as ms
>>> cc = ms.from_config('Oresund.yml')

Source code in modelskill/configuration.py

def from_config(
    conf: Union[dict, str, Path], *, relative_path=True
) -> ComparerCollection:
    """Load ComparerCollection from a config file (or dict)

    Parameters
    ----------
    conf : Union[str, Path, dict]
        path to config file or dict with configuration
    relative_path: bool, optional
        True: file paths are relative to configuration file,
        False: file paths are absolute (relative to the current directory),
        by default True

    Returns
    -------
    ComparerCollection
        A ComparerCollection object from the given configuration

    Examples
    --------
    >>> import modelskill as ms
    >>> cc = ms.from_config('Oresund.yml')
    """
    if isinstance(conf, (str, Path)):
        p = Path(conf)
        ext = p.suffix
        dirname = Path(str(p.parents[0]))
        if (ext == ".yml") or (ext == ".yaml") or (ext == ".conf"):
            conf = _yaml_to_dict(p)
        elif "xls" in ext:
            conf = _excel_to_dict(p)
        else:
            raise ValueError("Filename extension not supported! Use .yml or .xlsx")
    else:
        dirname = Path(".")

    assert isinstance(conf, dict)
    modelresults = []
    for name, mr_dict in conf["modelresults"].items():
        if not mr_dict.get("include", True):
            continue
        fp = Path(mr_dict["filename"])
        if relative_path:
            fp = dirname / fp

        item = mr_dict.get("item")
        mr = model_result(fp, name=name, item=item)
        modelresults.append(mr)

    observations = []
    for name, data in conf["observations"].items():
        if data.pop("include", True):
            data["name"] = name
            observations.append(_obs_from_dict(name, data, dirname, relative_path))

    return match(obs=observations, mod=modelresults)