Preparing Data

ModelSkill requires data in Observation and ModelResult objects. These objects are inputs for ModelSkill’s Comparer, which matches data and assesses skill. This section covers PointObservation and PointModelResult for comparing time series at specific points.

Observations

A PointObservation represents measured data, often a time series from one sensor. Each object handles one point and variable. For API details, see the PointObservation documentation.

Key parameters for PointObservation:

Parameter Description
name A unique identifier (e.g., “Gauge_A_WaterLevel”). Useful for distinguishing observations and labeling plots.
data The data source: a dfs0 file path, MIKE IO Dataset, or Pandas DataFrame.
item Specifies the data column (for Pandas DataFrame) or item (for MIKE IO Dataset or dfs0 path) from the source. Refer by name (string) or numerical index.
quantity A modelskill.Quantity defining the variable name (e.g., “Water Level”) and unit (e.g., “m”). Essential if the data source (e.g., Pandas DataFrame) lacks this metadata. ModelSkill often infers this from dfs0 files with EUM information.

The quantity parameter (ms.Quantity(name="...", unit="...")) is vital for ModelSkill. It defines the data’s variable (e.g., “Water Level,” “Discharge”) and unit (e.g., “m,” “m^3/s”). This information is used for:

  1. Clear plot labeling.
  2. Compatibility checks between observations and model results.

ModelSkill often infers quantity from dfs0 files with EUM information. For other sources like Pandas DataFrames or CSV files, you must define quantity explicitly.

Consult the ModelSkill documentation on Quantity for details, including EUM handling and more examples.

ModelSkill examples often include x and y coordinates for PointObservation objects. ModelSkill uses these coordinates mainly to interpolate data from spatial model outputs (e.g., dfsu, dfs2 files) to the observation point. This is useful for comparing point observations to 2D or 3D model fields.

This module focuses on comparing time series already extracted for specific points (e.g., from a res1d node to dfs0, or a point sensor dfs0). Thus, we won’t use the x and y spatial interpolation capability extensively here.

From Dataset

First, read a dfs0 file into a MIKE IO Dataset.

ds_obs = mikeio.read("data/flow_meter_data.dfs0")
ds_obs
<mikeio.Dataset>
dims: (time:121)
time: 1994-08-07 16:35:00 - 1994-08-07 18:35:00 (121 records)
geometry: GeometryUndefined()
items:
  0:  116l1_observed <Discharge> (meter pow 3 per sec)
  1:  12l1_observed <Discharge> (meter pow 3 per sec)

Create a PointObservation from this Dataset, selecting one item.

import modelskill as ms

obs_116l1 = ms.PointObservation(
    data=ds_obs,
    item="116l1_observed",    # Selects one column/item
    name="116l1_Gauge",       # Descriptive name for this specific observation
)
obs_116l1
<PointObservation>: 116l1_Gauge
Location: nan, nan
Time: 1994-08-07 16:35:00 - 1994-08-07 18:35:00
Quantity: Discharge [m^3/s]

A PointObservation has useful attributes and methods. Plot to verify:

obs_116l1.plot()

From dfs0 file

Alternatively, create a PointObservation using the dfs0 file path directly:

obs_116l1_from_file = ms.PointObservation(
    data="data/flow_meter_data.dfs0",
    item="116l1_observed",
    name="116l1_Gauge",
)
obs_116l1_from_file.to_dataframe().head()
116l1_Gauge
time
1994-08-07 16:35:00 -0.014113
1994-08-07 16:36:00 0.043355
1994-08-07 16:37:00 -0.129244
1994-08-07 16:38:00 -0.052462
1994-08-07 16:39:00 -0.051976

From Pandas DataFrame

First, prepare a Pandas DataFrame.

df_obs_csv = pd.read_csv("data/flow_meter_data.csv", index_col="time", parse_dates=True)
df_obs_csv.head()
116l1_observed 12l1_observed
time
1994-08-07 16:35:00 -0.014113 -0.095583
1994-08-07 16:36:00 0.043355 -0.058748
1994-08-07 16:37:00 -0.129244 0.040089
1994-08-07 16:38:00 -0.052462 0.068110
1994-08-07 16:39:00 -0.051976 -0.024882

Create a PointObservation from the DataFrame. Provide quantity as DataFrames lack EUM information.

obs_12l1_from_df = ms.PointObservation(
    data=df_obs_csv,
    item="12l1_observed",
    name="12l1_Gauge",
    quantity=ms.Quantity(name="Discharge", unit="m^3/s"),
)
obs_12l1_from_df.plot()

Note

Ensure DataFrames have a DatetimeIndex, as mentioned in previous modules.

Model Results

PointModelResult objects represent model simulation outputs. Each PointModelResult handles one variable from a specific model output point and represents a model simulation run. See the PointModelResult documentation for API details.

Key parameters for PointModelResult are similar to PointObservation:

Parameter Description
name Identifies the model simulation run (e.g., “MIKE_Plus_Scenario_A”).
data The data source: a dfs0 file path, MIKE IO Dataset, or Pandas DataFrame.
item Specifies the data column (for Pandas DataFrame) or item (for MIKE IO Dataset or dfs0 path) from the source.
quantity A modelskill.Quantity. Crucial if metadata is missing (e.g., Pandas DataFrame). Often inferred from dfs0 files with EUM info.

The name parameter in PointModelResult identifies the overall model simulation, not a specific point. You may create several PointModelResult objects that all come from the same simulation but represent different output locations (e.g., water level at point A, discharge at point B). All these objects should share the same name (e.g., “Model_Run_Alpha”). This shared name signifies they originate from the same model execution. Later, when using ModelSkill’s Comparer, you will explicitly match each of these individual PointModelResult objects to its corresponding Observation object.

From Dataset

First, read a dfs0 file with model output into a MIKE IO Dataset.

ds_model_data = mikeio.read("data/model_results.dfs0")
ds_model_data
<mikeio.Dataset>
dims: (time:110)
time: 1994-08-07 16:35:00 - 1994-08-07 18:35:00 (110 non-equidistant records)
geometry: GeometryUndefined()
items:
  0:  reach:Discharge:116l1:37.651 <Discharge> (meter pow 3 per sec)
  1:  reach:Discharge:12l1:28.410 <Discharge> (meter pow 3 per sec)

Create the PointModelResult from the Dataset. name identifies the model simulation. quantity is often inferred from dfs0 files with EUM information.

mod_116l1_dataset = ms.PointModelResult(
    data=ds_model_data,
    item="reach:Discharge:116l1:37.651",       # Item name from the dfs0
    name="MIKE+_RunA",                         # Model simulation identifier
)
mod_116l1_dataset
<PointModelResult>: MIKE+_RunA
Location: nan, nan
Time: 1994-08-07 16:35:00 - 1994-08-07 18:35:00
Quantity: Discharge [m^3/s]

Like with observations, the PointModelResult object has useful attributes and methods. For example, plot to verify:

mod_116l1_dataset.plot()

From dfs0 file

Create a PointModelResult using the dfs0 file path directly.

mod_12l1_file = ms.PointModelResult(
    data="data/model_results.dfs0",
    item="reach:Discharge:12l1:28.410",
    name="MIKE+_RunA",                      # Same simulation as above, different location/item
)
mod_12l1_file.to_dataframe().head()
MIKE+_RunA
time
1994-08-07 16:35:00.000 0.000000
1994-08-07 16:36:01.870 -0.000004
1994-08-07 16:37:07.560 -0.000009
1994-08-07 16:38:55.828 -0.000004
1994-08-07 16:39:55.828 0.000006

From Pandas DataFrame

First, prepare a Pandas DataFrame with model data. This example reads a dfs0 file into a DataFrame.

df_model = mikeio.read("data/model_results.dfs0").to_dataframe()
df_model.head()
reach:Discharge:116l1:37.651 reach:Discharge:12l1:28.410
1994-08-07 16:35:00.000 0.000000 0.000000
1994-08-07 16:36:01.870 0.000007 -0.000004
1994-08-07 16:37:07.560 0.000022 -0.000009
1994-08-07 16:38:55.828 0.000043 -0.000004
1994-08-07 16:39:55.828 0.000054 0.000006

Create a PointModelResult from the DataFrame. Provide quantity as DataFrames lack EUM information.

mod_116l1_df = ms.PointModelResult(
    data=df_model,
    item="reach:Discharge:116l1:37.651",            # Column name in the DataFrame
    name="MIKE+_RunA",                                   # Identifies the overall model simulation
    quantity=ms.Quantity(name="Discharge", unit="m^3/s"),
)
mod_116l1_df.plot()

From res1d file

MIKE+ res1d files store results for an entire network. For point comparisons with PointObservation in ModelSkill, first extract the specific time series for the point(s) into an intermediate format (e.g., dfs0 file, Pandas DataFrame). This example extracts one model output point to a dfs0 file, then creates a PointModelResult.

First, extract model output (one point, one variable) to a dfs0 file.

res = mikeio1d.open("data/network.res1d")
res.reaches["116l1"]["37.651"].Discharge.to_dfs0("data/model_Q_116l1.dfs0")
ds = mikeio.read("data/model_Q_116l1.dfs0")
ds
<mikeio.Dataset>
dims: (time:110)
time: 1994-08-07 16:35:00 - 1994-08-07 18:35:00 (110 non-equidistant records)
geometry: GeometryUndefined()
items:
  0:  reach:Discharge:116l1:37.651 <Discharge> (meter pow 3 per sec)

Now, create a PointModelResult from this new dfs0 file. Use the item name from the Dataset object created above.

mod_116l1 = ms.PointModelResult(
    data=ds,
    item="reach:Discharge:116l1:37.651",
    name="MIKE+",
)
mod_116l1.plot() # Verify

Future versions of ModelSkill may allow creating a network result, instead of a point result. This would allow network results to automatically be matched with corresponding observations, eliminating the need to manually match individual model result points with observation points.

Best Practices

Consistent data organization and naming are key.

  • Organize Data: Structure observation and model result files (e.g., separate folders, clear names). This helps when programmatically accessing many files.
  • Descriptive Names: Use name in PointObservation and PointModelResult for clear identifiers (e.g., PointObservation(name="Flow_Gauge_West")). This aids in managing objects, improves plot clarity, and helps programmatic creation with many observations or runs.
  • Specify Units and Quantities: Always provide quantity for sources like DataFrames or CSVs. ModelSkill often infers this from dfs0 files with EUM information. Correct metadata is crucial for comparisons, visualizations, and automated workflows. See the modelskill.Quantity callout and official documentation.

ModelSkill is versatile. This section focuses on point data, but the package also supports TrackObservation (data along a path) and GridObservation (gridded data). These are useful for different validation scenarios. See the official documentation for examples and use cases.