Final assignment

Final assignment#

The final exercise involves converting data from one or more providers.

Since this exercise is designed to prepare you for real project work, the information you need to solve it might be slightly incomplete or not provided in context. Use your best judgment!

Parts of this assignment can be solved in several ways. Use descriptive variable names and comments or descriptive text if necessary to clarify. The final solution should be clear to your colleagues and will be shared with some of your fellow students for review.

The data will be used for MIKE modelling and must be converted to Dfs with apppropriate EUM types/units in order to be used by the MIKE software.

The data is provided as a zip file and a NetCDF file (in the data folder - see FA.3 below).

Inside the zip file, there are a many timeseries (ASCII format) of discharge data from streams located across several regions (*.dat).

Static data for each region is found in a separate file (region_info.csv)

Pandas read_csv is very powerful, but here are a few things to keep in mind

Column separator e.g. comma (,)
Blank lines
Comments
Missing values
Date format

The MIKE engine can not handle missing values / delete values, fill in missing values with interpolated values.

In order to save diskspace, crop the timeseries to simulation period Feb 1 - June 30.

FA.1 Convert all timeseries to Dfs0#

import os
import numpy as np
import pandas as pd
import mikeio

from mikeio import Dataset
from mikeio.eum import EUMType, ItemInfo, EUMUnit

# This is one way to find and filter filenames in a directory
# [x for x in os.listdir("datafolder") if "some_str" in x]

# This is useful!
# help(pd.read_csv)

# example of reading csv
# df = pd.read_csv("../data/oceandata.csv", comment='#', index_col=0, sep=',', parse_dates=True)

a) Convert all timeseries to dfs0 (remember that the notebook should be runnable for your peers so put the files somewhere reasonable).

b) Read s15_east_novayork_river.dfs0, print the “header”, plot, and show that the number of missing values is 0.

FA.2 Add region specific info to normalize timeseries with surface area#

Each timeseries belongs to a region identified in the filename, e.g. s15_east_novayork_river.dat is located in the novayork region.

a) Convert all timeseries to dfs0 with specific discharge, by doing:

For each timeseries in the dataset:

Find out which region it belongs to (hint: the string method split() will be useful)
Divide the timeseries values with the surface area for the region (take into account units)
Create a dfs0 file with specific discharge (discharge / area) (like the one with discharge from FA.1)

b) Determine which station has the largest max specific discharge (in the simulation period).

Submission of solution#

Your solution to the above tasks is to be delivered in the format of a single Jupyter notebook file. Please create a new and name it final_assignment_teamxyz.ipynb where xyz is your team number. It should be easy to understand and runnable by your instructors.

The solution will be reviewed by an instructor, which will provide feedback on both the correctnes and clarity of your solution.

Please submit your team’s solution by email to campus@dhigroup.com.

Final assignment

Contents

Final assignment#

FA.1 Convert all timeseries to Dfs0#

FA.2 Add region specific info to normalize timeseries with surface area#

FA.3 Gridded data#

Submission of solution#