Course project: Time Series Data Cleaning

Imagine this: you’re given a script by a colleague and your task is to make usable for others in your organization; to make a proper package with a good structure, tests and documentation. And with a design that will make it easy to extend and maintain in the future.

In this project, the script removes bad data from three different time series using three different algorithms: out-of-range, spikes, and flat-periods. Your colleague is not the best Python coder, so you will start by cleaning up the code, using functions and gradually from there improve the quality.

Module 1: GitHub and basic functions

  • 1.1 GitHub repo
  • 1.2 Functions

Module 2: Modules and classes

  • 2.1 Function arguments
  • 2.2 Modules
  • 2.3 Classes

Module 3: Installable package and pytest

  • 3.1 Installable package
  • 3.2 Pytest

Module 4: GitHub actions and auto-formatting

  • 4.1 Github Action
  • 4.2 Linting with ruff
  • 4.3 Formatting with ruff
  • 4.4 pyproject.toml

Module 5: Object-oriented design

  • 5.1 Type Hints
  • 5.2 Data class
  • 5.3 Module level function
  • 5.4 Composition or inheritance

Module 6: Documentation

  • 6.1 README
  • 6.2 Docstrings
  • 6.3 mkdocs

Module 7: Publishing

  • 7.1 License
  • 7.2 Publishing