Documentation

Why document your code?

  • Make it easier for others to use your code
  • Make it easier for you to use your code

Readme.md

  • A readme file is a text file that introduces and explains a project.
  • Always include a readme file in your project.
  • You can put readme files in any directory, and you can have more than one in a single project.

Requirements

  • Mention the requirements for your package
    • Operating system
    • Python version
    • Other non-Python dependencies, e.g. VC++ redistributables
  • Include information on how to install your package
    • pip install my_package
    • pip install https://github.com/DHI/{repo}/archive/main.zip

Notebooks

  • Jupyter notebooks are a great way to document your code
  • Good for prototyping
  • In a later stage, notebooks can be used to demonstrate how to use your code
  • Not a replacement for documentation for a professional package

Docstrings

"""K-means clustering."""

class KMeans(_BaseKMeans):
  """K-Means clustering.
  
  Parameters
  ----------
  n_clusters : int, default=8
      The number of clusters to form as well as the number of
      centroids to generate.

  Examples
  --------
  >>> X = np.array([[1, 2], [1, 4], [1, 0],
  ...               [10, 2], [10, 4], [10, 0]])
  >>> kmeans = KMeans(n_clusters=2, random_state=0, n_init="auto").fit(X)
  >>> kmeans.labels_
  array([1, 1, 1, 0, 0, 0], dtype=int32)

sklearn.KMeans

>>> from sklearn.cluster import KMeans
>>> help(KMeans)
class KMeans(_BaseKMeans)
 |  KMeans(n_clusters=8, *, init='k-means++', n_init='warn')
 |
 |  K-Means clustering.
 |
 |  Parameters
 |  ----------
 |  n_clusters : int, default=8

. . .

Docstring - Numpy format

def function_name(param1, param2, param3):
    """Short summary.
    
    Long description.
    
    Parameters
    ----------
    param1 : int
        Description of `param1`.
    param2 : str
        Description of `param2`.
    param3 : list of str
        Description of `param3`.
    
    Returns
    -------
    bool
        Description of return value.
    """
    pass

Type hints

From Python 3.6, type hints can be used in addition to the type in the docstring.

def remove_outlier(data:pd.DataFrame, column:str, threshold:float=3) -> pd.DataFrame:
    """Remove outliers from a dataframe.
    
    Parameters
    ----------
    data : pd.DataFrame
        Dataframe to remove outliers from.
    column : str
        Column to remove outliers from.
    threshold : float, optional
        Number of standard deviations to use as threshold, by default 3
    

doctest

Using code without documentation is hard, but using code with wrong documentation is even harder.

How can you make sure that the documentation is correct?

The answer is the doctest module built in to the Python standard library.

Tip

The extensive standard library is why Python is described as a language with “batteries included!”

Input, output examples in docstrings are run as tests.

def add(a, b):
    """Add two numbers.
    >>> add(1, 2)
    3
    >>> add(1, 3)
    5
    """
    return a + b
$ python -m doctest -v add.py
Failed example:
    add(1, 3)
Expected:
    5
Got:
    4
**********************************************************************
1 items had failures:
   1 of   2 in mod.add
***Test Failed*** 1 failures.

Documentation generators

  • Sphinx
  • mkdocs

mkdocs

  • Text is written in markdown
  • Easy to use
  • API documentation can be generated with mkdocstrings
  • The end result is a static website that can be hosted on e.g. GitHub pages

Configuration

mkdocs.yml
site_name: my_library

theme: "material" # or readthedocs, mkdocs, etc.

plugins:
- mkdocstrings:
    handlers:
      python:
        options:
          show_source: false # change if you want able to show source code
          heading_level: 2
          docstring_style: "numpy" # important!, since default is google

API docs

  1. install mkdocstrings

    $ pip install mkdocstrings[python]

  2. Install theme, e.g. material

    $ pip install mkdocs-material

  3. Add plugin to mkdocs.yml (see above)

  4. Create index.md in docs folder

  5. Run mkdocs serve to view locally

docs/index.md

# Reference

::: my_library.simulation

GitHub pages

  • Once you have a static website, you need to share it with the world
  • GitHub pages allows you to easily host a static website on GitHub
  • The website is available at https://dhi.github.io/<repository>/
  • The website can be created locally by manually editing html pages.
  • For use as documentation, it is easier to use a documentation generator like mkdocs.

GitHub pages

“Private” website

  • A GitHub repository can be made private
  • The website is still publicly available
  • In order to “hide” it from search engines, add a robots.txt file to the root of the website
  • This is not a secure way to hide a website, but it is a simple way to hide it from search engines.
robots.txt
User-agent: *
Disallow: /

Additional resources

  • https://realpython.com/python-project-documentation-with-mkdocs/

Summary

  • Documentation is important
  • Use a README file
  • Use docstrings
  • Use type hints
  • Use mkdocs to generate API documentation