Functions, classes and modules
Functions as black boxes
- A function is a black box that takes some input and produces some output.
- The input and output can be anything, including other functions.
- As long as the input and output are the same, the function body can be modified.
. . .
def is_operable(height, period):
return height < 2.0 and period < 6.0
. . .
These two function behaves the same, but the implementation is different.
def is_operable(height, period):
= load_fancy_ml_model()
model return model.predict(height, period)
Pure functions
A pure function returns the same output for the same input.
def f(x)
return x**2
>> f(2)
4
>> f(2)
4
A non-pure function can return different outputs for the same input.
= 0
n
def non_pure_function(x):
global n=n+1
return x + n
>>> non_pure_function(2)
3
>>> non_pure_function(2)
4
Side effects
A function can have side effects (besides returning a value)
def f_with_side_effect(x):
with open("output.txt", "a") as f:
str(x))
f.write(return x**2
The function has x as input, returns the square of x, but also appends x to a file. If you run the function a second time, the file will contain two lines.
Side effects
Pure functions without side effects are easier to reason about.
But sometimes side effects are necessary.
- Writing to a file
- Writing to a database
- Printing to the screen
- Creating a plot
Modifying input arguments
def difficult_function(values):
for i in range(len(values)):
values[i] = min(0, values[i]) # 😟
return values
>>> x = [1,2,-1]
>>> difficult_function(x)
[0, 0, -1]
>>> x
[0, 0, -1]
This function modifies the input array, which might come as a surprise. The array is passed by reference, so the function can modify it.
Functions that doesn’t modify the input arguments are easier to reason about.
def easier_function(values):
= list(values) # copy🤔
l2 for i in range(len(l2)):
= min(0, l2[i])
l2[i] return l2
>>> x = [1, 2, -1]
>>> easier_function(x)
0, 0, -1],
[>>> x
1, 2, -1] [
. . .
Just be aware that copying large datasets can be slow.
Positional arguments
def f(x, y):
return x + y
>>> f(1, 2)
3
Keyword arguments
def f(x, y):
return x + y
>>> f(x=1, y=2)
3
Positional arguments
Version 1
def is_operable(height, period):
return height < 2.0 and period < 6.0
>>> is_operable(1.0, 3.0)
True
Version 2
def is_operable(period, height=0.0):
# dont forget, that arguments are swapped 👍
return height < 2.0 and period < 6.0
>>> is_operable(1.0, 3.0)
False 😟
The order of the arguments is swapped, since we want to make height an optional argument (more on that later). This breaks existing code, since the order of the arguments is changed.
Keyword-only arguments
def f(*, x, y):
return x + y
>>> f(1,2)
Traceback (most recent call last):"<stdin>", line 1, in <module>
File TypeError: f() takes 0 positional arguments but 2 were given
Optional(=default) arguments
def f(x, n=2):
return x**n
>>> f(2)
4
>>> f(2, n=3)
8
. . .
Makes it easy to use a function with many arguments.
Mutable default arguments
Python’s default arguments are evaluated once when the function is defined, not each time the function is called.
. . .
def add_to_cart(x, cart=[]): # this line is evaluated only once 😮
cart.append(x)
return cart
>>> add_to_cart(1, cart=[2])
[2, 1]
>>> add_to_cart(1)
[1]
>>> add_to_cart(2)
[1, 2]
Python’s default arguments are evaluated once when the function is defined, not each time the function is called (like it is in say, Ruby). This means that if you use a mutable default argument and mutate it, you will and have mutated that object for all future calls to the function as well.
How to use default (mutable) arguments
Changing return types
Since Python is a dynamic language, the type of the returned variable is allowed to vary.
def foo(x):
if x >=0:
return x
else:
return "x is negative"
. . .
But it usually a bad idea, since you can not tell from reading the code, which type will be returned.
Changing return types
def is_operable(height, period):
if height < 10:
return height < 5.0 and period > 4.0
else:
return "No way!"
>>> if is_operable(height=12.0, period=5.0):
print("Go ahead!")
...
...! Go ahead
. . .
Is this the result you expected?
. . .
A non-empty string or a non-zero value is considered “truthy” in Python!
Type hints
Python is a dynamically typed language -> the type of a variable is determined at runtime.
. . .
But we can add type hints to help the reader (and the code editor).
def is_operable(height: float, period: float) -> bool:
...
def clip(values:list[int], *, threshold:int = 0) -> list[int]:
return [min(threshold, v) for v in values]
. . .
>>> x= [-1, 0, 2]
>>> clip(x)
-1, 0, 0]
[>>> x
-1, 0, 2]
[>>> clip(x, threshold=1)
-1, 0, 1] [
Type hints are just hints, it will make it easier for you to read the code, and use it in your IDE, but it will not enforce the type.
Classes
class WeirdToolbox:
tools = [] # class variable ☹️
>>> t1 = WeirdToolbox()
>>> t1.tools.append("hammer")
>>> t1.tools
["hammer"]
>>> t2 = WeirdToolbox()
>>> t2.tools.append("screwdriver")
>>> t2.tools
["hammer", "screwdriver"]
Class variables are rarely what you want, since they are shared between all instances of the class.
Classes
class Toolbox:
def __init__(self):
self.tools = [] # instance variable 😃
>>> t1 = Toolbox()
>>> t1.tools.append("hammer")
>>> t1.tools
"hammer"]
[
>>> t2 = Toolbox()
>>> t2.tools.append("screwdriver")
>>> t2.tools
"screwdriver"] [
Instance variables are created when the instance is created, and are unique to each instance.
Static methods
from datetime import date
class Interval:
def __init__(self, start:date, end:date):
self.start = start
self.end = end
>>> dr = Interval(date(2020, 1, 1), date(2020, 1, 31))
>>> dr.start
2020, 1, 1)
datetime.date(>>> dr.end
2020, 1, 31) datetime.date(
Here is an example of useful class, but it is a bit cumbersome to create an instance.
Static methods
from datetime import date
class Interval:
def __init__(self, start:date, end:date):
self.start = start
self.end = end
@staticmethod
def from_string(date_string):
= date_string.split("|")
start_str, end_str = date.fromisoformat(start_str)
start = date.fromisoformat(end_str)
end return Interval(start, end)
>>> dr = Interval.from_string("2020-01-01|2020-01-31")
>>> dr
<__main__.Interval at 0x7fb99efcfb90>
Since we commonly use ISO formatted dates separated by a pipe, we can add a static method to create an instance from a string. This makes it easier to create an instance.
Dataclasses
from dataclasses import dataclass
@dataclass
class Interval:
start: date
end: date
@staticmethod
def from_string(date_string):
= date_string.split("|")
start_str, end_str = date.fromisoformat(start_str)
start = date.fromisoformat(end_str)
end return Interval(start, end)
>>> dr = Interval.from_string("2020-01-01|2020-01-31")
>>> dr
=datetime.date(2020, 1, 1), end=datetime.date(2020, 1, 31)) Interval(start
Dataclasses are a new feature in Python 3.7, they are a convenient way to create classes with a few attributes. The variables are instance variables, and the class has a constructor that takes the same arguments as the variables.
@dataclass
class Interval:
start: date
end: date
def __str__(self):
return f"{self.start} | {self.end}"
>>> dr = Interval(start=date(2020, 1, 1), end=date(2020, 1, 31))
>>> dr
=datetime.date(2020, 1, 1), end=datetime.date(2020, 1, 31))
Interval(start>>>print(dr)
2020-01-01 | 2020-01-31
To override the default string representation, we can add a __str__
method.
Equality
On a regular class, equality is based on the memory address of the object.
class Interval:
def __init__(self, start:date, end:date):
self.start = start
self.end = end
>>> dr1 = Interval(start=date(2020, 1, 1), end=date(2020, 1, 31))
>>> dr2 = Interval(start=date(2020, 1, 1), end=date(2020, 1, 31))
>>> dr1 == dr2
False
This is not very useful, since we want to compare the values of the attributes.
Equality
class Interval:
def __init__(self, start:date, end:date):
self.start = start
self.end = end
def __eq__(self, other):
return self.start == other.start and self.end == other.end
>>> dr1 = Interval(start=date(2020, 1, 1), end=date(2020, 1, 31))
>>> dr2 = Interval(start=date(2020, 1, 1), end=date(2020, 1, 31))
>>> dr1 == dr2
True
We can override the __eq__
method to compare the values of the attributes.
For a dataclass, equality is based on the values of the fields.
from dataclasses import dataclass
@dataclass
class Interval:
start: date
end: date
>>> dr1 = Interval(start=date(2020, 1, 1), end=date(2020, 1, 31))
>>> dr2 = Interval(start=date(2020, 1, 1), end=date(2020, 1, 31))
>>> dr1 == dr2
True
This is the default behavior for dataclasses.
Data classes
from dataclasses import dataclass, field
@dataclass
class Quantity:
str = field(compare=True)
unit: =True)
standard_name: field(comparestr = field(compare=False, default=None)
name:
>>> t1 = Quantity(name="temp", unit="C", standard_name="air_temperature")
>>> t2 = Quantity(name="temperature", unit="C", standard_name="air_temperature")
>>> t1 == t2
True
>>> d1 = Quantity(unit="m", standard_name="depth")
>>> d1 == t2
False
Data classes
- Compact notation of fields with type hints
- Equality based on values of fields
- Useful string represenation by default
- It is still a regular class
Modules
Modules are files containing Python code (functions, classes, constants) that belong together.
$tree analytics/
analytics/
├── __init__.py
├── date.py
└── tools.py
. . .
The analytics package contains two modules:
tools
moduledate
module
from analytics.tools import is_operable
from analytics.tools import Toolbox, Tool
from analytics.date import Interval
= Tool(name="hammer")
tool = Interval(start=date(2020, 1, 1), end=date(2020, 1, 31))
dr =1.8, period=1.0) is_operable(height
Packages
- A package is a directory containing modules
- Each package in Python is a directory which MUST contain a special file called
__init__.py
- The
__init__.py
can be empty, and it indicates that the directory it contains is a Python package __init__.py
can also execute initialization code
__init__.py
Example: mikeio/pfs/__init__.py
:
from .pfsdocument import Pfs, PfsDocument
from .pfssection import PfsNonUniqueList, PfsSection
def read_pfs(filename, encoding="cp1252", unique_keywords=False):
"""Read a pfs file for further analysis/manipulation"""
return PfsDocument(filename, encoding=encoding, unique_keywords=unique_keywords)
. . .
The imports in __init__.py
let’s you separate the implementation into multiple files.
>>> mikeio.pfs.pfssection.PfsSection
<class 'mikeio.pfs.pfssection.PfsSection'>
>>> mikeio.pfs.PfsSection
<class 'mikeio.pfs.pfssection.PfsSection'>
The PfsSection and PfsDocument are imported from the pfssection.py
and pfsdocument.py
modules. to the mikeio.pfs
namespace.
Python naming conventions
By adhering to the naming conventions, your code will be easier to read for other Python developers.
- variables, functions and methods:
lowercase_with_underscores
- classes:
CamelCase
- constants:
UPPERCASE_WITH_UNDERSCORES
Variables, function and method names
- Use lowercase characters
- Separate words with underscores
. . .
= "NorthSeaModel"
model_name = 100
n_epochs
def my_function():
pass
Constants
- Use all uppercase characters
= 9.81
GRAVITY
= 6.02214076e23
AVOGADRO_CONSTANT
= 86400
SECONDS_IN_A_DAY
= {
N_LEGS_PER_ANIMAL "human": 2,
"dog": 4,
"spider": 8,
}
. . .
Python will not prevent you from changing the value of a constant, but it is a convention to use all uppercase characters for constants.
Classes
- Use CamelCase for the name of the class
- Use lowercase characters for the name of the methods
- Separate words with underscores
. . .
Summary
- Functions are black boxes that takes input and produces output.
- Function arguments can be positional or keyword arguments.
- Pure functions are easier to reason about.
- Avoid mutable default arguments and modifying input arguments.
- Classes are useful for grouping related functions and data.
- Dataclasses are a convenient way to create classes with a few attributes.
- Modules are files containing Python code (functions, classes, constants) that belong together.
- Packages are directories containing modules.