API Reference#

Generic#

class tsod.RangeDetector(min_value=-inf, max_value=inf, quantiles=None)#

Detect values outside range.

Parameters:
  • min_value (float, default=-np.inf) – Minimum value threshold.

  • max_value (float, default=np.inf) – Maximum value threshold.

  • quantiles (list of float, optional) – Quantiles to use for determining min and max during fit. Default is [0.0, 1.0], which corresponds to absolute min and max values. Use values like [0.001, 0.999] to exclude extreme outliers.

Examples

>>> normal_data = pd.Series(np.random.normal(size=100))
>>> abnormal_data = pd.Series(np.random.normal(size=100))
>>> abnormal_data[[2, 6, 15, 57, 60, 73]] = 5
>>> normal_data_with_some_outliers = pd.Series(np.random.normal(size=100))
>>> normal_data_with_some_outliers[[12, 13, 20, 90]] = 7
>>> detector = RangeDetector(min_value=0.0, max_value=2.0)
>>> anomalies = detector.detect(abnormal_data)
>>> detector = RangeDetector()
>>> detector.fit(normal_data) # min, max inferred from normal data
>>> anomalies = detector.detect(abnormal_data)
>>> detector = RangeDetector(quantiles=[0.001,0.999])
>>> detector.fit(normal_data_with_some_outliers)
>>> anomalies = detector.detect(abnormal_data)
detect(data: Series) Series#

Detect anomalies.

Parameters:

data (pd.Series) – Time series data with possible anomalies.

Returns:

Time series with bools, True == anomaly.

Return type:

pd.Series

fit(data: Series) Detector#

Set detector parameters based on data.

Parameters:

data (pd.Series) – Normal time series data.

Returns:

Returns self for method chaining.

Return type:

Detector

save(path: str | Path) None#

Save a detector for later use.

Parameters:

path (str or Path) – File path to save the detector to.

validate(data: Series | DataFrame) Series | DataFrame#

Check that input data is in correct format and possibly adjust.

Parameters:

data (pd.Series or pd.DataFrame) – Input data to validate.

Returns:

Validated data.

Return type:

pd.Series or pd.DataFrame

Raises:

WrongInputDataTypeError – If data is not a pd.Series or pd.DataFrame.

class tsod.ConstantValueDetector(window_size: int = 3, threshold: float = 1e-07)#

Detect contiguous periods of constant values within a configurable time window.

Commonly caused by sensor failures, which get stuck at a constant level.

Parameters:
  • window_size (int, default=3) – Number of consecutive points to evaluate.

  • threshold (float, default=1e-7) – Maximum variation (max - min) within window to consider constant.

detect(data: Series) Series#

Detect anomalies.

Parameters:

data (pd.Series) – Time series data with possible anomalies.

Returns:

Time series with bools, True == anomaly.

Return type:

pd.Series

fit(data: Series) Detector#

Set detector parameters based on data.

Parameters:

data (pd.Series) – Normal time series data.

Returns:

Returns self for method chaining.

Return type:

Detector

save(path: str | Path) None#

Save a detector for later use.

Parameters:

path (str or Path) – File path to save the detector to.

validate(data: Series | DataFrame) Series | DataFrame#

Check that input data is in correct format and possibly adjust.

Parameters:

data (pd.Series or pd.DataFrame) – Input data to validate.

Returns:

Validated data.

Return type:

pd.Series or pd.DataFrame

Raises:

WrongInputDataTypeError – If data is not a pd.Series or pd.DataFrame.

class tsod.ConstantGradientDetector(window_size: int = 3)#

Detect constant gradients.

Typically caused by linear interpolation over a long interval.

Parameters:

window_size (int, default=3) – Minimum window size to consider as anomaly.

detect(data: Series) Series#

Detect anomalies.

Parameters:

data (pd.Series) – Time series data with possible anomalies.

Returns:

Time series with bools, True == anomaly.

Return type:

pd.Series

fit(data: Series) Detector#

Set detector parameters based on data.

Parameters:

data (pd.Series) – Normal time series data.

Returns:

Returns self for method chaining.

Return type:

Detector

save(path: str | Path) None#

Save a detector for later use.

Parameters:

path (str or Path) – File path to save the detector to.

validate(data: Series | DataFrame) Series | DataFrame#

Check that input data is in correct format and possibly adjust.

Parameters:

data (pd.Series or pd.DataFrame) – Input data to validate.

Returns:

Validated data.

Return type:

pd.Series or pd.DataFrame

Raises:

WrongInputDataTypeError – If data is not a pd.Series or pd.DataFrame.

class tsod.GradientDetector(max_gradient=inf, direction='both')#

Detect abrupt changes in time series data.

Requires data with a DatetimeIndex. Calculates rate of change per second.

Parameters:
  • max_gradient (float, default=np.inf) – Maximum rate of change per second.

  • direction ({'both', 'positive', 'negative'}, default='both') – Direction of change to detect. ‘positive’ detects only increases, ‘negative’ detects only decreases, ‘both’ detects changes in either direction.

detect(data: Series) Series#

Detect anomalies.

Parameters:

data (pd.Series) – Time series data with possible anomalies.

Returns:

Time series with bools, True == anomaly.

Return type:

pd.Series

fit(data: Series) Detector#

Set detector parameters based on data.

Parameters:

data (pd.Series) – Normal time series data.

Returns:

Returns self for method chaining.

Return type:

Detector

save(path: str | Path) None#

Save a detector for later use.

Parameters:

path (str or Path) – File path to save the detector to.

validate(data: Series | DataFrame) Series | DataFrame#

Check that input data is in correct format and possibly adjust.

Parameters:

data (pd.Series or pd.DataFrame) – Input data to validate.

Returns:

Validated data.

Return type:

pd.Series or pd.DataFrame

Raises:

WrongInputDataTypeError – If data is not a pd.Series or pd.DataFrame.

class tsod.DiffDetector(max_diff=inf, direction='both')#

Detect sudden shifts in data, irrespective of time axis.

Parameters:
  • max_diff (float, default=np.inf) – Maximum change threshold between consecutive points.

  • direction ({'both', 'positive', 'negative'}, default='both') – Direction of change to detect. ‘positive’ detects only increases, ‘negative’ detects only decreases, ‘both’ detects changes in either direction.

See also

GradientDetector

Similar functionality but considers actual time between data points.

detect(data: Series) Series#

Detect anomalies.

Parameters:

data (pd.Series) – Time series data with possible anomalies.

Returns:

Time series with bools, True == anomaly.

Return type:

pd.Series

fit(data: Series) Detector#

Set detector parameters based on data.

Parameters:

data (pd.Series) – Normal time series data.

Returns:

Returns self for method chaining.

Return type:

Detector

save(path: str | Path) None#

Save a detector for later use.

Parameters:

path (str or Path) – File path to save the detector to.

validate(data: Series | DataFrame) Series | DataFrame#

Check that input data is in correct format and possibly adjust.

Parameters:

data (pd.Series or pd.DataFrame) – Input data to validate.

Returns:

Validated data.

Return type:

pd.Series or pd.DataFrame

Raises:

WrongInputDataTypeError – If data is not a pd.Series or pd.DataFrame.

class tsod.CombinedDetector(detectors)#

Combine detectors.

It is possible to combine several anomaly detection strategies into a combined detector. Anomalies are detected if ANY of the constituent detectors flags an anomaly (OR logic).

Parameters:

detectors (list of Detector) – List of detector instances to combine.

Examples

>>> normal_data = pd.Series(np.random.normal(size=100))
>>> abnormal_data = pd.Series(np.random.normal(size=100))
>>> abnormal_data[[2, 6, 15, 57, 60, 73]] = 5
>>> anomaly_detector = CombinedDetector([RangeDetector(), DiffDetector()])
>>> anomaly_detector.fit(normal_data)
>>> detected_anomalies = anomaly_detector.detect(abnormal_data)
count(value) integer -- return number of occurrences of value#
detect(data: Series) Series#

Detect anomalies.

Parameters:

data (pd.Series) – Time series data with possible anomalies.

Returns:

Time series with bools, True == anomaly.

Return type:

pd.Series

fit(data: Series) Detector#

Set detector parameters based on data.

Parameters:

data (pd.Series) – Normal time series data.

Returns:

Returns self for method chaining.

Return type:

Detector

index(value[, start[, stop]]) integer -- return first index of value.#

Raises ValueError if the value is not present.

Supporting start and stop arguments is optional, but recommended.

save(path: str | Path) None#

Save a detector for later use.

Parameters:

path (str or Path) – File path to save the detector to.

validate(data: Series | DataFrame) Series | DataFrame#

Check that input data is in correct format and possibly adjust.

Parameters:

data (pd.Series or pd.DataFrame) – Input data to validate.

Returns:

Validated data.

Return type:

pd.Series or pd.DataFrame

Raises:

WrongInputDataTypeError – If data is not a pd.Series or pd.DataFrame.

Hampel#

class tsod.hampel.HampelDetector(window_size=5, threshold=3)#

Hampel filter implementation that works on numpy arrays, implemented with numba.

Parameters:
  • window_size (int) – The window range is from [(i - window_size):(i + window_size)], so window_size is half of the window, counted in number of array elements (as opposed to specify a time span, which is not supported by this implementation)

  • threshold (float) – The threshold for marking an outlier. A low threshold “narrows” the band within which values are deemed as outliers. n_sigmas, default=3.0

detect(data: Series) Series#

Detect anomalies.

Parameters:

data (pd.Series) – Time series data with possible anomalies.

Returns:

Time series with bools, True == anomaly.

Return type:

pd.Series

fit(data: Series) Detector#

Set detector parameters based on data.

Parameters:

data (pd.Series) – Normal time series data.

Returns:

Returns self for method chaining.

Return type:

Detector

save(path: str | Path) None#

Save a detector for later use.

Parameters:

path (str or Path) – File path to save the detector to.

validate(data: Series | DataFrame) Series | DataFrame#

Check that input data is in correct format and possibly adjust.

Parameters:

data (pd.Series or pd.DataFrame) – Input data to validate.

Returns:

Validated data.

Return type:

pd.Series or pd.DataFrame

Raises:

WrongInputDataTypeError – If data is not a pd.Series or pd.DataFrame.