API Reference#
Generic#
- class tsod.RangeDetector(min_value=-inf, max_value=inf, quantiles=None)#
Detect values outside range.
- Parameters:
min_value (float, default=-np.inf) – Minimum value threshold.
max_value (float, default=np.inf) – Maximum value threshold.
quantiles (list of float, optional) – Quantiles to use for determining min and max during fit. Default is [0.0, 1.0], which corresponds to absolute min and max values. Use values like [0.001, 0.999] to exclude extreme outliers.
Examples
>>> normal_data = pd.Series(np.random.normal(size=100)) >>> abnormal_data = pd.Series(np.random.normal(size=100)) >>> abnormal_data[[2, 6, 15, 57, 60, 73]] = 5 >>> normal_data_with_some_outliers = pd.Series(np.random.normal(size=100)) >>> normal_data_with_some_outliers[[12, 13, 20, 90]] = 7
>>> detector = RangeDetector(min_value=0.0, max_value=2.0) >>> anomalies = detector.detect(abnormal_data)
>>> detector = RangeDetector() >>> detector.fit(normal_data) # min, max inferred from normal data >>> anomalies = detector.detect(abnormal_data)
>>> detector = RangeDetector(quantiles=[0.001,0.999]) >>> detector.fit(normal_data_with_some_outliers) >>> anomalies = detector.detect(abnormal_data)
- detect(data: Series) Series#
Detect anomalies.
- Parameters:
data (pd.Series) – Time series data with possible anomalies.
- Returns:
Time series with bools, True == anomaly.
- Return type:
pd.Series
- fit(data: Series) Detector#
Set detector parameters based on data.
- Parameters:
data (pd.Series) – Normal time series data.
- Returns:
Returns self for method chaining.
- Return type:
Detector
- save(path: str | Path) None#
Save a detector for later use.
- Parameters:
path (str or Path) – File path to save the detector to.
- validate(data: Series | DataFrame) Series | DataFrame#
Check that input data is in correct format and possibly adjust.
- Parameters:
data (pd.Series or pd.DataFrame) – Input data to validate.
- Returns:
Validated data.
- Return type:
pd.Series or pd.DataFrame
- Raises:
WrongInputDataTypeError – If data is not a pd.Series or pd.DataFrame.
- class tsod.ConstantValueDetector(window_size: int = 3, threshold: float = 1e-07)#
Detect contiguous periods of constant values within a configurable time window.
Commonly caused by sensor failures, which get stuck at a constant level.
- Parameters:
window_size (int, default=3) – Number of consecutive points to evaluate.
threshold (float, default=1e-7) – Maximum variation (max - min) within window to consider constant.
- detect(data: Series) Series#
Detect anomalies.
- Parameters:
data (pd.Series) – Time series data with possible anomalies.
- Returns:
Time series with bools, True == anomaly.
- Return type:
pd.Series
- fit(data: Series) Detector#
Set detector parameters based on data.
- Parameters:
data (pd.Series) – Normal time series data.
- Returns:
Returns self for method chaining.
- Return type:
Detector
- save(path: str | Path) None#
Save a detector for later use.
- Parameters:
path (str or Path) – File path to save the detector to.
- validate(data: Series | DataFrame) Series | DataFrame#
Check that input data is in correct format and possibly adjust.
- Parameters:
data (pd.Series or pd.DataFrame) – Input data to validate.
- Returns:
Validated data.
- Return type:
pd.Series or pd.DataFrame
- Raises:
WrongInputDataTypeError – If data is not a pd.Series or pd.DataFrame.
- class tsod.ConstantGradientDetector(window_size: int = 3)#
Detect constant gradients.
Typically caused by linear interpolation over a long interval.
- Parameters:
window_size (int, default=3) – Minimum window size to consider as anomaly.
- detect(data: Series) Series#
Detect anomalies.
- Parameters:
data (pd.Series) – Time series data with possible anomalies.
- Returns:
Time series with bools, True == anomaly.
- Return type:
pd.Series
- fit(data: Series) Detector#
Set detector parameters based on data.
- Parameters:
data (pd.Series) – Normal time series data.
- Returns:
Returns self for method chaining.
- Return type:
Detector
- save(path: str | Path) None#
Save a detector for later use.
- Parameters:
path (str or Path) – File path to save the detector to.
- validate(data: Series | DataFrame) Series | DataFrame#
Check that input data is in correct format and possibly adjust.
- Parameters:
data (pd.Series or pd.DataFrame) – Input data to validate.
- Returns:
Validated data.
- Return type:
pd.Series or pd.DataFrame
- Raises:
WrongInputDataTypeError – If data is not a pd.Series or pd.DataFrame.
- class tsod.GradientDetector(max_gradient=inf, direction='both')#
Detect abrupt changes in time series data.
Requires data with a DatetimeIndex. Calculates rate of change per second.
- Parameters:
max_gradient (float, default=np.inf) – Maximum rate of change per second.
direction ({'both', 'positive', 'negative'}, default='both') – Direction of change to detect. ‘positive’ detects only increases, ‘negative’ detects only decreases, ‘both’ detects changes in either direction.
- detect(data: Series) Series#
Detect anomalies.
- Parameters:
data (pd.Series) – Time series data with possible anomalies.
- Returns:
Time series with bools, True == anomaly.
- Return type:
pd.Series
- fit(data: Series) Detector#
Set detector parameters based on data.
- Parameters:
data (pd.Series) – Normal time series data.
- Returns:
Returns self for method chaining.
- Return type:
Detector
- save(path: str | Path) None#
Save a detector for later use.
- Parameters:
path (str or Path) – File path to save the detector to.
- validate(data: Series | DataFrame) Series | DataFrame#
Check that input data is in correct format and possibly adjust.
- Parameters:
data (pd.Series or pd.DataFrame) – Input data to validate.
- Returns:
Validated data.
- Return type:
pd.Series or pd.DataFrame
- Raises:
WrongInputDataTypeError – If data is not a pd.Series or pd.DataFrame.
- class tsod.DiffDetector(max_diff=inf, direction='both')#
Detect sudden shifts in data, irrespective of time axis.
- Parameters:
max_diff (float, default=np.inf) – Maximum change threshold between consecutive points.
direction ({'both', 'positive', 'negative'}, default='both') – Direction of change to detect. ‘positive’ detects only increases, ‘negative’ detects only decreases, ‘both’ detects changes in either direction.
See also
GradientDetectorSimilar functionality but considers actual time between data points.
- detect(data: Series) Series#
Detect anomalies.
- Parameters:
data (pd.Series) – Time series data with possible anomalies.
- Returns:
Time series with bools, True == anomaly.
- Return type:
pd.Series
- fit(data: Series) Detector#
Set detector parameters based on data.
- Parameters:
data (pd.Series) – Normal time series data.
- Returns:
Returns self for method chaining.
- Return type:
Detector
- save(path: str | Path) None#
Save a detector for later use.
- Parameters:
path (str or Path) – File path to save the detector to.
- validate(data: Series | DataFrame) Series | DataFrame#
Check that input data is in correct format and possibly adjust.
- Parameters:
data (pd.Series or pd.DataFrame) – Input data to validate.
- Returns:
Validated data.
- Return type:
pd.Series or pd.DataFrame
- Raises:
WrongInputDataTypeError – If data is not a pd.Series or pd.DataFrame.
- class tsod.CombinedDetector(detectors)#
Combine detectors.
It is possible to combine several anomaly detection strategies into a combined detector. Anomalies are detected if ANY of the constituent detectors flags an anomaly (OR logic).
- Parameters:
detectors (list of Detector) – List of detector instances to combine.
Examples
>>> normal_data = pd.Series(np.random.normal(size=100)) >>> abnormal_data = pd.Series(np.random.normal(size=100)) >>> abnormal_data[[2, 6, 15, 57, 60, 73]] = 5
>>> anomaly_detector = CombinedDetector([RangeDetector(), DiffDetector()]) >>> anomaly_detector.fit(normal_data) >>> detected_anomalies = anomaly_detector.detect(abnormal_data)
- count(value) integer -- return number of occurrences of value#
- detect(data: Series) Series#
Detect anomalies.
- Parameters:
data (pd.Series) – Time series data with possible anomalies.
- Returns:
Time series with bools, True == anomaly.
- Return type:
pd.Series
- fit(data: Series) Detector#
Set detector parameters based on data.
- Parameters:
data (pd.Series) – Normal time series data.
- Returns:
Returns self for method chaining.
- Return type:
Detector
- index(value[, start[, stop]]) integer -- return first index of value.#
Raises ValueError if the value is not present.
Supporting start and stop arguments is optional, but recommended.
- save(path: str | Path) None#
Save a detector for later use.
- Parameters:
path (str or Path) – File path to save the detector to.
- validate(data: Series | DataFrame) Series | DataFrame#
Check that input data is in correct format and possibly adjust.
- Parameters:
data (pd.Series or pd.DataFrame) – Input data to validate.
- Returns:
Validated data.
- Return type:
pd.Series or pd.DataFrame
- Raises:
WrongInputDataTypeError – If data is not a pd.Series or pd.DataFrame.
Hampel#
- class tsod.hampel.HampelDetector(window_size=5, threshold=3)#
Hampel filter implementation that works on numpy arrays, implemented with numba.
- Parameters:
window_size (int) – The window range is from [(i - window_size):(i + window_size)], so window_size is half of the window, counted in number of array elements (as opposed to specify a time span, which is not supported by this implementation)
threshold (float) – The threshold for marking an outlier. A low threshold “narrows” the band within which values are deemed as outliers. n_sigmas, default=3.0
- detect(data: Series) Series#
Detect anomalies.
- Parameters:
data (pd.Series) – Time series data with possible anomalies.
- Returns:
Time series with bools, True == anomaly.
- Return type:
pd.Series
- fit(data: Series) Detector#
Set detector parameters based on data.
- Parameters:
data (pd.Series) – Normal time series data.
- Returns:
Returns self for method chaining.
- Return type:
Detector
- save(path: str | Path) None#
Save a detector for later use.
- Parameters:
path (str or Path) – File path to save the detector to.
- validate(data: Series | DataFrame) Series | DataFrame#
Check that input data is in correct format and possibly adjust.
- Parameters:
data (pd.Series or pd.DataFrame) – Input data to validate.
- Returns:
Validated data.
- Return type:
pd.Series or pd.DataFrame
- Raises:
WrongInputDataTypeError – If data is not a pd.Series or pd.DataFrame.