API Reference#
Generic#
- class tsod.RangeDetector(min_value=- inf, max_value=inf, quantiles=None)#
Detect values outside range.
- Parameters
min_value (float) – Minimum value threshold.
max_value (float) – Maximum value threshold.
quantiles (list[2]) – Default quantiles [0, 1]. Same as min and max value.
Examples
>>> normal_data = pd.Series(np.random.normal(size=100)) >>> abnormal_data = pd.Series(np.random.normal(size=100)) >>> abnormal_data[[2, 6, 15, 57, 60, 73]] = 5 >>> normal_data_with_some_outliers = pd.Series(np.random.normal(size=100)) >>> normal_data_with_some_outliers[[12, 13, 20, 90]] = 7
>>> detector = RangeDetector(min_value=0.0, max_value=2.0) >>> anomalies = detector.detect(abnormal_data)
>>> detector = RangeDetector() >>> detector.fit(normal_data) # min, max inferred from normal data >>> anomalies = detector.detect(abnormal_data)
>>> detector = RangeDetector(quantiles=[0.001,0.999]) >>> detector.fit(normal_data_with_some_outliers) >>> anomalies = detector.detect(abnormal_data)
- detect(data: pandas.core.series.Series) pandas.core.series.Series #
Detect anomalies
- Parameters
data (pd.Series) – Time series data with possible anomalies
- Returns
Time series with bools, True == anomaly
- Return type
pd.Series
- fit(data: pandas.core.series.Series)#
Set detector parameters based on data.
- Parameters
data (pd.Series) – Normal time series data.
- save(path: Union[str, pathlib.Path]) None #
Save a detector for later use
- Parameters
path (str or Path) – file-like object to load detector from
- validate(data: Union[pandas.core.series.Series, pandas.core.frame.DataFrame]) Union[pandas.core.series.Series, pandas.core.frame.DataFrame] #
Check that input data is in correct format and possibly adjust
- class tsod.ConstantValueDetector(window_size: int = 3, threshold: float = 1e-07)#
Detect constant values over a longer period.
Commonly caused by sensor failures, which get stuck at a constant level.
- detect(data: pandas.core.series.Series) pandas.core.series.Series #
Detect anomalies
- Parameters
data (pd.Series) – Time series data with possible anomalies
- Returns
Time series with bools, True == anomaly
- Return type
pd.Series
- fit(data: pandas.core.series.Series)#
Set detector parameters based on data.
- Parameters
data (pd.Series) – Normal time series data.
- save(path: Union[str, pathlib.Path]) None #
Save a detector for later use
- Parameters
path (str or Path) – file-like object to load detector from
- validate(data: Union[pandas.core.series.Series, pandas.core.frame.DataFrame]) Union[pandas.core.series.Series, pandas.core.frame.DataFrame] #
Check that input data is in correct format and possibly adjust
- class tsod.ConstantGradientDetector(window_size: int = 3)#
Detect constant gradients.
Typically caused by linear interpolation over a long interval.
- Parameters
window_size (int) – Minium window to consider as anomaly, default 3
- detect(data: pandas.core.series.Series) pandas.core.series.Series #
Detect anomalies
- Parameters
data (pd.Series) – Time series data with possible anomalies
- Returns
Time series with bools, True == anomaly
- Return type
pd.Series
- fit(data: pandas.core.series.Series)#
Set detector parameters based on data.
- Parameters
data (pd.Series) – Normal time series data.
- save(path: Union[str, pathlib.Path]) None #
Save a detector for later use
- Parameters
path (str or Path) – file-like object to load detector from
- validate(data: Union[pandas.core.series.Series, pandas.core.frame.DataFrame]) Union[pandas.core.series.Series, pandas.core.frame.DataFrame] #
Check that input data is in correct format and possibly adjust
- class tsod.GradientDetector(max_gradient=inf, direction='both')#
Detects abrupt changes
- Parameters
max_gradient (float) – Maximum rate of change per second, default np.inf
direction (str) – positive, negative or both, default=’both’
- detect(data: pandas.core.series.Series) pandas.core.series.Series #
Detect anomalies
- Parameters
data (pd.Series) – Time series data with possible anomalies
- Returns
Time series with bools, True == anomaly
- Return type
pd.Series
- fit(data: pandas.core.series.Series)#
Set detector parameters based on data.
- Parameters
data (pd.Series) – Normal time series data.
- save(path: Union[str, pathlib.Path]) None #
Save a detector for later use
- Parameters
path (str or Path) – file-like object to load detector from
- validate(data: Union[pandas.core.series.Series, pandas.core.frame.DataFrame]) Union[pandas.core.series.Series, pandas.core.frame.DataFrame] #
Check that input data is in correct format and possibly adjust
- class tsod.DiffDetector(max_diff=inf, direction='both')#
Detect sudden shifts in data. Irrespective of time axis.
- Parameters
max_diff (float) – Maximum change threshold.
direction (str) – positive, negative or both, default=’both’
See also
GradientDetector
similar functionality but considers actual time between data points
- detect(data: pandas.core.series.Series) pandas.core.series.Series #
Detect anomalies
- Parameters
data (pd.Series) – Time series data with possible anomalies
- Returns
Time series with bools, True == anomaly
- Return type
pd.Series
- fit(data: pandas.core.series.Series)#
Set detector parameters based on data.
- Parameters
data (pd.Series) – Normal time series data.
- save(path: Union[str, pathlib.Path]) None #
Save a detector for later use
- Parameters
path (str or Path) – file-like object to load detector from
- validate(data: Union[pandas.core.series.Series, pandas.core.frame.DataFrame]) Union[pandas.core.series.Series, pandas.core.frame.DataFrame] #
Check that input data is in correct format and possibly adjust
- class tsod.CombinedDetector(detectors)#
Combine detectors.
It is possible to combine several anomaly detection strategies into a combined detector.
Examples
>>> normal_data = pd.Series(np.random.normal(size=100)) >>> abnormal_data = pd.Series(np.random.normal(size=100)) >>> abnormal_data[[2, 6, 15, 57, 60, 73]] = 5
>>> anomaly_detector = CombinedDetector([RangeDetector(), DiffDetector()]) >>> anomaly_detector.fit(normal_data) >>> detected_anomalies = anomaly_detector.detect(abnormal_data)
- count(value) integer -- return number of occurrences of value #
- detect(data: pandas.core.series.Series) pandas.core.series.Series #
Detect anomalies
- Parameters
data (pd.Series) – Time series data with possible anomalies
- Returns
Time series with bools, True == anomaly
- Return type
pd.Series
- fit(data: pandas.core.series.Series)#
Set detector parameters based on data.
- Parameters
data (pd.Series) – Normal time series data.
- index(value[, start[, stop]]) integer -- return first index of value. #
Raises ValueError if the value is not present.
Supporting start and stop arguments is optional, but recommended.
- save(path: Union[str, pathlib.Path]) None #
Save a detector for later use
- Parameters
path (str or Path) – file-like object to load detector from
- validate(data: Union[pandas.core.series.Series, pandas.core.frame.DataFrame]) Union[pandas.core.series.Series, pandas.core.frame.DataFrame] #
Check that input data is in correct format and possibly adjust
Hampel#
- class tsod.hampel.HampelDetector(window_size=5, threshold=3)#
Hampel filter implementation that works on numpy arrays, implemented with numba.
- Parameters
window_size (int) – The window range is from [(i - window_size):(i + window_size)], so window_size is half of the window, counted in number of array elements (as opposed to specify a time span, which is not supported by this implementation)
threshold (float) – The threshold for marking an outlier. A low threshold “narrows” the band within which values are deemed as outliers. n_sigmas, default=3.0
- detect(data: pandas.core.series.Series) pandas.core.series.Series #
Detect anomalies
- Parameters
data (pd.Series) – Time series data with possible anomalies
- Returns
Time series with bools, True == anomaly
- Return type
pd.Series
- fit(data: pandas.core.series.Series)#
Set detector parameters based on data.
- Parameters
data (pd.Series) – Normal time series data.
- save(path: Union[str, pathlib.Path]) None #
Save a detector for later use
- Parameters
path (str or Path) – file-like object to load detector from
- validate(data: Union[pandas.core.series.Series, pandas.core.frame.DataFrame]) Union[pandas.core.series.Series, pandas.core.frame.DataFrame] #
Check that input data is in correct format and possibly adjust