Best Anomaly Detection Library : Kats, ARUNDO-ADTK, PyOD, Luminaire

Facebook Kats

Kats is a toolkit to analyze time series data, a lightweight, easy-to-use, and generalizable framework to perform time series analysis. Time series analysis is an essential component of Data Science and Engineering work at industry, from understanding the key statistics and characteristics, detecting regressions and anomalies, to forecasting future trends. Kats aims to provide the one-stop shop for time series analysis, including detection, forecasting, feature extraction/embedding, multivariate analysis, etc. Kats is released by Facebook's Infrastructure Data Science team. It is available for download on PyPI.

* Site : https://facebookresearch.github.io/Kats/

* install : pip install kats

Anomaly Detector Feature

Changepoint Detection
1.1 CUSUMDetector
1.2 BOCPDetector
1.3 RobustStatDetector
1.4 Comparing the Changepoint Detectors

Outlier Detection
2.1 OutlierDetector
2.2 MultivariateAnomalyDetector

Trend Detection
3.1 MKDetector

<Relative Posting>

Python Anomaly Detection Library : Kats

ARUNDO-ADTK

Anomaly Detection Toolkit (ADTK) is a Python package for unsupervised / rule-based time series anomaly detection.

As the nature of anomaly varies over different cases, a model may not work universally for all anomaly detection problems. Choosing and combining detection algorithms (detectors), feature engineering methods (transformers), and ensemble methods (aggregators) properly is the key to build an effective anomaly detection model.

This package offers a set of common detectors, transformers and aggregators with unified APIs, as well as pipe classes that connect them together into a model. It also provides some functions to process and visualize time series and anomaly events.

* Install : pip install adtk

* Site : https://arundo-adtk.readthedocs-hosted.com/en/stable/

Anomaly Detector Feature

Outlier Detection

Spike Detection

Level Shift Detection

Pattern Change Detection

Seasonality Detection

Anomaly Detection Python Library : Arundo-adtk

PyOD

PyOD is a comprehensive and scalable Python toolkit for detecting outlying objects in multivariate data. This exciting yet challenging field is commonly referred as Outlier Detection or Anomaly Detection.

PyOD includes more than 30 detection algorithms, from classical LOF (SIGMOD 2000) to the latest ECOD (TKDE 2020). Since 2017, PyOD [AZNL19] has been successfully used in numerous academic researches and commercial products [AZHC+21, AZNHL19] with more than 6 million downloads. It is also well acknowledged by the machine learning community with various dedicated posts/tutorials, including Analytics Vidhya, Towards Data Science, KDnuggets, and awesome-machine-learning.

* install : pip install pyod

* site : https://pyod.readthedocs.io/en/latest/

Anomaly Detector Feature

Individual Detection Algorithms

Type	Abbr	Algorithm	Ref
Probabilistic	ECOD	Unsupervised Outlier Detection Using Empirical Cumulative Distribution Functions	[22]
Probabilistic	ABOD	Angle-Based Outlier Detection	[17]
Probabilistic	FastABOD	Fast Angle-Based Outlier Detection using approximation	[17]
Probabilistic	COPOD	COPOD: Copula-Based Outlier Detection	[21]
Probabilistic	MAD	Median Absolute Deviation (MAD)	[14]
Probabilistic	SOS	Stochastic Outlier Selection	[15]
Linear Model	PCA	Principal Component Analysis (the sum of weighted projected distances to the eigenvector hyperplanes)	[32]
Linear Model	MCD	Minimum Covariance Determinant (use the mahalanobis distances as the outlier scores)	[12] [29]
Linear Model	CD	Use Cook's distance for outlier detection	[9]
Linear Model	OCSVM	One-Class Support Vector Machines	[31]
Linear Model	LMDD	Deviation-based Outlier Detection (LMDD)	[6]
Proximity-Based	LOF	Local Outlier Factor	[7]
Proximity-Based	COF	Connectivity-Based Outlier Factor	[33]
Proximity-Based	(Incremental) COF	Memory Efficient Connectivity-Based Outlier Factor (slower but reduce storage complexity)	[33]
Proximity-Based	CBLOF	Clustering-Based Local Outlier Factor	[13]
Proximity-Based	LOCI	LOCI: Fast outlier detection using the local correlation integral	[25]
Proximity-Based	HBOS	Histogram-based Outlier Score	[10]
Proximity-Based	kNN	k Nearest Neighbors (use the distance to the kth nearest neighbor as the outlier score)	[28]
Proximity-Based	AvgKNN	Average kNN (use the average distance to k nearest neighbors as the outlier score)	[5]
Proximity-Based	MedKNN	Median kNN (use the median distance to k nearest neighbors as the outlier score)	[5]
Proximity-Based	SOD	Subspace Outlier Detection	[18]
Proximity-Based	ROD	Rotation-based Outlier Detection	[4]
Outlier Ensembles	IForest	Isolation Forest	[23]
Outlier Ensembles	FB	Feature Bagging	[19]
Outlier Ensembles	LSCP	LSCP: Locally Selective Combination of Parallel Outlier Ensembles	[36]
Outlier Ensembles	XGBOD	Extreme Boosting Based Outlier Detection (Supervised)	[35]
Outlier Ensembles	LODA	Lightweight On-line Detector of Anomalies	[26]
Outlier Ensembles	SUOD	SUOD: Accelerating Large-scale Unsupervised Heterogeneous Outlier Detection (Acceleration)	[37]
Neural Networks	AutoEncoder	Fully connected AutoEncoder (use reconstruction error as the outlier score)	[1] [Ch.3]
Neural Networks	VAE	Variational AutoEncoder (use reconstruction error as the outlier score)	[16]
Neural Networks	Beta-VAE	Variational AutoEncoder (all customized loss term by varying gamma and capacity)	[8]
Neural Networks	SO_GAAL	Single-Objective Generative Adversarial Active Learning	[24]
Neural Networks	MO_GAAL	Multiple-Objective Generative Adversarial Active Learning	[24]
Neural Networks	DeepSVDD	Deep One-Class Classification	[30]

(ii) Outlier Ensembles & Outlier Detector Combination Frameworks:

Type	Abbr	Algorithm	Ref
Outlier Ensembles		Feature Bagging	[19]
Outlier Ensembles	LSCP	LSCP: Locally Selective Combination of Parallel Outlier Ensembles	[36]
Outlier Ensembles	XGBOD	Extreme Boosting Based Outlier Detection (Supervised)	[35]
Outlier Ensembles	LODA	Lightweight On-line Detector of Anomalies	[26]
Outlier Ensembles	SUOD	SUOD: Accelerating Large-scale Unsupervised Heterogeneous Outlier Detection (Acceleration)	[37]
Combination	Average	Simple combination by averaging the scores	[2]
Combination	Weighted Average	Simple combination by averaging the scores with detector weights	[2]
Combination	Maximization	Simple combination by taking the maximum scores	[2]
Combination	AOM	Average of Maximum	[2]
Combination	MOA	Maximization of Average	[2]
Combination	Median	Simple combination by taking the median of the scores	[2]
Combination	majority Vote	Simple combination by taking the majority vote of the labels (weights can be used)	[2]

Luminair

Luminaire is a python package that provides ML driven solutions for monitoring time series data. Luminaire provides several anomaly detection and forecasting capabilities that incorporate correlational and seasonal patterns in the data over time as well as uncontrollable variations. Specifically, Luminaire is equipped with the following key features:

Generic Anomaly Detection: Luminaire is a generic anomaly detection tool containing several classes of time series models focused toward catching any irregular fluctuations over different kinds of time series data.
Fully Automatic: Luminaire performs optimizations over different sets of hyperparameters and several model classes to pick the optimal model for the time series under consideration. No model configuration is required from the user.
Supports Diverse Anomaly Detection Types: Luminaire supports different detection types:
- Outlier Detection
- Data Shift Detection
- Trend Change Detection
- Null Data Detection
- Density comparison for streaming data

* install : pip install luminaire

* github : https://github.com/zillow/luminaire/

* site : https://zillow.github.io/luminaire/

Data Exploration and Profiling

Luminaire performs exploratory profiling on the data before progressing to optimization and training. This step provides batch insights about the raw training data on a given time window and also enables automated decisions regarding data pre-processing during the optimization process. These tests and pre-processing steps include:

Checking for recent data shifts
Detecting recent trend changes
Stationarity adjustments
Imputation of missing data

Outlier Detection

Luminaire generates a model for a given time series based on its recent patterns. Luminaire implements several modeling techniques to learn different variational patterns of the data that ranges from ARIMA, Filtering Models, and Fourier Transform. Luminaire incorporates the global characteristics while learning the local patterns in order to make the learning process robust to any local fluctuations and for faster execution.

Configuration Optimization for Outlier Detection Models

Luminaire combines many techniques under hood to find the optimal model for every time series. Hyperopt is used at its core to optimize over the global hyperparameters for a given time series. In addition, Luminaire identifies whether a time series shows exponential characteristics in terms of its variational patterns, whether holidays have any effects on the time series, and whether the time series shows a long term correlational or Markovian pattern (depending on the last value only).

Anomaly Detection for Streaming Data

Luminaire performs anomaly detection over streaming data by comparing the volume density of the incoming data stream with a preset baseline time series window. Luminaire is capable of tracking time series windows over different data frequencies and is autoconfigured to support most typical streaming use cases.

source: https://unsplash.com/photos/-kcv5g5FO-o

'Data Science > Anomaly Dectection' 카테고리의 다른 글

TadGAN 관련 자료 모음 (0)	2022.09.16
Python Anomaly Detection Library : Kats (0)	2022.04.21
Anomaly Detection Python Library : Arundo-adtk (0)	2022.04.18
Time Series Data Anomaly Detection (시계열 데이터 이상 탐지) (0)	2022.01.24
Autoencoder based Anomaly Detection (0)	2022.01.24