Introduce
Kats is a toolkit to analyze time series data, a lightweight, easy-to-use, and generalizable framework to perform time series analysis. Time series analysis is an essential component of Data Science and Engineering work at industry, from understanding the key statistics and characteristics, detecting regressions and anomalies, to forecasting future trends. Kats aims to provide the one-stop shop for time series analysis, including detection, forecasting, feature
extraction/embedding, multivariate analysis, etc. Kats is released by Facebook's Infrastructure Data Science team. It is available for download on PyPI.
Forecasting
Kats provides a full set of tools for forecasting that includes 10+ individual forecasting models, ensembling, a selfsupervised learning (meta-learning) model, backtesting, hyperparameter tuning, and empirical prediction intervals.
Detection
Kats supports functionalities to detect various patterns on time series data, including seasonalities, outlier, change point, and slow trend changes.
TSFeatures
The time series feature (TSFeature) extraction module in Kats can produce 65 features with clear statistical definitions, which can be incorporated in most machine learning (ML) models, such as classification and regression.
Utilities
Kats also provides a set of useful utilities, such as time series
Detection with Kats
- Changepoint Detection
1.1 CUSUMDetector
1.2 BOCPDetector
1.3 RobustStatDetector
1.4 Comparing the Changepoint Detectors - Outlier Detection
2.1 OutlierDetector
2.2 MultivariateAnomalyDetector - Trend Detection
3.1 MKDetector
Source : https://github.com/facebookresearch/Kats/blob/main/tutorials/kats_202_detection.ipynb
Install and Download sample-data
%%capture
# For Google Colab:
!pip install kats
!wget https://raw.githubusercontent.com/facebookresearch/Kats/main/kats/data/air_passengers.csv
!wget https://raw.githubusercontent.com/facebookresearch/Kats/main/kats/data/multivariate_anomaly_simulated_data.csv
1. Changepoint Detection
Changepoint detection tries to identify times when the probability distribution of a stochastic process or time series changes, e.g. the change of mean in a time series. It is one of the most popular detection tasks in time series analysis.
1.1 CUSUMDetector
CUSUM is a method to detect an up/down shift of means in a time series. Our implementation has two main steps:
- Locate the change point: This is an iterative process where we initialize a change point (in the middle of the time series) and CUSUM time series based on this change point. The next changepoint is the location where the previous CUSUM time series is maximized (or minimized). This iteration continues until either 1) a stable changepoint is found or 2) we exceed the limit number of iterations.
- Test the change point for statistical significance: Conduct log likelihood ratio test to test if the mean of the time series changes at the changepoint calculated in Step 1. The null hypothesis is that there is no change in mean.
By default, we report a detected changepoint if and only if we reject the null hypothesis in Step 2. If we want to see all the changepoints, we can use the return_all_changepoints parameter in CUSUMDetector and set it to True.
Here are a few additional points worth mentioning:
- We assume there is at most one increase change point and at most one decrease change point. You can use the change_directions argument in the detector to specify whether you are looking an increase, a decrease, or both (default is both).
- We use Gaussian distribution as the underlying model to calculate the CUSUM time series value and conduct the hypothesis test.
The full set of parameters for the detector method in CUSUMDetector, all of which are optional and have default values, are as follows:
- threshold: float, significance level;
- max_iter: int, maximum iteration in finding the changepoint;
- delta_std_ratio: float, the mean delta has to be larger than this parameter times std of the data to be considered as a change;
- min_abs_change: int, minimal absolute delta between mu0 and mu1
- start_point: int, the start idx of the changepoint, None means the middle of the time series;
- change_directions: list[str], a list contain either or both 'increase' and 'decrease' to specify what type of change to be detected;
- interest_window: list[int, int], a list containing the start and end of the interest window where we will look for a change point. Note that the llr will still be calculated using all data points;
- magnitude_quantile: float, the quantile for magnitude comparison, if none, will skip the magnitude comparison;
- magnitude_ratio: float, comparable ratio;
- magnitude_comparable_day: float, maximal percentage of days can have comparable magnitude to be considered as regression;
- return_all_changepoints: bool, return all the changepoints found, even the insignificant ones.
CUSUMDetector - Basic Usage
from kats.consts import TimeSeriesData
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import logging
from imp import reload
from datetime import datetime, timedelta
# import packages
from kats.detectors.cusum_detection import CUSUMDetector
# synthesize data with simulation
np.random.seed(10)
df_increase_decrease = pd.DataFrame(
{
'time': pd.date_range('2019-01-01', periods=60),
'increase':np.concatenate([np.random.normal(1,0.2,30), np.random.normal(2,0.2,30)]),
'decrease':np.concatenate([np.random.normal(1,0.3,50), np.random.normal(0.5,0.3,10)]),
}
)
#generate the kats.TimeSeriesData
tsd = TimeSeriesData(df_increase_decrease.loc[:,['time','increase']])
#create a detector
detector = CUSUMDetector(tsd)
#dectect a chagne-points
change_points = detector.detector(change_directions=["increase"])
#show detecting-result.
detector.plot(change_points)
plt.xticks(rotation=45)
plt.show()
1.2 BOCPDetector
Bayesian Online Change Point Detection (BOCPD) is a method for detecting sudden changes in a time series that persist over time. Our implementation is faithful to "Bayesian Online Changepoint Detection" (Adams & McKay, 2007). There are a coupple properties that distinguish BOCPD from the other change point detection methods supported in Kats:
- Online Model: This detection does not need to know the entire series apriori. It only need to look a few steps ahead (specified by the lag parameter in the detector method) to make predictions, and it revises its predictions as new data arrives.
- Bayesian Model: The user can specify prior beliefs about the probability of a changepoint (using the changepoint_prior paremeter in the detector method) and specify the the parameters of the underlying probability model that generates the time series (using the model_parameters parameter in the detector method). Right now we support 3 diferent types of underlying probability models (specified using the model parameter in the detector method).
The basic idea of this detection method is uses Bayesian inference to decide if the next point is improbable. This requires the user to specify (or use default values for) the probability of a change point and underlying predictive model (UPM) that generates the incoming data points in the time series. Currently we support three different types of underlying models:
- Normal Distribution (unknown mean, known variance)
- Trend Change Distribution
- Poisson Process Model
The full set of parameters for the detector method in BOCPDetector, all of which are optional and have default values, are as follows:
- model: This specifies the underlying probabilistic model (UPM) that generates the data within each segment. Currently, allowed models are:
- NORMAL_KNOWN_MODEL: Normal model with variance known. Use this to find level shifts in normally distributed data.
- TREND_CHANGE_MODEL : This model assumes each segment is generated from ordinary linear regression. Use this model to understand changes in slope, or trend in time series.
- POISSON_PROCESS_MODEL: This assumes a Poisson generative model. Use this for count data, where most of the values are close to zero.
- model_parameters: Model Parameters correspond to specific parameters for a specific model. They are defined in the NormalKnownParameters, TrendChangeParameters, PoissonModelParameters classes.
- lag: integer referring to the lag in reporting the changepoint. We report the changepoint after seeing "lag" number of data points. Higher lag gives greater certainty that this is indeed a changepoint. Lower lag will detect the changepoint faster. This is the trade-off.
- changepoint_prior: This is a Bayesian algorithm. Hence, this parameter specifies the prior belief on the probability that a given point is a changepoint. For example, if you believe 10% of your data will be a changepoint, you can set this to 0.1.
- threshold: We report the probability of observing the changepoint at each instant. The actual changepoints are obtained by denoting the points above this threshold to be a changepoint.
- debug: This surfaces additional information, such as the plots of predicted means and variances, which allows the user to see debug why changepoints were not properly detected.
Note: When using the normal distribution for a UPM, there are two ways to pick the prior parameters:
- Use an empirical prior and estimate the parameters from the data (set attribute empirical=True in NormalKnownParameters)
- Specify the mean and precision of the distribution
BOCPDetector - Basic Usage
from kats.utils.simulator import Simulator
sim = Simulator(n=450, start='2020-01-01', freq='H')
ts_bocpd = sim.level_shift_sim(noise=0.05, seasonal_period=1)
# plot the simulated data
ts_bocpd.plot(cols=['value'])
Now we run BOCPD to find the change points in this simulated time series
from kats.detectors.bocpd import BOCPDetector, BOCPDModelType, TrendChangeParameters
# Initialize the detector
detector = BOCPDetector(ts_bocpd)
changepoints = detector.detector(
model=BOCPDModelType.NORMAL_KNOWN_MODEL # this is the default choice
)
# Plot the data
detector.plot(changepoints)
plt.xticks(rotation=45)
plt.show()
kats API
kats
- kats.consts module
- kats.detectors package
- kats.detectors.bocpd module
- kats.detectors.bocpd_model module
- kats.detectors.changepoint_evaluator module
- kats.detectors.cusum_detection module
- kats.detectors.cusum_model module
- kats.detectors.detector module
- kats.detectors.detector_consts module
- kats.detectors.hourly_ratio_detection module
- kats.detectors.outlier module
- kats.detectors.prophet_detector module
- kats.detectors.residual_translation module
- kats.detectors.robust_stat_detection module
- kats.detectors.seasonality module
- kats.detectors.stat_sig_detector module
- kats.detectors.trend_mk module
- kats.detectors module
- kats.graphics package
- kats.models package
- kats.models.arima module
- kats.models.bayesian_var module
- kats.models.ensemble package
- kats.models.harmonic_regression module
- kats.models.holtwinters module
- kats.models.linear_model module
- kats.models.lstm module
- kats.models.metalearner package
- kats.models.model module
- kats.models.nowcasting package
- kats.models.prophet module
- kats.models.quadratic_model module
- kats.models.reconciliation package
- kats.models.sarima module
- kats.models.stlf module
- kats.models.theta module
- kats.models.var module
- kats.models module
- kats.tsfeatures package
- kats.utils package
'Data Science > Anomaly Dectection' 카테고리의 다른 글
TadGAN 관련 자료 모음 (0) | 2022.09.16 |
---|---|
Best Anomaly Detection Library : Kats, ARUNDO-ADTK, PyOD, Luminaire (0) | 2022.04.21 |
Anomaly Detection Python Library : Arundo-adtk (0) | 2022.04.18 |
Time Series Data Anomaly Detection (시계열 데이터 이상 탐지) (0) | 2022.01.24 |
Autoencoder based Anomaly Detection (0) | 2022.01.24 |
최근댓글