Anomaly detection (or outlier detection) is a common problem in many industries such as finance (card fraud detection), cyber security (intrusion detection), manufacturing (fault detection) or medicine (anomalous ECG signal). In many of these applications, the training data collected take the form of time series. In this post, we will review the different anomaly detection approaches in time series data.
1. Classifying the outlier detection techniques
1.1. Input data
- Univariate time series: only one set of observation varying with time.
- Multivariate time series: composed of 2 or more sets of observation recorded during the same period of time.
1.2. Outlier type
- Point outlier: when one specific point behaves unusually at a specific time compared to the other values.
(a) O1 and O2 are univariate point outliers.
(b) O1 and O2 are multivariate point outliers. O3 is a univariate point outlier in a multivariate time series.
- Subsequence outlier: when many consecutive points behave unusually
(a) O1 and O2 are univariate subsequence outliers.
(b) O1 and O2 are multivariate subsequence outliers. O3 is a univariate subsequence outlier in a multivariate time series.
- Outlier time series: when an entire time series is behaving unusually (only for multivariate time series)
Variable 4 is an outlier time series in a multivariate time series.
1.3. Nature of the method
- Univariate detection method: can only consider a single time-dependent variable (can be applied to univariate or multivariate time series)
- Multivariate detection method: can work with more than one time-dependent variable (can only be applied to multivariate time series)
2. Point outlier detection in univariate time series
- Temporal method: if the temporal order of the time series is important for the performance of the method (non-temporal methods will give the same results if the time series is shuffled)
- Streaming methods: if it is able to detect a new incoming data point as an outlier .
2.1. Model-based techniques
An anomaly is detected at a time t if the distance to its expected value is higher than a predefined threshold. These techniques fit a model to calculate the expected value, hence the name.
- Estimation model-based methods: if the expected value is calculated using past, current and future data.
- Prediction model-based methods: if the expected value is calculated using only the past data (suitable for detecting anomalies in streaming incoming data).
|Paper||Estimation model-based methods|
|Basu and Meckesheimer 2007||Calculate the expected value with the median|
|Mehrang et al. 2015||Calculate the expected value with the Median Absolute Deviation|
|Dani et al. 2015||Calculate the expected value with the mean of time serie segments|
|Chen et al. 2010||Model with B-splines or kernels|
|Carter and Streilein 2012||Model with the Exponentially Weighted Moving|
|Song et al. 2015, Zhang et al. 2016||Model with slope constraints|
|Mehrang et al. 2015||Assume normality in the data if outliers are removed|
|Reddy et al. 2017||Model with Gaussian Mixture Models (GMM)|
|Hochenbaum et al. 2017||Use a predictive model (STL decomposition), then analyse the residuals to identify the outliers|
|Akouemo and Povinelli 2014||Use a predictive model (ARIMA), then analyse the residuals to identify the outliers|
|Akouemo and Povinelli 2016||Use a predictive model (linear regression), then analyse the residuals to identify the outliers|
|Akouemo and Povinelli 2017||Use a predictive model (ANN), then analyse the residuals to identify the outliers|
|Paper||Prediction model-based methods|
|Munir et al. 2019||DeepAnT: CNN forecasting future values and compare with new data point|
|Hill and Minsker 2010||Use autoregressive model to calculate a confidence interval|
|Zhang et al. 2012||Use ARIMA model to calculate a confidence interval|
|Siffer et al. 2017||SPOT and DSPOT: detect anomalies with extreme value theory|
|Xu et al. (2016, 2017)||Use Student-t processes to compute the prediction|
interval and update the model incrementally for each newly arrived data point
|Ahmad et al. 2017||Use the Hierarchical Temporal Memory (HTM) network to predict the new data point and update the model incrementally|
|Hundman et al. 2018||Use a LSTM to calculate the predicted value and compare it with the new data point.|
2.2. Density-based methods
In these methods, an anomaly is detected at a data point Xt if less than k neighbors are contained within a threshold of + or – a value R (this is calculated inside a rolling window, since we are dealing with time series).
For example, for a sliding window of length = 11, a number of neighbors k = 3 and a threshold R = 0.5, the points S4 and I10 are not outliers but O13 is an outlier. However when considering the window at time t = 17, the data point I13 is not an outlier anymore.
Fassetti (2007, 2010)
|Use a sliding window to detect outliers in streaming time series.|
|Ishimtsev et al. 2017||Use a sliding window to detect outliers in streaming time series.|
These methods consists in building a histogram representation of the time series, then detecting the points whose removal from the time series produces a histogram with lower error than the original.
Jagadish et al. 1999
Muthukrishnan et al. 2004
3. Point outlier detection in multivariate time series
3.1. Univariate techniques
Since a multivariate time series is composed of more than one time-dependent
variable, a univariate analysis can be performed for each variable to detect univariate point outliers, without considering dependencies that may exist between the variables. The same methods discussed in section 2 can be applied.
However, ignoring the correlation dependencies in multivariate time series lead to a loss of information. One solution consists in applying dimensionality reduction techniques to simply the multivariate time series into a lower dimensional representation. The new multivariate time series is composed of uncorrelated variables and univariate detection techniques can then be applied.
|Paper||Dimensionality reduction for multivariate time series|
|Papadimitriou et al. 2005||Apply PCA to uncorrelate the time series, then use an AR prediction model.|
|Galeano et al. 2006||Reduce the dimensionality with projection pursuit, then use univariate statistical tests on each projected univariate time series [Chen and Liu 1993; Fox 1972]|
|Baragona and Battaglia 2007||Use Independent Component Analysis (ICA), then calculate the mean and standard error on the reduced data to identify outliers|
|Lu et al. 2018||Reduce the input multivariate time series into a single time-dependent variable, then identify outliers by looking at the correlation between 2 adjacent data points|
|Shahriar et al. 2016||Reduce the input multivariate time series into a univariate series, then calculate the expected value to detect outliers|
3.2. Multivariate techniques
These methods take into account the dependencies between the variables of a multivariate time series.
The model-based multivariate methods estimate an expected value and compare it with the actual value. There are 2 main types:
- estimation model: use past, current and future values (typically, autoencoder)
- prediction model: only use past values (more adapted for detecting outliers in a streaming fashion)
|Paper||Estimation model-based multivariate methods|
|Sakurada and Yairi 2014||Autoencoder is used to reconstruct the original time series but fails to reconstruct an outlier.|
|Kieu et al. 2018||Extract features with sliding windows before applying the autoencoder (to account for the temporal dependencies)|
|Su et al. 2019||Use a variational autoencoder + gated recurrent unit to reconstruct the expected value|
|Zhou et al. [2019, 2018b]||Use a non-parametric model to estimate the expected value.|
|Paper||Prediction model-based multivariate methods|
|Zhou et al. 2016||Contextual Hidden Markov Model (CHMM) is used to estimate the expected value incrementally.|
|Munir et al. 2019||DeepAnt: use a CNN prediction model to detect point outliers in multivariate time series|
These methods consists in measuring the dissimilarity between 2 multivariate points to identify outliers.
Cheng et al. [2008, 2009]
Li et al. 
It’s a generalisation of the univariate case where the aim is to detect the vectors that should be removed so that the compressed representation (histogram) of the remaining data is improved.
Muthukrishnan et al. 
4. Subsequence outliers
The aim is to identify a set of consecutive points that jointly behave unusually. This is more challenging than point outlier detection.
4.1. Univariate time series
4.1.2. Discord detection
This can be done using the HOT-SAX algorithm.
|Paper||Discord detection for univariate subsequence outlier|
|Keogh et al. 2005, 2007||Compare each subsequence with the other (Brute-force + HOT-SAX)|
|Lin et al. 2005||Compare each subsequence with the other (Brute-force + HOT-SAX)|
|Bu et al. 2007||HOT-SAX|
|Fu et al. 2006||HOT-SAX|
|Li et al. 2013||HOT-SAX|
|Sanchez and Bustos 2014||HOT-SAX|
|Chau et al. 2018||HOT-SAX|
|Liu et al. ||HOT-SAX|
|Chau et al. ||HOT-SAX|
|Senin et al. ||variable-length discords|
|Keogh et al. 2001||Piecewise Aggregate Approximation (PAA)|
4.1.2. Dissimilarity-based detection
These techniques are based on the direct comparison of subsequences using a reference of normality.
The reference of normality can be either the same time series, an external time series or the previous subsequence.
|Paper||Dissimilarity-based detection for univariate subsequence outlier|
|Chen and Cook ||Same time series|
|Chen et al. ||Same time series|
|Izakian and Pedrycz ||Same time series|
|Silva et al. 2013||Same time series|
|Wang et al. ||Same time series|
|Ren et al. ||Same time series|
|Moonesinghe and Tan 2006||Same time series|
|Jones et al. ||External time series|
|Carrera et al. [2016, 2019]||External time series|
|Longoni et al. 2018||External time series|
|Wei et al. ||Previous subsequence|
|Kumar et al. ||Previous subsequence|
4.1.3. Prediction Model-based
These methods intend to build a prediction model that captures the dynamics of the series using past data and thus make predictions of the future. Subsequences that are far from those predictions are flagged as outliers.
Munir et al. : use a CNN
A subsequence is an outlier if it does not appear as frequently as expected.
|Paper||Frequency-based detection for univariate subsequence outlier|
|Keogh et al. .|
|Rasheed and Alhajj |
4.1.5. Information Theory
The methods assume that a subsequence that occurs frequently is less surprising and thus carries less information than a rare subsequence. Therefore, the aim is to find infrequent but still repetitive subsequences with rare symbols, using the same time series as the reference of normality.