Anomaly detection for time series

Anomaly detection (or outlier detection) is a common problem in many industries such as finance (card fraud detection), cyber security (intrusion detection), manufacturing (fault detection) or medicine (anomalous ECG signal). In many of these applications, the training data collected take the form of time series. In this post, we will review the different anomaly detection approaches in time series data.

1. Classifying the outlier detection techniques

Source

1.1. Input data

  • Univariate time series: only one set of observation varying with time.
  • Multivariate time series: composed of 2 or more sets of observation recorded during the same period of time.

1.2. Outlier type

  • Point outlier: when one specific point behaves unusually at a specific time compared to the other values.
Source

(a) O1 and O2 are univariate point outliers.
(b) O1 and O2 are multivariate point outliers. O3 is a univariate point outlier in a multivariate time series.

  • Subsequence outlier: when many consecutive points behave unusually
Source

(a) O1 and O2 are univariate subsequence outliers.
(b) O1 and O2 are multivariate subsequence outliers. O3 is a univariate subsequence outlier in a multivariate time series.

  • Outlier time series: when an entire time series is behaving unusually (only for multivariate time series)
Source

Variable 4 is an outlier time series in a multivariate time series.

1.3. Nature of the method

  • Univariate detection method: can only consider a single time-dependent variable (can be applied to univariate or multivariate time series)
  • Multivariate detection method: can work with more than one time-dependent variable (can only be applied to multivariate time series)

2. Point outlier detection in univariate time series

  • Temporal method: if the temporal order of the time series is important for the performance of the method (non-temporal methods will give the same results if the time series is shuffled)
  • Streaming methods: if it is able to detect a new incoming data point as an outlier .
Source

2.1. Model-based techniques

An anomaly is detected at a time t if the distance to its expected value is higher than a predefined threshold. These techniques fit a model to calculate the expected value, hence the name.

Source
  • Estimation model-based methods: if the expected value is calculated using past, current and future data.
  • Prediction model-based methods: if the expected value is calculated using only the past data (suitable for detecting anomalies in streaming incoming data).
PaperEstimation model-based methods
Basu and Meckesheimer 2007Calculate the expected value with the median
Mehrang et al. 2015Calculate the expected value with the Median Absolute Deviation
Dani et al. 2015Calculate the expected value with the mean of time serie segments
Chen et al. 2010Model with B-splines or kernels
Carter and Streilein 2012Model with the Exponentially Weighted Moving
Average (EWMA)
Song et al. 2015, Zhang et al. 2016Model with slope constraints
Mehrang et al. 2015Assume normality in the data if outliers are removed
Reddy et al. 2017Model with Gaussian Mixture Models (GMM)
Hochenbaum et al. 2017Use a predictive model (STL decomposition), then analyse the residuals to identify the outliers
Akouemo and Povinelli 2014Use a predictive model (ARIMA), then analyse the residuals to identify the outliers
Akouemo and Povinelli 2016Use a predictive model (linear regression), then analyse the residuals to identify the outliers
Akouemo and Povinelli 2017Use a predictive model (ANN), then analyse the residuals to identify the outliers
PaperPrediction model-based methods
Munir et al. 2019DeepAnT: CNN forecasting future values and compare with new data point
Hill and Minsker 2010Use autoregressive model to calculate a confidence interval
Zhang et al. 2012Use ARIMA model to calculate a confidence interval
Siffer et al. 2017SPOT and DSPOT: detect anomalies with extreme value theory
Xu et al. (2016, 2017)Use Student-t processes to compute the prediction
interval and update the model incrementally for each newly arrived data point
Ahmad et al. 2017Use the Hierarchical Temporal Memory (HTM) network to predict the new data point and update the model incrementally
Hundman et al. 2018Use a LSTM to calculate the predicted value and compare it with the new data point.

2.2. Density-based methods

In these methods, an anomaly is detected at a data point Xt if less than k neighbors are contained within a threshold of + or – a value R (this is calculated inside a rolling window, since we are dealing with time series).

Source

For example, for a sliding window of length = 11, a number of neighbors k = 3 and a threshold R = 0.5, the points S4 and I10 are not outliers but O13 is an outlier. However when considering the window at time t = 17, the data point I13 is not an outlier anymore.

PaperDensity-based methods
Angiulli and
Fassetti (2007, 2010)
Use a sliding window to detect outliers in streaming time series.
Ishimtsev et al. 2017Use a sliding window to detect outliers in streaming time series.

2.3. Histogramming

These methods consists in building a histogram representation of the time series, then detecting the points whose removal from the time series produces a histogram with lower error than the original.

Source

Jagadish et al. 1999
Muthukrishnan et al. 2004

3. Point outlier detection in multivariate time series

3.1. Univariate techniques

Since a multivariate time series is composed of more than one time-dependent
variable, a univariate analysis can be performed for each variable to detect univariate point outliers, without considering dependencies that may exist between the variables. The same methods discussed in section 2 can be applied.

However, ignoring the correlation dependencies in multivariate time series lead to a loss of information. One solution consists in applying dimensionality reduction techniques to simply the multivariate time series into a lower dimensional representation. The new multivariate time series is composed of uncorrelated variables and univariate detection techniques can then be applied.

Source
PaperDimensionality reduction for multivariate time series
Papadimitriou et al. 2005Apply PCA to uncorrelate the time series, then use an AR prediction model.
Galeano et al. 2006Reduce the dimensionality with projection pursuit, then use univariate statistical tests on each projected univariate time series [Chen and Liu 1993; Fox 1972]
Baragona and Battaglia 2007Use Independent Component Analysis (ICA), then calculate the mean and standard error on the reduced data to identify outliers
Lu et al. 2018Reduce the input multivariate time series into a single time-dependent variable, then identify outliers by looking at the correlation between 2 adjacent data points
Shahriar et al. 2016Reduce the input multivariate time series into a univariate series, then calculate the expected value to detect outliers

3.2. Multivariate techniques

These methods take into account the dependencies between the variables of a multivariate time series.

Source

3.2.1. Model-based

The model-based multivariate methods estimate an expected value and compare it with the actual value. There are 2 main types:

  • estimation model: use past, current and future values (typically, autoencoder)
  • prediction model: only use past values (more adapted for detecting outliers in a streaming fashion)
Source
PaperEstimation model-based multivariate methods
Sakurada and Yairi 2014Autoencoder is used to reconstruct the original time series but fails to reconstruct an outlier.
Kieu et al. 2018Extract features with sliding windows before applying the autoencoder (to account for the temporal dependencies)
Su et al. 2019Use a variational autoencoder + gated recurrent unit to reconstruct the expected value
Zhou et al. [2019, 2018b]Use a non-parametric model to estimate the expected value.
PaperPrediction model-based multivariate methods
Zhou et al. 2016Contextual Hidden Markov Model (CHMM) is used to estimate the expected value incrementally.
Munir et al. 2019DeepAnt: use a CNN prediction model to detect point outliers in multivariate time series

3.2.2. Dissimilarity-based

These methods consists in measuring the dissimilarity between 2 multivariate points to identify outliers.

Cheng et al. [2008, 2009]
Li et al. [2009]

3.2.3. Histogramming

It’s a generalisation of the univariate case where the aim is to detect the vectors that should be removed so that the compressed representation (histogram) of the remaining data is improved.

Muthukrishnan et al. [2004]

4. Subsequence outliers

The aim is to identify a set of consecutive points that jointly behave unusually. This is more challenging than point outlier detection.

4.1. Univariate time series

Source

4.1.2. Discord detection

This can be done using the HOT-SAX algorithm.

Source
PaperDiscord detection for univariate subsequence outlier
Keogh et al. 2005, 2007Compare each subsequence with the other (Brute-force + HOT-SAX)
Lin et al. 2005Compare each subsequence with the other (Brute-force + HOT-SAX)
Bu et al. 2007HOT-SAX
Fu et al. 2006HOT-SAX
Li et al. 2013HOT-SAX
Sanchez and Bustos 2014HOT-SAX
Chau et al. 2018HOT-SAX
Buu and
Anh 2011
HOT-SAX
Liu et al. [2009]HOT-SAX
Chau et al. [2018]HOT-SAX
Senin et al. [2015]variable-length discords
Keogh et al. 2001Piecewise Aggregate Approximation (PAA)

4.1.2. Dissimilarity-based detection

These techniques are based on the direct comparison of subsequences using a reference of normality.

Source

The reference of normality can be either the same time series, an external time series or the previous subsequence.

PaperDissimilarity-based detection for univariate subsequence outlier
Chen and Cook [2011]Same time series
Chen et al. [2013]Same time series
Izakian and Pedrycz [2013]Same time series
Silva et al. 2013Same time series
Wang et al. [2018]Same time series
Ren et al. [2017]Same time series
Moonesinghe and Tan 2006Same time series
Jones et al. [2016]External time series
Carrera et al. [2016, 2019]External time series
Longoni et al. 2018External time series
Wei et al. [2005]Previous subsequence
Kumar et al. [2005]Previous subsequence

4.1.3. Prediction Model-based

These methods intend to build a prediction model that captures the dynamics of the series using past data and thus make predictions of the future. Subsequences that are far from those predictions are flagged as outliers.

Munir et al. [2019]: use a CNN

4.1.4. Frequency-based

A subsequence is an outlier if it does not appear as frequently as expected.

PaperFrequency-based detection for univariate subsequence outlier
Keogh et al. [2002].
Rasheed and Alhajj [2014]

4.1.5. Information Theory

The methods assume that a subsequence that occurs frequently is less surprising and thus carries less information than a rare subsequence. Therefore, the aim is to find infrequent but still repetitive subsequences with rare symbols, using the same time series as the reference of normality.

4.1. Multivariate time series