Bayes Server Logo
Tick Classification / Regression Tick Clustering / Mixture models Tick Density estimation
Tick Time series prediction Tick Anomaly detection Tick Decision Support
Tick Multivariate models Tick Learning with missing data Tick Probabilistic reasoning

anomaly detection

This article describes how to perform anomaly detection using Bayesian networks.

An anomaly detection walkthrough using Bayes Server™ is also available.

what is anomaly detection?

Anomaly detection, also known as outlier detection, is the process of identifying data which is unusual.

For example, anomaly detection can be used to give advanced warning of a mechanical component failing (system health monitoring), can isolate components in a system which have failed (fault detection), can warn financial institutions of fraudulent transactions (fraud detection), and can detect unusual patterns for use in medical research. It is also often used as a pre-processing step to remove unusual data, before building statistical models.

Anomaly detection can also be used to detect unusual time series. For example, an algorithmic trader may wish to know when a multivariate time series is abnormal, and use that knowledge to gain a competitive advantage.


Anomaly detection

Image 1: a multivariate time series model, showing a component degrading with time.


anomaly detection with Bayesian networks

In this article we explore the use of anomaly detection in intelligent systems, and how it can be performed with Bayesian networks.

Bayesian networks are well suited for anomaly detection, because they can handle high dimensional data, which humans find difficult to interpret. While some anomalies are clearly visible by plotting individual variables, often anomalies are far more subtle, and are based on the interaction of many variables.

Note that mixture models which are often used for anomaly detection, can be represented easily as Bayesian networks, and in fact can be extended to build more complex models.

Bayesian networks are essentially probability distributions, albeit sometimes complex ones. We can therefore use many of the techniques we have used for years with standard probability distributions, such as the probability density function (pdf). We can also use Bayesian networks for classification, and it is these two technique we use to perform different types of anomaly detection in this article.

Bayesian networks also have the following properties, useful for anomaly detection:

  • Support for both discrete and continuous variables
  • Support for high dimensional models, which humans are bad at interpreting.
  • Allow missing data (both during learning and prediction/anomaly detection)
  • Models can contain data which is not time related, and also time series data, all within the same model.

creating a model

In order to perform anomaly detection with Bayesian networks, the first thing we need is a model. Often an anomaly model is built from data, however there is no reason why experts could not manually specify the parameters for simple models. While there are no restrictions on the structure of Bayesian networks that can be used for anomaly detection, there are differences in how they are built.

Supervised

The supervised approach requires that your data set contains data which is labelled either normal or anomalous/unusual (note that there can be multiple labels within each category). If you have sufficient data in each category, you can then build a classification model.

Problems with this approach occur if:

  • There is insufficient data labelled anomalous.
  • It is too difficult to manually identify anomalous data. Perhaps because the data is high dimensional, or is a complex time series or both.
  • It is too expensive to label cases manually. E.g. the costs of experts required to categorise the data.
  • Anomalies tend to be different in nature each time they occur, and therefore past anomalies do not predict future anomalies well. In practise this is often the case, therefore unsupervised alternatives exist.

Semi-supervised

The semi supervised approach uses a dataset containing only normal data. This is termed semi supervised, since the anomalous data (if any) has been removed before learning. Once a model has been constructed, we can use likelihood techniques described later, to perform detection on unseen data.

Unsupervised

Unsupervised techniques automatically build a model of the normal data, from data that contains both normal and anomalous data. The process therefore involves automatically determining which data is anomalous, or which parts of a learnt model are anomalous, in order to exclude them or label them in the final model.


performing anomaly detection

classification models

If the resulting model is a classification model, in order to perform anomaly detection, we can simply predict which class unseen data belongs to (e.g. normal or anomalous) using standard inference in Bayesian networks. i.e. we set evidence on our Bayesian network, based on the unseen data, and then query the output variable (containing labels normal and anomalous for example). In fact we will get a probability of membership for each label, which tells us how anomalous certain data is.


log likelihood

If the result of learning is a model that does not contain information about the anomalous data, we have a model which represents normal behaviour. We can use this model to see how likely it is that unseen data could have been generated by this model. This tells us how anomalous the unseen data is.

In order to do this, we first set evidence on the Bayesian network according to our unseen data, in the same way as we would using the classification approach. We then calculate the log-likelihood of the evidence entered.

The log likelihood is simply the log of the probability density function (pdf) for the Bayesian network. It is no different to calculating the pdf of a Gaussian distribution, and using this value to determine how unusual the data is. e.g. is someone unusually tall. The lower the value of the log-likelihood, the more unsusual the data is.

We use the logged value, otherwise we may suffer from underflow, even when using double precision arithmetic, especially in large networks, networks with continuous variables, and networks for time series (Dynamic Bayesian networks).

Although the base of the logarithm used is not always mandated, we use the natural logarithm. If the Bayesian network contains any exponential distributions (such as Gaussians) they are then in the same form.

An intelligent system can monitor the log likelihood value, in order to oversee the health of components in a system. The value may degrade over time, indicating a potential failure, and is therefore extremely useful in early warning systems. See Image 1 for an example.


feedback

send feedback


website

facebook  Find us on facebook

Google Plus One  

legal

Privacy policy

Terms and conditions

Terms and Conditions of Sale

End User License Agreement

company

Bayes Server Ltd.  All rights Reserved. Copyright © 2012

Registered in England and Wales

Company number 6957059

VAT number 998440170

Secure