performing anomaly detection
classification models
If the resulting model is a classification model, in order to perform anomaly detection,
we can simply predict which class unseen data belongs to (e.g. normal or anomalous)
using standard inference in Bayesian networks. i.e. we set evidence on our Bayesian
network, based on the unseen data, and then query the output variable (containing
labels normal and anomalous for example). In fact we will get a probability of membership
for each label, which tells us how anomalous certain data is.
log likelihood
If the result of learning is a model that does not contain information about the
anomalous data, we have a model which represents normal behaviour. We can use this
model to see how likely it is that unseen data could have been generated by this
model. This tells us how anomalous the unseen data is.
In order to do this, we first set evidence on the Bayesian network according to
our unseen data, in the same way as we would using the classification approach.
We then calculate the log-likelihood of the evidence entered.
The log likelihood is simply the log of the probability density function (pdf) for
the Bayesian network. It is no different to calculating the pdf of a Gaussian distribution,
and using this value to determine how unusual the data is. e.g. is someone unusually
tall. The lower the value of the log-likelihood, the more unsusual the data is.
We use the logged value, otherwise we may suffer from underflow, even when using
double precision arithmetic, especially in large networks, networks with continuous
variables, and networks for time series (Dynamic Bayesian networks).
Although the base of the logarithm used is not always mandated, we use the natural
logarithm. If the Bayesian network contains any exponential distributions (such
as Gaussians) they are then in the same form.
An intelligent system can monitor the log likelihood value, in order to oversee
the health of components in a system. The value may degrade over time, indicating
a potential failure, and is therefore extremely useful in early warning systems.
See Image 1 for an example.