From version 6.8 Bayes Server has an additional algorithm for the initialization of Bayesian networks during parameter learning.

We now have 3 different ways a Bayesian network can be initialized during parameter learning.

Following initialization, parameter learning runs the expectation maximization (EM) algorithm. The new Clustering initialization algorithm typically requires fewer EM iterations to converge, and produces better solutions.


Visualization

An illustration of this new initialization algorithm is shown below. Although the data (shown in blue) is clearly fictitious, it demonstrates the power of this technique. The orange ellipses are a visualization of the Bayesian network model trained on this data.

Although the data contains only X and Y data, we are using what is called a latent variable so that instead of just one ellipse, we can have many, and the learning algorithm will fit them for us, as shown below.



Although the data in this example is only 2D, often we build models in higher dimensions, and the initialization algorithm works in the same way.


Choosing the number of clusters

A question often asked is how many clusters (ellipses) should we choose.

A common approach is to divide your data into a training and test set. Then learn a number of models on the training data varying the number of clusters between a range.

For each resulting model, then calculate the log-likelihood of the model on the unseen test data, and choose the one that has the highest value. This will choose a model without over-fitting.

It is also common to use an additional validation set, so that the cluster count which is chosen based on the test data is subsequently evaluated on a further unseen data set.


Generative model

Bayesian networks are a form of generative model. This means that once we have trained a model, we can sample data from it. It is often useful to visualize these samples to better understand our model.

The image below shows some sample data generated from our model.


More complicated models

The Bayesian network used for this example is shown below.

This model only has a single discrete latent variable, however Bayesian networks can contain multiple latent variables (discrete or continuous).

We can even have layers of latent variables.



Feedback

Send feedback