# Cluster count

**Cluster Count** determines the number of clusters (states) for a discrete latent variable (cluster / mixture) in a Bayesian network.

The process uses cross validation, and evaluates the log-likelihood for a series of different cluster counts.

## Opening

With a Bayesian network or Dynamic Bayesian network open that contains one or more discrete latent variables,
click the **Cluster Count** button on the main window toolbar tab entitled
**Data**.

## Cluster Count

In order to determine a suitable number of clusters, cross validation is used. The data is split randomly into a configurable number of partitions. For each partition p, a models is learned on (data - p), and the log-likelihood is evaluated on the unseen data p. The log-likelihood is then summed over each partition, resulting in an overall score.

This score is calculated for each configurable cluster count, and the scores plotted.

##### NOTE

A higher score is preferred, especially if it is part of a smooth curve. Any areas that exhibit volatility should usually be ignored.

Once a suitable number of clusters has been determined, close the **Cluster count** window and update the number of states in the cluster variable.

A cluster count of 1 is included by default to test the hypothesis that the cluster variable is not required at all.