Structural learning

Structural learning is the process of using data to learn the links of a Bayesian network or Dynamic Bayesian network.

Structural learning

The data used for learning must not change during the learning process.

Bayes Server supports the following algorithms for structural learning:

  • PC
  • Search & Score
  • Hierarchical
  • Chow-Liu
  • Tree augmented Naive Bayes (TAN)

PC algorithm

A constraint based algorithm, which uses marginal and conditional independence tests to determine the structure of the network.

This algorithm supports the following:

  • Learning with discrete and continuous variables, including hybrid networks with a mixture of discrete and continuous.
  • Learning both Bayesian networks and Dynamic Bayesian networks. (e.g. Learning from Time Series or sequence data).
  • Learning with missing data (discrete or continuous).

The algorithm does not currently support the automatic detection of latent variables.

Significance level

The significance level is used during tests for independence and conditional independence by the algorithm to determine whether links should exist or not. A value of 0.05 or 0.01 is typical.

Max conditional

This controls the number of conditional nodes that are considered when the algorithm performs conditional independence tests.

A value of 3 means that when testing whether X and Y are independent, up to 3 other conditional variables will be considered in the tests.

Max temporal

This is the maximum order that should be considered for temporal links in dynamic Bayesian networks.

A value of 3 means that temporal nodes should not have links greater than order 3, i.e. up to 3 previous time steps.

Search & score algorithm

(since version 7.14)

The Search & Score algorithm performs a search of possible Bayesian network structures, and scores each to determine the best.

This algorithm currently supports the following:

  • Discrete variables.
  • Continuous variables.
  • Hybrid networks with both discrete ad continuous variables.
  • Learning with missing data.

Hierarchical

(since version 7.20)

This algorithm creates a hierarchy of nodes, built layer by layer. Each layer consists of a number of groups. Each group consists of similar variables as determined by the algorithm. Individual groups are connected via a parent discrete latent variable. Therefore the algorithm generates multiple latent variables, and latent variables themselves can be subsequently clustered in groups.

This algorithm currently supports the following:

  • Discrete variables.
  • Continuous variables.
  • Hybrid networks with both discrete ad continuous variables.
  • Learning with missing data.

Chow-Liu algorithm

(since version 7.12)

Creates a Bayesian network which is a tree. The tree is constructed from a weighted spanning tree over a fully connected graph whose connections are weighted by a metric such as Mutual Information.

This algorithm currently supports the following:

  • Discrete variables.
  • Continuous variables.
  • Learning with missing data.

Root

An optional argument which determines the root of the generated tree.

TAN algorithm

(since version 7.12)

The Tree Augmented Naive Bayes algorithm (TAN) builds a Bayesian network that is focused on a particular target T (e.g. for classification). The algorithm creates a Chow-Liu tree, but using tests that are conditional on T (e.g. conditional mutual information). Then additional links are added from T to each node in the Chow-Liu tree.

This algorithm currently supports the following:

  • Discrete variables.
  • Continuous variables.
  • Learning with missing data.

Root

An optional argument which determines the root of the generated tree.

Target

Determines the target for the learning algorithm. For example, the node which you are trying to predict.

Defining variables

In order to perform structural learning from a data set, you must have first defined your variables/nodes. If required, these can be automatically determined using the Add nodes from data feature.

Opening

With a Bayesian network or Dynamic Bayesian network open, the Structural learning window can be launched either by clicking the Add links from data menu item from the Link button on the Network tab of the main toolbar, or by clicking the Structural Learning button on the Data tab of the main window toolbar.

First the Data connection manager will be launched to choose a Data Connection followed by the Data tables and Data map windows in order to select tables, and map variables to columns.

Not all variables have to be mapped to data columns. Any variables that are not mapped are ignored during the learning process.

Learning

To start learning click the Run button. When learning is complete, a message box will be displayed.

The Cancel button, cancels the learning process, without producing a new candidate network.

If required link constraints can be defined before learning. The different types of link constraint are documented in BayesServer.Learning.Structure.LinkConstraintMethod in the Bayes Server API. The link constraint failure modes are documented in BayesServer.Learning.Structure.LinkConstraintFailureMode in the Bayes Server API.