Structural learning is the process of using data to learn the links of a Bayesian network or Dynamic Bayesian network.
The data used for learning must not change during the learning process.
Bayes Server supports the following algorithms for structural learning:
You can chain algorithms together (e.g. Search & Score + Clustering). Simply finish the structural learning wizard multiple times, each time importing link constraints from a previous run.
(since version 8.4)
Automatically determines an appropriate discrete latent variable to add to the model, which can model hidden patterns. Similar to a cluster model or a mixture model. Supports networks with discrete and/or continuous variables, as well as missing data.
See our article on mixture models & clustering for more information.
This is an unsupervised algorithm, and in its most basic form is similar to a mixture model (a.k.a. cluster model). However this algorithm is more flexible, as an arbitrary structure can be mixed, and include both discrete and continuous variables. The appropriate number of clusters is automatically determined by the algorithm robustly using cross validation.
A constraint based algorithm, which uses marginal and conditional independence tests to determine the structure of the network.
This algorithm supports the following:
The algorithm does not currently support the automatic detection of latent variables.
An algorithm for fast recovery of sparse causal graphs | P Spirtes, C Glymour - Social science computer review, 1991
Causation, prediction, and search. Adaptive computation and machine learning | P Spirtes, C Glymour, R Scheines - 2000 - MIT Press, Cambridge
The significance level is used during tests for independence and conditional independence by the algorithm to determine whether links should exist or not. A value of 0.05 or 0.01 is typical.
This controls the number of conditional nodes that are considered when the algorithm performs conditional independence tests.
A value of 3 means that when testing whether X and Y are independent, up to 3 other conditional variables will be considered in the tests.
This is the maximum order that should be considered for temporal links in dynamic Bayesian networks.
A value of 3 means that temporal nodes should not have links greater than order 3, i.e. up to 3 previous time steps.
(since version 7.14)
The Search & Score algorithm performs a search of possible Bayesian network structures, and scores each to determine the best.
This algorithm currently supports the following:
A Bayesian method for the induction of probabilistic networks from data | GF Cooper, E Herskovits - Machine learning, 1992
A tutorial on learning with Bayesian networks | D Heckerman - Learning in graphical models, 1998
Optimal structure identification with greedy search | DM Chickering - Journal of machine learning research, 2002
(since version 7.20)
This algorithm, which is unique to Bayes Server, creates a hierarchy of nodes, built layer by layer. Each layer consists of a number of groups. Each group consists of similar variables as determined by the algorithm. Individual groups are connected via a parent discrete latent variable. Therefore the algorithm generates multiple latent variables, and latent variables themselves can be subsequently clustered in groups.
This algorithm currently supports the following:
An example output from the algorithm is shown below.
(since version 7.12)
Creates a Bayesian network which is a tree. The tree is constructed from a weighted spanning tree over a fully connected graph whose connections are weighted by a metric such as Mutual Information.
This algorithm currently supports the following:
An optional argument which determines the root of the generated tree.
(since version 7.12)
The Tree Augmented Naive Bayes algorithm (TAN) builds a Bayesian network that is focused on a particular target T (e.g. for classification). The algorithm creates a Chow-Liu tree, but using tests that are conditional on T (e.g. conditional mutual information). Then additional links are added from T to each node in the Chow-Liu tree.
This algorithm currently supports the following:
See the Chow-Liu algorithm.
An optional argument which determines the root of the generated tree.
Determines the target for the learning algorithm. For example, the node which you are trying to predict.
Chaining of algorithms provides extra flexibility. For example you could start with a Chow-Liu tree which can be constructed very quickly. You may then wish to use the Search & Score algorithm to augment the tree with some more complex structure.
You could also start with a Search & Score, then use the Clustering algorithm.
When chaining, be sure to:
In order to perform structural learning from a data set, you must have first defined your variables/nodes. If required, these can be automatically determined using the Add nodes from data feature.
With a Bayesian network or Dynamic Bayesian network open, the Structural learning window can be launched either by clicking the Add links from data menu item from the Link button on the Network tab of the main toolbar, or by clicking the Structural Learning button on the Data tab of the main window toolbar.
First the Data connection manager will be launched to choose a Data Connection followed by the Data tables and Data map windows in order to select tables, and map variables to columns.
Not all variables have to be mapped to data columns. Any variables that are not mapped are ignored during the learning process.
To start learning click the Run button. When learning is complete, a message box will be displayed.
The Cancel button, cancels the learning process, without producing a new candidate network.
If required link constraints can be defined before learning. The different types of link constraint are documented in
BayesServer.Learning.Structure.LinkConstraintMethod
in the Bayes Server API. The link constraint failure modes
are documented in BayesServer.Learning.Structure.LinkConstraintFailureMode
in the Bayes Server API.