Add nodes from data

Nodes can be added to a network based on values in a data source such as a database or spreadsheet.

Opening

With a Bayesian network or Dynamic Bayesian network open, either:

  • Click the Add nodes from data menu item, from the Node drop down on the main toolbar, Network tab, Editing group.
  • Click the Add nodes button on the main toolbar, Data tab, Network group.

This will launch the Data tables window.

In the Data tables window, select the data you wish to generate nodes from. For more information about selecting data, see the help for the Data tables window. Once the table or table(s) have been selected, click Ok. This will launch the Data Map window. This window allows you to map data to variables in the network.

Clicking Ok on the Data Map window will launch the Add nodes from data window, shown below.

Add nodes from data

Defining variables

Nodes are added to a network, by defining variables. By default each defined variable will generate a new node, but if two or more variable definitions share the same node name, a single node will be generated containing multiple variables. It is also possible to add variables to existing nodes.

To change the options for one or more variables, select the variables in the grid, and change their options in the property grid in the right hand pane. When multiple items are selected, some properties will not be visible.

To exclude certain variables, either un-check the check box in the grid, or change the IsEnabled option in the data grid.

To find help on a particular option, select it in the property grid, and the help will be displayed at the bottom of the property grid.

Discretization

Sometimes it is useful to discretize continuous data, generating a discrete variable, where each state represents a continuous interval.

Discrete variables, whose state values are intervals, can be created manually, but often it is useful to generate them from a data source. To generate a discretized variable, change the Discretization option in the property grid, to use one of the discretization algorithms. In addition, update the DiscretizationOptions in the property grid if required.

The following discretization algorithms are available:

  • Clustering - uses a probabilistic clustering algorithm to determine the intervals, based on cluster centers
  • Equal frequencies - defines intervals such that each one represents a similar number of items from the data source.
  • Equal ranges - defines intervals by splitting the range of continuous values into equal amounts.