Add nodes from data

Introduction

Nodes can be added to a network based on values in a data source such as a database or spreadsheet.

Support

Discrete variables
Continuous variables
Discretization
Missing data
DBN variables

info

Add nodes from data does not add links (which can be done manually or with Structural Learning), and does not define any parameters (which can be done manually or with Parameter Learning).

Add nodes from data

Defining variables

Nodes are added to a network, by defining variables.

To change the options for more than one variable at once, select the variables in the grid, and click Edit Selected.

By default, each defined variable will generate a new node, but if two or more variable definitions share the same node name, a single node will be generated containing multiple variables. It is also possible to add variables to existing nodes.

Discretization

Sometimes it is useful to discretize continuous data, generating a discrete variable, where each state represents a continuous interval.

Discrete variables, whose state values are intervals, can be created manually, but often it is useful to generate them from a data source. To generate a discretized variable, change the Discretization option, to use one of the discretization algorithms. In addition, update the DiscretizationOptions if required.

The following discretization algorithms are available:

Clustering - uses a probabilistic clustering algorithm to determine the intervals, based on cluster centers
Equal frequencies - defines intervals such that each one represents a similar number of items from the data source.
Equal ranges - defines intervals by splitting the range of continuous values into equal amounts.

Add nodes from data

Introduction​

Support​

Defining variables​

Discretization​

Introduction

Support

Defining variables

Discretization