# Data sampling

## Introduction

Data Sampling is the process of generating data from a Bayesian network or Dynamic Bayesian network. Some uses for Data Sampling are:

- Generating data to understand and visualize a network.
- Generating test data.
- Understanding how the log-likelihood varies.

## Log Likelihood

When the **Log Likelihood** option is **true**, an
additional column is added to the sampled data, which reports the
Log likelihood of
the generated sample.

[!IMPORTANT] When evidence is fixed, each case has an associated weight that can be output if necessary. The log-likelihood output however is unweighted. i.e. it reflects the log-likelihood of the case as if it had a weight of 1.

## Weight / Log(weight)

Bayes Server supports sampling data when certain evidence is fixed. When evidence is fixed, each sample has an associated weight, which is the likelihood of the fixed evidence for the current sample. This indicates how likely it is that the fixed evidence could have occurred with this sample.

To output the weight, ensure the **Weight** options is **true** or the **Log Weight** options is **true**.

Log weight is useful when the likelihood of the fixed evidence can be very small, such as when sampling from time series and sequences. The log weight is simply log(weight) however is calculated in such a way that it does not suffer from underflow problems.

## Current Evidence

When the **Current Evidence** option is **true**, any evidence currently entered in the current Bayesian network or Dynamic Bayesian network will be used
in the data sampling process.

## Missing Data

By specifying a value between 0 and 1 (inclusive), in the missing data **Probability**
text box, a proportion of values will be randomly set to missing (null/unobserved). Optionally an additional minimum probability
can be specified in the **Probability (Min)** text box. When set, the missing data probability for each case varies randomly between the two specified probabilities.

The missing data mechanism used is Missing Completely At Random (MCAR).

## Export

Data can be easily exported.

For example, this is useful to understand how the log-likelihood varies.

## DBN Sampling

When sampling from networks with DBN variables (Dynamic Bayesian networks), the sample count does not equal the number of sequence rows generated, but rather the number of cases generated, each of which will have its own sequence.