# Online learning

## Introduction

Online learning, also known as adaptation, allows distributions in a Bayesian network to be updated incrementally. For example a network can adapt to an incoming stream of data.

This process is different to batch parameter learning which iteratively learns from historic data, although batch parameter learning can be used to train an initial network which is then adapted with online learning.

Currently online learning supports discrete (non temporal) nodes, missing data, noisy nodes and discrete latent nodes. It does not yet support continuous nodes and temporal nodes, although they can be included in the network.

## Experience tables

Online learning of discrete nodes requires the use of Experience tables and optionally fading tables (discussed later).

Before online learning can be performed, each distribution that you wish to update must have an experience table associated with it. Adding or editing an experience table
can be done in the same way as a standard probability distribution for a node, just change the **Kind** drop down to **Experience** and edit in the usual way.

An experience value is required for each parent combination for a node. The value reflects the amount of prior knowledge we have about the associated probabilities. For example, a value of 1000 says that we have a fair bit of confidence about the associated probability values, whereas a value of 1 says that have little confidence in the probabilities.

An experience value of zero, indicates that no adaptation should be performed for that parent configuration.

When the experience values are multiplied by the probability values for a node, this becomes a Dirichlet distribution which is used during the online learning process. This is equivalent to a standard fully Bayesian approach.

When a node has an experience table assigned, you will see an **E** symbol appear in the node toolbar (F for a fading table) as shown below.

## Fading tables

In addition to an experience table, a fading table can optionally be added to enable previous values to be given less importance. i.e. the importance of previous knowledge gradually fades away.

A fading value between (0, 1] is required for each parent combination. A value of 1 means no fading, while lower values (e.g. 0.99, or 0.9) mean fading is applied.

## Initial experience

The initial values for experience tables can be set in a number of ways. They can be set manually (or through code) to reflect the confidence in the associated probability values,
or they can be initialized during batch learning. Simply enable the **SaveHyperParameters** option during parameter learning and
experience tables will automatically be assigned to learned nodes based on the training data.

## Algorithm

The algorithm employed for adaptation is based on **Sequential updating of conditional probabilities on directed graphical structures,
David J. Spiegelhalter, Steffen L. Lauritzen, 1990**.

It is expected that sometimes experience table values can decrease following adaptation. This happens when an event that has a high prior probability is contradicted by the evidence used during adaptation.