In this article we describe a technique called **Virtual accuracy** or **soft accuracy**.

Simply put, it enables us to make use of Bayesian network models that perform better or worse under certain conditions.

This is useful when the overall accuracy of the model is not sufficient, but under certain (soft) conditions it performs very well. Soft here means that the conditions need not be hard such as Country=USA, but can be soft such as P(Country=USA, DayOfWeek=Saturday) = 0.7

For example, given a health data set, a model may only give us 80% accuracy overall, but for certain groups of people / under certain conditions, it may give us an accuracy of 95%.

When we talk about performance of a model, we mean the accuracy of the classification or regression problem on an unseen (test/validation) set of data.

With Bayes Server, the accuracy of a classification model (predicting a discrete variable) or regression model (predicting a continuous variable) can be measured in the usual ways.

For **classification** models we can use the overall **Accuracy metric** for the model, i.e. what percentage of predictions were correct, along with
a confusion matrix, which tells us where the model performs well and badly (but only in simple 'hard' terms such as Country=USA).

For **regression** problems we can use standard metrics such as the **Mean absolute error (MAE)**, the **Root mean squared error (RMSE)** or the **R squared statistic**.

Virtual/soft accuracy is a bit like a probabilistic version of a confusion matrix that also considers multiple variables at once, and crucially it can also be used on regression problems.

Virtual accuracy simply means that when a model performs better under certain conditions (evidence scenarios) , we can sometimes make use of this fact.
It could be simple conditions such as a particular **Country** or **Time of day** or it be could the combination of **Country** and **Time of day** etc...

A more sophisticated approach however, is to let the algorithm automatically determine the conditions using one or more clusters (discrete latent variables).

As well as introducing one or more latent variables, we collect metrics for them post training, so that we then know how each cluster (state in the latent variable) performs.

Once a model is built, we can then use this to predict the probability of membership of each cluster given the current evidence. Given these probabilities we can merge the metrics associated with each cluster (weighting by the probability) to give us a sense of how performant the model is under these conditions.

Returning to the example of health data, we can then looks at hundreds of thousands of patient records, and perform certain actions for those where the conditional performance of the model is high.

A cluster variable is simply a discrete variable with a number of states (clusters). The key difference to a normal variable is that is does not have any data mapped to it. It allows us to capture hidden patterns (features) in the data.

For more detail please see the article on latent variables.

In order to determine the metrics per cluster, which can be performed as a post training step, we use the following data and predictions based on the test data:

- The probability of membership of each cluster. i.e. P(Cluster1|evidence), P(Cluster2|evidence), etc...
- The actual target value (the value we are trying to predict)
- The predicted target value (the value the model predicts)

Note that calculating the probability for each cluster can be done in the same way as predicting any other discrete variable.

Then for each cluster **c** at a time, we calculate the standard performance metrics weighted by **P(c | evidence)**

The Bayes Server API allows a weight to be given to each record when calculating either classification or regression metrics. For more information see the

`RegressionStatistics`

and`ConfusionMatrix`

classes.

What we end up with is one or more metrics for each cluster. (We can of course also calculate the overall metrics for the model).

Note that Bayes Server also supports latent variables in Time Series models, and therefore the same approach can be used.

If you intend to ignore predictions on data that are deemed anomalous by the model, it is useful to also ignore anomalous test data when calculating the overall accuracy metrics and the cluster based metrics.

This can be done easily by excluding data from the metric calculations that have a log-likelihood (or cdf(og-likelihood)) under a certain threshold.

Please see the article on anomaly detection for more information on how to calculate these anomaly scores.

When we come to using the model on new data, we can again calculate the probability of membership of each cluster **c** given the evidence which we will denote **P(c | evidence)**,
and then merge the metrics for each cluster weighted by **P(c | evidence)**.

Bayes Server includes methods in the

`CrossValidation`

class to perform these combinations.

We then have metrics that are conditional on this particular scenario, and we can decide to take action only if they are sufficiently high.