Confusion matrix

A Confusion Matrix is used to evaluate the performance of a classification model.

A classification model, is simply a model which is used to predict a discrete variable.

There are cells in the matrix for each combination of actual vs. predicted values. Each cell displays a count, which is the number of times the predicted value matched the actual value.

By changing the Display Value a probability can be displayed instead.

Probability given actual

Probability given actual is the cell count divided by the row count (total number of that actual value in the data).

For binary classifiers, the terms recall rate/sensitivity/true positive rate/specificity/true negative rate are sometimes used to refer to particular cells.

Probability given predicted

Probability given predicted is the cell count divided by the column count (total number of that predicted value).

For binary classifiers, the terms precision/positive predicted value, are sometimes used to refer to particular cells.

An example confusion matrix is shown below.

Confusion matrix

The cells that lie on the diagonal from top left to bottom right, represent correct predictions, while off diagonal are incorrect predictions. The diagonal elements are surrounded by a black border for easy identification.

A confusion matrix can be generated from the Statistics tab in the Batch Query window.

The actual value, can be included in the output, via an Information Column, defined in the Data map window.

An alternative way of accessing the actual value, is to map the predicted variable in the Data Map window, and use retracted evidence so that the prediction does not use the actual value.