Comparison queries are useful for comparing one set of probabilities against another. This is often used to automatically derive insight from a network.
Often the absolute probabilities returned from a standard query are less useful than the change or ratio when compared to the overall population or subset of interest.
For a more automated approach consider using Auto insight.
A comparison can be made against the network with no evidence set, or against base evidence. Base evidence (if required) can be set on the Comparison tab of the Query group on the main ribbon toolbar. When a comparison is made against no evidence you are comparing the current evidence against the overall population. However when you compare against base evidence, you are comparing against a particular subset of the population.
There are two types of comparison available. The first is a difference comparison (detects large patterns).
This calculates the current probability minus the probability with no evidence or the base evidence. The second is a lift comparison (detects anomalous patterns) which calculates the ratio of the current probability to the probability with no evidence of the base evidence.
A difference comparison is very useful for discrete probability values because it takes the likelihood of that scenario (support) into account. The lift (ratio) comparison will often give significant results when the overall probability is very low, so may not be of interest. A difference comparison on the other-hand will implicitly take this support into account. For example a probability that goes from 0.002 to 0.004 has a high lift value of 2 but a low difference of 0.002, whereas a probability that goes from 0.2 to 0.4 has the same high lift value of 2 but now has a high difference value of 0.2. It is usually the latter that is of more interest.
Note that you can check the support in a dataset for a particular evidence scenario, by clicking the Data Count button on the Data tab of the main ribbon toolbar.
As an example of using a comparison query, consider a classification model which has been learnt from data to predict the probability of a customer purchasing a product. To gain insight about what is different between those that purchase and those that do not, we can use a comparison query. First set the purchase variable to False, so that the network shows the probabilities for when a customer does not purchase. Then, capture this evidence as base evidence using the buttons on the Comparison tab of the Query group on the main ribbon toolbar. Finally, change the purchase variable to True, and change the Comparison mode to Difference. Any significant changes will be highlighted by colored arrows in the network.