Tutorial 6 - Anomaly detection

In this tutorial we will demonstrate how to use Bayesian networks to perform anomaly detection on un-seen data.

Anomaly detection is the process of detecting data which is considered unusual or represents fault conditions.

We will use a semi-supervised anomaly detection approach. This entails training a model with data that is considered 'normal'. The 'normal' model can then be used to test un-seen data to determine whether it is also considered 'normal', otherwise it is considered anomalous (unusual), and might represent a faulty system. A semi-supervised approach assumes that a model can be built on training data that is considered 'normal', i.e. we may have had to first remove 'abnormal' data from the training data set.

Bayesian networks can also be used for supervised anomaly detection, and unsupervised anomaly detection.

Supervised anomaly detection requires that each case (row) in the data is labeled in some way to identify it as normal or abnormal. There might be a single column with mutually exclusive states, some of which are normal, and some of which are abnormal. Alternatively there may be multiple columns each representing a particular known anomaly (fault), and the data flags each as either false or true. A classification model is then trained to predict abnormal states, and can then be used on un-seed data to predict the probability of each abnormal state. This approach may not be appropriate if there is insufficient data labeled anomalous, or if it is too expensive to label data, or anomalies of the same type are rare.

Unsupervised anomaly detection uses data that contains both 'normal' and 'abnormal' data. The training process will either need to build a model and automatically remove elements of the model which are deemed 'abnormal', or alternatively data considered anomalous can be automatically removed before training begins. In both cases Bayesian networks can be used to train the model, but an additional algorithm is required to determine which parts of the model or data should be discarded.

Anomaly detection network

The following concepts will be covered:

  • Parameter learning
  • Log likelihood prediction
  • Conflict measure
  • Batch queries

Bayes Server must be installed, before starting this tutorial. An evaluation version can be downloaded from the Downloads page

Companion video (No Audio)

The 'normal' model

For this tutorial we will use a Bayesian network mixture model as our 'normal' model. We could use a variety of different types of model for our 'normal' model, but whichever type we use, as we are building a semi-supervised model, it will not include nodes/variables representing particular anomalies (faults).

The model we are using in this tutorial uses inputs that are continuous, however the same techniques can be used with discrete variables. Similarly, we could use a Dynamic Bayesian network to detect anomalous (unusual) time series.

Rather than creating and learning a model from scratch, we will re-use the mixture model created in Tutorial 2 - Mixture model.

This model was learned from data, which for the purposes of this tutorial, we will consider normal. Therefore we consider the model a summary of 'normal' behavior.

This model only contains two continuous inputs in order to keep this tutorial simple, however models with far more continuous and/or discrete inputs can be created.

Opening the model

  • Launch Bayes Server, and on the Start page click the network entitled 'Tutorial 2 - Mixture model' in the Sample networks pane.

If the Start page is not set to display on start up, or has been closed, click the Start page button, on the View tab, General group.

Using the 'normal' model to test un-seen data

We will use our 'model' to test whether un-seen data is anomalous. We know that the un-seen data is considered 'normal' for the first 300 points, after which the system degrades over the next 100 points. Since we do not have outputs in this model representing anomalies (faults) we will use the log-likelihood statistic and conflict measure to detect anomalous behavior.

The un-seen data was collected over a period of time, however in this tutorial we are not using a time series model. Although this means that we are monitoring each case (row) in isolation, we can still compare the predictions over time, to see if any significant changes have occurred. If we had data where the interaction between data in time was important we could use a time series model and exactly the same techniques presented here.

Adding a data connection

Note: You can skip this step, and instead use the pre-installed Tutorial data connection (Walkthrough Data in earlier versions).

For convenience, we will use Microsoft Excel as the data source, however another database can be substituted.

Although Microsoft Excel is a convenient way of storing data, in practice we recommend using a database as the data source.

  • Select the data (including the header) in the un-seen data section and copy it to the clipboard (Ctrl+C).
  • Open Microsoft Excel and paste the data into a new Microsoft Excel spreadsheet (Ctrl+V).
  • Save the new spreadsheet.
  • In Bayes Server, click the Data Connections button on the Data tab, Data Sources group. This will launch the Data connection manager.
  • Click the New button on the toolbar. This will launch the Data connection editor.
  • In the list of data providers, select the appropriate Excel Driver for the version of Microsoft Excel you are using.
  • Next to the File Name text box, click the Ellipsis (...) button, and select the Microsoft Excel spreadsheet created in an earlier step.
  • Click the Test Connection button, to ensure the new data connection is working.
  • Click OK to add the new Data Connection.

Batch query

  • Click the Batch query button, on the Data tab, Learning group. This will launch the Data tables window.

  • In the Data Connection drop down, select the new Data Connection created in an earlier step, or the Tutorial data connection if you skipped that step. This should enable the Data drop down.

  • In the Data drop down, select the worksheet that contains the data. (If the data is on the first worksheet, select Sheet1$). If you are using the pre-installed Tutorial data connection, select Tutorial 6 - Anomaly.

  • Click the OK button. This will launch the Data map window.

  • In the Data map window, ensure that variable X has automatically been mapped to column X, and variable Y has automatically been mapped to column Y.

    The window should look like this:

    Anomaly detection data map

  • Click the OK button. This will launch the Batch query window.

  • In the query pane on the left hand side, ensure the following queries/information columns are checked.

    • LogLikelihood
    • Conflict
    • X
    • Y
  • Click the Start button on the Batch Query tab, Batch Query group. This outputs the predictions to the window.

    Instead of outputting to the window, you can also output the predictions to a database. This is essential if you are working with large datasets.

    The window should look like this:

    Anomaly detection batch query

Analysis

First we will chart the X and Y data to see whether a univariate analysis would have detected anything unusual. Then we will inspect the log-likelihood statistic and conflict measure, to see if our model detects anything unusual.

Charting the batch predictions

  • Select the Charts tab in the Batch query window.

  • In the X drop down, select the Case output column, and in the Y drop down, select the X output column.

    The window should look like this:

    Anomaly detection before plot X

  • Click the Plot button.

    The window should look like the image below. The plot of X, does not show anything unusual.

    Anomaly detection data plot X

  • Now change the Y drop down to the Y output column.

  • Click the Plot button.

    The window should look like the image below. The plot of Y, does not show anything unusual.

    Anomaly detection data plot Y

  • Now change the Y drop down to the LogLikelihood output column.

  • Click the Plot button.

    The window should look like this:

    Anomaly detection log likelihood plot

    The plot of LogLikelihood, clearly shows the system degrading after the first 300 points.

    The LogLikelihood statistic tells us how likely it is that this 'normal' model could have generated the data. Therefore, the lower the log-likelihood, the more unusual the data is.

  • Now change the Y drop down to the Conflict output column.

  • Click the Plot button.

    The window should look like this:

    Anomaly detection conflict plot

    The plot of Conflict, clearly shows the system degrading after the first 300 points.

    The Conflict measure tells us whether our data is in conflict, or contains rare cases. The more positive the value, the higher the conflict.

    Although there are odd points in the first 300 that have positive conflict values, it is only in the last 100 point that the conflict consistently generates increasingly worse positive values.

Unseen data

Case X Y
0 10.64540994 4.857540094
1 1.858277161 6.887043079
2 4.363390391 0.332331369
3 3.097652905 1.222215446
4 11.34812194 5.26208
5 7.978567267 2.331910403
6 1.677249743 8.889583371
7 0.943218544 8.516974678
8 0.505818394 8.093009397
9 4.02396812 -0.054061777
10 0.449396841 10.69805905
11 1.925362784 2.225469612
12 9.382223795 4.095529836
13 1.192765728 2.187396285
14 9.495751156 4.76827038
15 9.640764766 4.612566397
16 -0.202980787 9.209244123
17 2.901222654 5.405022924
18 9.157318954 4.103350532
19 8.35168605 4.554919915
20 1.613560356 6.631370206
21 1.926994636 8.900422442
22 2.354454631 1.511882463
23 1.165810694 8.489134043
24 2.723255757 7.022676705
25 0.096239083 10.28379366
26 3.353236614 0.194562003
27 1.096443647 7.004293945
28 2.030138208 7.313298685
29 0.912812243 8.972726928
30 1.411757337 9.012333121
31 1.722380631 1.513168977
32 1.980622554 1.91874769
33 11.92938793 5.640683196
34 7.892630709 2.004474804
35 11.79899669 6.404348843
36 1.959364056 1.630039906
37 0.655941827 7.659455965
38 0.837634148 8.11473975
39 7.827741997 3.170079565
40 9.483674848 2.995045334
41 9.880186904 3.647213739
42 0.69473194 7.582355976
43 9.705443701 4.100001097
44 7.321568043 2.31994677
45 0.043212253 7.928441742
46 0.151648126 9.339952963
47 2.392860022 2.684013809
48 0.608115765 1.399542366
49 0.92509319 11.03247629
50 0.529211664 9.117501375
51 1.314429494 7.016502891
52 8.610005989 2.270471535
53 7.593098031 3.238838179
54 6.563351703 2.753616317
55 0.363676232 9.007170744
56 8.035121368 2.16715187
57 2.4461859 5.006935115
58 10.58988369 4.667900088
59 7.962888541 2.030267777
60 1.394271653 0.776434763
61 8.919205963 3.349914931
62 1.605839528 5.892637225
63 2.469012826 6.241039419
64 1.552044914 9.69524868
65 4.126945892 1.332079416
66 6.61255702 1.750765756
67 1.329355651 0.984670245
68 0.596772113 9.520527654
69 9.595726459 3.893562041
70 8.718174966 2.122822698
71 0.252294333 11.2012356
72 2.571312537 1.932566618
73 1.973336551 3.323971902
74 1.107875811 9.356666422
75 7.738861864 1.330291127
76 2.662068223 7.918250496
77 9.069725719 4.120378852
78 2.237524163 8.663450259
79 2.58268715 1.814570464
80 1.268099256 6.918092698
81 -0.077544644 9.111106523
82 0.991149289 7.7759675
83 1.508756683 6.940412064
84 1.497388477 8.21675735
85 7.621274204 2.57968742
86 1.541571333 8.403780026
87 0.832649678 5.305114876
88 2.596940071 1.63555079
89 1.197426834 9.896596279
90 0.970888215 6.93427585
91 3.14107854 1.733570738
92 0.497437604 3.702207414
93 9.378301347 3.678587253
94 9.246780414 3.167133013
95 8.137931726 2.168996672
96 7.567729836 3.147517874
97 2.056864103 6.652487898
98 3.214803367 1.406952645
99 7.958748891 1.958389643
100 -1.075995143 10.82860161
101 1.538955628 8.010719809
102 9.971018568 2.547467054
103 1.089814325 7.940758239
104 1.691678828 2.48796291
105 2.948813187 3.109591206
106 2.816940563 2.553912046
107 1.609333982 8.906279687
108 12.23964811 6.866708183
109 1.869289258 5.81465351
110 2.800866324 3.647243136
111 9.998691521 4.916832179
112 3.202558365 3.658850659
113 7.341090146 2.600030465
114 9.603216778 3.475474707
115 0.778843917 7.701408274
116 3.903579275 2.386858286
117 3.272121775 -1.258435029
118 0.845985811 9.356564283
119 2.773324516 2.018863181
120 0.363338327 9.030498427
121 7.274326176 2.00813589
122 -0.232168851 0.75819081
123 0.624916354 7.182665146
124 1.864412119 7.483043473
125 1.127949379 1.817249697
126 1.311489591 10.64571484
127 0.855615901 7.52396499
128 0.737370613 7.661567784
129 -0.335857018 8.661735704
130 8.26829903 1.994174804
131 0.885831539 8.005833852
132 2.263804514 0.900506919
133 1.595202236 2.383355554
134 1.379520226 2.322147721
135 9.340028941 4.569461252
136 1.998330026 8.405411261
137 1.126673493 7.131763168
138 2.237991119 0.656609729
139 8.819604725 3.327197586
140 0.322189736 7.6844953
141 -0.013693379 5.867934081
142 1.180110929 9.380480705
143 2.237607503 6.081749579
144 0.736478899 7.683126059
145 2.579401699 6.753979566
146 1.270744987 7.323909445
147 0.832539522 7.308522359
148 1.292116251 7.841174495
149 9.200021741 3.280475998
150 1.551819057 9.830378805
151 0.95086449 8.231677319
152 2.11604569 1.030092648
153 5.495471637 1.137089964
154 -0.293247271 11.4356719
155 1.256007275 9.723557168
156 2.901983324 5.33143096
157 1.306461299 8.646338282
158 1.920295264 6.514761298
159 0.775777107 6.898751755
160 2.719260458 3.635926882
161 0.376542258 7.421930822
162 0.786990031 8.623184327
163 2.351820829 7.413075799
164 0.596703574 7.777137756
165 2.497329079 6.145918616
166 1.595696746 8.50217729
167 -0.543557275 9.687661593
168 -0.542853582 11.69380576
169 1.066475291 5.709562535
170 2.724600136 1.558972522
171 -0.082802992 8.676117047
172 9.505312971 3.884967119
173 0.268929401 9.317344279
174 1.255575466 7.820306208
175 0.719663482 9.323286528
176 7.909681612 2.323222398
177 1.055741976 9.845901242
178 4.767966222 2.61595653
179 7.888250038 2.63230306
180 2.989967894 1.014369704
181 1.638966916 6.239366553
182 0.45542432 6.710778015
183 0.925448219 7.003734491
184 2.411990728 1.539395309
185 8.929192247 4.980155073
186 6.954073266 2.125896249
187 1.141388555 9.134112971
188 6.086453672 1.566920605
189 -0.318126096 8.15711759
190 6.559360731 1.948124721
191 1.890520612 8.095218634
192 1.24955614 1.557999307
193 -0.633157289 10.11871877
194 -0.386092211 9.259154783
195 0.610348525 6.993728509
196 2.535031274 5.436563241
197 6.242447415 1.152050056
198 2.355272493 2.067698872
199 2.81998214 0.790773154
200 2.033431733 5.87239168
201 -1.651761931 10.92694267
202 0.650029566 8.694205693
203 3.606998995 5.618223263
204 1.002140373 6.92133481
205 0.536741154 10.68590364
206 1.29503235 6.996163718
207 2.608149736 2.517216459
208 0.606934231 8.033801083
209 -0.718219308 10.54082274
210 -0.874950933 9.654032466
211 0.273919737 8.884371137
212 0.862794261 8.440642664
213 1.408589065 6.917102737
214 7.17691571 1.934553106
215 1.200790168 7.391868101
216 1.446536618 7.983936343
217 4.896930779 1.960534586
218 1.781408606 6.060907874
219 2.362435633 2.818039167
220 1.475417565 2.301899055
221 7.427753026 2.697835847
222 0.891413724 7.746049244
223 2.769172548 1.4431577
224 2.910288855 5.601222343
225 2.862908107 6.971001812
226 12.05568128 6.190829357
227 3.509702169 1.673486502
228 0.373637136 9.471195706
229 -1.013254253 10.7328429
230 0.880646607 8.861080285
231 1.035417222 5.240603652
232 6.803552901 1.705093738
233 0.521568828 8.222015565
234 0.754968525 9.00068519
235 1.7129982 2.586191629
236 1.842743804 3.515183048
237 1.852885512 7.745456182
238 0.577770129 7.942231251
239 -0.01391403 10.84838058
240 1.298816547 8.327264605
241 0.302121652 11.07528409
242 6.14303883 0.882478397
243 2.559447585 2.556342801
244 0.275726752 7.444550062
245 0.879843914 7.63369441
246 6.710736905 1.937334559
247 0.098810215 9.078195202
248 2.098059195 6.445373215
249 10.49109237 3.78897784
250 1.450842803 6.935154232
251 0.33588408 8.407537659
252 1.238650688 6.671935565
253 1.39091048 7.725009615
254 8.954515067 4.055689368
255 11.43437223 4.439451479
256 9.370357772 4.120589851
257 10.95634008 3.964461468
258 0.557760859 7.521220062
259 8.860850821 3.304827139
260 0.916080486 6.408236347
261 0.126688632 8.399328361
262 1.263868256 3.64280836
263 1.506898711 0.317020588
264 4.704812348 0.765968536
265 0.730890788 8.345649798
266 -0.040990611 7.994828415
267 8.182800242 2.732878682
268 6.78577077 2.892942852
269 10.50082191 5.209117087
270 2.526527533 6.297905053
271 -0.106393994 9.907648984
272 1.827748112 6.77108302
273 2.089124219 1.104873146
274 10.4987264 5.161995292
275 8.62909582 3.509240796
276 2.139818705 6.715004569
277 0.425476606 9.458753699
278 7.435250834 3.484451627
279 12.15104103 5.314880748
280 8.95104554 3.404385573
281 0.650193005 1.697323683
282 3.165748125 3.707275578
283 0.998283542 7.335502884
284 0.700419603 8.232888883
285 0.873021882 10.49573156
286 2.438188396 5.431050458
287 0.847034337 8.148952621
288 0.780857134 11.5805308
289 7.42175705 2.217436421
290 0.670452324 7.502979945
291 1.939699143 6.905916024
292 -0.697767471 11.04910087
293 7.002145059 1.82440975
294 3.258167532 0.463564807
295 0.641749886 9.37254631
296 1.185226841 9.265772224
297 0.61430004 7.109540187
298 1.059648768 8.43587822
299 2.955635137 0.764382252
300 0.156805409 7.563344037
301 0.087616219 7.285636163
302 0.73830223 7.34804542
303 0.753631596 7.3124003
304 0.983618953 7.169685027
305 0.75639077 6.561369523
306 0.551502777 6.925127873
307 -0.431655676 9.316552236
308 0.125032548 5.844987578
309 -0.277865234 6.657760211
310 1.158840121 7.194701548
311 -0.598360488 7.229489978
312 -0.070679163 6.602060764
313 0.665508499 8.785684458
314 0.462652312 9.884614264
315 0.056505797 9.321010107
316 1.536332165 1.074834051
317 0.86257069 8.405320107
318 -0.780804379 9.411595506
319 1.586996692 7.562378313
320 1.305142828 1.964356887
321 -1.114296795 9.134847198
322 1.045264843 7.790338246
323 1.693274972 1.925688747
324 5.171454829 0.501060887
325 2.391417828 1.031832531
326 -1.131994622 8.062215558
327 0.668818311 0.736556487
328 0.818161928 4.340210386
329 0.924711006 1.500104637
330 7.113252294 1.191369487
331 1.342753446 4.911789441
332 7.941032288 2.455485956
333 -0.765768891 8.814087239
334 1.816195943 2.338034299
335 1.569336588 7.638184442
336 0.085543486 0.66702795
337 2.482076451 2.810739913
338 5.312292352 0.706251411
339 0.991292394 1.850760013
340 1.427834232 3.922443169
341 8.155546445 3.894294639
342 -0.455936835 5.660022994
343 4.96328143 1.773834803
344 1.534756013 9.259153854
345 0.812983557 2.505816602
346 0.860409502 1.252146072
347 -0.806639846 10.20433647
348 2.34850316 3.56546559
349 0.808972094 2.618865297
350 -0.76181605 5.847198528
351 0.53919344 0.347872519
352 -0.005705813 10.51605313
353 1.704147015 9.622149824
354 0.66413236 9.509023027
355 2.254833193 5.264752806
356 1.093457602 3.009615072
357 0.750186042 2.550481794
358 5.346613582 1.534267031
359 -0.333714507 0.360582437
360 2.94887115 2.182081988
361 1.706749749 7.761840372
362 0.903390382 3.290189125
363 1.815469291 -0.137099227
364 -1.131537086 7.204582216
365 0.769406047 2.02462976
366 0.296007216 2.517187571
367 2.548346485 8.230203613
368 0.658624044 3.968060707
369 -0.418313565 2.057992932
370 -0.688140885 5.012772116
371 0.719254215 2.650471342
372 9.428564607 4.560967256
373 -0.822568284 5.850451469
374 -0.715554442 2.164789071
375 5.91352185 -0.613809277
376 -0.317493331 2.555026481
377 -1.074278562 2.552254768
378 3.719130478 8.108147012
379 -1.437330256 6.158768144
380 5.433904848 3.86114097
381 -1.632101001 4.354103664
382 7.311280169 4.744578561
383 11.80773502 7.932490763
384 6.615173231 5.156823316
385 10.41720521 0.931649284
386 6.401008567 5.293155766
387 8.502931068 -0.59712661
388 4.605961291 7.607629324
389 6.575575686 6.713645336
390 9.543863038 7.888222286
391 5.941710511 6.670267746
392 7.093602764 7.2401096
393 9.499239941 8.415576697
394 5.825933281 8.442488186
395 8.775842519 9.176664068
396 7.318978313 9.008043693
397 7.087939457 8.545364825
398 5.786261156 9.976768443
399 7.447909015 9.153926296