BAYESIAN NETWORK

About This Presentation

Title:

BAYESIAN NETWORK

Description:

Outlook=sunny, Temperature=cool, Humidity=high, Wind=strong ... P(Temperature=cool|PlayTennis=yes) = 3/9 = 0.33. P(Temperature=cool|PlayTennis=no) = 1/5 = .20 ... – PowerPoint PPT presentation

Number of Views:293

Avg rating:3.0/5.0

Slides: 82

Provided by: vaibhav6

Learn more at: https://www3.cs.stonybrook.edu

Category:

more less

Transcript and Presenter's Notes

Title: BAYESIAN NETWORK

1
BAYESIAN NETWORK

Submitted By
Faisal Islam
Srinivasan Gopalan
Vaibhav Mittal
Vipin Makhija
Prof. Anita Wasilewska
State University of New York at Stony Brook

2
References

1Jiawei HanData Mining Concepts and
Techniques,ISBN 1-53860-489-8
Morgan Kaufman Publisher.
2 Stuart Russell,Peter Norvig Artificial
Intelligence A modern Approach ,Pearson
education.
3 Kandasamy,Thilagavati,Gunavati , Probability,
Statistics and Queueing Theory , Sultan Chand
Publishers.
4 D. Heckerman A Tutorial on Learning with
Bayesian Networks, In Learning in Graphical
Models, ed. M.I. Jordan, The MIT Press, 1998.
5 http//en.wikipedia.org/wiki/Bayesian_probabil
ity
6 http//www.construction.ualberta.ca/civ606/myF
iles/Intro20to20Belief20Network.pdf
7 http//www.murrayc.com/learning/AI/bbn.shtml
8 http//www.cs.ubc.ca/murphyk/Bayes/bnintro.ht
ml
9 http//en.wikipedia.org/wiki/Bayesian_belief_n
etwork

3
CONTENTS

HISTORY
CONDITIONAL PROBABILITY
BAYES THEOREM
NAÏVE BAYES CLASSIFIER
BELIEF NETWORK
APPLICATION OF BAYESIAN NETWORK
PAPER ON CYBER CRIME DETECTION

4
HISTORY

Bayesian Probability was named after
Reverend Thomas Bayes (1702-1761).
He proved a special case of what is currently
known as the Bayes Theorem.
The term Bayesian came into use around the
1950s.
Pierre-Simon, Marquis de Laplace (1749-1827)
independently proved a generalized version of
Bayes Theorem.
http//en.wikipedia.org/wiki/Bayesian_probability

5
HISTORY (Cont.)

1950s New knowledge in Artificial Intelligence
1958 Genetic Algorithms by Friedberg (Holland and
Goldberg 1985)
1965 Fuzzy Logic by Zadeh at UC Berkeley
1970 Bayesian Belief Network at Stanford
University (Judea Pearl 1988)
The ideas proposed above was not fully
developed until later. BBN became popular in
the 1990s.
http//www.construction.ualberta.ca/civ606/myFiles
/Intro20to20Belief20Network.pdf

6
HISTORY (Cont.)

Current uses of Bayesian Networks
Microsofts printer troubleshooter.
Diagnose diseases (Mycin).
Used to predict oil and stock prices
Control the space shuttle
Risk Analysis Schedule and Cost Overruns.

7
CONDITIONAL PROBABILITY

Probability How likely is it that an event will
happen?
Sample Space S
Element of S elementary event
An event A is a subset of S
P(A)
P(S) 1
Events A and B
P(AB)- Probability that event A occurs given
that event B has already occurred.
Example
There are 2 baskets. B1 has 2 red ball and 5 blue
ball. B2 has 4 red ball and 3 blue ball. Find
probability of picking a red ball from
basket 1?

8
CONDITIONAL PROBABILITY

The question above wants P(red ball
basket 1).
The answer intuitively wants the probability of
red ball from only the sample space of basket
1.
So the answer is 2/7
The equation to solve it is
P(AB) P(AnB)/P(B) Product Rule
P(A,B) P(A)P(B) If A and B are independent
How do you solve P(basket2 red ball) ???

9
BAYESIAN THEOREM

A special case of Bayesian Theorem
P(AnB) P(B) x P(AB)
P(BnA) P(A) x P(BA)
Since P(AnB) P(BnA),
P(B) x P(AB) P(A) x P(BA)
gt P(AB) P(A) x P(BA) / P(B)

A
B
10
BAYESIAN THEOREM

Solution to P(basket2 red ball) ?
P(basket 2 red ball) P(b2) x P(r b2) /
P(r)
(1/2) x (4/7) / (6/14)
0.66

11
BAYESIAN THEOREM

Example 2 A medical cancer diagnosis
problem
There are 2 possible outcomes of a diagnosis
ve, -ve. We know .8 of world population has
cancer. Test gives correct ve result 98 of the
time and gives correct ve result 97 of the
time.
If a patients test returns ve, should we
diagnose the patient as having cancer?

12
BAYESIAN THEOREM

P(cancer) .008 P(-cancer) .992
P(vecancer) .98 P(-vecancer) .02
P(ve-cancer) .03 P(-ve-cancer) .97
Using Bayes Formula
P(cancerve) P(vecancer)xP(cancer) / P(ve)
0.98 x 0.008 .0078 / P(ve)
P(-cancerve) P(ve-cancer)xP(-cancer) /
P(ve)
0.03 x 0.992 0.0298 / P(ve)
So, the patient most likely does not have cancer.

13
BAYESIAN THEOREM

General Bayesian Theorem
Given E1, E2,,En are mutually disjoint events
and P(Ei) ? 0, (i 1, 2,, n)
P(Ei/A) P(Ei) x P(AEi) / S P(Ei) x P(AEi)
i 1, 2,, n

14
BAYESIAN THEOREM

Example
There are 3 boxes. B1 has 2 white, 3 black
and 4 red balls. B2 has 3 white, 2 black and 2
red balls. B3 has 4 white, 1 black and 3 red
balls. A box is chosen at random and 2 balls are
drawn. 1 is white and other is red. What is the
probability that they came from the first box??

15
BAYESIAN THEOREM

Let E1, E2, E3 denote events of choosing B1, B2,
B3 respectively. Let A be the event that 2 balls
selected are white and red.
P(E1) P(E2) P(E3) 1/3
P(AE1) 2c1 x 4c1 / 9c2 2/9
P(AE2) 3c1 x 2c1 / 7c2 2/7
P(AE3) 4c1 x 3c1 / 8c2 3/7

16
BAYESIAN THEOREM

P(E1A) P(E1) x P(AE1) / S P(Ei) x P(AEi)
0.23727
P(E2A) 0.30509
P(E3A) 1 (0.23727 0.30509) 0.45764

17
BAYESIAN CLASSIFICATION

Why use Bayesian Classification
Probabilistic learning Calculate explicit
probabilities for hypothesis, among the most
practical approaches to certain types of
learning problems
Incremental Each training example can
incrmentally increase/decrease the probability
that a hypothesis is correct. Prior knowledge
can be combined with observed data.

18
BAYESIAN CLASSIFICATION

Probabilistic prediction Predict multiple
hypotheses, weighted by their probabilities
Standard Even when Bayesian methods are
computationally intractable, they can provide a
standard of optimal decision making against
which other methods can be measured

19
NAÏVE BAYES CLASSIFIER

A simplified assumption attributes are
conditionally independent
Greatly reduces the computation cost, only
count the class distribution.

20
NAÏVE BAYES CLASSIFIER

The probabilistic model of NBC is to find the
probability of a certain class given multiple
dijoint (assumed) events.
The naïve Bayes classifier applies to learning
tasks where each instance x is described by a
conjunction of attribute values and where the
target function f(x) can take on any value from
some finite set V. A set of training examples of
the target function is provided, and a new
instance is presented, described by the tuple
of attribute values lta1,a2,,angt. The learner is
asked to predict the target value, or
classification, for this new instance.

21
NAÏVE BAYES CLASSIFIER

Abstractly, probability model for a classifier is
a conditional model
P(CF1,F2,,Fn)
Over a dependent class variable C with a small
nuumber of outcome or classes conditional over
several feature variables F1,,Fn.
Naïve Bayes Formula
P(CF1,F2,,Fn) argmaxc P(C) x P(F1C) x
P(F2C) xx P(FnC) / P(F1,F2,,Fn)
Since P(F1,F2,,Fn) is common to all
probabilities, we donot need to evaluate the
denomitator for comparisons.

22
NAÏVE BAYES CLASSIFIER

Tennis-Example

23
NAÏVE BAYES CLASSIFIER

Problem
Use training data from above to classify the
following instances
ltOutlooksunny, Temperaturecool,
Humidityhigh, Windstronggt
ltOutlookovercast, Temperaturecool,
Humidityhigh, Windstronggt

24
NAÏVE BAYES CLASSIFIER

Answer to (a)
P(PlayTennisyes) 9/14 0.64
P(PlayTennisn) 5/14 0.36
P(OutlooksunnyPlayTennisyes) 2/9 0.22
P(OutlooksunnyPlayTennisno) 3/5 0.60
P(TemperaturecoolPlayTennisyes) 3/9 0.33
P(TemperaturecoolPlayTennisno) 1/5 .20
P(HumidityhighPlayTennisyes) 3/9 0.33
P(HumidityhighPlayTennisno) 4/5 0.80
P(WindstrongPlayTennisyes) 3/9 0.33
P(WindstrongPlayTennisno) 3/5 0.60

25
NAÏVE BAYES CLASSIFIER

P(yes)xP(sunnyyes)xP(coolyes)xP(highyes)xP(stro
ngyes) 0.0053
P(no)xP(sunnyno)xP(coolno)xP(highno)x
P(strongno) 0.0206
So the class for this instance is no. We can
normalize the probility by
0.0206/0.02060.0053 0.795

26
NAÏVE BAYES CLASSIFIER

Answer to (b)
P(PlayTennisyes) 9/14 0.64
P(PlayTennisno) 5/14 0.36
P(OutlookovercastPlayTennisyes) 4/9 0.44
P(OutlookovercastPlayTennisno) 0/5 0
P(TemperaturecoolPlayTennisyes) 3/9 0.33
P(TemperaturecoolPlayTennisno) 1/5 .20
P(HumidityhighPlayTennisyes) 3/9 0.33
P(HumidityhighPlayTennisno) 4/5 0.80
P(WindstrongPlayTennisyes) 3/9 0.33
P(WindstrongPlayTennisno) 3/5 0.60

27
NAÏVE BAYES CLASSIFIER

Estimating Probabilities
In the previous example, P(overcastno) 0 which
causes the formula-
P(no)xP(overcastno)xP(coolno)xP(highno)xP(stron
gnno) 0.0
This causes problems in comparing because the
other probabilities are not considered. We can
avoid this difficulty by using m- estimate.

28
NAÏVE BAYES CLASSIFIER

M-Estimate Formula
c k / n m where c/n is the original
probability used before, k1 and m
equivalent sample size.
Using this method our new values of
probility is given below-

29
NAÏVE BAYES CLASSIFIER

New answer to (b)
P(PlayTennisyes) 10/16 0.63
P(PlayTennisno) 6/16 0.37
P(OutlookovercastPlayTennisyes) 5/12 0.42
P(OutlookovercastPlayTennisno) 1/8 .13
P(TemperaturecoolPlayTennisyes) 4/12 0.33
P(TemperaturecoolPlayTennisno) 2/8 .25
P(HumidityhighPlayTennisyes) 4/11 0.36
P(HumidityhighPlayTennisno) 5/7 0.71
P(WindstrongPlayTennisyes) 4/11 0.36
P(WindstrongPlayTennisno) 4/7 0.57

30
NAÏVE BAYES CLASSIFIER

P(yes)xP(overcastyes)xP(coolyes)xP(highyes)xP(s
trongyes) 0.011
P(no)xP(overcastno)xP(coolno)xP(highno)xP(stron
gnno) 0.00486
So the class of this instance is yes

31
NAÏVE BAYES CLASSIFIER

The conditional probability values of all the
attributes with respect to the class are
pre-computed and stored on disk.
This prevents the classifier from computing the
conditional probabilities every time it runs.
This stored data can be reused to reduce the
latency of the classifier.

32
BAYESIAN BELIEF NETWORK

In Naïve Bayes Classifier we make the assumption
of class conditional independence, that is given
the class label of a sample, the value of the
attributes are conditionally independent of one
another.
However, there can be dependences between
value of attributes. To avoid this we use
Bayesian Belief Network which provide joint
conditional probability distribution.
A Bayesian network is a form of probabilistic
graphical model. Specifically, a Bayesian
network is a directed acyclic graph of nodes
representing variables and arcs representing
dependence relations among the
variables.

33
(No Transcript)
34
BAYESIAN BELIEF NETWORK

A Bayesian network is a representation of the
joint distribution over all the variables
represented by nodes in the graph. Let the
variables be X(1), ..., X(n).
Let parents(A) be the parents of the node A. Then
the joint distribution for X(1) through X(n) is
represented as the product of the probability
distributions P(XiParents(Xi)) for i 1
to n. If X has no parents, its probability
distribution is said to be unconditional,
otherwise it is conditional.

35
BAYESIAN BELIEF NETWORK
36
BAYESIAN BELIEF NETWORK

By the chaining rule of probability, the joint
probability of all the nodes in the graph
above is
P(C, S, R, W) P(C) P(SC) P(RC)
P(WS,R)
WWet Grass, CCloudy, RRain,
SSprinkler
Example P(Wn-RnSnC)
P(WS,-R)P(-RC)P(SC)P(C)
0.90.20.10.5 0.009

37
BAYESIAN BELIEF NETWORK

What is the probability of wet grass on a given
day - P(W)?
P(W) P(WSR) P(S) P(R)
P(WS-R) P(S) P(-R)
P(W-SR) P(-S) P(R)
P(W-S-R) P(-S) P(-R)
Here P(S) P(SC) P(C) P(S-C) P(-C)
P(R) P(RC) P(C) P(R-C) P(-C)
P(W) 0.5985

38
Advantages of Bayesian Approach

Bayesian networks can readily handle
incomplete data sets.
Bayesian networks allow one to learn
about causal relationships
Bayesian networks readily facilitate use of
prior knowledge.

39
APPLICATIONS OF Bayesian-Network
40
Sources/References

Naive Bayes Spam Filtering Using
Word-Position-Based Attributes-
http//www.ceas.cc/papers-2005/144.pdf
by- Johan Hovold, Department of Computer
Science,Lund University Box 118, 221 00
Lund, Sweden.E-mail johan.hovold.363_at_student.lu.s
e
Presented at CEAS 2005 Second Conference on
Email and Anti-SpamJuly 21 22, at Stanford
University
Tom Mitchell , Machine Learning , Tata Mcgraw
Hill
A Bayesian Approach to Filtering Junk EMail,
Mehran Sahami Susan Dumaisy David Heckermany
Eric Horvitzy Gates Building
Computer Science Department Microsoft
Research, Stanford University Redmond W
Stanford CA fsdumais heckerma
horvitzgmicrosoftcom
Presented at AAAI Workshop on Learning for
Text Categorization, July 1998, Madison,
Wisconsin

41
Problem???

real world Bayesian network application
Learning to classify text.
Instances are text documents
we might wish to learn the target concept
electronic news articles that I find
interesting, or pages on the World Wide Web
that discuss data mining topics.
In both cases, if a computer could learn the
target concept accurately, it could automatically
filter the large volume of
online text documents to present only the
most relevant
documents to the user.

42
TECHNIQUE

learning how to classify text, based on the
naive Bayes classifier
its a probabilistic approach and is among the
most effective algorithms currently known for
learning to classify text documents,
Instance space X consists of all possible text
documents
given training examples of some unknown target
function f(x), which can take on any value from
some finite set V
we will consider the target function classifying
documents as interesting or uninteresting to a
particular person, using the target values like
and dislike to indicate these two classes.

43
Design issues

how to represent an arbitrary text document in
terms of attribute values
decide how to estimate the probabilities required
by the naive Bayes classifier

44
Approach

Our approach to representing arbitrary text
documents is disturbingly simple Given a text
document, such as this paragraph, we define an
attribute for each word position in the document
and define the value of that attribute to be the
English word found in that position. Thus, the
current paragraph would be described by 111
attribute values, corresponding to the 111 word
positions. The value of the first attribute is
the word our, the value of the second attribute
is the word approach, and so on. Notice that
long text documents will require a larger number
of attributes than short documents. As we shall
see, this will not cause us any trouble.

45
ASSUMPTIONS

assume we are given a set of 700 training
documents that a friend has classified as dislike
and another 300 she has classified as like
We are now given a new document and asked to
classify it
let us assume the new text document is the
preceding paragraph

We know (P(like) .3 and P (dislike) .7 in the
current example
P(ai , wkvj) (here we introduce wk to indicate
the kth word in the English vocabulary)
estimating the class conditional probabilities
(e.g., P(ai ourIdislike)) is more problematic
because we must estimate one such probability
term for each combination of text position,
English word, and target value.
there are approximately 50,000 distinct words in
the English vocabulary, 2 possible target values,
and 111 text positions in the current example, so
we must estimate 2111 50, 000 10 million such
terms from the training data.
we make assumption that reduces the number of
probabilities that must be estimated

we shall assume the probability of encountering a
specific word wk (e.g., chocolate) is
independent of the specific word position being
considered (e.g., a23 versus a95) .
we estimate the entire set of probabilities P(a1
wkvj), P(a2 wkvj)... by the single
position-independent probability P(wklvj)
net effect is that we now require only 2 50, 000
distinct terms of the form P(wklvj)
We adopt the rn-estimate, with uniform priors and
with m equal to the size of the word vocabulary
n ? total number of word positions in all
training examples whose target value is v, nk is
the number of times word Wk is found among these
n word positions, and Vocabulary is the total
number of distinct words (and other tokens) found
within the training data.

48
Final Algorithm

Examples is a set of text documents along with
their target values. V is the set of all possible
target values. This function learns the
probability terms P( wk vj), describing the
probability that a randomly drawn word from a
document in class vj will be the English word Wk.
It also learns the class prior probabilities
P(vi). 1. collect all words, punctuation, and
other tokens that occur in Examples Vocabulary
? set of all distinct words tokens occurring in
any text document from Examples 2. calculate the
required P(vi) and P( wk vj) probability terms
For each target value vj in V do docsj ?
the subset of documents from Examples for which
the target value is vj P(v1) ? IdocsjI /
\Examplesl Textj a single document created by
concatenating all members of docsj n ? total
number of distinct word positions in Textj for
each word Wk in Vocabulary nk ? number of
times word wk occurs in Textj P(wkIvj) ?
nk1/nVocabulary
CLASSIFY_NAIVE_BAYES_TEXT( Doc) Return the
estimated target value for the document Doc. ai
denotes the word found in the ith position within
Doc. positions ? all word positions in Doc
that contain tokens found in Vocabulary Return
VNB, where

During learning, the procedure LEARN_NAIVE_BAYES_T
EXT examines all training documents to extract
the vocabulary of all words and tokens that
appear in the text, then counts their frequencies
among the different target classes to obtain the
necessary probability estimates. Later, given a
new document to be classified, the procedure
CLASSIFY_NAIVE_BAYESTEXT uses these probability
estimates to calculate VNB according to Equation
Note that any words appearing in the new document
that were not observed in the training set are
simply ignored by CLASSIFY_NAIVE_BAYESTEXT

50
Effectiveness of the Algorithm

Problem ? classifying usenet news articles
target classification for an article ?name of the
usenet newsgroup in which the article appeared
In the experiment described by Joachims (1996),
20 electronic newsgroups were considered
1,000 articles were collected from each
newsgroup, forming a data set of 20,000
documents. The naive Bayes algorithm was then
applied using two-thirds of these 20,000
documents as training examples, and performance
was measured over the remaining third.
100 most frequent words were removed (these
include words such as the and of), and any
word occurring fewer than three times was also
removed. The resulting vocabulary contained
approximately 38,500 words.
The accuracy achieved by the program was 89.

comp.graphics misc.forsale soc.religion.christian alt.atheism
comp.os.ms-winclows.misc rec.autos talk.politics.guns sci.space
cornp.sys.ibm.pc.hardware rec.sport.baseball talk.politics.mideast sci.crypt
comp.windows.x rec.motorcycles talk.politics.misc sci.electronics
comp.sys.mac.hardware rec.sport.hockey talk.creligion.misc sci .med
51
APPLICATIONS

A newsgroup posting service that learns to assign
documents to the appropriate newsgroup.
NEWSWEEDER systema program for reading netnews
that allows the user to rate articles as he or
she reads them. NEWSWEEDER then uses these rated
articles (i.e its learned profile of user
interests to suggest the most highly rated new
articles each day
Naive Bayes Spam Filtering Using Word-
Position-Based Attributes

52
Thank you !
53

Bayesian Learning Networks
Approach to
Cybercrime Detection

54
Bayesian Learning Networks Approach to
Cybercrime DetectionN S ABOUZAKHAR, A GANI
and G MANSONThe Centre for Mobile Communications
Research(C4MCR),University of Sheffield,
SheffieldRegent Court, 211 Portobello
Street,Sheffield S1 4DP, UKN.Abouzakhar_at_dcs.shef
.ac.ukA.Gani_at_dcs.shef.ac.ukG.Manson_at_dcs.shef.ac.
ukM ABUITBEL and D KINGThe Manchester School
of Engineering,University of ManchesterIT
Building, Room IT 109,Oxford Road,Manchester
M13 9PL, UKmostafa.abuitbel_at_stud.man.ac.ukDavid.
king_at_man.ac.uk
55

REFERENCES
David J. Marchette, Computer Intrusion Detection
and Network Monitoring,
A statistical Viewpoint, 2001,Springer-Verlag,
New York, Inc, USA.
2. Heckerman, D. (1995), A Tutorial on Learning
with Bayesian Networks, Technical
Report MSR-TR-95-06, Microsoft Corporation.
3. Michael Berthold and David J. Hand,
Intelligent Data Analysis, An Introduction, 1999,
Springer, Italy.
4. http//www.ll.mit.edu/IST/ideval/data/data_inde
x.html, accessed on 01/12/2002
5. http//kdd.ics.uci.edu/ , accessed on
01/12/2002.
6. Ian H. Witten and Eibe Frank, Data Mining,
Practical Machine Learning Tools and
Techniques with Java Implementations, 2000,
Morgan Kaufmann, USA.
7. http//www.bayesia.com , accessed on 20/12/2002

56
Motivation behind the paper..

Growing dependence of modern society
on telecommunication and information
networks.
Increase in the number of interconnected
networks to the Internet has led to an
increase in security threats and cyber crimes.

57
Structure of the paper

In order to detect distributed network
attacks as early as possible, an under
research and development probabilistic
approach, based on Bayesian networks
has been proposed.

58
Where can this model be utilized

Learning Agents which deploy Bayesian network
approach are considered to be a promising and
useful tool in determining suspicious early
events of Internet
threats.

59
Before we look at the details given in the paper
lets understand what Bayesian Networks are and
how they are constructed.
60
Bayesian Networks

A simple, graphical notation for conditional
independence assertions and hence for compact
specification of full
joint distributions.
Syntax
a set of nodes, one per variable
a directed, acyclic graph (link "directly
influences")
a conditional distribution for each node given
its
parents
P (Xi Parents (Xi))
In the simplest case, conditional distribution
represented as a conditional probability table
(CPT) giving the
distribution over Xi for each combination of
parent values

61
Some conventions.

Variables depicted as nodes
Arcs represent probabilistic dependence between
variables.
Conditional probabilities
encode the strength of
dependencies.
Missing arcs implies
conditional independence.

62
Semantics

The full joint distribution is defined as the
product of the
local conditional distributions
P (X1, ,Xn) pi 1 P (Xi Parents(Xi))
e.g., P(j ? m ? a ? ?b ? ?e)
P (j a) P (m a) P (a ?b, ?e) P (?b) P
(?e)

63
Example of Construction of a BN
64
Back to the discussion of the paper.
65
Description

This paper shows how probabilistically Bayesian
network detects communication network attacks,
allowing for generalization of Network Intrusion
Detection Systems
(NIDSs).

66
Goal

How well does our model detect or classify
attacks and respond to them later on.
The system requires the estimation of two
quantities
The probability of detection (PD)
Probability of false alarm (PFA).
It is not possible to simultaneously achieve a PD
of 1 and PFA of 0.

67
Input DataSet

The 2000 DARPA Intrusion Detection Evaluation
Program which was prepared and managed by MIT
Lincoln Labs has provided the necessary dataset.
Sample dataset

68
Construction of the network

The following figure shows the Bayesian
network that has been automatically
constructed by the learning algorithms of
BayesiaLab.
The target variable, activity_type, is directly
connected to the variables that heavily
contribute to its knowledge such as service
and protocol_type.

69
(No Transcript)
70
Data Gathering

MIT Lincoln Labs set up an environment to
acquire several weeks of raw TCP dump
data for a local-area network (LAN)
simulating a typical U.S. Air Force LAN. The
generated raw dataset contains about few
million connection records.

71
Mapping the simple Bayesian Network that we saw
to the one used in the paper
72
Observation 1

As shown in the next figure, the most probable
activity corresponds to a smurf attack (52.90),
an ecr_i (ECHO_REPLY) service (52.96) and an
icmp protocol (53.21).

73
(No Transcript)
74
Observation 2

What would happen if the probability of receiving
ICMP protocol packets is increased? Would the
probability of having a smurf attack increase?
Setting the protocol to its ICMP value increases
the probability of having a smurf attack from
52.90 to 99.37.

75
(No Transcript)
76
Observation 3

Lets look at the problem from the opposite
direction. If we set the probability of portsweep
attack to 100,then the value of some associated
variables would inevitably vary.
We note from Figure 4 that the probabilities of
the TCP protocol and private service have been
increased from 38.10 to 97.49 and from 24.71
to 71.45 respectively. Also, we can notice an
increase in the REJ and RSTR flags.

77
(No Transcript)
78
How do the previous examples work??PROPOGATION
Data
Data
79
Benefits of the Bayesian Model

The benefit of using Bayesian IDSs is the ability
to adjust our IDSs sensitivity.
This would allow us to trade off between
accuracy and sensitivity.
Furthermore, the automatic detection network
anomalies by learning allows distinguishing the
normal activities from the abnormal ones.
Allow network security analysts to see the
amount of information being contributed by
each variable in the detection model to the
knowledge of the target node