Internet Traffic Classification Using Bayesian Analysis Techniques Andrew W' Moore Univ' of Cambridg - PowerPoint PPT Presentation

1 / 22

About This Presentation

Title:

Internet Traffic Classification Using Bayesian Analysis Techniques Andrew W' Moore Univ' of Cambridg

Description:

ACM SIGMETRICS, Banff, Canada, June 2005. Date : 2005. 11. 02. Eric Joonmyung Kang ... Quality of Service to providing operators with useful forecasts for long ... – PowerPoint PPT presentation

Number of Views:478

Avg rating:3.0/5.0

Slides: 23

Provided by: junk7

Category:

more less

Transcript and Presenter's Notes

Title: Internet Traffic Classification Using Bayesian Analysis Techniques Andrew W' Moore Univ' of Cambridg

1
Internet Traffic ClassificationUsing Bayesian
Analysis TechniquesAndrew W. Moore (Univ. of
Cambridge)Denis Zuev (Univ. of Oxford) ACM
SIGMETRICS, Banff, Canada, June 2005.
Date 2005. 11. 02 Eric Joonmyung
Kang eliot_at_postech.ac.kr DPNM Lab., Dept. of
CSE, POSTECH
2
Presentation Outline

Introduction
Related Work
Experimental Setup
Machine Learned Classification
Naïve Bayesian Classifier
Naïve Bayes Kernel Estimation
Discriminator selection and dimension reduction
Experimental Results
Evaluation
Conclusion
My thoughts

3
Introduction

Accurate network traffics classification
Fundamental to numerous network activities
Security monitoring to accounting
Quality of Service to providing operators with
useful forecasts for long-term provisioning
Classification schemes are difficult to operate
correctly
The knowledge commonly available to the network,
i.e. packet-header, often does not contain
sufficient information to allow for an accurate
methodology
The goal of this paper
Using supervised Machine-Learning to classify
network traffic
The high level of accuracy achievable with the
Naïve Bayes estimator
The improved accuracy of refined variants of this
estimator
The application of a powerful technique to a
field different to those to which it has been
previously applied

4
Related Work

The field of traffic classification
The use of well known ports 12
An analysis of the headers of packets is used to
identify traffic associated with a particular
port and thus of a particular application
The relationship between the class of traffic and
its observed statistical properties 6
The distribution of flow-bytes and flow-packets
for a number of specific applications
Joint distributions for identifying classes of
traffic (flow-duration and the number of packets
transferred) 7
Packet-size profile of particular applications
8
Poisson process by Floyd Paxson 9
Describe a number of events caused directly by
the user
Classification of traffic flows for QoS
application 10
The field of Machine Learning
Security ( IDS, Signature-matching )

5
Experimental Setup1

The experimental data
Loss limited, full-payload capture to disk
providing time-stamps with resolution of better
than 35 nanoseconds
Examine data for several different periods in
time from one site on the Internet ( hosts
several Biology-related facilities, collectively
referred as a Genome Campus )
Object
Traffic-flow which is represented as a flow of
one or more packets between a given pair of hosts
Discriminator
Describing each object and used as input for
classification

6
Experimental Setup2

Traffic categories
Each flow is mapped to only one category

7
Machine Learned Classification1

Two methods for classifying data
Deterministic classification ( hard )
Data points to one of mutually exclusive classes
Probabilistic classification ( soft )
Classifying data by assigning it with
probabilities of belonging to each class of
interest
Using probabilistic classification for this
approach
Can identify similar characteristics of flows
after their probabilistic class assignment
The underlying statistical method is tractable
and well documented and understood
Robust to measurement error
Allows for supervised training with
pre-classified traffic
Provides a mechanism for quantifying the class
assignment probabilities
Ideally, available in a number of commonly
available implementations

8
Machine Learned Classification2

Analysis tools
Naïve Bayes method
The simplest technique that could be applied to
the problem in consideration
Naïve Bayes method with kernel density estimation
theory
FCBF (Fast Correlation Based Filter)
A very promising method of feature selection and
redundancy reduction

9
Naïve Bayesian Classifier1

Bayes Methods
Consider a data sample
which is a realization of
such that each random variable is described
by m attributes (referred to as
discriminators)
is then a
random vector
Ex) internet traffic may represent the mean
inter-arrival time of packets in the flow
The set of all known classes
The notion stands for the
instance belongs to the class
Posterior probability
Goal Estimate
Assumption on such as independence of
s and the standard Gaussian behavior of
them
The problem estimating the parameters of the
Gaussian distribution and the prior probabilities
of cjs

10
Naïve Bayesian Classifier2

m1, k 2 ( one discriminator and two classes )
Training sample
Prior probabilities of the class s by
The method assumes Gaussian distribution for
The distribution parameters
are estimated by appropriate maximum likelihood
estimators
Given a flow to be classified, its posterior
probability of class membership is give by

11
Naïve Bayes Kernel Estimation1

Kernel Estimation uses kernel estimation methods
The estimate of the real density is
give by
Kernel any non-negative function such that
The selection of statistical bandwidth plays an
important role in the accuracy of the model
Mean Integrated Squared Error(MISE)
When computing to classify unknown
instance, Naïve Bayes has to evaluate the
Gaussian distribution only once whereas Kernel
Esitmation must perform n evaluations

Given n training instances and m discriminators
12
Naïve Bayes Kernel Estimation2
An illustration of how the estimate of the
distribution is constructed
13
Discriminator selection1

Discriminator selection and dimension reduction
Very important role in Machine Learning by acting
as a preprocessing step, removing redundant and
irrelevant discriminators
What is meant by irrelevant and redundant
discriminators
It carries no information about different classes
of interest
It is highly correlated with another
discriminator
Discriminator selection
Filter
Use the characteristics of the training data to
determine the relevance and importance of certain
discriminators to the classification problem
Wrapper
Make use of the results of a particular
classifier to build the optimal set by evaluating
the results of the classifier on a test set for
different combinations of discriminators

14
Discriminator selection2

In this paper, using Fast Correlation-Based
Filter(FCBF)
Goodness of a discriminator
measured by its correlation with the class and
other good attributes
Correlation measure used in FCBF
based on the entropy of a random variable a
measure of uncertainty
Information gain
Symmetrical uncertainty
Two stage for selecting good discriminators
Identifying the relevance of a discriminator
Identifying the redundancy of a feature with
respect to other discriminators

15
Experimental Results1

Setup
Analysis data
Analysis Tools
WEKA written in the Java, allows tracking of
its memory footprint
Evaluation Criteria
Accuracy
The raw count of flows that were classified
correctly divided by the total number of flows
Trust
Per-class measure and an indication of how much
the classification can be trusted
Accuracy by bytes
The raw count of correctly classified flow bytes
divided by the total amount of bytes in all flows

16
Experimental Results2
Heuristic illustration of how data blocks were
obtained The line represents the instantaneous
bandwidth requirements during the day, While the
dark regions represent the data blocks that are
used in the analysis Of the accuracy of Naïve
Bayes method
17
Experimental Results3
Average percentage of accurately classified flows
by different methods
Average percentage of classified bytes by
different methods
of optimal discriminators for each training set
18
Experimental Results4
19
Evaluation
Evaluated with a dataset from a later time
Average percentage of accurately classified flows
by different method.
Measure of belief if a certain class occurred in
the Naïve Bayes method after FCBF
Measure of belief if a certain class occurred in
the Naïve Bayes with kernel density estimation
after FCBF
20
Conclusion1

Identification of Important Discriminators
Port(S),
of pushed data packets
initial window bytes
average segment size
IP data bytes median
Actual data packets
Data bytes in the wire variance
minimum segment size
RTT samples
pushed data packets
Summary
The application of supervised Machine-Learning to
classify network traffic by application
In its most basic form a Naïve Bayes classifier
is able to provide 65 accuracy for data from the
same period and can achieve over 95 accuracy
when combined with a number of simple refinements
Bayes based upon kernel-estimates combined with
the FCBF technique for discriminator reduction

21
Conclusion2

Future Works
Apply more machine-learning techniques to the
problem of network-traffic classification
Test spatial-independence of our approach through
the application of models trained using one set
of network-traces upon an entirely different
location
My thought
Machine Learning Techniques can be an interesting
approach
Static classification methods need to improve
Easy to find some solutions about the traditional
problems by mathematical approaches
Important to find good discriminators for
classifying correctly
How about making and sharing well-trained
classifiers?
How are they scalable and flexible?