Internet Traffic Classification Using Bayesian Analysis Techniques Andrew W' Moore Univ' of Cambridg - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Internet Traffic Classification Using Bayesian Analysis Techniques Andrew W' Moore Univ' of Cambridg

Description:

ACM SIGMETRICS, Banff, Canada, June 2005. Date : 2005. 11. 02. Eric Joonmyung Kang ... Quality of Service to providing operators with useful forecasts for long ... – PowerPoint PPT presentation

Number of Views:478
Avg rating:3.0/5.0
Slides: 23
Provided by: junk7
Category:

less

Transcript and Presenter's Notes

Title: Internet Traffic Classification Using Bayesian Analysis Techniques Andrew W' Moore Univ' of Cambridg


1
Internet Traffic ClassificationUsing Bayesian
Analysis TechniquesAndrew W. Moore (Univ. of
Cambridge)Denis Zuev (Univ. of Oxford) ACM
SIGMETRICS, Banff, Canada, June 2005.
Date 2005. 11. 02 Eric Joonmyung
Kang eliot_at_postech.ac.kr DPNM Lab., Dept. of
CSE, POSTECH
2
Presentation Outline
  • Introduction
  • Related Work
  • Experimental Setup
  • Machine Learned Classification
  • Naïve Bayesian Classifier
  • Naïve Bayes Kernel Estimation
  • Discriminator selection and dimension reduction
  • Experimental Results
  • Evaluation
  • Conclusion
  • My thoughts

3
Introduction
  • Accurate network traffics classification
  • Fundamental to numerous network activities
  • Security monitoring to accounting
  • Quality of Service to providing operators with
    useful forecasts for long-term provisioning
  • Classification schemes are difficult to operate
    correctly
  • The knowledge commonly available to the network,
    i.e. packet-header, often does not contain
    sufficient information to allow for an accurate
    methodology
  • The goal of this paper
  • Using supervised Machine-Learning to classify
    network traffic
  • The high level of accuracy achievable with the
    Naïve Bayes estimator
  • The improved accuracy of refined variants of this
    estimator
  • The application of a powerful technique to a
    field different to those to which it has been
    previously applied

4
Related Work
  • The field of traffic classification
  • The use of well known ports 12
  • An analysis of the headers of packets is used to
    identify traffic associated with a particular
    port and thus of a particular application
  • The relationship between the class of traffic and
    its observed statistical properties 6
  • The distribution of flow-bytes and flow-packets
    for a number of specific applications
  • Joint distributions for identifying classes of
    traffic (flow-duration and the number of packets
    transferred) 7
  • Packet-size profile of particular applications
    8
  • Poisson process by Floyd Paxson 9
  • Describe a number of events caused directly by
    the user
  • Classification of traffic flows for QoS
    application 10
  • The field of Machine Learning
  • Security ( IDS, Signature-matching )

5
Experimental Setup1
  • The experimental data
  • Loss limited, full-payload capture to disk
    providing time-stamps with resolution of better
    than 35 nanoseconds
  • Examine data for several different periods in
    time from one site on the Internet ( hosts
    several Biology-related facilities, collectively
    referred as a Genome Campus )
  • Object
  • Traffic-flow which is represented as a flow of
    one or more packets between a given pair of hosts
  • Discriminator
  • Describing each object and used as input for
    classification

6
Experimental Setup2
  • Traffic categories
  • Each flow is mapped to only one category

7
Machine Learned Classification1
  • Two methods for classifying data
  • Deterministic classification ( hard )
  • Data points to one of mutually exclusive classes
  • Probabilistic classification ( soft )
  • Classifying data by assigning it with
    probabilities of belonging to each class of
    interest
  • Using probabilistic classification for this
    approach
  • Can identify similar characteristics of flows
    after their probabilistic class assignment
  • The underlying statistical method is tractable
    and well documented and understood
  • Robust to measurement error
  • Allows for supervised training with
    pre-classified traffic
  • Provides a mechanism for quantifying the class
    assignment probabilities
  • Ideally, available in a number of commonly
    available implementations

8
Machine Learned Classification2
  • Analysis tools
  • Naïve Bayes method
  • The simplest technique that could be applied to
    the problem in consideration
  • Naïve Bayes method with kernel density estimation
    theory
  • FCBF (Fast Correlation Based Filter)
  • A very promising method of feature selection and
    redundancy reduction

9
Naïve Bayesian Classifier1
  • Bayes Methods
  • Consider a data sample
    which is a realization of
    such that each random variable is described
    by m attributes (referred to as
    discriminators)
  • is then a
    random vector
  • Ex) internet traffic may represent the mean
    inter-arrival time of packets in the flow
  • The set of all known classes
  • The notion stands for the
    instance belongs to the class
  • Posterior probability
  • Goal Estimate
  • Assumption on such as independence of
    s and the standard Gaussian behavior of
    them
  • The problem estimating the parameters of the
    Gaussian distribution and the prior probabilities
    of cjs

10
Naïve Bayesian Classifier2
  • m1, k 2 ( one discriminator and two classes )
  • Training sample
  • Prior probabilities of the class s by
  • The method assumes Gaussian distribution for
  • The distribution parameters
    are estimated by appropriate maximum likelihood
    estimators
  • Given a flow to be classified, its posterior
    probability of class membership is give by

11
Naïve Bayes Kernel Estimation1
  • Kernel Estimation uses kernel estimation methods
  • The estimate of the real density is
    give by
  • Kernel any non-negative function such that
  • The selection of statistical bandwidth plays an
    important role in the accuracy of the model
  • Mean Integrated Squared Error(MISE)
  • When computing to classify unknown
    instance, Naïve Bayes has to evaluate the
    Gaussian distribution only once whereas Kernel
    Esitmation must perform n evaluations

Given n training instances and m discriminators
12
Naïve Bayes Kernel Estimation2
An illustration of how the estimate of the
distribution is constructed
13
Discriminator selection1
  • Discriminator selection and dimension reduction
  • Very important role in Machine Learning by acting
    as a preprocessing step, removing redundant and
    irrelevant discriminators
  • What is meant by irrelevant and redundant
    discriminators
  • It carries no information about different classes
    of interest
  • It is highly correlated with another
    discriminator
  • Discriminator selection
  • Filter
  • Use the characteristics of the training data to
    determine the relevance and importance of certain
    discriminators to the classification problem
  • Wrapper
  • Make use of the results of a particular
    classifier to build the optimal set by evaluating
    the results of the classifier on a test set for
    different combinations of discriminators

14
Discriminator selection2
  • In this paper, using Fast Correlation-Based
    Filter(FCBF)
  • Goodness of a discriminator
  • measured by its correlation with the class and
    other good attributes
  • Correlation measure used in FCBF
  • based on the entropy of a random variable a
    measure of uncertainty
  • Information gain
  • Symmetrical uncertainty
  • Two stage for selecting good discriminators
  • Identifying the relevance of a discriminator
  • Identifying the redundancy of a feature with
    respect to other discriminators

15
Experimental Results1
  • Setup
  • Analysis data
  • Analysis Tools
  • WEKA written in the Java, allows tracking of
    its memory footprint
  • Evaluation Criteria
  • Accuracy
  • The raw count of flows that were classified
    correctly divided by the total number of flows
  • Trust
  • Per-class measure and an indication of how much
    the classification can be trusted
  • Accuracy by bytes
  • The raw count of correctly classified flow bytes
    divided by the total amount of bytes in all flows

16
Experimental Results2
Heuristic illustration of how data blocks were
obtained The line represents the instantaneous
bandwidth requirements during the day, While the
dark regions represent the data blocks that are
used in the analysis Of the accuracy of Naïve
Bayes method
17
Experimental Results3
Average percentage of accurately classified flows
by different methods
Average percentage of classified bytes by
different methods
of optimal discriminators for each training set
18
Experimental Results4
19
Evaluation
Evaluated with a dataset from a later time
Average percentage of accurately classified flows
by different method.
Measure of belief if a certain class occurred in
the Naïve Bayes method after FCBF
Measure of belief if a certain class occurred in
the Naïve Bayes with kernel density estimation
after FCBF
20
Conclusion1
  • Identification of Important Discriminators
  • Port(S),
  • of pushed data packets
  • initial window bytes
  • average segment size
  • IP data bytes median
  • Actual data packets
  • Data bytes in the wire variance
  • minimum segment size
  • RTT samples
  • pushed data packets
  • Summary
  • The application of supervised Machine-Learning to
    classify network traffic by application
  • In its most basic form a Naïve Bayes classifier
    is able to provide 65 accuracy for data from the
    same period and can achieve over 95 accuracy
    when combined with a number of simple refinements
  • Bayes based upon kernel-estimates combined with
    the FCBF technique for discriminator reduction

21
Conclusion2
  • Future Works
  • Apply more machine-learning techniques to the
    problem of network-traffic classification
  • Test spatial-independence of our approach through
    the application of models trained using one set
    of network-traces upon an entirely different
    location
  • My thought
  • Machine Learning Techniques can be an interesting
    approach
  • Static classification methods need to improve
  • Easy to find some solutions about the traditional
    problems by mathematical approaches
  • Important to find good discriminators for
    classifying correctly
  • How about making and sharing well-trained
    classifiers?
  • How are they scalable and flexible?

22
QnA
Write a Comment
User Comments (0)
About PowerShow.com