Title: Lightweight Application Classification for Network Management
1Lightweight Application Classification for
Network Management
- Hongbo Jiang
- Case Western Reserve University
Andrew W. Moore University of Cambridge
Zihui Ge Adverplex Inc.
Jia Wang ATT Labs - Research
Shudng Jin Case Western Reserve University
ACM SIGCOMM Workshop on Internet Network
Management (INM) Kyoto, Japan, August 31, 2007
2Why do Network Traffic Classification?
- Network planning
- Traffic engineering
- Accounting and billing
- Security profiling
-
3Our Contribution
- A lightweight application classification scheme
based on NetFlow data - Evaluation Sensitivity Analysis
- Trivial features
- Derivative features
- Training-set size
- Packet sampling
4Flow-level Traffic Classification
- Previous traffic classification use features
derived from streams of packets - Can achieve good accuracy (e.g., 95)
- Have high complexity and cost
- Commonly available flow-level statistics
- (Cisco NetFlow, Juniper cflowd, Huawei
NetStream,) - Sampling further reduces the cost
5Probabilistic Method Example
Training Set
Pr .97
6Our Approach (cont.)
- Features ranked by importance
- Use Symmetric Uncertainty (based on entropy)
- (See paper and references therein for details.)
- Ranked features allows for a
- sensitivity analysis, and the
- removal of irrelevant and redundant features.
7Evaluation
- Dataset (not from ATT!)
- Full-duplex 1Gbps access-link 1000 researchers
- Data was hand-classified into a number of
application classes e.g. web-browsing, email,
FTP, attack, P2P, - Focused on TCP/IP flows only
- 800,000 simplex TCP/IP application-level flows
- (97 of traffic by byte-volume)
- Netflow Generation
- Software simulation of Cisco NetFlow v5 engine
- Independent training and test sets
- Flows randomly assigned to each
8Baseline and Derivative Features
Category Baseline Derivative Baseline Application Baseline Derivative
Features srcIP/dstIPsrcPort/dstPort ToS sTime/eTime tcpFlag bytes packets Duration pktSize byteRate pktRate tcpFxxx (syn/ack/fin/rst/psh/urg) Low port High port
Accuracy 88.3 89.1 91.4
Comparison Port based 50-70, Packet based 95
9Highly Relevant Features
Refers to specific privileged services and
protocols
Differentiate Email and FTP from Web-browsing
Compact features
10Reducing Feature Complexity
Runtime 600x (s)
Runtime 1x (s)
Accuracy remains high even after removing
irrelevant and redundant features.
11Reducing Training SetSize
More features may lead-to noise (insufficiently
representative)
12Impact of Packet Sampling
- NetFlow characteristic
- Observed flow-count will decrease as sampling
rate decreases
- Packet sampling has little impact on accuracy
13Conclusion Future Works
- Conclusion
- Application Classification can be done with
Flow-level (NetFlow) information - Trivially-derived features improve accuracy
- Packet sampling have minimal impact
- Future works
- NetFlow v9??
- Other M-L methods?
14Thanks