Networking Research UofC - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

Networking Research UofC

Description:

Faculty: Majid Ghaderi, Zongpeng Li, Mea Wang, Carey Williamson. Research Staff: Martin Arlitt, Jingxiang Luo, Terence Robinson, Hongxia Sun, Qian Wu ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 13
Provided by: Kenn87
Category:

less

Transcript and Presenter's Notes

Title: Networking Research UofC


1
Networking Research (UofC)
  • Carey Williamson
  • iCORE Chair and NSERC/iCORE/TELUS Mobility
    Industrial Research Chair
  • Department of Computer Science
  • University of Calgary

2
Research Team
  • Faculty Majid Ghaderi, Zongpeng Li, Mea Wang,
    Carey Williamson
  • Research Staff Martin Arlitt, Jingxiang Luo,
    Terence Robinson, Hongxia Sun, Qian Wu
  • Students Jean Cao, Marian Doerk, Phillipa Gill,
    Mingwei Gong, Ajay Gopinathan, Emir Halepovic,
    Andreas Hirt, Rohit Joshi, Ahmed Obied, Nadim
    Parvez, Partha Ramanujam, Tuan Vu, ...

3
Research Overview
  • Research area?
  • Wireless networks, Internet protocols, computer
    systems performance evaluation
  • Mission Make the Internet go faster
  • Approach?
  • Experimental, simulation, analytical
  • Key challenges?
  • Citius, Altius, Fortius!
  • Performance, scalability, robustness

4
Experimental Facilities
  • Wireless Internet Performance Lab (UofC)
  • IEEE 802.11b wireless LAN
  • SnifferPro, Airopeek wireless network analyzers
  • PCs, laptops, PDAs, wireless NICs, Web proxy
  • Experimental Laboratory for Internet Systems and
    Applications (UofC/UofS,CFI)
  • Geographically distributed Internet testbed
    between Calgary and Saskatoon
  • Clients, servers, notebooks, routers, switches,
    Web proxies, network analyzers, 802.11a/b
  • Fully operational since Spring 2004

5
Research Highlights
  • Network Traffic Measurements
  • Martin Arlitt, et al.
  • Internet Traffic Classification
  • Jeff Erman, Anirban Mahanti, et al.
  • Wireless LAN Traffic Measurements
  • Aniket Mahanti, Martin Arlitt, et al.
  • Cellular Network Capacity Planning
  • Yujing Wu, Jingxiang Luo, Hongxia Sun

6
Network Traffic Measurement
  • Collect and analyze packet-level traces from a
    live network, using special equipment
  • Process traces, statistical analysis
  • Diagnose performance problems (network,
    protocol, application)

101101
7
Network Traffic Measurement
  • Continuous monitoring of U of C traffic on
    commercial Internet link (100 Mbps), recording
    TCP SYN/FIN/RST pkt headers
  • 36 months of data and counting
  • Specific measurement studies to date
  • TCP reset behaviour (Arlitt)
  • P2P traffic evolution (Madhukar)
  • Internet traffic classification (Erman)
  • Malicious network attacks (Obied)

8
TCP and HTTP Results
9
Semi-Supervised Network Traffic
Classification Jeffrey Erman, Anirban
Mahanti, Martin Arlitt?, Ira Cohen?,
Carey Williamson Department of Computer
Science, University of Calgary Department of
Computer Science and Engineering, Indian
Institute of Technology (Delhi) ?Enterprise
Systems Software Labs, HP Labs
Introduction
Semi-Supervised Results
Retraining Detection
Identifying and categorizing network traffic by
application type is challenging because of the
continued evolution of applications, especially
of those with a desire to be undetectable. The
diminished effectiveness of port-based
identification and the overheads of deep packet
inspection approaches motivate us to propose a
traffic classification methodology that relies on
using only flow statistics to classify traffic.
Although we found that our classifiers remained
robust for extended periods of time, a mechanism
for determining when the classifier needs
updating is still required.
Labelling of training feature vectors is one of
the most time consuming steps of the
classification process.
Figure 2 Training with (Un)labelled Flows
The results in Figure 2 show the effect on the
classifiers precision when we used a fixed
number of labelled flows and a varying numbers of
unlabelled flows in the training data set. Our
results show that for a fixed number of labelled
training flows, increasing the number of
unlabelled flows increases the classifiers
precision.
Figure 5 Correlation Between Average Distance
and Flow Accuracy
Figure 1 Selective Labelling of Flows
We propose using the average distance of new
flows to the centroid of the nearest cluster a
significant increase in the average distance
indicates the need for an update.
In Figure 1 we test the hypothesis that if a few
flows are labelled in each cluster then we have a
reasonable basis for creating the cluster to
application mapping. With as few as two labels
per cluster, we attain 94 flow accuracy.
Our proposed technique is a flexible mathematical
framework that leverages both labeled and
unlabeled flows. This semi-supervised approach to
learning a network traffic classifier is a key
contribution of this work.
Conclusions
  • Fast and accurate classifiers can be obtained
    by training with a small number of labelled flows
    mixed with a large number of unlabelled flows.
  • High flow and byte accuracy can be achieved for
    offline and real-time classification
  • Robust classifiers can be built that are immune
    to transient changes in network conditions.
  • Our approach can be integrated with solutions
    that collect flow statistics. We developed a
    prototype real-time classifier using Bro 4.

Real-Time Classification
Classification Framework
  • A fundamental challenge in the design of the
    real-time classification system is the need to
    classify a flow as soon as possible. Unlike
    offline classification where all discriminating
    flow statistics are available a priori, in the
    real-time context we only have partial
    information on the flow statistics.
  • Our solution uses a layered classification system
    based on the idea of packet milestones.
  • A packet milestone is reached when the count of
    the total number of packets a flow has sent or
    received reaches a specific value.
  • Each layer has an independent classifier.
  • Flow statistics are monitored in real-time.
  • As a flow reaches a packet milestone it is
    classified/reclassified by the appropriate layer.
  • This layered approach allows us to revise and
    potentially improve the classification of flows.
  • Figures 3 4 present example results by using
    the April 13, 9 am trace we collected from the
    UofC. We see that the classier performs well,
    with byte accuracies typically in the 70 to 90
    range.

Unlabelled Training Data
Labelled Clusters
Labelled Training Data
Classified Flows
Unclassified Flows
References
Figure 3 Performance of Real-time Classifier
Step 1 Model Building
Step 2 Classification
1 O. Chapelle, B. Scholkopf, and A. Zien,
editors. Semi-Supervised Learning. MIT Press,
Cambridge, MA, 2006. 2 J. Erman, A. Mahanti,
M. Arlitt, I. Cohen, and C. Williamson.
Offline/Online Traffic Classification Using
Semi-Supervised Learning. To Appear in Proc. of
IFIP Performance 2007 3 J. Erman, A. Mahanti,
M. Arlitt, and C. Williamson. Identifying and
Discriminating Between Web and Peer-to-Peer
Traffic in the Network Core. In WWW07, Banff,
Canada, May 2007. 4 V. Paxson. Bro A System
for Detecting Nework Intruders in Real-time.
Computer Networks, 31(23-24)2435-2463, 1999.
  • Classifier assigns each new unclassified flow to
    the nearest cluster using Euclidean distance.
    This is the maximum likelihood cluster
    assignment.
  • Label of the assigned cluster becomes the
    classification of the flow.
  • A cluster label is obtained using the labelled
    flows available in each cluster.
  • These can be obtained through a variety of
    means (automated) payload analysis, port
    numbers, expert knowledge.
  • Clusters with no labels can be left as unknown.
  • A clustering algorithm partitions the training
    flows into disjoint groups called clusters based
    on similarity. The advantages are
  • Builds natural clusters.
  • The number of training flows needed is small
    (e.g., 8000)

Typical byte accuracies in the 70 to 90 range.
Acknowledgements
This work was supported by the Natural Sciences
and Engineering Research Council (NSERC) of
Canada and Informatics Circle of Research
Excellence (iCORE) of the province of Alberta,
Canada.
Training Data Training data can be a mix of
labelled and unlabelled flows. Features include
Average Packet Size, Number of Packets, Payload
Bytes, Header Bytes, etc.
Figure 4 Byte Accuracy of Real-time Classifier
Full Paper available at http//pages.cpsc.ucalgar
y.ca/erman/
10
Wireless-side Trace Collection
  • RFGrabbers were configured to scan channels 1, 6,
    and 11 to capture AirUC WLAN traffic in the b/g
    mode.
  • Over 6 weeks, RFGrabbers captured packets from 97
    APs at 9 locations, representing 20 of the UofC
    WLAN.

11
CDMA2000 EV-DO Downlink
flow arrivals
schedule queue i at slot t
maximum feasible rate of queue j at slot t
Propagation loss, shadowing, fast fading
realized throughput of queue j up to slot t
Index of the scheduled queue at slot t
12
Future Plans
  • More of the same!
  • P2P systems modeling and analysis
  • Wireless Internet measurement/modeling
  • WiMax (IEEE 802.16)
  • QoS in CDMA2000 EV-DO
  • Wireless mesh networks?
  • Sensor networks?
  • Grid computing?
  • Network security?
Write a Comment
User Comments (0)
About PowerShow.com