Things about Trace Analysis - PowerPoint PPT Presentation

About This Presentation
Title:

Things about Trace Analysis

Description:

SNMP: Polling. USC raw trace. Wireless association (time start/stop ... We choose to represent summary of user association in each day by a single vector. ... – PowerPoint PPT presentation

Number of Views:101
Avg rating:3.0/5.0
Slides: 45
Provided by: WJ63
Learn more at: https://www.cise.ufl.edu
Category:
Tags: analysis | things | trace

less

Transcript and Presenter's Notes

Title: Things about Trace Analysis


1
Things about Trace Analysis
  • Wei-jen Hsu
  • In class presentation for CIS6930
  • wjhsu_at_ufl.edu
  • (Advisor Ahmed Helmy)

2
Objective
  • More background knowledge related to trace-based
    study
  • Details about the trace format an intro for one
    of the assignments
  • Share the experience in trace analysis

3
Why trace analysis?
  • Traces provide the realism of how the system
    work
  • Verification of established system
  • Diagnosis of system operation (identify faults)
  • Identifying design flaws
  • Large-scale properties (e.g. self-similar
    traffic)
  • Understand how a new system works
  • Provide domain knowledge for analysis work
  • Verifying an idea

4
Typical Work Flow for Trace Analysis
  • Build the system
  • Identify point(s) of trace collection and the
    methodology used
  • Obtain the data
  • Clean-up and sanity check
  • Analyze the data and post processing
  • Explain the results
  • Apply the results to further study or modify the
    existing system

5
WLAN Traces Study
  • It starts back around 2000
  • WLAN was new, people wanted to understand how
    people used it (usage study)
  • Surveys v.s. trace
  • Work by Tang and Baker (00), Kotz and Essien
    (02) are pioneer examples
  • Statistics of usage ( of users, amount of
    traffic, etc.)

6
WLAN Traces Study
  • Mobility-related
  • MIT work (home location, prevalence, and
    persistence)
  • UCSD (PDA users)
  • WLAN mobility model (INFOCOM05, T-model,
    T-model)
  • Other user properties
  • Handoff
  • Pause time distribution

7
Trace Format
  • For association
  • Usually with format
  • (Node_id, start_time, location, end_time)
  • But with various ways to get you there.
  • Syslog Event-based
  • SNMP Polling
  • USC raw trace
  • Wireless association (time start/stop switch-port
    MAC)
  • DHCP log (time MAC IP)
  • Traffic log

8
Trace Format Example
  • USC wireless association trace
  • (Time Start/Stop Switch_IP Switch_port
    MAC_of_node)
  • Mon Oct 10 011652 Start
    172.16.8.245 31005 03065f9c0ae
  • Mon Oct 10 011700 Stop
    172.16.8.245 21044 0e359964d1
  • Mon Oct 10 011702 Start
    172.16.8.245 31015 01124dfc03a
  • USC DHCP trace
  • (Time IP_of_node MAC_of_node)
  • Jan 27 002119 207.151.229.50
    018f310ea4c
  • Jan 27 002120 207.151.232.184
    018de33792
  • Jan 27 002120 207.151.229.50
    018f310ea4c
  • USC traffic trace
  • (Start_time End_time Destination_IP_port
    Source_IP_port protocol(TCP6, UDP17) ?
    Packet_number Data_size)
  • 0127.235942.925 0127.235944.905
    128.125.253.143 53 207.151.239.208 1795
    17 0 3 1368
  • 0127.235942.925 0127.235952.677
    63.236.56.237 80 207.151.239.208 3257
    6 2 4 192

9
Work with the Trace
  • An exercise
  • Does the Encounter-Relationship graph change
    with respect to time??
  • From WLAN traces,
  • We find encounters to measure inter-node
    relationship

Note Is this a good assumption??
10
Encounter distribution
  • How many other nodes does a node encounter with?

Prob. (unique encounter fraction gt x)
11
Encounter-Relationship graph
  • Imagine that there is a link to connect the node
    pairs if they ever encounter with each other
    What does the graph look like?

But, is ER grapha connected graph? What are its
properties?
12
Encounter-Relationship graph
  • To our surprise, ER graphs are connected!!

Disconnected Ratio ()
13
Encounter-Relationship graph
  • What are the graph properties of the relationship
    graphs?

High clustering as regular graph Low path length
as random graph
14
Encounter-Relationship graph
  • Relationship graphs are SmallWorld graph
  • High clustering coefficient, low avg. path length

Normalized CC and PL
15
Work with the Trace
  • An exercise
  • Does the Encounter-Relationship graph change
    with respect to time??
  • Chop the trace into multiple segments
  • Analyze the average clustering coefficient and
    average path length of the resultant graph
  • How to deal with changing population?
  • Does the encounter duration matter?

16
Work with the Trace
  • Ask questions! What to look for from the trace?
  • Its importance
  • Its implication
  • Its potential usage
  • Its alternative solutions
  • Apply new techniques to look into the data
  • Find/Create interesting data sets

17
Lessons Learned
  • You need a lot of patience and care
  • Exceptions in the data
  • Flaws in your assumption
  • You need a lot of hard-drive space too!
  • You need good questions
  • For each question there are multiple ways to come
    up with an answer
  • New questions require new data sets and tools
  • You need to read a lot of papers

18
More Potential Direction
  • Mobility modeling/prediction
  • Data mining and clustering
  • Behavior-aware service/advertisements
  • Behavior-aware routing
  • Caveat Over-generalization from WLAN to
    futuristic networks (such as DTN)?
  • Re-examine assumptions in earlier work

19
Related Skills
  • General programming (C/C)
  • Perl/shell script/awk
  • Matrix manipulation (MATLAB)
  • Statistics software (R)
  • http//www.r-project.org/
  • Clustering/Machine learning
  • Principal component analysis/ Singular value
    decomposition
  • http//www.cs.cmu.edu/elaw/papers/pca.pdf
  • Data mining? Database analysis?

20
Good Online Resources
  • MobiLib
  • http//nile.cise.ufl.edu/MobiLib
  • Links to various traces, USC trace and some
    processing tools download
  • CRAWDAD
  • http//crawdad.cs.dartmouth.edu/
  • Various traces download, related papers

21
References
  • Stanford D. Tang and M. Baker, Analysis of a
    Local-area Wireless Network
  • Stanford2 D. Tang and M. Baker, Analysis of a
    Metropolitan-area Wireless Network
  • Dartmouth D. Kotz and K. Essien, Analysis of a
    Campus-wide Wireless Network
  • Dartmouth2 T. Henderson, D. Kotz, and I.
    Abyzov, The Changing Usage of a Mature
    Campus-wide Wireless Network
  • MIT/IBM M. Balazinska and P. Castro,
    Characterizing Mobility and Network Usage in a
    Corporate Wireless Local-area Network

22
References
  • UCSD M. McNett and G. Voelker, Access and
    Mobility of Wireless PDA Users
  • UCLA X. Meng, S. Wong, Y. Yuan, and S. Lu,
    Characterizing Flows in Large Wireless Data
    Networks
  • USC D. Bhattacharjee, A. Rao, C. Shah, M. Shah,
    and A. Helmy, Empirical Modeling of Campus-wide
    Pedestrian Mobility Observations on the USC
    Campus
  • USC2 K. Merchant, W. Hsu, H. Shu, C. Hsu, and
    A. Helmy, Weighted Waypoint Mobility Model and
    Its Impacts on Ad Hoc Networks

23
References
  • Dartmouth M. Kim and D Kotz, Methodology for
    Classifying Mobile Users and Access Points
  • Dartmouth L. Song, D. Kotz, R. Jain, and X. He,
    Evaluating location predictors with extensive
    Wi-Fi mobility data
  • SIGCOMM01 A. Balachandran, G. Voelker, P. Bahl,
    and V. Rangan, Characterizing User Behavior and
    Network Performance in a Public Wireless LAN
  • INFOCOM05 C. Tuduce and T. Gross, A Mobility
    Model Based on WLAN Traces and its Validation
  • T-model D Lelescu, UC Kozat, R Jain, M
    Balakrishnan, Model T an empirical joint
    space-time registration model
  • T-model R Jain, D Lelescu, M Balakrishnan,
    Model T an empirical model for user
    registration patterns in a campus wireless LAN

24
More on Mobility Modeling
25
Mobility Observations from WLANs
  • Skewed location visiting preferences
  • Nodes spend 95 of time at top 5 preferred
    locations.
  • Heavily visited preferred spots
  • Periodical re-appearance
  • Nodes show up repeatedly at the same location
    after integer multiples of days.
  • Periodical daily/weekly schedules

26
Mobility Observations from WLANs
  • Problems of simple random models (random walk,
    random waypoint, random direction)
  • No preferred locations in spatial domain (uniform
    nodal distribution across space)
  • No structure in time domain (homogeneous behavior
    across time)
  • Nodes behave statistically identical to one
    another
  • Benefit Math analysis tractability
  • Can we improve realism and not sacrifice math
    tractability?

27
Time-variant Community Model
  • Skewed location visiting preferences
  • Create communities to be the preferred
    destination
  • Each node can have its own community
  • Periodical re-appearance
  • Create structure in time Periods
  • Node move with different parameters in periods
  • Repetitive structure

75
25
28
Time-variant Community Model
  • Major trends of mobility characteristics
    preserved (extensions later)
  • In addition, mathematical tractability is retained

29
More on Matrix-based Analysis
30
Introduction
  • Wide-spread WLAN deployments create large-scale
    infrastructures.
  • Large number of users lead to large scale
    management and design issues.
  • We need methods to quantify, summarize, and
    compare long-run trends (in the order of months)
    of individual user associations
  • Usage model / association model
  • Personalized services
  • Behavior aware ads / monetization
  • Behavior-aware routing protocols

31
Questions
  • Q1. How to quantify user association consistency?
  • (Challenge) What is a proper representation of
    user association, and how do we measure
    consistency?
  • Q2. How do we summarize long run user association
    patterns?
  • (Challenge) How to utilize existing data
    reduction techniques?
  • Q3. How to group users with similar association
    patterns?
  • (Challenge) How to quantify the similarity of
    user association patterns?
  • How to reduce computational complexity?
  • Contribution Generic methods to address these
    questions and empirically validated using USC and
    Dartmouth WLAN traces.

32
Representation of User Association Patterns
  • We choose to represent summary of user
    association in each day by a single vector.
  • For a given day d, user association vector is
    defined by a n-element vector a aj the
    percentage of online time the user i spends at
    APj on day d.
  • The elements of a vector sum to 1.
  • Use zero vector for off-line users.
  • The elements in the vectors quantify the relative
    importance (or, attraction) of the AP to the user.

Association vector (library, office, class)
(0.2, 0.4, 0.4)
33
Q1. User Association Consistency
  • User i is consistent, if its daily association
    vectors can be grouped into few clusters (e.g.,
    less than 10 of the number of days).
  • Evaluation use hierarchical clustering with
    Manhattan distance measure (L1)
  • Distance between two vectors is at most 2.

34
Q1. User Association Consistency
  • Hierarchical Clustering
  • Start Each vector is a single-member cluster.
  • Recursion Two closest clusters are merged.
  • End Until remaining clusters have distances
    larger than a threshold

35
Q1. User Association Consistency
Distribution of Number ofclusters under
cut-offthreshold 0.9
80 of users show at most9 clusters of behavior
modesduring the 94-day trace
complete link Distance between clusters
distance between the furthest components inthe
considered clusters
Observation many users are multimodal but with
much less association modes than total number
of days in the trace period.
36
Q2. Summarizing user associations
  • Association matrix concatenate user association
    vectors for all days into a matrix.
  • To summarize, perform SVD and store the top-k
    eigen values/vectors.
  • What value of k we have to use for a good
    representation of the matrix?
  • Captured matrix power
  • How much is the reconstruction error?
  • Matrix norms X-Xkp/Xp where

37
Q2. Summarizing user associations
Only top 6 singular vectorsare needed to capture
at least90 of power for more than 95 of
association matrices
Reconstruction error of low-rank
approximationis low (5 singular vectorsgive
error lt 0.05)
Observation although users are multi-modal,a
few major modes dominate its behavior
38
Q2. Summarizing user associations
  • Association matrix concatenate user association
    vectors for all days into a matrix.
  • To summarize, perform SVD and store the top-k
    eigen values/vectors.
  • What value of k we have to use for a good
    representation of the matrix?
  • Captured matrix power
  • How much is the reconstruction error?
  • Matrix norms X-Xkp/Xp where

39
Q2. Summarizing user associations
Only top 6 singular vectorsare needed to capture
at least90 of power for more than 95 of
association matrices
Reconstruction error of low-rank
approximationis low (5 singular vectorsgive
error lt 0.05)
Observation although users are multi-modal,a
few major modes dominate its behavior
40
Q3. Similarity Metrics between Users
  • Naive method to compare similarity between user i
    and j
  • Intuition for every daily association vector of
    i, if there is a similar association vector for
    j, then (i,j) have similar behavior.
  • From user i, pick association vector aid of user
    i on day d.
  • Find the association vector of user j, denoted by
    ajd , which is the nearest to aid
  • Find average of ajd - aid over all days d.
  • Drawback expensive
  • O(nd2) for each pair
  • Lots of file reads for large dataset . Read raw
    data
  • Need a faster method which reads summaries

41
Q3. Similarity Metrics between Users
  • Compare the similarity of the eigen-vectors
    obtained from SVD.
  • Similarity between users determined by weighted
    inner products of eigen vectors.
  • wi proportion of power of singular vector
  • D(U,V) 1 - Sim(U,V)
  • Are the 2 metrics similar?
  • 0.911 correlation coefficient for studied users.

42
Q3. Similarity Metrics between Users
  • Are we able to get clusters with similar users?
  • Compare the PDF/CDF for inter- and intra- cluster
    users (Example 200 clusters).

43
Q3. Similarity Metrics between Users
  • Take users in the same clusters and concatenate
    the asso. matrices, and perform SVD and find
    power captured by top k eigen vectors.
  • Also take random users and concatenate the
    eigenvectors and do the same.
  • There is a clear distinction between the 2
    clustering methods.

straight-forward similarity decided based
onpair-wise comparison of association
vectors feature-based similarity decided based
on singular vectors
44
Q3. Similarity Metrics between Users
  • For all clusters, use a scatter plot to show the
    power captured by top-4 eigenvectors.
    (distance-based cluster vs random cluster)
Write a Comment
User Comments (0)
About PowerShow.com