Internet Traffic Demand and Traffic Matrix Estimation - PowerPoint PPT Presentation

1 / 51
About This Presentation
Title:

Internet Traffic Demand and Traffic Matrix Estimation

Description:

User behavior, time-of-day effects, and new applications ... Practical solution. Flow level measurements at peering links (both directions! ... – PowerPoint PPT presentation

Number of Views:216
Avg rating:3.0/5.0
Slides: 52
Provided by: NickMc6
Category:

less

Transcript and Presenter's Notes

Title: Internet Traffic Demand and Traffic Matrix Estimation


1
Internet Traffic Demand and Traffic Matrix
Estimation
  • Challenges in directly measuring traffic demand
    or traffic matrix
  • granularity and time scale of traffic demand
    matrix ?
  • Focus mainly on two studies representing two
    approaches
  • Partial (or sampled) measurement at
    ingress/egress points/links
  • Inference of traffic matrix based on link loads
    (aggregate SNMP link load measurement)
  • gravity model
  • tomogravity model
  • Readings Please do the required readings

2
Traffic Demands
  • How to measure and model the traffic demands?
  • Know where the traffic is coming from and going
    to
  • Why do we care about traffic demands?
  • Traffic engineering utilizes traffic demand
    matrices in balancing traffic loads and managing
    network congestion
  • Support what-if questions about topology and
    routing changes
  • Handle the large fraction of traffic crossing
    multiple domains
  • Understanding traffic demand matrices are
    critical inputs to network design, capacity
    planning and business planning!
  • How to populate the demand model?
  • Typical measurements show only the impact of
    traffic demands
  • Active probing of delay, loss, and throughput
    between hosts
  • Passive monitoring of link utilization and packet
    loss
  • Need network-wide direct measurements of traffic
    demands
  • How to characterize the traffic dynamics?
  • User behavior, time-of-day effects, and new
    applications
  • Topology and routing changes within or outside
    your network

3
Traffic Demands
Big Internet
Web Site
User Site
4
Traffic Demands
Interdomain Traffic
5
Traffic Demands
6
Defining Traffic Demand Matrices
  • Granularity and time scale
  • Source/destination network prefix pairs,
    source/destination AS pairs
  • ingress/egress routers, or ingress/egress PoP
    pairs?
  • Finer granularity traffic demands
  • likely unstable or fluctuate too widely!

6
7
Traffic Matrix (TM)
  • Point-to-Point Model
  • T Ti,j , where Ti,j from an ingress
    point i to an egress point j over a given time
    interval
  • ingress/egress points routers or PoPs
  • an ingress-egress pair is often referred to as an
    O-D pair
  • Point-to-Multipoint Model
  • Sometimes it may be difficult to determine egress
    points due to uncertainty in routing or route
    changes
  • Definition V(in, out, t)
  • Entry link (in)
  • Set of possible exit links (out)
  • Time period (t)
  • Volume of traffic (V(in,out,t))

8
Ideal Measurement Methodology
  • Measure traffic where it enters the network
  • Input link, destination address, bytes, and
    time
  • Determine where traffic can leave the network
  • Set of egress links associated with each network
    address (forwarding tables)
  • Compute traffic demands
  • Associate each measurement with a set of egress
    links
  • Even at PoP-level level, direct measurement can
    be too expensive!
  • We either need to tap all ingress/egress links,
    or collect netflow records at all ingress/egress
    routers
  • May lead to reduced performance at routers
  • large amount of data limited router disk space,
    export Netflow records consumes bandwidth!
  • Either packet-level or flow-level data, need to
    map to ingress/egress points, and a lot of
    processing to generate TM!

8
9
Adapted Measurement Methodology Inter-domain
Focus
  • F01 Paper
  • Driving traffic demands from netflow
    measurements based on selected links
  • A large fraction of the traffic is interdomain
  • Interdomain traffic is easiest to capture
  • Large number of diverse access links to customers
  • Small number of high speed links to peers
  • Practical solution
  • Flow level measurements at peering links (both
    directions!)
  • Reachability information from all routers

10
Measuring Only at Peering Links
  • Why measure only at peering links?
  • Measurement support directly in the interface
    cards
  • Small number of routers (lower management
    overhead)
  • Less frequent changes/additions to the network
  • Smaller amount of measurement data
  • Why is this enough?
  • Large majority of traffic is interdomain
  • Measurement enabled in both directions (in and
    out)
  • Inference of ingress links for traffic from
    customers

11
Inbound Outbound Flows on Peering Links
Note Ideal methodology applies for inbound flows.
12
Full Classification of Traffic Types at Peering
Links
13
Identifying Where the Traffic Can Leave
  • Traffic flows
  • Each flow has a dest IP address (e.g.,
    12.34.156.5)
  • Each address belongs to a prefix (e.g.,
    12.34.156.0/24)
  • Forwarding tables
  • Each router has a table to forward a packet to
    next hop
  • Forwarding table maps a prefix to a next hop
    link
  • Process
  • Dump the forwarding table from each edge router
  • Identify entries where the next hop is an
    egress link
  • Identify set all egress links associated with a
    prefix

14
Flows Leaving at Peer Links
  • Single-hop transit
  • Flow enters and leaves the network at the same
    router
  • Keep the single flow record measured at ingress
    point
  • Multi-hop transit
  • Flow measured twice as it enters and leaves the
    network
  • Avoid double counting by omitting second flow
    record
  • Discard flow record if source does not match a
    customer
  • Outbound
  • Flow measured only as it leaves the network
  • Keep flow record if source address matches a
    customer
  • Identify ingress link(s) that could have sent the
    traffic

15
Most Challenging Part Inferring Ingress Links
for Outbound Flows
Example
Outbound traffic flow measured at peering link
output
Customers
destination
16
Computing the Demands
  • Data
  • Large, diverse, lossy
  • Collected at slightly different, overlapping time
    intervals, across the network.
  • Subject to network and operational dynamics.
    Anomalies explained and fixed via understanding
    of these dynamics
  • Algorithms, details and anecdotes in paper!

17
Experience with Populating the Model
  • Largely successful
  • 98 of all traffic (bytes) associated with a set
    of egress links
  • 95-99 of traffic consistent with an OSPF
    simulator
  • Disambiguating outbound traffic
  • 67 of traffic associated with a single ingress
    link
  • 33 of traffic split across multiple ingress
    (typically, same city!)
  • Inbound and transit traffic (uses input
    measurement)
  • Results are good
  • Outbound traffic (uses input disambiguation)
  • Results are pretty good, for traffic engineering
    applications, but there are limitations
  • To improve results, may want to measure at
    selected or sampled customer links e.g., links
    to email, hosting or data centers.

18
Proportion of Traffic in Top Demands (Log Scale)
Zipf-like distribution. Relatively small number
of heavy demands dominate.
19
Time-of-Day Effects (San Francisco)
Heavy demands at same site may show different
time of day behavior
20
Discussion
  • Distribution of traffic volume across demands
  • Small number of heavy demands (Zipfs Law!)
  • Optimize routing based on the heavy demands
  • Measure a small fraction of the traffic (sample)
  • Watch out for changes in load and egress links
  • Time-of-day fluctuations in traffic volumes
  • U.S. business, U.S. residential, International
    traffic
  • Depends on the time-of-day for human end-point(s)
  • Reoptimize the routes a few times a day (three?)
  • Stability?
  • No and Yes

21
TM Estimation Using Link Loads
  • M02 Paper TM estimation using SNMP link
    loads
  • Available information
  • Link counts from SNMP data.
  • Routing information. (Weights of links)
  • Additional topological information. ( Peerings,
    access links)
  • Assumption on the distribution of demands.
  • TM Estimation gt using indirect measurements
    (here link loads), solving an inference problem!
  • Y link load measurements, A routing matrix
  • Given Y, solving for X, where YAX

22
Terminology
  • cn(n-1) origin-destination (OD) pairs.
  • X Traffic matrix. (Xj data transmitted by OD
    pair j)
  • Y(y1,y2,,yr ) vector of link counts.
  • A r-by-c routing matrix (aij1, if link i
    belongs to the path associated to OD pair j)
  • YAX
  • rltltc gt Infinitely many solutions!

23
Three Existing Techniques
  • Key issue linear equations under-strained!
  • More (N2) unknowns (X_ijs) than of knowns
    Y_ls
  • Linear Programming (LP) approach.
  • O. Goldschmidt - ISMA Workshop 2000
  • Bayesian estimation.
  • C. Tebaldi, M. West - J. of American Statistical
    Association, June 1998.
  • Expectation Maximization (EM) approach.
  • J. Cao, D. Davis, S. Vander Weil, B. Yu - J. of
    American Statistical Association, 2000

24
Linear Programming
  • Objective
  • Constraints

25
Statistical Approaches
26
Bayesian Approach
  • Assumes P(Xj) follows a Poisson distribution with
    mean ?j. (independently dist.)
  • needs to be
    estimated. (a prior is needed)
  • Conditioning on link counts P(X,?Y)
  • Uses Markov Chain Monte Carlo (MCMC) simulation
    method to get posterior distributions.
  • Ultimate goal compute P(XY)

27
Expectation Maximization (EM)
  • Assumes Xj are ind. dist. Gaussian.
  • YAX implies
  • Requires a prior for initialization.
  • Incorporates multiple sets of link measurements.
  • Uses EM algorithm to compute MLE.

28
Comparison of Methodologies
  • Considers PoP-PoP traffic demands.
  • Two different topologies (4-node, 14-node).
  • Synthetic TMs. (constant, Poisson, Gaussian,
    Uniform, Bimodal)
  • Comparison criteria
  • Estimation errors yielded.
  • Sensitivity to prior.
  • Sensitivity to distribution assumptions.

29
4-node Topology
30
4-node Topology Results
31
14-node Topology
32
14-node Topology Results
33
Marginal Gains of Known Rows
34
New Directions
  • Lessons learned
  • Model assumptions do not reflect the true nature
    of traffic (multimodal behavior)
  • Dependence on priors
  • Link count is not sufficient (Generally more data
    is available to network operators.)
  • Proposed Solutions
  • Use choice models to incorporate additional
    information.
  • Generate a good prior solution.

35
New Statement of the Problem
  • Xij Oi.aij
  • Oi outflow from node (PoP) i.
  • aij fraction Oi going to PoP j.
  • Equivalent problem estimating aij .
  • Solution via Discrete Choice Models (DCM).
  • User choices.
  • ISP choices.

36
Choice Models
  • Decision makers PoPs
  • Set of alternatives egress PoPs.
  • Attributes of decision makers and alternatives
    attractiveness (capacity, number of attached
    customers, peering links).
  • Utility maximization with random utility models.

37
Random Utility Model
  • Uij Vij eij Utility of PoP i choosing to
    send packet to PoP j.
  • Choice problem
  • Deterministic component
  • Random component mlogit model used.

38
Gravity Modeling
  • General formula
  • Simple gravity model Try to estimate the amount
    of traffic between edge links.

39
Results
  • Two different models (Model 1attractiveness,
  • Model 2 attractiveness repulsion )

40
Further Improvement Tomogravity Model
  • Two step modeling.
  • Gravity Model Initial solution obtained using
    edge link load data and ISP routing policy.
  • Tomographic Estimation Initial solution is
    refined by applying quadratic programming to
    minimize distance to initial solution subject to
    tomographic constraints (link counts).

41
Highlights
  • Router to router traffic matrix is computed
    instead of PoP to PoP.
  • Performance evaluation with real traffic
    matrices.
  • Tomogravity method (Gravity Tomography)

42
Recall Gravity Model
  • General formula
  • Simple gravity model Try to estimate the amount
    of traffic between edge links.

43
Generalized Gravity Model
  • Four traffic categories
  • Transit
  • Outbound
  • Inbound
  • Internal
  • Peers P1, P2,
  • Access links a1, a2, ...
  • Peering links p1,p2,

44
Generalized Gravity Model
45
Tomography
  • Solution should be consistent with the link
    counts.

46
Reducing the Computational Complexity
  • Hundreds of backbone routers, ten thousands of
    unknowns.
  • Observations
  • Some elements of the BR to BR matrix are empty.
    (Multiple BRs in each PoP, shortest paths)
  • Topological equivalence. (Reduce the number of
    IGP simulations)

47
Quadratic Programming
  • Problem Definition
  • Use SVD (singular value decomposition) to solve
    the inverse problem.
  • Use Iterative Proportional Fitting (IPF) to
    ensure non-negativity.

48
Evaluation of Gravity Models
49
Performance of Proposed Algorithm
50
Comparison
51
Robustness
  • Measurement errors
  • xAte
  • exN(0,s)
Write a Comment
User Comments (0)
About PowerShow.com