Internet Traffic Demand and Traffic Matrix Estimation - PowerPoint PPT Presentation

1 / 51

About This Presentation

Title:

Internet Traffic Demand and Traffic Matrix Estimation

Description:

User behavior, time-of-day effects, and new applications ... Practical solution. Flow level measurements at peering links (both directions! ... – PowerPoint PPT presentation

Number of Views:216

Avg rating:3.0/5.0

Slides: 52

Provided by: NickMc6

Category:

more less

Transcript and Presenter's Notes

Title: Internet Traffic Demand and Traffic Matrix Estimation

1
Internet Traffic Demand and Traffic Matrix
Estimation

Challenges in directly measuring traffic demand
or traffic matrix
granularity and time scale of traffic demand
matrix ?
Focus mainly on two studies representing two
approaches
Partial (or sampled) measurement at
ingress/egress points/links
Inference of traffic matrix based on link loads
(aggregate SNMP link load measurement)
gravity model
tomogravity model
Readings Please do the required readings

2
Traffic Demands

How to measure and model the traffic demands?
Know where the traffic is coming from and going
to
Why do we care about traffic demands?
Traffic engineering utilizes traffic demand
matrices in balancing traffic loads and managing
network congestion
Support what-if questions about topology and
routing changes
Handle the large fraction of traffic crossing
multiple domains
Understanding traffic demand matrices are
critical inputs to network design, capacity
planning and business planning!
How to populate the demand model?
Typical measurements show only the impact of
traffic demands
Active probing of delay, loss, and throughput
between hosts
Passive monitoring of link utilization and packet
loss
Need network-wide direct measurements of traffic
demands
How to characterize the traffic dynamics?
User behavior, time-of-day effects, and new
applications
Topology and routing changes within or outside
your network

3
Traffic Demands
Big Internet
Web Site
User Site
4
Traffic Demands
Interdomain Traffic
5
Traffic Demands
6
Defining Traffic Demand Matrices

Granularity and time scale
Source/destination network prefix pairs,
source/destination AS pairs
ingress/egress routers, or ingress/egress PoP
pairs?
Finer granularity traffic demands
likely unstable or fluctuate too widely!

6
7
Traffic Matrix (TM)

Point-to-Point Model
T Ti,j , where Ti,j from an ingress
point i to an egress point j over a given time
interval
ingress/egress points routers or PoPs
an ingress-egress pair is often referred to as an
O-D pair
Point-to-Multipoint Model
Sometimes it may be difficult to determine egress
points due to uncertainty in routing or route
changes
Definition V(in, out, t)
Entry link (in)
Set of possible exit links (out)
Time period (t)
Volume of traffic (V(in,out,t))

8
Ideal Measurement Methodology

Measure traffic where it enters the network
Input link, destination address, bytes, and
time
Determine where traffic can leave the network
Set of egress links associated with each network
address (forwarding tables)
Compute traffic demands
Associate each measurement with a set of egress
links
Even at PoP-level level, direct measurement can
be too expensive!
We either need to tap all ingress/egress links,
or collect netflow records at all ingress/egress
routers
May lead to reduced performance at routers
large amount of data limited router disk space,
export Netflow records consumes bandwidth!
Either packet-level or flow-level data, need to
map to ingress/egress points, and a lot of
processing to generate TM!

8
9
Adapted Measurement Methodology Inter-domain
Focus

F01 Paper
Driving traffic demands from netflow
measurements based on selected links
A large fraction of the traffic is interdomain
Interdomain traffic is easiest to capture
Large number of diverse access links to customers
Small number of high speed links to peers
Practical solution
Flow level measurements at peering links (both
directions!)
Reachability information from all routers

10
Measuring Only at Peering Links

Why measure only at peering links?
Measurement support directly in the interface
cards
Small number of routers (lower management
overhead)
Less frequent changes/additions to the network
Smaller amount of measurement data
Why is this enough?
Large majority of traffic is interdomain
Measurement enabled in both directions (in and
out)
Inference of ingress links for traffic from
customers

11
Inbound Outbound Flows on Peering Links
Note Ideal methodology applies for inbound flows.
12
Full Classification of Traffic Types at Peering
Links
13
Identifying Where the Traffic Can Leave

Traffic flows
Each flow has a dest IP address (e.g.,
12.34.156.5)
Each address belongs to a prefix (e.g.,
12.34.156.0/24)
Forwarding tables
Each router has a table to forward a packet to
next hop
Forwarding table maps a prefix to a next hop
link
Process
Dump the forwarding table from each edge router
Identify entries where the next hop is an
egress link
Identify set all egress links associated with a
prefix

14
Flows Leaving at Peer Links

Single-hop transit
Flow enters and leaves the network at the same
router
Keep the single flow record measured at ingress
point
Multi-hop transit
Flow measured twice as it enters and leaves the
network
Avoid double counting by omitting second flow
record
Discard flow record if source does not match a
customer
Outbound
Flow measured only as it leaves the network
Keep flow record if source address matches a
customer
Identify ingress link(s) that could have sent the
traffic

15
Most Challenging Part Inferring Ingress Links
for Outbound Flows
Example
Outbound traffic flow measured at peering link
output
Customers
destination
16
Computing the Demands

Data
Large, diverse, lossy
Collected at slightly different, overlapping time
intervals, across the network.
Subject to network and operational dynamics.
Anomalies explained and fixed via understanding
of these dynamics
Algorithms, details and anecdotes in paper!

17
Experience with Populating the Model

Largely successful
98 of all traffic (bytes) associated with a set
of egress links
95-99 of traffic consistent with an OSPF
simulator
Disambiguating outbound traffic
67 of traffic associated with a single ingress
link
33 of traffic split across multiple ingress
(typically, same city!)
Inbound and transit traffic (uses input
measurement)
Results are good
Outbound traffic (uses input disambiguation)
Results are pretty good, for traffic engineering
applications, but there are limitations
To improve results, may want to measure at
selected or sampled customer links e.g., links
to email, hosting or data centers.

18
Proportion of Traffic in Top Demands (Log Scale)
Zipf-like distribution. Relatively small number
of heavy demands dominate.
19
Time-of-Day Effects (San Francisco)
Heavy demands at same site may show different
time of day behavior
20
Discussion

Distribution of traffic volume across demands
Small number of heavy demands (Zipfs Law!)
Optimize routing based on the heavy demands
Measure a small fraction of the traffic (sample)
Watch out for changes in load and egress links
Time-of-day fluctuations in traffic volumes
U.S. business, U.S. residential, International
traffic
Depends on the time-of-day for human end-point(s)
Reoptimize the routes a few times a day (three?)
Stability?
No and Yes

21
TM Estimation Using Link Loads

M02 Paper TM estimation using SNMP link
loads
Available information
Link counts from SNMP data.
Routing information. (Weights of links)
Additional topological information. ( Peerings,
access links)
Assumption on the distribution of demands.
TM Estimation gt using indirect measurements
(here link loads), solving an inference problem!
Y link load measurements, A routing matrix
Given Y, solving for X, where YAX

22
Terminology

cn(n-1) origin-destination (OD) pairs.
X Traffic matrix. (Xj data transmitted by OD
pair j)
Y(y1,y2,,yr ) vector of link counts.
A r-by-c routing matrix (aij1, if link i
belongs to the path associated to OD pair j)
YAX
rltltc gt Infinitely many solutions!

23
Three Existing Techniques

Key issue linear equations under-strained!
More (N2) unknowns (X_ijs) than of knowns
Y_ls
Linear Programming (LP) approach.
O. Goldschmidt - ISMA Workshop 2000
Bayesian estimation.
C. Tebaldi, M. West - J. of American Statistical
Association, June 1998.
Expectation Maximization (EM) approach.
J. Cao, D. Davis, S. Vander Weil, B. Yu - J. of
American Statistical Association, 2000

24
Linear Programming

Objective
Constraints

25
Statistical Approaches
26
Bayesian Approach

Assumes P(Xj) follows a Poisson distribution with
mean ?j. (independently dist.)
needs to be
estimated. (a prior is needed)
Conditioning on link counts P(X,?Y)
Uses Markov Chain Monte Carlo (MCMC) simulation
method to get posterior distributions.
Ultimate goal compute P(XY)

27
Expectation Maximization (EM)

Assumes Xj are ind. dist. Gaussian.
YAX implies
Requires a prior for initialization.
Incorporates multiple sets of link measurements.
Uses EM algorithm to compute MLE.

28
Comparison of Methodologies

Considers PoP-PoP traffic demands.
Two different topologies (4-node, 14-node).
Synthetic TMs. (constant, Poisson, Gaussian,
Uniform, Bimodal)
Comparison criteria
Estimation errors yielded.
Sensitivity to prior.
Sensitivity to distribution assumptions.

29
4-node Topology
30
4-node Topology Results
31
14-node Topology
32
14-node Topology Results
33
Marginal Gains of Known Rows
34
New Directions

Lessons learned
Model assumptions do not reflect the true nature
of traffic (multimodal behavior)
Dependence on priors
Link count is not sufficient (Generally more data
is available to network operators.)
Proposed Solutions
Use choice models to incorporate additional
information.
Generate a good prior solution.

35
New Statement of the Problem

Xij Oi.aij
Oi outflow from node (PoP) i.
aij fraction Oi going to PoP j.
Equivalent problem estimating aij .
Solution via Discrete Choice Models (DCM).
User choices.
ISP choices.

36
Choice Models

Decision makers PoPs
Set of alternatives egress PoPs.
Attributes of decision makers and alternatives
attractiveness (capacity, number of attached
customers, peering links).
Utility maximization with random utility models.

37
Random Utility Model

Uij Vij eij Utility of PoP i choosing to
send packet to PoP j.
Choice problem
Deterministic component
Random component mlogit model used.

38
Gravity Modeling

General formula
Simple gravity model Try to estimate the amount
of traffic between edge links.

39
Results

Two different models (Model 1attractiveness,
Model 2 attractiveness repulsion )

40
Further Improvement Tomogravity Model

Two step modeling.
Gravity Model Initial solution obtained using
edge link load data and ISP routing policy.
Tomographic Estimation Initial solution is
refined by applying quadratic programming to
minimize distance to initial solution subject to
tomographic constraints (link counts).

41
Highlights

Router to router traffic matrix is computed
instead of PoP to PoP.
Performance evaluation with real traffic
matrices.
Tomogravity method (Gravity Tomography)

42
Recall Gravity Model

General formula
Simple gravity model Try to estimate the amount
of traffic between edge links.

43
Generalized Gravity Model

Four traffic categories
Transit
Outbound
Inbound
Internal
Peers P1, P2,
Access links a1, a2, ...
Peering links p1,p2,

44
Generalized Gravity Model
45
Tomography

Solution should be consistent with the link
counts.

46
Reducing the Computational Complexity

Hundreds of backbone routers, ten thousands of
unknowns.
Observations
Some elements of the BR to BR matrix are empty.
(Multiple BRs in each PoP, shortest paths)
Topological equivalence. (Reduce the number of
IGP simulations)

47
Quadratic Programming