A fine-grained view of high-performance networking - PowerPoint PPT Presentation

About This Presentation

Title:

A fine-grained view of high-performance networking

Description:

Stephen Casner, Cengiz Alaettinoglu, Chia-Chee Kuan. NANOG 22 May, ... gig-ether. Test Host. R. 11/9/09. SLIDE 25. A recent 'micro-blender' 11/9/09. SLIDE 26 ... – PowerPoint PPT presentation

Number of Views:44

Avg rating:3.0/5.0

Slides: 34

Provided by: stephen532

Category:

more less

Transcript and Presenter's Notes

Title: A fine-grained view of high-performance networking

1
A fine-grained view ofhigh-performance networking
Stephen Casner,Cengiz Alaettinoglu, Chia-Chee
Kuan NANOG 22 May, 2001
2
What this talk is about

Measurements on a tier 1 US backbone
jitter on test traffic
routing protocol packet traces
Analysis of anomalies we found
Claim can support delay-critical services
jitter determines volume and latency
some problems need to be fixed

3
What this talk is not about

Which vendor has more or fewer bugs
Which ISP provides better or worse service

This is a collaboration We appreciate the
assistance of the ISP and vendor to investigate
the unusual events we found.
4
State of the net

Backbones perform very well
For several weeks, we found 99.99 availability
and jitter lt 1ms for 99.99 of packets sent
TCP tolerates the occasional delays
Routing strained but has adapted to growth
Operators have a good macro view of this state
link uptime
ping latency
router CPU utilization

5
Going from four 9s to five 9s

Want tighter SLAs for VoIP, Virtual Wire, VPNs
Need to understand what really happens
on a fine timescale
over long periods for rare events

6
Jitter Measurement

Installed test hosts in POPs at SF DC
All services except ssh disabled for security
Connected directly to core routers
OC-48 links between POPs
Continuous 1 Mb/s test traffic
Uniform random length over 64,1500
Exponential random interval (6 ms mean)
Data collected for 15 periods of 5-7 days each
Data retrieved over the net (takes 24 hours!)

7
20?s accuracy timestamping
IP UDP seqnum Tx stamp Rx stamp
data
8
Offline Jitter Analysis

Threshold filter interarrival (relative) jitter
for a quick full-week overview
Scan each hour for packet loss and delay shifts
For interesting hours, graph absolute jitter and
zoom in
NTP not used because adjustments glitch the clock
Jitter analysis tool removes effects of clock
skew and length variation

9
99.99 clean
10
99.99 clean
11
Better ARP implemention

Do not flush ARP cache entry when timer expires
Send ARP request
Continue using the ARP cache entry
If no ARP response after N retries, flush entry
Workaround is permanent ARP entries, or
gratuitous ARP responses from host if accepted

12
Packets with negative delay?
13
Jitter shift due to rerouting
7.6 ms
40 sec
14
Constant baseline sawtooth
500 ?s
2.3 seconds
15
Mostly smooth, except...
16
A very large delay
7 seconds!
9 hours
17
Rare but significant events
18
Outage followed by flood
19
Severe jitter and misordering
20
Transmit view blender event
21
Data rate of blender event
1172 packets lost
1 Mb/s avg rate
25 Mb/s
14 seconds
22
Slope shows deceleration
23
Slope shows deceleration
24
Monitor routing along with jitter
Test Host
tg
IPBackbone
sk gig-ether
IS-IS hellos tcpdump
R
R
Test host is passive peer, sends no routes
traceroute every 5s
packet trace file
25
A recent micro-blender
26
Routing loops cause blenders
TTL 16
TTL 30
TTL 60
27
Why do loops happen?

Link-State Routing Protocols 101
Detect topology changes
Flood link-state packets
SPF algorithm to compute routes
Route databases consistent within propagation time

28
Excess churn on lifetime 0 LSPs
Observed Churn
Genuine Churn
Averaged over 100 sec
29
Long LSP propagation times
30
Explanation

Route databases are not in sync because
Churn rate is high ? many LSPs to flood
Average rate 6.6 / second (as seen at test host)
Peak rate 10 / second (as seen at test host)
LSP rate control limits flooding
4 LSPs / second on each backbone link
SPF updates may also be delayed by rate limits
Any topology change can result in a loop
DC host link appears down due to LSP switching

31
Routing loop on another path?
32
A recent week (very boring)
Jitter Measurement Summaryfor the Week 69
million packets transmitted Zero packets
lost 100 jitter lt 700?s
33
Experiment conclusions
casner_at_packetdesign.com