Title: Interview talk at various universities and labs
1Flowlet Switching
Srikanth Kandula Shan Sinha Dina Katabi
2ISPs Want to Split Traffic Across Multiple Paths
3ISPs Want to Split Traffic Across Multiple Paths
70
30
- Load balancing to remove hot spots
- Rebalance traffic when unpredictable events occur
(Outages, DoS, BGP reroutes, Flash Crowds, )
4ISPs Want to Split Traffic Across Multiple Paths
Unpredictable Traffic
Rebalance Traffic
70
30
- Load balancing to remove hot spots
- Rebalance traffic when unpredictable events occur
(Outages, DoS, BGP reroutes, Flash Crowds, )
5ISPs Want to Split Traffic Across Multiple Paths
Unpredictable Traffic
30
70
- Load balancing to remove hot spots
- Rebalance traffic when unpredictable events occur
(Outages, DoS, BGP reroutes, Flash Crowds, )
6- Much research on balancing and rebalancing load,
- But implementation is hard particularly with
dynamic ratios - Either sacrifice accuracy or reorder TCP packets
7- Much research on balancing and rebalancing load,
- But implementation is hard particularly with
dynamic ratios - Either sacrifice accuracy or reorder TCP packets
Problem
- Given the desired split ratios possibly dynamic
- Split traffic accurately, at the edge router,
without reordering TCPs packets
8Existing Scheme 1 Packet-Based Splitting
- Assign packets to paths proportional to the
desired ratios - Reorders TCP packets causing bad throughput
9Existing Scheme 2 Flow-Based Splitting
- Assign TCP flows to each path proportional to the
desired ratio - Flows are not all equal Elephants Mice
- So, estimate the rate of each TCP flow
- But rates change with time
- Too complex
- Very inaccurate if desired ratios change
10How to Split Traffic?
- Packet-Based
- Accurate
- Reorders TCP packets
- Easily tracks dynamic ratios
- Flow-Based
- Inaccurate
- No packet reordering
- Hard to track if ratios change
Can we combine the best of the two approaches?
11This Talk
- Show how to send a single TCP flow down multiple
paths without reordering - Accurately split traffic even when desired ratios
are dynamic - Easy to implement
12Flowlet Switching
1
2
- If the previous packet from the flow has left the
merging point ? Can reassign the flow to a
different path
13Flowlet Switching
Given ? gt D2-D1
14Flowlet Switching
Given ? gt D2-D1
Flowlets are bursts from same flow separated by
at least ? they can be switched independently!
Idle ?
15Implementing Flowlet Switching is Simple
hash
SRCip DSTip SRCPort DSTPort
- Router at the split point hashes packet header
- If (Now - Last_Seen) gt ?, flow can change path
- Reassign path proportionally to the desired split
ratios
16Does it Really Work?
- Traces collected on a peering link, an edge link
and two core links - Split Vectors (3 paths)
- Static (.3, .3, .4)
- Dynamic sinusoidal with amplitude 60, period
20min Akella04,Chuah02 -
17Is Flowlet Switching Accurate?
Error
18Is Flowlet Switching Accurate?
Error
Flowlet switching is much more accurate than
flow-based switching
19Can do Flowlet Switching without Per-Flow State
Fig. shows Avg. and Max. of many traces
4 16 64 256 1024
2048 4096 8192
Hash Table Entries
Active Flows 50,000 But Router maintains a
hash table lt 1000 entries (5KB).
20Understanding Flowlets
21But Where do Flowlets come from?
- Cant be just timeouts or short flows most of
the bytes are in the elephants - Why can a large flow be broken into many small
flowlets?
22Flowlets exist because TCP is bursty at RTT and
sub-RTT scales
- Well-known that TCP usually sends a window in one
or a few bursts and waits for acks
Zhang91,Zhang03, Jiang04 - Some Reasons
- Slow-start
- Ack compression
- Window is much smaller than delay-BW product
23Flowlets exist because TCP is Bursty
Most flowlets have inter-arrivals less than an
RTT ? most flowlets are sub-windows
24Why Flowlet Switching is Accurate?
- 80 of bytes are in flowlets smaller than 10KB
- Assigning a flowlet to a path isnt a long
commitment
25Why Flowlets can Track Dynamics?
Arrival Rate of both flows and flowlets (/sec)
143.16
1454.98
Edge
611.95
8661.43
Peering
3784.10
35287.04
Core1
111.33
2848.76
Core2
An order of magnitude more opportunities to
rebalance!
26Why flowlet switching doesnt need per-flow state?
27Why flowlet switching doesnt need per-flow state?
Flow 1
Flow 2
Flow 3
28Why flowlet switching doesnt need per-flow state?
Flow 1
Flow 2
Flow 3
29Why flowlet switching doesnt need per-flow state?
Flow 1
Flow 2
Flow 3
Active Flowlets
Time
30Why flowlet switching doesnt need per-flow state?
Trace
Edge
Peering
Core1
Core2
31Why flowlet switching doesnt need per-flow state?
Trace
Edge
Peering
Core1
Core2
Active flowlets is 2 orders of magnitude smaller
than flows ? Very small hash table
32Why Flowlet Switching is Possible?
- Why can a large flow be broken into many small
flowlets? - Why is flowlet switching accurate?
- Why flowlet switching does not need per-flow
state?
- TCP burstiness at small time scales
- Small commitment many more chances to rebalance
- Few simultaneously active flowlets
33Configuring Flowlet Switching
Flowlet separation gt delay difference
But, how to find delay difference?
- For our traces which are a diverse collection of
traffic within continental US - 50ms is a good and safe choice!
- Our procedure is a constructive way to find ?
34Flowlet Separation of 50ms is Good
Any flowlet timeout in 50, 100 ms yields highly
accurate splits
35Flowlet Separation of 50ms is Safe
1 .8 .6 .4 .2 0
Even if delay difference gtgt 50ms, prob. of
reordering is negligible compared to drop. rate
in the Internet (about 1)
36Conclusion
- Harness TCP burstiness to split traffic at a
finer resolution than a flow without reordering - Flowlet Switching
- Splitting errors are a few percents
- Reordering probability is negligible compared to
drop prob. in the Internet - Easy to implement
- Enable ISPs to do dynamic load balancing
37More Information athttp//nms.lcs.mit.edu/dina/
texcp.html
Questions?
38For 50ms, very few retransmissions triggered even
when delay difference is severely under-estimated
Actual Delay Diff Flowlet Timeout
Flowlet Timeout
39Flowlet Switching has Negligible Overhead
Error
4 16 64 256
1024 2048 4096 8192
Hash Table Size
Very, Very Cheap. Edge routers maintain a hash
table of 210 entries (lt10KB).
40Any flowlet timeout in 50, 100 ms yields highly
accurate splits
Error
Flowlet Separation (msec)
41Flowlet Switching Tracks Dynamic Splits
For clarity, we show only one path!