Title: Data Center Transport Mechanisms
1- Data Center Transport Mechanisms
Balaji Prabhakar Departments of Electrical
Engineering and Computer Science Stanford
University
Joint work with Mohammad Alizadeh, Berk Atikoglu
and Abdul Kabbani, Stanford University Ashvin
Lakshmikantha, Broadcom Rong Pan, Cisco
Systems Mick Seaman, Chair, Security Group
Ex-Chair, Interworking Group, IEEE 802.1
2What are Data Centers?
- Large enterprise networks convergence of
- High speed LANs 10, 40, 100 Gbps Ethernet
- Storage networks Fibre Channel, Infiniband
- Related idea Cloud Computing
- Outgrowth of high-performance computing networks
with integrated storage and server virtualization
support - Driven by
- Economics One network, not many
- Low capex and opex
- Economics Server utilization
- Resource pooling, virtualization, server
migration, high-speed interconnect fabrics - Savings in power consumption
- Unified management of network of servers allows
server and job scheduling - Security
- Storage and processing of data within a single
autonomous domain
3Overview of a Data Center
N-Tiers of Servers (Web, App, Database)
Front End Networks (Security Load Balancing)
- Large networks of servers, storage arrays,
connected by a high-performance network - Origins
- Clusters of web servers
- Web hosting
- High performance computing Cloud computing
- Servers, storage
DataStorage
FC
Firewall
VPN
Disk and Tape
IDS
IP
Load Balancing
- Key drivers
- Convergence of Layer 2 neworks
- Swtiched Ethernet (LANs) and Storage Area
Networks (SANs) FCoE - Server virtualization
- VMs, VM migration
NAS File Storage
IB
Data Center
4Rest of the Talk
- A brief overview of the relevant congestion
control background - A description of the QCN algorithm and its
performance - The Averaging Principle A control-theoretic idea
underlying the QCN and BIC-TCP algorithms which
stabilizes them when loop delays increase very
useful for operating high-speed links with
shallow buffers---the situation in 10 Gbps
Ethernets
5Why do Congestion Control?
- Congestion
- Transient Due to random fluctuations in packet
arrival rate - Handled by buffering packets, pausing links (IEEE
802.1bb) - Sustained When link bandwidth suddenly drops or
when new flows arrive - Switches signal sources to reduce their sending
rate IEEE 802.1Qau - Congestion control algorithms aim to
- Deliver high throughput, maintain low
latencies/backlogs, be fair to all flows, be
simple to implement and easy to deploy - Congestion control in the Internet Rich history
of algorithm development, control-theoretic
analysis, deployment - Jacobson, Floyd et al, Kelly et al, Low et al,
Srikant et al, Misra et al, Katabi, Paganini, et
al
6A main issue Stability
- Stability of control loop
- Refers to the non-oscillatory behavior of
congestion control loops - If the switch buffers are short, oscillating
queues can overflow (and drop packets) or
underflow (lose utilization) - In either case, links cannot be fully utilized,
throughput is lost, flow transfers take longer
7TCP--RED A basic control loop
TCP Slow start Congestion avoidance Congestio
n avoidance AIMD No loss increase window by
1 Pkt loss cut window by half
8TCP Dynamics
Cwnd
Congestion Window Rate
Cwnd/2
Time
Congestion message recd
9TCP--RED Analytical model
10TCP--RED Analytical model
Users
Network
W window size RTT round trip time C link
capacity q queue length qa ave queue length
p drop probability
By V. Misra, W. Dong and D. Towsley at SIGCOMM
2000 Fluid model concept originated by F. Kelly,
A. Maullo and D. Tan at Jour. Oper. Res. Society,
1998
11TCP--RED Stability analysis
- Given the differential equations, in principle,
one can figure out whether the TCP--RED control
loop is stable - However, the differential equations are very
complicated - 3rd or 4th order, nonlinear, with delays
- There is no general theory, specific case
treatments exist - Linearize and analyze
- Linearize equations around the (unique) operating
point - Analyze resultant linear, delay-differential
equations using Nyquist or Bode theory - End result
- Design stable control loops
- Determine stability conditions (RTT limits,
number of users, etc) - Obtain control loop parameters gains, drop
functions,
12Instability of TCP--RED
- As the bandwidth-delay-product increases, the
TCP--RED control loop becomes unstable - Parameters 50 sources, link capacity 9000
pkts/sec, TCP--RED - Source S. Low et. al. Infocom 2002
13Feedback Stabilization
- Many congestion control algorithms developed for
high bandwidth-delay product environments - The two main types of feedback stabilization used
are -
- Determine lags (round trip times), apply the
correct gains for the loop to be stable (e.g.
FAST, XCP, RCP, HS-TCP) - Include higher order queue derivatives in the
congestion information fed back to the source
(e.g. REM/PI, XCP, RCP) - We shall see that BIC-TCP and QCN use a different
method which we call the Averaging Principle - BIC (or Binary Increase) TCP is due to Rhee et al
- It is the default congestion control algorithm in
Linux - No control theoretic analysis, until now
14- Quantized Congestion Notification (QCN)
- Congestion control for Ethernet
15Ethernet vs. the Internet
- Some significant differences
- No per-packet acks in Ethernet, unlike in the
Internet - Not possible to know round trip time or lags!
- So congestion must be signaled to the source by
switches - Algorithm not automatically self-clocked (like
TCP) - Links can be paused i.e. packets may not be
dropped - No sequence numbering of L2 packets
- Sources do not start transmission gently (like
TCP slow-start) they can potentially come on at
the full line rate of 10Gbps - Ethernet switch buffers are much smaller than
router buffers (100s of KBs vs 100s of MBs) - Most importantly, algorithm should be simple
enough to be implemented completely in hardware - Note QCN has Internet relatives---BIC-TCP at the
source and the REM/PI controllers
16Data Center Ethernet BridgingIEEE 802.1Qau
Standard
- A summary of standards effort
- Everybody should do it at least once
- Like proving limit theorems in Probability
- But, in this case, no more than once!?
- Intense, fun activity
- Broadcom, Brocade, Cisco, Fujitsu, HP, Huawei,
IBM, Intel, NEC, Nortel, - Conference calls every Thursday morning
- Meeting every 6 weeks (Interim and Plenary)
- Real-time engineering Tear and re-build
- Our algorithm was the 4th to be proposed
- It underwent 56 revisions because of being
subjected to constraints - Draft of standard 9 revs
-
17QCN Source Dynamics
BIC-TCP and QCN
18Stability AIMD vs QCN
AIMD
QCN
RTT 50 µs
RTT 300 µs
19Experiment Simulation Parameters
- Baseline scenario
- Output-queued switch
- OG hotspot hotspot severity 0.2Gbps, hotspot
duration 3.5sec - Vary RTT 100us to 1000us
0.95 G
0.95 G
NIC 1
0.2 G
NIC 2
201 source, RTT 100µs
Hardware
OMNET
211 source, RTT 1ms
Hardware
OMNET
228 sources, RTT 1ms
Hardware
OMNET
23Fluid Model for QCN
P F(Fb)
- Assume N flows pass through a single queue at a
switch. State variables are TRi(t), CRi(t), q(t),
p(t).
10
Fb
63
24Accuracy Equations vs ns2 sims
25QCN Notes
- The algorithm has been extensively tested in
deployment scenarios of interest - Esp. interoperability with link-level PAUSE and
TCP - All presentations and p-code are available at the
IEEE 802.1 website - http//www.ieee802.org/1/pages/dcbridges.html
- http//www.ieee802.org/1/files/public/docs2008/au-
rong-qcn-serial-haipseudo-code20rev2.0.pdf - The theoretical development is interesting, but
most notably because QCN and BIC-TCP display
strong stability in the face of increasing lags,
or, equivalently in high bandwidth-delay product
networks - While attempting to understand the unusually good
performance of these schemes, we uncovered a
method for improving the stability of any
congestion control scheme
26The Averaging Principle
27The Averaging Principle (AP)
- A source in a congestion control loop is
instructed by the network to decrease or increase
its sending rate (randomly) periodically
- AP a source obeys the network whenever
instructed to change rate, and then voluntarily
performs averaging as below
TR Target Rate CR Current Rate
28A Generic Control Example
- As an example, we consider the plant transfer
function - P(s) (s1)/(s31.6s20.8
s0.6)
29Step ResponseBasic AP, No Delay
30Step ResponseBasic AP, Delay 8 seconds
31Step Response Two-step AP, Delay 14 seconds
32Step Response Two-step AP, Delay 25 seconds
Two-step AP is even more stable than Basic AP
33Applying AP to RCP (Rate Control Protocol)RCP
due to Dukkipatti and McKeown
- Basic idea Network computes max-min flow rates
for each flow. - Rate computed every 10 msecs
- Flows send at their advertised rate
- Apply the AP to RCP
34AP-RCP Stability
RTT 60 msec
RTT 65 msec
35AP-RCP Stability contd
RTT 120 msec
RTT 130 msec
36AP-RCP Stability contd
RTT 230 msec
RTT 240 msec
37Understanding the AP
- As mentioned earlier, the two major flavors of
feedback compensation are - Determine lags, chose appropriate gains
- Feedback higher derivatives of state
- We prove that the AP is sense equivalent to both
of the above! - This is great because we dont need to change
network routers and switches - And the AP is really very easy to apply no
lag-dependent optimizations of gain parameters
needed
38AP Equivalence
Source does AP
Fb
Regular source
0.5 Fb 0.25 T dFb/dt
- Systems 1 and 2 are discrete-time models for an
AP enabled source, and a regular source
respectively. - Theorem Systems 1 and 2 are algebraically
equivalent. That is, given identical input
sequences, they produce identical output
sequences.
39AP vs Equivalent PD ControllerNo Delay
40AP vs PDDelay 8 seconds
41Conclusions
- We have seen the background, development and
analysis of a congestion control scheme for the
IEEE 802.1 Ethernet standard - The QCN algorithm is
- More stable with respect to control loop delays
- Requires much smaller buffers than TCP
- Easy to build in hardware
- The Averaging Principle is interesting were
exploring its use in nonlinear control systems