Data Center Transport Mechanisms - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

Data Center Transport Mechanisms

Description:

Data Center Transport Mechanisms Balaji Prabhakar Departments of Electrical Engineering and Computer Science Stanford University Joint work with: Mohammad Alizadeh ... – PowerPoint PPT presentation

Number of Views:94
Avg rating:3.0/5.0
Slides: 42
Provided by: Dee119
Category:

less

Transcript and Presenter's Notes

Title: Data Center Transport Mechanisms


1
  • Data Center Transport Mechanisms

Balaji Prabhakar Departments of Electrical
Engineering and Computer Science Stanford
University
Joint work with Mohammad Alizadeh, Berk Atikoglu
and Abdul Kabbani, Stanford University Ashvin
Lakshmikantha, Broadcom Rong Pan, Cisco
Systems Mick Seaman, Chair, Security Group
Ex-Chair, Interworking Group, IEEE 802.1
2
What are Data Centers?
  • Large enterprise networks convergence of
  • High speed LANs 10, 40, 100 Gbps Ethernet
  • Storage networks Fibre Channel, Infiniband
  • Related idea Cloud Computing
  • Outgrowth of high-performance computing networks
    with integrated storage and server virtualization
    support
  • Driven by
  • Economics One network, not many
  • Low capex and opex
  • Economics Server utilization
  • Resource pooling, virtualization, server
    migration, high-speed interconnect fabrics
  • Savings in power consumption
  • Unified management of network of servers allows
    server and job scheduling
  • Security
  • Storage and processing of data within a single
    autonomous domain

3
Overview of a Data Center
N-Tiers of Servers (Web, App, Database)
Front End Networks (Security Load Balancing)
  • Large networks of servers, storage arrays,
    connected by a high-performance network
  • Origins
  • Clusters of web servers
  • Web hosting
  • High performance computing Cloud computing
  • Servers, storage

DataStorage
FC
Firewall
VPN
Disk and Tape
IDS
IP
Load Balancing
  • Key drivers
  • Convergence of Layer 2 neworks
  • Swtiched Ethernet (LANs) and Storage Area
    Networks (SANs) FCoE
  • Server virtualization
  • VMs, VM migration

NAS File Storage
IB
Data Center
4
Rest of the Talk
  • A brief overview of the relevant congestion
    control background
  • A description of the QCN algorithm and its
    performance
  • The Averaging Principle A control-theoretic idea
    underlying the QCN and BIC-TCP algorithms which
    stabilizes them when loop delays increase very
    useful for operating high-speed links with
    shallow buffers---the situation in 10 Gbps
    Ethernets

5
Why do Congestion Control?
  • Congestion
  • Transient Due to random fluctuations in packet
    arrival rate
  • Handled by buffering packets, pausing links (IEEE
    802.1bb)
  • Sustained When link bandwidth suddenly drops or
    when new flows arrive
  • Switches signal sources to reduce their sending
    rate IEEE 802.1Qau
  • Congestion control algorithms aim to
  • Deliver high throughput, maintain low
    latencies/backlogs, be fair to all flows, be
    simple to implement and easy to deploy
  • Congestion control in the Internet Rich history
    of algorithm development, control-theoretic
    analysis, deployment
  • Jacobson, Floyd et al, Kelly et al, Low et al,
    Srikant et al, Misra et al, Katabi, Paganini, et
    al

6
A main issue Stability
  • Stability of control loop
  • Refers to the non-oscillatory behavior of
    congestion control loops
  • If the switch buffers are short, oscillating
    queues can overflow (and drop packets) or
    underflow (lose utilization)
  • In either case, links cannot be fully utilized,
    throughput is lost, flow transfers take longer

7
TCP--RED A basic control loop
TCP Slow start Congestion avoidance Congestio
n avoidance AIMD No loss increase window by
1 Pkt loss cut window by half
8
TCP Dynamics
Cwnd
Congestion Window Rate
Cwnd/2
Time
Congestion message recd
9
TCP--RED Analytical model
10
TCP--RED Analytical model
Users
Network
W window size RTT round trip time C link
capacity q queue length qa ave queue length
p drop probability
By V. Misra, W. Dong and D. Towsley at SIGCOMM
2000 Fluid model concept originated by F. Kelly,
A. Maullo and D. Tan at Jour. Oper. Res. Society,
1998
11
TCP--RED Stability analysis
  • Given the differential equations, in principle,
    one can figure out whether the TCP--RED control
    loop is stable
  • However, the differential equations are very
    complicated
  • 3rd or 4th order, nonlinear, with delays
  • There is no general theory, specific case
    treatments exist
  • Linearize and analyze
  • Linearize equations around the (unique) operating
    point
  • Analyze resultant linear, delay-differential
    equations using Nyquist or Bode theory
  • End result
  • Design stable control loops
  • Determine stability conditions (RTT limits,
    number of users, etc)
  • Obtain control loop parameters gains, drop
    functions,

12
Instability of TCP--RED
  • As the bandwidth-delay-product increases, the
    TCP--RED control loop becomes unstable
  • Parameters 50 sources, link capacity 9000
    pkts/sec, TCP--RED
  • Source S. Low et. al. Infocom 2002

13
Feedback Stabilization
  • Many congestion control algorithms developed for
    high bandwidth-delay product environments
  • The two main types of feedback stabilization used
    are
  • Determine lags (round trip times), apply the
    correct gains for the loop to be stable (e.g.
    FAST, XCP, RCP, HS-TCP)
  • Include higher order queue derivatives in the
    congestion information fed back to the source
    (e.g. REM/PI, XCP, RCP)
  • We shall see that BIC-TCP and QCN use a different
    method which we call the Averaging Principle
  • BIC (or Binary Increase) TCP is due to Rhee et al
  • It is the default congestion control algorithm in
    Linux
  • No control theoretic analysis, until now

14
  • Quantized Congestion Notification (QCN)
  • Congestion control for Ethernet

15
Ethernet vs. the Internet
  • Some significant differences
  • No per-packet acks in Ethernet, unlike in the
    Internet
  • Not possible to know round trip time or lags!
  • So congestion must be signaled to the source by
    switches
  • Algorithm not automatically self-clocked (like
    TCP)
  • Links can be paused i.e. packets may not be
    dropped
  • No sequence numbering of L2 packets
  • Sources do not start transmission gently (like
    TCP slow-start) they can potentially come on at
    the full line rate of 10Gbps
  • Ethernet switch buffers are much smaller than
    router buffers (100s of KBs vs 100s of MBs)
  • Most importantly, algorithm should be simple
    enough to be implemented completely in hardware
  • Note QCN has Internet relatives---BIC-TCP at the
    source and the REM/PI controllers

16
Data Center Ethernet BridgingIEEE 802.1Qau
Standard
  • A summary of standards effort
  • Everybody should do it at least once
  • Like proving limit theorems in Probability
  • But, in this case, no more than once!?
  • Intense, fun activity
  • Broadcom, Brocade, Cisco, Fujitsu, HP, Huawei,
    IBM, Intel, NEC, Nortel,
  • Conference calls every Thursday morning
  • Meeting every 6 weeks (Interim and Plenary)
  • Real-time engineering Tear and re-build
  • Our algorithm was the 4th to be proposed
  • It underwent 56 revisions because of being
    subjected to constraints
  • Draft of standard 9 revs

17
QCN Source Dynamics
BIC-TCP and QCN
18
Stability AIMD vs QCN
AIMD
QCN
RTT 50 µs
RTT 300 µs
19
Experiment Simulation Parameters
  • Baseline scenario
  • Output-queued switch
  • OG hotspot hotspot severity 0.2Gbps, hotspot
    duration 3.5sec
  • Vary RTT 100us to 1000us

0.95 G
0.95 G
NIC 1
0.2 G
NIC 2
20
1 source, RTT 100µs
Hardware
OMNET
21
1 source, RTT 1ms
Hardware
OMNET
22
8 sources, RTT 1ms
Hardware
OMNET
23
Fluid Model for QCN
P F(Fb)
  • Assume N flows pass through a single queue at a
    switch. State variables are TRi(t), CRi(t), q(t),
    p(t).

10
Fb
63
24
Accuracy Equations vs ns2 sims
25
QCN Notes
  • The algorithm has been extensively tested in
    deployment scenarios of interest
  • Esp. interoperability with link-level PAUSE and
    TCP
  • All presentations and p-code are available at the
    IEEE 802.1 website
  • http//www.ieee802.org/1/pages/dcbridges.html
  • http//www.ieee802.org/1/files/public/docs2008/au-
    rong-qcn-serial-haipseudo-code20rev2.0.pdf
  • The theoretical development is interesting, but
    most notably because QCN and BIC-TCP display
    strong stability in the face of increasing lags,
    or, equivalently in high bandwidth-delay product
    networks
  • While attempting to understand the unusually good
    performance of these schemes, we uncovered a
    method for improving the stability of any
    congestion control scheme

26
The Averaging Principle
27
The Averaging Principle (AP)
  • A source in a congestion control loop is
    instructed by the network to decrease or increase
    its sending rate (randomly) periodically
  • AP a source obeys the network whenever
    instructed to change rate, and then voluntarily
    performs averaging as below

TR Target Rate CR Current Rate
28
A Generic Control Example
  • As an example, we consider the plant transfer
    function
  • P(s) (s1)/(s31.6s20.8
    s0.6)

29
Step ResponseBasic AP, No Delay
30
Step ResponseBasic AP, Delay 8 seconds
31
Step Response Two-step AP, Delay 14 seconds
32
Step Response Two-step AP, Delay 25 seconds
Two-step AP is even more stable than Basic AP
33
Applying AP to RCP (Rate Control Protocol)RCP
due to Dukkipatti and McKeown
  • Basic idea Network computes max-min flow rates
    for each flow.
  • Rate computed every 10 msecs
  • Flows send at their advertised rate
  • Apply the AP to RCP

34
AP-RCP Stability
RTT 60 msec
RTT 65 msec
35
AP-RCP Stability contd
RTT 120 msec
RTT 130 msec
36
AP-RCP Stability contd
RTT 230 msec
RTT 240 msec
37
Understanding the AP
  • As mentioned earlier, the two major flavors of
    feedback compensation are
  • Determine lags, chose appropriate gains
  • Feedback higher derivatives of state
  • We prove that the AP is sense equivalent to both
    of the above!
  • This is great because we dont need to change
    network routers and switches
  • And the AP is really very easy to apply no
    lag-dependent optimizations of gain parameters
    needed

38
AP Equivalence
Source does AP
Fb
Regular source
0.5 Fb 0.25 T dFb/dt
  • Systems 1 and 2 are discrete-time models for an
    AP enabled source, and a regular source
    respectively.
  • Theorem Systems 1 and 2 are algebraically
    equivalent. That is, given identical input
    sequences, they produce identical output
    sequences.

39
AP vs Equivalent PD ControllerNo Delay
40
AP vs PDDelay 8 seconds
41
Conclusions
  • We have seen the background, development and
    analysis of a congestion control scheme for the
    IEEE 802.1 Ethernet standard
  • The QCN algorithm is
  • More stable with respect to control loop delays
  • Requires much smaller buffers than TCP
  • Easy to build in hardware
  • The Averaging Principle is interesting were
    exploring its use in nonlinear control systems
Write a Comment
User Comments (0)
About PowerShow.com