Scheduling Algorithms for CIOQ Switches - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Scheduling Algorithms for CIOQ Switches

Description:

LOOFA is work-conserving with speedup of 2. ... T-work conserving : At the beginning of the departure phase every output which ... – PowerPoint PPT presentation

Number of Views:94
Avg rating:3.0/5.0
Slides: 30
Provided by: kareny
Category:

less

Transcript and Presenter's Notes

Title: Scheduling Algorithms for CIOQ Switches


1
Scheduling Algorithms for CIOQ Switches
Prashanth Pappu (Advisor Dr. Jon Turner)
2
The Scheduling Problem
  • Need for Combined Input and Output Queuing
    (CIOQ).
  • Speedup (S) is the ratio of speed of switch
    fabric to external links.
  • Need for making a scheduling decision.
  • Objective - closely approximate ideal
    output-queued switch.
  • work conservation no loss of output link
    capacity
  • traffic isolation traffic to different outputs
    does not interfere
  • minimize required speedup

3
Problem Context
Scheduling algorithms proven to be work
conserving (with speedup 2) Worst Case
Results.
Heuristic algorithms with performance analysis
using simulated traffic conditions.
Scheduling algorithms theoretically proven to be
stable Stability Results.
Worst-case Results
High
Stability Results
Simulation Results
?
Implementation Complexity
Low
Inadmissible Traffic
Admissible Traffic (uniform non-uniform)
Performance under various traffic conditions
  • Methods to evaluate scheduling algorithms in
    inadmissible traffic conditions.
  • Design low complexity scheduling algorithms which
    have good performance under all traffic
    conditions.

4
Summary of Contributions
  • New method for evaluation of scheduling
    algorithms in inadmissible traffic conditions
  • studying performance under targeted stress tests
  • metric that measures lost link capacity miss
    fraction
  • For crossbar based switches Stress resistant
    scheduling algorithms
  • Evaluation of well-known crossbar scheduling
    algorithms.
  • PIM (Anderson, et. al.), i SLIP (McKeown),
    APSARA (Giaconne, et. al.) LOOFA (Krishna, et.
    al.)
  • Design and analysis of improved algorithms.
  • Lowest Layer Selection (LLS-R and LLS-S) Adding
    bias to PIM and i SLIP
  • SOLIF-A Improved weight metrics for APSARA

5
Summary of Contributions
  • A-LOOFA practical approximate version of LOOFA
  • fast maximal matching in hardware
  • approximate sorting
  • ensuring input fairness
  • For buffered multistage switches
  • Distributed Scheduling (DS) A novel scalable
    mechanism for regulating flow of traffic.
  • First provably work conserving DS algorithms.
  • BCCF, BLOOFA and OLA
  • Design of practical variants single iteration,
    distributed scheduling algorithms
  • DBL and Distributed OLA
  • Performance analysis of DS algorithms

6
Stress Test
  • How do we simulate extreme traffic conditions?
  • Adversarial approach in overloading outputs.
  • Stress (overload) various outputs with the
    objective of increasing misses.
  • New metric, miss fraction 1 NA/ NI. (instead
    of delay)
  • Tests can be varied by changing number of
    participating inputs and phases.

7
Example of Stress Test
14,000
OQ_0
PIM, speedup1.5
12,000
3 inputs, 4 phases
10,000
8,000
queue length
VOQ_0
VOQ_2
6,000
VOQ_1
4,000
VOQ_3
OQ_1
2,000
0
0
10,000
20,000
30,000
40,000
50,000
60,000
70,000
80,000
Ideally, there should be no queuing for VOQ3
time
Phase transition times determined to equal VOQ
lengths
8
Least Occupied Output First (LOOFA)
outputs issuegrants
inputs make single requestfor least occupied
outputs
5
5
unmatchedinputs andoutputs repeatuntil no
newmatches canbe added
1
1
3
3
2
2
  • LOOFA is work-conserving with speedup of 2.
  • implies output link is idle only if no cell for
    that output is present in any queue (input or
    output)
  • holds for any input selection rule
  • often requires many iterations to complete
    little practical use!
  • Can we design a practical variant which, if not
    provably work-conserving, will perform well under
    stress tests?

9
Approximate LOOFA
a
b
c
inputs
d
e
f
0
1
3
3
4
5
a
e
b
d
f
c
outputs
  • Hardware implementation of maximal matching.
  • Allows n iterations to complete quickly
  • 4n gate delays
  • 6.4 ns for 32 port switch _at_50 ps per gate (less
    than 20ns for router with 10 Gb/s links)

10
Maintaining Output Ordering
  • Not enough time to sort outputs by queue length.
  • But not essential, since queue lengths change
    slowly Odd Even Sort.

initial state
final state
Queue lengths change by at most 1
Compare and swap adjacent elements.
Note that entire column is swapped.
11
Fair Treatment of Inputs
  • For speedups lt2, unfairness treatment of inputs
    becomes issue.
  • Resolve by performing random row permutations.

Perfect shuffle.
Random settings.
12
Performance of Approx. LOOFA
  • Summary
  • Use of stress test
  • Insight to develop practical variant.
  • Make qualitative claims about A-LOOFA
    performance.
  • In high speed algorithms where sorting is the
    bottleneck step
  • Approximate sorting is a good variation.
  • Choice of approximate sorting technique depends
    on context
  • Odd-Even sort suited for slowly changing traffic
    conditions.
  • O(N) algorithms can still have efficient hardware
    implementation!
  • The constants can be greatly reduced.

13
Distributed Scheduling
Scheduler uses state information to pace VOQs to
avoid congesting switch fabric.
Switch Fabric
Queue state information sent out every update
period (T).
I
O
I
O
I
O
I
O
I
O
I
O
Sched.
Sched.
Sched.
Sched.
Sched.
Sched.
Routing
Routing
Routing
Routing
Routing
Routing
TI
TI
TI
TI
TI
TI
  • Highly scalable systems with inter stage flow
    control.
  • Lack mechanisms to ensure high throughput in
    extreme traffic conditions.
  • Distributed Scheduling (DS) A scalable
    mechanism to maintain throughput in extreme
    traffic conditions.

14
Distributed Scheduling
  • Three important features of the mechanism
  • It is coarse-grained scheduling decisions are
    made at pre-determined update periods (T).
  • It is distributed ports make scheduling
    decisions independently and asynchronously.
  • It is non-iterative low overhead due to exchange
    of information per update period.
  • Also, low hardware complexity and execution time
    of algorithms at ports.
  • Question What kind of performance guarantees (if
    any) can we provide with this mechanism?
  • Approach Introduce each feature incrementally
    and evaluate the achievable throughput.
  • Question 1 What is the effect (on achievable
    throughput) of making a scheduling decision only
    at fixed time periods (T) in the switch?

15
T-CIOQ Switch
ARRIVAL
TRANSFER
DEPARTURE
T
T
ST
  • Three phases arrival, transfer and departure.
  • Up to T cells can arrive (depart) in arrival
    (departure) phase.
  • A scheduling decision is made every T time units
    to transfer a maximum of ST cells from an input
    or to an output (during transfer phase).
  • Implications of these assumptions for real
    systems discussed in thesis.
  • T-work conserving At the beginning of the
    departure phase every output which has cells
    queued in the system has at least T cells in its
    output queue.
  • Question 1 (rephrased) Is there a scheduling
    algorithm that can keep a T-CIOQ switch T-work
    conserving?

16
VOQ Ordering
  • All VOQs at an input are ordered.
  • Cells at the input are ordered according to
    ordering of VOQs.
  • Two different ordering criterions
  • BLOOFA
  • BCCF
  • Example shows BLOOFA ordering.
  • Given a particular ordering we want to
    construct maximal and ordered schedules.

Output Queues
17
Maximal Ordered Schedule
0
2
6
0
1
5
5
6
1
6
6
5
3
0
0
6
2
5
0
0
6
4
6
5
6
0
0
18
Work Conservation
  • Theorem 1 The BLOOFA scheduling algorithm is
    T-work conserving for a speedup 2.
  • No output with cells queued at inputs can have
    fewer than T cells in its output queue at the
    beginning of the departure phase.
  • Theorem 2 The BCCF scheduling algorithm is
    T-work conserving for a speedup 2.
  • Proof construction similar to that of BLOOFA.
  • Question 1 (contd) How do we find a maximal
    ordered schedule?

19
Maximal Schedule as a Blocking Flow
Outputs
Inputs
3
1
3
6
5
5
4
3
1
6
1
6
1
3
1
  • Dinics algorithm
  • Repeatedly search for st-paths with no saturated
    edges and add as much flow as possible.
  • Modification Preferentially select edges
    between inputs and outputs according to VOQ
    ordering at input.

6
3
2
6
6
2
6
Target
Source
6
1
6
6
1
2
6
6
5
2
1
6
2
1
6
1
4
4
20
BLOOFA Example
BLOOFA Example (Arrival)
BLOOFA Example (Transfer)
BLOOFA Example (Departure)
0
2
0
1
5
5
6
1
6
6
3
0
0
6
2
0
0
6
4
6
5
0
0
21
Distributed, Iterative Schedulers
  • Question 2 Can we make these T-work conserving
    algorithms distributed?
  • Answer Yes.
  • Outputs (inputs) send (receive) a maximum of n
    messages.
  • Inputs (outputs) send (receive) a maximum of 2n
    messages.
  • Algorithm not guaranteed to run in O(n) time.
  • Question 3 Can we make these T-work conserving
    distributed scheduling algorithms non-iterative?
  • Answer No.
  • But distributed, non-iterative schedulers can
    approximate the performance of T-work conserving
    schedulers.
  • Distributed BLOOFA.

22
Distributed BLOOFA
Backlog Proportional Allocation hi(i,j)
STB(i.j)/B(,j)
6
5
0
0
5
5
0
0
  • hi(2,0) 30/8 4
  • hi(2,1) 0
  • hi(2,2) 12/7 1
  • hi(2,3) 0

6
5
0
0
  • hi(3,0) 0
  • hi(3,1) 6/7 1
  • hi(3,2) 12/7 2
  • hi(3,3) 4

6
0
6
0
23
Performance on stress tests
DBL
BLOOFA
  • 90 sets of stress tests (up to 15 inputs and 15
    phases).
  • Worst case results for 2,3,4 and 5 inputs
    plotted.
  • Performance of DBL comparable to BLOOFA.
  • Though DBL is not known to be provably T-work
    conserving.

24
Output Leveling Algorithm (OLA)
  • Question Can we improve the performance of
    BLOOFA at smaller speedups?
  • Idea Instead of giving shorter output queues
    greater priority, level the output queues!
  • Comprehensive study
  • Formulation OLA produces schedules which are
    maximal and level.
  • Theorem 3 OLA is T-work conserving for speedup
    2.
  • How do we find schedules which are maximal and
    level?
  • Using convex edge costs in a minimum cost maximum
    flow problem.
  • Less complex approximation - OLA.
  • Single iteration, distributed version of OLA
    distributed OLA.

25
OLA Example
OLA Example (Arrival)
OLA Example (Transfer)
OLA Example (Departure)
5
0
1
3
0
2
6
3
6
4
0
0
1
4
6
1
2
1
2
5
6
6
0
0
4
6
4
1
0
1
2
5
3
0
26
Performance of Distributed OLA
Delta0.2
Delta0.02
  • Distributed OLA requires same speedup as DBL to
    reduce miss fraction to 0.
  • Shows great improvement over DBL for smaller
    speedups.

27
Summary
  • LLS-R, LLS-S
  • SOLIF-A
  • A-LOOFA
  • DBL
  • Distributed OLA

Worst-case Results
  • BLOOFA
  • BCCF
  • OLA

High
Stability Results
Simulation Results
Implementation Complexity
Stress Resistant Algorithms
Low
Inadmissible Traffic
Admissible Traffic (uniform non-uniform)
Performance under various traffic conditions
  • Proposed algorithms are of immediate utility to
    routers like Ciscos 12000 series (1.28 Tbps
    capacity crossbar based router) and CRS-1 (90
    Tbps capacity buffered multi-stage router,
    released May 2004)

28
Acknowledgements
  • Dr Jon Turner
  • Members of the Committee
  • Dr John Lockwood
  • Dr Roger Chamberlain
  • Dr Dan Fuhrmann
  • Dr Sergey Gorinsky
  • ARLites.
  • Parents.

29
Thank you (The Halle Berry version)
Kayaks coffee shop (where I wrote most of my
thesis)
David for his politically correct humor
Ed for bringing his GPS device on our last float
trip. Which way Ed? ltPausegt Downstream!
My younger brother (for not expecting me to gift
him an iPod)
Praveen for converting my pdf plots to eps on his
unix machine.
Tilman for convening the FOOLISH
symposium Fragmentation Of Organic Life-forms In
Simulated Hallways
Anshul for adding to my vocabulary Highly
Useless
Samphel for his spontaneity. (Movie in fifteen
minutes? You guys are nuts!)
Sumi for readily parting with her machine for
more important research.
Ralph for his politically incorrect humor
My elder brother (for gifting me an iPod without
me expecting it)
Praveen for converting my excel plots to pdf on
his mac.
Jai for buying headphones along with his music
composition wares.
Saigon Restaurant (for No. 34 with Tofu)
(Background music peppered with sounds of mike
being wrenched away) ltInaudiblegt
(While Im at it) Sean _at_ food court (The wrap
guy)
Kurt Elling, Jamiroquai, OutKast
The CD burner and the Internet
Write a Comment
User Comments (0)
About PowerShow.com