iSLIP Switch Scheduler Ali Mohammad Zareh Bidoki April 2002 - PowerPoint PPT Presentation

1 / 57
About This Presentation
Title:

iSLIP Switch Scheduler Ali Mohammad Zareh Bidoki April 2002

Description:

... RRM WFA PP_VOQ Multicasting A 2.5Tb/s Router The place of Buffer in Crossbar Output Buffer Shared Buffer Input buffer Interconnects Two basic techniques ... – PowerPoint PPT presentation

Number of Views:88
Avg rating:3.0/5.0
Slides: 58
Provided by: MEHR2
Category:

less

Transcript and Presenter's Notes

Title: iSLIP Switch Scheduler Ali Mohammad Zareh Bidoki April 2002


1
iSLIP Switch Scheduler Ali Mohammad Zareh
BidokiApril 2002
2
Table of Contents
  • The place Buffer in Crossbar Switches
  • Example of Fabrics
  • PIM
  • iSLIP (in CISCO 12000 ,5Gb/s router and Tiny Tera
    0.5 Tb/s)
  • RRM
  • WFA
  • PP_VOQ
  • Multicasting
  • A 2.5Tb/s Router

3
The place of Buffer in Crossbar
  • Output Buffer
  • Shared Buffer
  • Input buffer

4
InterconnectsTwo basic techniques
Input Queueing
Output Queueing
Usually a non-blocking switch fabric (e.g.
crossbar)
Usually a fast bus
5
InterconnectsInput Queueing with Crossbar
Arbiter
Data In
Data Out
configuration
6
Input QueueingHead of Line Blocking
Delay
Load
100
7
Head of Line Blocking
8
Virtual output Queuing
Queue scheduler
To port 1
To port 2
To port n
To port 1
To port 2
To port n
To port 1
To port 2
To port n
Input queues
To port 1
To port n
Port 2 queue
Port n queue
Port 1 queue
Input port 1
9
Input QueueingVirtual output queues
10
Input QueueingVirtual Output Queues
Delay
Load
100
11
Which is better?
  • Virtual output Queue (input queue).
  • Ideal Output queue.

12
Input QueueingVirtual output queues
Arbiter
13
VOQ
  • Arbiter
  • Input memory management

14
Problem Definition (bipartite)
15
Maximum or Maximal matching
16
Maximum or Maximal matching
  • Maximum matching
  • Maximizes instantaneous throughput
  • Starvation
  • Time complexity is very high in Hardware (o(n3))
  • Maximal matching
  • Cant add any connection on the current match
    without alert existing connections
  • More practical (e.g. WFA, PIM, iSLIP, DRR,RRM)

17
Matching Algorithms
3. iSLIP Iterative Serial-Line IP(base on
PIM and RRM)
2. RRM Round-Robin Matching
1. PIM - Parallel Iterative Matching
We will discuss three different matching algo.
Each algo. is evaluated by four
parameters 1. Latency(Throughput). 2. Starvation
free. 3. Fast. 4. Implementation.
18
PIM - Parallel Iterative Matching
When no new matching can be found, the
algorithm stops.
3. Accept - If an input receives a grant,
it accepts one by selecting an output randomly
among those that granted to this output..
2. Grant - If an unmatched output receives
any requests, it grants to one by randomly
selecting a request uniformly over all requests.
1. Request - Each unmatched input sends a
request to every output for which it has a queued
cell.
The basic matching algorithm. Each iteration of
the algorithm follows these three steps
19
PIM
  • Each iteration will eliminate at least ¾ of the
    remaining connections
  • Converge in O(logN) iterations
  • No input queue is starved if service
  • No memory or state is used
  • At the beginning of each cell time, the match
    begins over, independently of the matches that
    were made in previous cell times
  • PIM does not perform well for a single iteration
    it limits the throughput to approximately 63,
    only slightly higher than for a FIFO switch.
  • This is because the probability that an input
    will remain ungranted is (N-1/N)N , hence as N
    increases, the throughput tends to .63 (1-(1/e))
  • Implementation is hard in Hardware

20
RRM Round-Robin Matching
The pointer gi to the highest priority
element of the round-robin schedule is
incremented (modulo N) to one location beyond the
granted input.
2. Grant - If an output receives any requests,
it chooses the one that appears next in a fixed,
round-robin schedule starting from the highest
priority element. The output notifies each input
whether or not its request was granted.
1. Request - Each unmatched input sends a request
to every output for which it has a queued cell.
21
RRM Round-Robin Matching
The pointer ai to the highest
priority element of the round-robin schedule is
incremented (modulo N) to one location beyond the
accepted output.
3. Accept - If an input receives a grant,
it accepts the one that appears next in a fixed,
round-robin schedule starting from the highest
priority element.
The pointer gi to the highest priority
element of the round-robin schedule is
incremented (modulo N) to one location beyond the
granted input.
2. Grant - If an output receives any requests,
it chooses the one that appears next in a fixed,
round-robin schedule starting from the highest
priority element. The output notifies each input
whether or not its request was granted.
1. Request - Each unmatched input sends a request
to every output for which it has a queued cell.
22
RRM Round-Robin Matching
The pointer ai to the highest
priority element of the round-robin schedule is
incremented (modulo N) to one location beyond the
accepted output.
3. Accept - If an input receives a grant,
it accepts the one that appears next in a fixed,
round-robin schedule starting from the highest
priority element.
The pointer gi to the highest priority
element of the round-robin schedule is
incremented (modulo N) to one location beyond the
granted input.
2. Grant - If an output receives any requests,
it chooses the one that appears next in a fixed,
round-robin schedule starting from the highest
priority element. The output notifies each input
whether or not its request was granted.
1. Request - Each unmatched input sends a request
to every output for which it has a queued cell.
23
RRM Round-Robin Matching
The RRM is not starvation free In the following
example, we assume there are always cells waiting
to be transferred. The destination is always the
same.
g1
a1
First cycle
a2
g2
g3
a3
24
RRM Round-Robin Matching
The RRM is not starvation free In the following
example, we assume there are always cells waiting
to be transferred. The destination is always the
same.
a1
First cycle
a2
g1
a3
g2
g3
25
RRM Round-Robin Matching
The RRM is not starvation free In the following
example, we assume there are always cells waiting
to be transferred. The destination is always the
same.
First cycle
a1
g1
a2
a3
g2
g3
26
RRM Round-Robin Matching
The RRM is not starvation free In the following
example, we assume there are always cells waiting
to be transferred. The destination is always the
same.
First cycle
a1
g2
g1
a2
a3
g3
27
RRM Round-Robin Matching
The RRM is not starvation free In the following
example, we assume there are always cells waiting
to be transferred. The destination is always the
same.
First cycle
a1
g2
g1
a2
a3
g3
28
RRM Round-Robin Matching
The RRM is not starvation free In the following
example, we assume there are always cells waiting
to be transferred. The destination is always the
same.
Second cycle
a1
g2
g1
a2
a3
g3
29
RRM Round-Robin Matching
The RRM is not starvation free In the following
example, we assume there are always cells waiting
to be transferred. The destination is always the
same.
Second cycle
a1
g1
a2
g2
a3
g3
30
RRM Round-Robin Matching
The RRM is not starvation free In the following
example, we assume there are always cells waiting
to be transferred. The destination is always the
same.
a1
Second cycle
g1
a2
g2
a3
g3
At this point the sequence of the events will
repeat itself Outputs 1 and 3 will always grant
input 1, while output 2 will always grant input 1
at the first iteration of the first cycle, but
input 1 will select output 1 indefinitely,
leaving output 2 to grant either input 2 or input
3. Thus the cell from input 1 to output 2 will
never be granted.
In order to solve this starvation the iSlip
algorithm was developed.
31
RRM
  • RRM overcomes two problem
  • Complexity
  • Unfairness
  • the round-robin arbiters are much simpler and can
    perform faster than random arbiters.
  • The rotating priority aids the algorithm in
    assigning bandwidth equally and more fairly among
    requesting connections.
  • Its throughput is about 63

32
2x2 switch with RRM algorithm under heavy load.
  • synchronization of output arbiters leads to a
    throughput of just 50.

33
Performance
34
Synchronization
35
iSLIP Iterative Serial-Line IP
2. Grant - If an output receives any requests, it
chooses the one that appears next in a fixed,
round-robin schedule starting from the highest
priority element. The output notifies each input
whether or not its request was granted.
The pointer gi to the highest priority
element of the round-robin schedule is
incremented (modulo N) to one location beyond the
granted input if and only if the grant is
accepted in Step 3 of the first iteration.
36
iSLIP Iterative Serial-Line IP
The pointer gi to the highest priority
element of the round-robin schedule is
incremented (modulo N) to one location beyond the
granted input if and only if the grant is
accepted in Step 3 of the first iteration.
2. Grant - If an output receives any requests, it
chooses the one that appears next in a fixed,
round-robin schedule starting from the highest
priority element. The output notifies each input
whether or not its request was granted.
37
iSLIP properties
  • Property 1. Lowest priority is given to the most
    recently made connection.
  • If input i successfully connects to output j,
    both a i and g j are updated and the connection
    from input i to output j becomes the lowest
    priority connection in the next cell time.
  • Property 2. No connection is starved. This is
    because an input will continue to request an
    output until it is successful. The output will
    serve at most other inputs first, waiting at most
    N cell times to be accepted by each input.
    Therefore, a requesting input is always served in
    less than N 2 cell times.
  • Property 3. Under heavy load, all queues with a
    common output have the same throughput. This is a
    consequence of Property 2 the output pointer
    moves to each requesting input in a fixed order,
    thus pr-viding each with the same throughput.

38
iSLIP properties
  • Simple to implement in hardware
  • Starvation free
  • Its throughput is about 100
  • It is fair
  • As the load increases, the number of synchronized
    arbiters decreases (see Figure), leading to a
    large sized match.
  • Under uniform 100 offered load the iSLIP
    arbiters adapt to a time-division multiplexing
    scheme.
  • It converge in O(1)

39
Bursty Arrivals
40
Burstiness Reduction
  • Results indicate that iSLIP reduces the average
    burst length, and will tend to be more
    burst-reducing as the offered load increases.
  • This is because the probability of switching
    between multiple connections increases as the
    utilization increases.
  • As the load increases, the contention increases
    and bursts are interleaved at the output. In
    fact, if the offered load exceeds approximately
    70, the average burst length drops to exactly
    one cell.

41
Burstiness Reduction
42
Multiple Iteration
  • The pointer gi to the highest priority element of
    the round-robin schedule is incremented (modulo
    N) to one location beyond the granted input if
    and only if the grant is accepted in Step 3 of
    the first iteration.
  • Note that pointers g i and a i are only updated
    for matches found in the first iteration.
  • It converge in O(logN)

43
Multiple Iteration
44
All with 4 iterations
45
Implementation
46
Implementation(2N arbiters)
47
Implementation(N arbiters)Each arbiter is used
for both inputand output arbitration. In this
case, each arbiter contains two registers to hold
pointers gi and ai .
48
Implementation
49
Priority in iSLIP
50
Why iSLIP is good for high speed?
  • input buffers are separated
  • Separated scheduler for each input and output
  • Each work independently

51
Multicasting
  • Fanout splitting higher throughput, but not as
    simple
  • Non-fanout splittingEasy, but low throughput

52
Multicasting (ESLIP Combining Unicast and
Multicast-use in CISCO 12000)
53
IP packet in iSLIP switch (2N2 Queue)
54
LCS Ingress Flow control(2.5Tb/s)
Linecard
Switch Port
Switch Fabric
LCS
LCS
Switch Scheduler
55
LCS Over Optical Fiber 10Gb/s Linecards
10Gb/s Linecard
10Gb/s Switch Port
2.5Gb/s LVDS
12 multimode fibers
Switch Fabric
LCS
12 multimode fibers
Switch Scheduler
GENET Quad Serdes
56
2.56Tb/s IP router
1000ft/300m
Port 1
LCS
Port 256
Linecards
2.56Tb/s switch core
57
Switch core architecture
Port 1
Scheduler
Port 256
Write a Comment
User Comments (0)
About PowerShow.com