Enabling Class of Service for CIOQ Switches with Maximal Weighted Algorithms - PowerPoint PPT Presentation

About This Presentation

Title:

Enabling Class of Service for CIOQ Switches with Maximal Weighted Algorithms

Description:

Enabling Class of Service for CIOQ Switches with Maximal Weighted Algorithms ... WFQ on OQ switches can provide service for different classes. ... – PowerPoint PPT presentation

Number of Views:28

Avg rating:3.0/5.0

Slides: 32

Provided by: sunda7

Learn more at: http://web.stanford.edu

Category:

more less

Transcript and Presenter's Notes

Title: Enabling Class of Service for CIOQ Switches with Maximal Weighted Algorithms

1
Enabling Class of Service for CIOQ Switches with
Maximal Weighted Algorithms
Monday, March 17, 2014
Feng Wang (fwang_at_cs.stanford.edu) Siu Hong Yuen
(siuhong.yuen_at_stanford.edu)
2
Contents

Motivation
WFQ on OQ switches can provide service for
different classes.
Can we find maximal weight matching algorithms to
provide service for different classes for CIOQ
switches?
Bandwidth Metric
Simulation Environment
Algorithms used and their results
Intuition behind the result
Further work
Conclusion

3
Motivation

We know that by using WFQ, we can provide service
for different classes based on the priorities of
the classes for OQ switches.
However, OQ switches are impractical to implement
because of the high memory bandwidth and fabric
switch bandwidth required.

4
Motivation

It is shown that with a speedup of 2, using
stable marriage algorithm, CIOQ switches can
emulate OQ switches.
Can we find maximal matching algorithms that can
provide service for different classes same as OQ
switch with WFQ for a CIOQ switch at a speedup of
S?

5
Contents

Motivation
Metric used
WFQ as an ideal algorithm
Using bandwidth as a quantative metric
Simulation Environment
Algorithms used and their results
Intuition behind the result
Further work
Conclusion

6
Metric used

We used the WFQ algorithm implemented on OQ
switches as the ideal algorithm to provide
service for multiple classes.
Thus, in order to measure the effectiveness of
our algorithms, we need a quantitative metric to
compare our algorithms against the WFQ algorithm.

7
Metric used

Bandwidth metric measures whether the
distribution of bandwidth that our algorithm
produces is similar to that of WFQ.
During a time period T, we observe the
distribution of packets departing from the OQ
(using WFQ) and the CIOQ (using our algorithm).
Denote the number of class k packets departed
from output port j of the OQ as Xjk, and the
number of class k packets departed from output
port j of the CIOQ as Yjk.

8
Metric used

For output port j, Bandwidth used by class k
for the OQ xjk Xjk / T Bandwidth used by
class k for the CIOQ yjk Yjk / T
Bandwidth metric we use
BDiff ranges from 0 to 1.
The closer BDiff is to 0, the closer we are to
emulating WFQ for OQ switches.
T is chosen as the time taken for the WFQ
algorithm to finish one round-robin cycle.

9
Contents

Motivation
Metric used
Simulation Environment
Simulator
Switch configuration
Traffic
Sampling
Algorithms used and their results
Intuition behind the result
Further work
Conclusion

10
Simulation Environment

Simulator SIM v2.35
Switch 8x8, 4 classes of service with weight
5221
Traffic model
Bernoulli iid uniform
Bernoulli iid nonuniform overloaded traffic
Bursty uniform
Bursty nonuniform overloaded traffic
Same input traffic trace for OQ and CIOQ switches
Sample the distribution of packets for port 0
each 10 time slots

11
Contents

Motivation
Metric used
Simulation Environment
Algorithms used and their results
algo0 to algo4
Intuition behind the result
Further work
Conclusion

12
Algorithms

We came up with 5 maximal weight matching
algorithms that attempt to provide service for
multiple classes.
They are based on the request-grant-accept phases
similar to iSLIP.
Each VOQij is split into P sub-queues, each
sub-queue stores the packet for a class

13
algo0

algo0 is the most basic algorithm out of the 5
algorithms upon which the subsequent algorithms
build on. Algo0 is a variation of PIM with
support for different priorities.
Request For each output j that input i has a
packet for, it requests that output with weight
1.
Grant If output j receives any requests, it
determines the request with the largest weight
(all the same in this case). If multiple requests
are the same largest weight, ties are broken
randomly.
Accept If input i receives any grants, it
determines the grant with the highest weight (all
the same in this case). If multiple requests are
the same largest weight, ties are broken
randomly.

14
algo1

algo0 does not differentiate between different
requests, i.e. all requests are treated equally.
algo1 improves on that by associating a weight
with each request.
For each VOQij, we calculate Wijk weight of
class k x amount of time a class k packet has
waited at the HoL. Then we take the maximum Wijk
over all k classes for this VOQ and assign this
as Wij, the weight of the request from input i to
output j

15
algo1

The rest of the algorithm is the same as algo0.
Request For each output j that input i has a
packet for, it requests that output with weight
Wij.
Grant If output j receives any requests, it
determines the request with the largest weight.
If multiple requests are the same largest weight,
the ties are broken randomly.
Accept If input i receives any grants, it
determines the grant with the largest weight. If
multiple requests are the same largest weight,
ties are broken randomly.

16
algo2 and algo3

For algo0 and algo1, during the grant and accept
phases, ties are broken randomly.
This does not take into consideration which
request was granted/accepted previously.
algo2 and algo3 improves on algo0 and algo1 by
remembering previous matches in a similar way to
iSLIP
For each output, we keep a pointer to the last
accepted grant input for every priority.
For each input, we keep a pointer to the last
accepted output for every priority.

17
algo2

algo2 is algo0 with the pointer enhancement.
Request For each output j that input i has a
packet for, it requests that output with weight
1.
Grant If output j receives any requests, it
determines the request with the highest weight
(all the same in this case). Ties are broken
first by priority. If multiple requests are of
the same priority, we do the following the
output last granted for this priority is least
preferred. We then grant the output that is most
preferred (in the round-robin definition).
Accept If input i receives any grants, it
determines the grant with the highest weight (all
the same in this case). If there are ties, we
first select a priority to accept randomly. If
there are multiple grants with this priority, we
do the following the input last accepted for
this priority is least preferred. We then accept
the input that is most preferred (in the
round-robin definition).

18
algo3

algo3 is algo1 with the pointer enhancement.
Request For each output j that input i has a
packet for, it requests that output with weight
Wij.
Grant If output j receives any requests, it
determines the request with the highest weight.
Ties are broken first by priority. If multiple
requests are of the same priority, we do the
following the output last granted for this
priority is least preferred. We then grant the
output that is most preferred (in the round-robin
definition).
Accept If input i receives any grants, it
determines the grant with the highest weight. If
there are ties, we first select a priority to
accept randomly. If there are multiple grants
with this priority, we do the following the
input last accepted for this priority is least
preferred. We then accept the input that is most
preferred (in the round-robin definition).

19
algo4

algo2 and algo3 rotate the pointer for the
preferred input port to grant and the preferred
output port to accept.
Instead of having a pointer that rotates
regularly, algo4 tries to rotate the preference
depending on the weight of each class. It
attempts to rotate the pointer similar to WFQ,
where the pointer stays at a particular preferred
port depending on the schedule determined by WFQ.

20
algo4

Request For each output j that input i has a
packet for, it requests that output with a bitmap
showing which priority has a packet.
Grant Output j maintains a preferred priority,
which is updated in a way similar to WFQ for
accepted grant request. Assume the preferred
priority for output j is k. Output j checks all
the received requests has the packet with
priority k. If multiple inputs have priority k
packets, ties are broken randomly. If no input
has priority k packets, the output j updates its
preferred priority to the next one.
Accept Input i also maintains a preferred
priority, which is updated in a way similar to
WFQ for accepted request. Assume the preferred
priority for input i is k. If input i receives
any grants, it finds the grant with priority k.
If multiple grants have priority k, ties are
broken randomly. If no grant has priority k, the
the preferred priority is updated to the next one.

21
Result Bernoulli iid uniform
22
Result Bernoulli iid nonuniform
23
Result Bursty uniform
24
Result Bursty nonuniform
25
Results

In most of the cases, algo1 is better than algo0,
algo3 is better than algo2
algo3 is not always better than algo1
algo4 is not always better than algo1 and algo3
When speedup increases, the results are getting
close for different algorithms.
For speedup gt 4, the BDiff 0

26
Content

Motivation
Bandwidth Metric
Simulation Environment
Algorithms used and their results
Conclusion
Weight information is helpful
Size of matching is not helpful
WFQ on both input and output side is not helpful
Speedup for BDiff 0
Further work
Conclusion

27
Intuition behind the result

Adding the weight information in the algorithms
helps the scheduler to make the better decision
for serving different classes.
Compared with algo0 and algo1, algo2 and algo3
improve the size of the matching because they
desynchronize the grants to different ports.
However, we observed that algo2 and algo3 did not
improve the BDiff metric. So the size of the
matching does not help for serving different
classes.
Implement WFQ on both input and output port to
select grants and accepts does not help to make
the better decision. Intuitive thinking WFQ on
output side may help to make better decisions,
but we could perhaps shall use other criteria to
break ties on the input side.

28
Intuition behind the result

BDiff 0 for Speedup gt 4. 4 is the number of
classes in our test. So maybe with Speedup gt
number of classes , BDiff0. However, we did a
couple of tests for number of classes 5, BDiff
0 for speedup gt 4 is still hold.

29
Content

Motivation
Bandwidth Metric
Simulation Environment
Algorithms used and their results
Intuition behind the result
Further work
Latency metric
Existence of a constant Speedup S for BDiff 0?
Conclusion

30
Future work

Besides the bandwidth allocated to different
classes of service, the latency is another metric
to measure how good the algorithm is. Define the
metric for latency as how close the latency of
the packets for different classes is to OQ
switch, measure the latency metrics for different
algorithms.
Investigate more on whether exist a constant
speedup S, CIOQ switch can emulate OQ WFQ for the
service rate for different classes. Need more
theoretical analysis

31
Conclusion

We define the metric to evaluate the capability
of algorithms to provide class of service. The
metric is measured for different algorithms.
The result suggests that the weight information
in selecting grants and accepts is helpful for
smaller speedup. When speed up increases, the
difference for different algorithm is not
obvious. So there is a trade off between simple
algorithm or speedup.
Among all the algorithms we tried, algo1 is good
enough to provide a good service rate for
different classes. Algo3 and Algo4 does not
improve from algo1.
Its possible to find a maximal matching
algorithm with certain speedup for CIOQ switch to
emulate OQ WFQ for the service rate of different
classes