Enabling Class of Service for CIOQ Switches with Maximal Weighted Algorithms - PowerPoint PPT Presentation

About This Presentation
Title:

Enabling Class of Service for CIOQ Switches with Maximal Weighted Algorithms

Description:

Enabling Class of Service for CIOQ Switches with Maximal Weighted Algorithms ... WFQ on OQ switches can provide service for different classes. ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 32
Provided by: sunda7
Learn more at: http://web.stanford.edu
Category:

less

Transcript and Presenter's Notes

Title: Enabling Class of Service for CIOQ Switches with Maximal Weighted Algorithms


1
Enabling Class of Service for CIOQ Switches with
Maximal Weighted Algorithms
Monday, March 17, 2014
Feng Wang (fwang_at_cs.stanford.edu) Siu Hong Yuen
(siuhong.yuen_at_stanford.edu)
2
Contents
  • Motivation
  • WFQ on OQ switches can provide service for
    different classes.
  • Can we find maximal weight matching algorithms to
    provide service for different classes for CIOQ
    switches?
  • Bandwidth Metric
  • Simulation Environment
  • Algorithms used and their results
  • Intuition behind the result
  • Further work
  • Conclusion

3
Motivation
  • We know that by using WFQ, we can provide service
    for different classes based on the priorities of
    the classes for OQ switches.
  • However, OQ switches are impractical to implement
    because of the high memory bandwidth and fabric
    switch bandwidth required.

4
Motivation
  • It is shown that with a speedup of 2, using
    stable marriage algorithm, CIOQ switches can
    emulate OQ switches.
  • Can we find maximal matching algorithms that can
    provide service for different classes same as OQ
    switch with WFQ for a CIOQ switch at a speedup of
    S?

5
Contents
  • Motivation
  • Metric used
  • WFQ as an ideal algorithm
  • Using bandwidth as a quantative metric
  • Simulation Environment
  • Algorithms used and their results
  • Intuition behind the result
  • Further work
  • Conclusion

6
Metric used
  • We used the WFQ algorithm implemented on OQ
    switches as the ideal algorithm to provide
    service for multiple classes.
  • Thus, in order to measure the effectiveness of
    our algorithms, we need a quantitative metric to
    compare our algorithms against the WFQ algorithm.

7
Metric used
  • Bandwidth metric measures whether the
    distribution of bandwidth that our algorithm
    produces is similar to that of WFQ.
  • During a time period T, we observe the
    distribution of packets departing from the OQ
    (using WFQ) and the CIOQ (using our algorithm).
  • Denote the number of class k packets departed
    from output port j of the OQ as Xjk, and the
    number of class k packets departed from output
    port j of the CIOQ as Yjk.

8
Metric used
  • For output port j, Bandwidth used by class k
    for the OQ xjk Xjk / T Bandwidth used by
    class k for the CIOQ yjk Yjk / T
  • Bandwidth metric we use
  • BDiff ranges from 0 to 1.
  • The closer BDiff is to 0, the closer we are to
    emulating WFQ for OQ switches.
  • T is chosen as the time taken for the WFQ
    algorithm to finish one round-robin cycle.


9
Contents
  • Motivation
  • Metric used
  • Simulation Environment
  • Simulator
  • Switch configuration
  • Traffic
  • Sampling
  • Algorithms used and their results
  • Intuition behind the result
  • Further work
  • Conclusion

10
Simulation Environment
  • Simulator SIM v2.35
  • Switch 8x8, 4 classes of service with weight
    5221
  • Traffic model
  • Bernoulli iid uniform
  • Bernoulli iid nonuniform overloaded traffic
  • Bursty uniform
  • Bursty nonuniform overloaded traffic
  • Same input traffic trace for OQ and CIOQ switches
  • Sample the distribution of packets for port 0
    each 10 time slots

11
Contents
  • Motivation
  • Metric used
  • Simulation Environment
  • Algorithms used and their results
  • algo0 to algo4
  • Intuition behind the result
  • Further work
  • Conclusion

12
Algorithms
  • We came up with 5 maximal weight matching
    algorithms that attempt to provide service for
    multiple classes.
  • They are based on the request-grant-accept phases
    similar to iSLIP.
  • Each VOQij is split into P sub-queues, each
    sub-queue stores the packet for a class

13
algo0
  • algo0 is the most basic algorithm out of the 5
    algorithms upon which the subsequent algorithms
    build on. Algo0 is a variation of PIM with
    support for different priorities.
  • Request For each output j that input i has a
    packet for, it requests that output with weight
    1.
  • Grant If output j receives any requests, it
    determines the request with the largest weight
    (all the same in this case). If multiple requests
    are the same largest weight, ties are broken
    randomly.
  • Accept If input i receives any grants, it
    determines the grant with the highest weight (all
    the same in this case). If multiple requests are
    the same largest weight, ties are broken
    randomly.

14
algo1
  • algo0 does not differentiate between different
    requests, i.e. all requests are treated equally.
  • algo1 improves on that by associating a weight
    with each request.
  • For each VOQij, we calculate Wijk weight of
    class k x amount of time a class k packet has
    waited at the HoL. Then we take the maximum Wijk
    over all k classes for this VOQ and assign this
    as Wij, the weight of the request from input i to
    output j

15
algo1
  • The rest of the algorithm is the same as algo0.
  • Request For each output j that input i has a
    packet for, it requests that output with weight
    Wij.
  • Grant If output j receives any requests, it
    determines the request with the largest weight.
    If multiple requests are the same largest weight,
    the ties are broken randomly.
  • Accept If input i receives any grants, it
    determines the grant with the largest weight. If
    multiple requests are the same largest weight,
    ties are broken randomly.

16
algo2 and algo3
  • For algo0 and algo1, during the grant and accept
    phases, ties are broken randomly.
  • This does not take into consideration which
    request was granted/accepted previously.
  • algo2 and algo3 improves on algo0 and algo1 by
    remembering previous matches in a similar way to
    iSLIP
  • For each output, we keep a pointer to the last
    accepted grant input for every priority.
  • For each input, we keep a pointer to the last
    accepted output for every priority.

17
algo2
  • algo2 is algo0 with the pointer enhancement.
  • Request For each output j that input i has a
    packet for, it requests that output with weight
    1.
  • Grant If output j receives any requests, it
    determines the request with the highest weight
    (all the same in this case). Ties are broken
    first by priority. If multiple requests are of
    the same priority, we do the following the
    output last granted for this priority is least
    preferred. We then grant the output that is most
    preferred (in the round-robin definition).
  • Accept If input i receives any grants, it
    determines the grant with the highest weight (all
    the same in this case). If there are ties, we
    first select a priority to accept randomly. If
    there are multiple grants with this priority, we
    do the following the input last accepted for
    this priority is least preferred. We then accept
    the input that is most preferred (in the
    round-robin definition).

18
algo3
  • algo3 is algo1 with the pointer enhancement.
  • Request For each output j that input i has a
    packet for, it requests that output with weight
    Wij.
  • Grant If output j receives any requests, it
    determines the request with the highest weight.
    Ties are broken first by priority. If multiple
    requests are of the same priority, we do the
    following the output last granted for this
    priority is least preferred. We then grant the
    output that is most preferred (in the round-robin
    definition).
  • Accept If input i receives any grants, it
    determines the grant with the highest weight. If
    there are ties, we first select a priority to
    accept randomly. If there are multiple grants
    with this priority, we do the following the
    input last accepted for this priority is least
    preferred. We then accept the input that is most
    preferred (in the round-robin definition).

19
algo4
  • algo2 and algo3 rotate the pointer for the
    preferred input port to grant and the preferred
    output port to accept.
  • Instead of having a pointer that rotates
    regularly, algo4 tries to rotate the preference
    depending on the weight of each class. It
    attempts to rotate the pointer similar to WFQ,
    where the pointer stays at a particular preferred
    port depending on the schedule determined by WFQ.

20
algo4
  • Request For each output j that input i has a
    packet for, it requests that output with a bitmap
    showing which priority has a packet.
  • Grant Output j maintains a preferred priority,
    which is updated in a way similar to WFQ for
    accepted grant request. Assume the preferred
    priority for output j is k. Output j checks all
    the received requests has the packet with
    priority k. If multiple inputs have priority k
    packets, ties are broken randomly. If no input
    has priority k packets, the output j updates its
    preferred priority to the next one.
  • Accept Input i also maintains a preferred
    priority, which is updated in a way similar to
    WFQ for accepted request. Assume the preferred
    priority for input i is k. If input i receives
    any grants, it finds the grant with priority k.
    If multiple grants have priority k, ties are
    broken randomly. If no grant has priority k, the
    the preferred priority is updated to the next one.

21
Result Bernoulli iid uniform
22
Result Bernoulli iid nonuniform
23
Result Bursty uniform
24
Result Bursty nonuniform
25
Results
  • In most of the cases, algo1 is better than algo0,
    algo3 is better than algo2
  • algo3 is not always better than algo1
  • algo4 is not always better than algo1 and algo3
  • When speedup increases, the results are getting
    close for different algorithms.
  • For speedup gt 4, the BDiff 0

26
Content
  • Motivation
  • Bandwidth Metric
  • Simulation Environment
  • Algorithms used and their results
  • Conclusion
  • Weight information is helpful
  • Size of matching is not helpful
  • WFQ on both input and output side is not helpful
  • Speedup for BDiff 0
  • Further work
  • Conclusion

27
Intuition behind the result
  • Adding the weight information in the algorithms
    helps the scheduler to make the better decision
    for serving different classes.
  • Compared with algo0 and algo1, algo2 and algo3
    improve the size of the matching because they
    desynchronize the grants to different ports.
    However, we observed that algo2 and algo3 did not
    improve the BDiff metric. So the size of the
    matching does not help for serving different
    classes.
  • Implement WFQ on both input and output port to
    select grants and accepts does not help to make
    the better decision. Intuitive thinking WFQ on
    output side may help to make better decisions,
    but we could perhaps shall use other criteria to
    break ties on the input side.

28
Intuition behind the result
  • BDiff 0 for Speedup gt 4. 4 is the number of
    classes in our test. So maybe with Speedup gt
    number of classes , BDiff0. However, we did a
    couple of tests for number of classes 5, BDiff
    0 for speedup gt 4 is still hold.

29
Content
  • Motivation
  • Bandwidth Metric
  • Simulation Environment
  • Algorithms used and their results
  • Intuition behind the result
  • Further work
  • Latency metric
  • Existence of a constant Speedup S for BDiff 0?
  • Conclusion

30
Future work
  • Besides the bandwidth allocated to different
    classes of service, the latency is another metric
    to measure how good the algorithm is. Define the
    metric for latency as how close the latency of
    the packets for different classes is to OQ
    switch, measure the latency metrics for different
    algorithms.
  • Investigate more on whether exist a constant
    speedup S, CIOQ switch can emulate OQ WFQ for the
    service rate for different classes. Need more
    theoretical analysis

31
Conclusion
  • We define the metric to evaluate the capability
    of algorithms to provide class of service. The
    metric is measured for different algorithms.
  • The result suggests that the weight information
    in selecting grants and accepts is helpful for
    smaller speedup. When speed up increases, the
    difference for different algorithm is not
    obvious. So there is a trade off between simple
    algorithm or speedup.
  • Among all the algorithms we tried, algo1 is good
    enough to provide a good service rate for
    different classes. Algo3 and Algo4 does not
    improve from algo1.
  • Its possible to find a maximal matching
    algorithm with certain speedup for CIOQ switch to
    emulate OQ WFQ for the service rate of different
    classes
Write a Comment
User Comments (0)
About PowerShow.com