O1TURN : Near-Optimal Worst-Case Throughput Routing for 2D-Mesh Networks PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: O1TURN : Near-Optimal Worst-Case Throughput Routing for 2D-Mesh Networks


1
O1TURN Near-Optimal Worst-Case Throughput
Routing for 2D-Mesh Networks
  • DaeHo Seo, Akif Ali, WonTaek Lim
  • Nauman Rafique, Mithuna Thottethodi
  • School of Electrical and Computer Engineering
  • Purdue University

2
Motivation
  • New routing algorithm for 2D Mesh networks
    O1TURN
  • Why 2D Mesh networks?
  • Important class of interconnection network
  • Natural topology for on-chip network
  • Many Applications
  • yet another routing algorithm?

3
Routing Algorithms Objectives
  • Maximize throughput and minimize latency
  • O1TURN satisfies all design goals

IDEAL DOR ROMM VALIANT MIN-ADAPTIVE
Average case throughput X X X
Worst case Throughput X X ?
Minimal of network hops X X X X
Low complexity router X X X
4
Challenges
  • Intuition Path flexibility, Load Balancing,
    Throughput correlated
  • Prior results
  • Throughput Increasing path flexibility SPAA
    2002
  • May not improve worst case throughput, even
    decrease
  • Likely to improve average case throughput
  • Latency Increasing path flexibility may
    increase router complexity

IDEAL DOR ROMM VALIANT MIN-ADAPTIVE
Average case throughput X X X
Worst case Throughput X X ?
Minimal of network hops X X X X
Low complexity router X X X
of Paths ? 1 T(K2) T(K2) T(2K)
5
Contributions
  • Develop new routing algorithm O1TURN
  • Throughput
  • Better than DOR / ROMM for worst-case throughput
  • Near optimal worst-case throughput for 2D Mesh
  • Captures most of the opportunity with limited
    path flexibility for average case throughput
  • O1TURN (with 2 paths) as good as ROMM (with
    T(K2) paths)
  • Latency
  • Router Implementation for O1TURN
  • Comparable complexity as simple DOR router
  • Key Point
  • Partition the delay-critical circuitry
  • O1TURN is minimal One goal trivially satisfied

6
Outline
  • Background of interconnection network
  • O1TURN routing algorithm
  • O1TURN router implementation
  • Simulation Results
  • Conclusion and QA

7
Outline
  • Background of interconnection network
  • O1TURN routing algorithm
  • O1TURN router implementation
  • Simulation Results
  • Conclusion and QA

8
Background
  • Packet Switched, 2D mesh network
  • Each packet independently routed
  • Terminology
  • Network Radix k in kxk network (NOT Degree)
  • Simplifying assumptions for this talk
  • One packet crosses a link in one cycle
  • Square mesh networks (K x K)
  • K is even (K 2p)
  • Analytical method for throughput analysis
  • TD Method Towles and Dally, SPAA 2002
  • Worst-case throughput (Maximum channel load)-1
  • Given permutation and (oblivious) routing
    algorithm
  • Find maximum channel load
  • Given only (oblivious) routing algorithm
  • Find permutation that causes maximum channel load

9
TD-Method Example
Unit of worst-case throughput packets / node /
cycle
  • Max Channel Load 0.5
  • Worst-case Throughput (1 / 0.5) 2
  • Max Channel Load 1
  • Worst-case Throughput (1 / 1) 1

A -gt B -gt D A -gt C -gt D
Traffic Src -gt Dst A -gt D D -gt A
A -gt B -gt D
D -gt C -gt A
D -gt B -gt A D -gt C -gt A
10
Outline
  • Background of interconnection network
  • O1TURN routing algorithm
  • O1TURN router implementation
  • Simulation Results
  • Conclusion and QA

11
O1TURN routing algorithm
  • Orthogonal 1 TURN routing
  • There is no U-TURN gt Orthogonal
  • At most 1 turn gt 1TURN
  • Use 2 routes
  • At most 2 minimal, 1-turn routes in 2D MESH (XY,
    YX)
  • Two routing algorithms (XY routing, YX routing)
  • With same probability

12
O1TURN routing algorithm
  • Claim Maximum channel load of O1TURN is K / 2
  • Proof Two sources of load contributions
  • of nodes of left side of channel by XY routing
  • of nodes of right side of channel by YX routing

N 0.5
(K - N) 0.5
XY routing
YX routing
13
Optimal Worst Case Throughput
  • Maximum channel load K / 2
  • Worst-case Throughput 2 / K by TD Method
  • Consider a permutation where 100 packets cross
    bisection
  • Throughput (X) bounded when bisection links
    saturated
  • X (K2 / 2) K
  • X 2 / K packets / node / cycle
  • When K is odd, O1TURN is within (1 / K2) of
    optimal worst-case throughput

K x K mesh
14
Worst-case Throughput Trends
  • Worst-case channel load as network size changes
  • Normalized to Optimal worst-case throughput
  • Worst case throughput of DOR, ROMM degrades with K

Recall Even Radix Opt 1 Odd Radix Opt (1
- 1 / K2)
15
Average Case Analysis
  • Extension of TD method B.Towles et.al., SPAA
    2003
  • Examine randomly chosen permutations
  • Harmonic means of worst-case throughput of
    various permutations
  • 1 M random permutations
  • O1TURN shows the better or the same average case
    throughput

4 x 4 2D MESH 4 x 4 2D MESH 4 x 4 2D MESH 4 x 4 2D MESH
DOR ROMM O1TURN
Average case throughput 1 1.113 1.136
8 x 8 2D MESH 8 x 8 2D MESH 8 x 8 2D MESH 8 x 8 2D MESH
Average case throughput 1 1.180 1.188
16
O1TURN Summary
  • Near optimal worst-case Throughput
  • By TD method
  • Optimal for even K
  • Approaches Optimal for large, odd K
  • Average case throughput
  • Better than DOR and comparable to ROMM
  • Minimal of network hops
  • O1TURN is minimal routing

17
Outline
  • Background of interconnection network
  • O1TURN routing algorithm
  • O1TURN router implementation
  • Simulation Results
  • Conclusion and QA

18
Base Router Implementation
  • Base Router Pipelined Virtual Channel Router
  • 4 Stages Routing, Virtual Channel allocation,
    Switch allocation, Crossbar Physical Channel
    transfer
  • One control block controls all virtual channels
  • Critical Stage Virtual Channel allocation stage

19
O1TURN Router Implementation
  • O1TURN Router
  • Separate Virtual Channels into two virtual
    networks (VN)
  • One VN for XY routing, the other for YX routing
  • Deadlock prevention in each independent VN due to
    DOR

20
Delay Analysis
  • Existing router delay models for pipelined
    routers
  • Peh and Dally HPCA 2001
  • Based on the logical effort method
  • I.Sutherland, B. Sproull, 1999
  • FO4 unit
  • Comparable complexity as DOR router

VCs / PC DOR DOR O1TURN O1TURN
VCs / PC VC allocation SW allocation VC allocation SW allocation
4 17 14 14 14
8 20 16 17 16
21
O1TURN Summary
  • Near Optimal Worst case Throughput
  • Good average case Throughput
  • Minimal Network Hops
  • Low Complexity Router Implementation
  • Comparable complexity as DOR router

IDEAL O1TURN
Average case throughput X X
Worst case Throughput X X
Minimal of network hops X X
Low complexity router X X
22
Outline
  • Background of interconnection network
  • O1TURN routing algorithm
  • O1TURN router implementation
  • Simulation Results
  • Conclusion and QA

23
Evaluation Method
  • Modified Popnet network Simulator L. Shang,
    2003
  • 4x4 2D MESH (8x8 in paper)
  • Full-duplex, bidirectional links
  • 8 VCs per PC
  • 5 Flits per packet
  • 500 K cycles
  • Synthetic Traffic Uniform Random, BC, MT, HOT
    SPOT
  • Compared with existing routing algorithms
  • Oblivious routing algorithms (DOR, ROMM)
  • Adaptive routing algorithm (DUATO)

24
Simulation Results
  • 4 x 4 2D MESH Uniform Random Traffic Pattern

25
Simulation Results
  • 4 x 4 2D MESH Matrix Transpose Traffic Pattern
  • One of the worst-case traffic pattern for DOR

26
Simulation Results
  • 4 x 4 2D MESH Bit Complement Traffic Pattern
  • Already balanced traffic pattern

27
Simulation Results
  • 4 x 4 2D MESH HOT SPOT Traffic Pattern
  • 2 nodes have 20 of traffic

28
Simulation Results
  • Delay penalty of adaptive routing
  • How the complexity of router implementation
    affects on latency
  • Hot Spot Traffic Pattern

29
Outline
  • Background of interconnection network
  • O1TURN routing algorithm
  • O1TURN router implementation
  • Simulation Results
  • Conclusion and QA

30
Related Work
  • Routing algorithms
  • Valiant L.G.Valiant et.al, ACM 1981
  • ROMM T.Nesson et.al, ACM 1995
  • DUATO J.Duato et.al, 1993
  • Partitioned router implementation
  • Mad Postman Jesshope et.al, ISCA 1989
  • PFNF Upadhyay et.al, 1997
  • Analysis methods
  • Worst-case B.Towles et.al, 2002
  • Throughput centric B.Towles et.al, 2003
  • Delay model L.S.Peh et.al, HPCA 2001

31
Conclusion
  • Goals
  • Good average case throughput
  • Good or Optimal worst case throughput
  • Minimal of network hops
  • Low complexity router implementation
  • O1TURN
  • Provide near optimal worst case throughput
  • Provide the better or the same average case
    throughput compared with existing routing
    algorithms
  • Minimal of network hops
  • Simple router implementation comparable with
    DOR router
  • Satisfy all performance aspects

32
Q A
Write a Comment
User Comments (0)
About PowerShow.com