Implementing DSP Algorithms with Networks on Chip - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Implementing DSP Algorithms with Networks on Chip

Description:

Rearrangeable Fabrics: every permutation matrix is feasible ... Given a graph representing the switch fabric, can a traffic matrix be scheduled in L cycles? ... – PowerPoint PPT presentation

Number of Views:61
Avg rating:3.0/5.0
Slides: 35
Provided by: nocsym
Category:

less

Transcript and Presenter's Notes

Title: Implementing DSP Algorithms with Networks on Chip


1
Implementing DSP Algorithms with Networks on Chip
2
Processing Elements and Networks on Chip
Dally01, Benini02
3
Problem
  • Mapping computations to processing elements
  • Hu03, Murali04
  • Scheduling switch fabric for traffic patterns
    known at compile time

4
Different Mapping
5
Same Mapping, Different Schedule
6
Different Results
  • Performance
  • Latency
  • Throughput
  • Power
  • Reliability

7
Outline
  • Background Switch Fabrics, BvN
  • Formulation Graph, Feasibility
  • Mapping Heuristics
  • Scheduling Heuristics
  • Experiments

8
BvN Decomposition
Any connection from inputs to outputs is
feasible Decompose traffic matrix into sum of
permutation matrices Chang01
Crossbar (Unbuffered, Input Queued)
9
BvN Decomposition
  • Optimum Least Number of Matrices
  • Leads to fewest number of cycles
  • Lower Bound
  • Maximum of row and column sums
  • Bound is tight
  • Proof yields a polynomial time algorithm
  • Assumptions
  • Traffic known a priori and captured by a traffic
    matrix
  • Rearrangeable fabric can implement all
    permutations

10
VLSI Implementation Issues
  • Rearrangeable fabrics not scalable
  • Excessive connection
  • Irregular placement and routing
  • Motivates study of simple topologies
  • Tree, mesh, torus, fat tree Leighton91
  • High quality schedules critical

11
Outline
  • Background Switch Fabrics, BvN
  • Formulation Graph, Feasibility
  • Mapping Heuristics
  • Scheduling Heuristics
  • Experiments

12
Switch Fabric and Traffic Matrix
  • Switch Fabrics
  • Links joined by programmable crosspoints
  • Simple graph representation
  • Traffic Matrix
  • Integer entries encoding number of packets from
    PE i to PE j

1
2
6
3
5
4
13
Feasibility 1
  • Feasible Matrix Vertex Disjoint Path Set
  • Can be transferred in one cycle
  • Rearrangeable Fabrics every permutation matrix
    is feasible
  • General Fabrics see the example below

1
2
3
4
5
6
1
2
3
4
5
6
14
Feasibility 2
1
2
3
4
5
6
1
1
2
2
3
4
6
3
5
6
4
5
1?4 and 2?5 are not feasible because any paths
chosen will share a vertex.
15
Schedule
  • A collection of feasible matrices that sum to the
    traffic matrix
  • Number of Cycles Number of Matrices
  • Optimum schedule has the least number of cycles

16
Optimum vs. Greedy
Greedy
Same color packets are scheduled in the same
cycle Greedy method takes one more cycle
Optimum
17
Decision Problem
  • VDPS Vertex Disjoint Path Set
  • NP-Complete Garey79
  • Given a graph representing the switch fabric, can
    a traffic matrix be scheduled in L cycles?
  • L1 case Is the traffic matrix
    feasible?Equivalently, is there a VDPS for the
    traffic matrix?
  • Hardness of VDPS ? hardness of scheduling on a
    general fabric

18
Outline
  • Background Switch Fabrics, BvN
  • Formulation Graph, Feasibility
  • Mapping Heuristics
  • Scheduling Heuristics
  • Experiments

19
Mapping and Scheduling
  • One-to-one mapping from DFG nodes to PEs done
    before scheduling actual traffic
  • Given a mapping, scheduling step generates the
    actual cycle by cycle scheme for communication

20
Why mapping heuristics?
  • Hard to evaluate a mapping
  • Deep combinatorial problem in its nature

21
Setup for Heuristic
  • distance from u to v when each
    edge in the graph are of length 1
  • FOM (figure of
  • merit) to minimize over all possible
  • Finding best is still NP-hard

22
Example Inputs
0
1
2
3
0
1
2
3
Initial Mapping
Traffic Matrix
23
Example Exchange
0
1
2
3
0
-6
1
2
3
Source 0 originates 034311 packets Source 3
originates 03104 packets Intuitively better
to place source 0 closer to destinations
24
Example Result Series
-6
-2
-6
-4
-4
-7
-4
25
Outline
  • Background Switch Fabrics, BvN
  • Formulation Graph, Feasibility
  • Mapping Heuristics
  • Scheduling Heuristics
  • Experiments

26
Congestion Metric
  • Design Criteria
  • Fast to calculate
  • Captures hot spots
  • Congestion on edge
  • is the row sum
  • is the column sum
  • is the distance
  • Based on the current traffic matrix

27
Generate Schedule
  • Pick the source or destination with maximum
    packets flowing in or out
  • BvN and Tree
  • Avoid congested links
  • Good metric of congestion
  • Shortest path based on congestion with fine tune
  • Keep adding paths to get a vertex disjoint path
    set
  • Record the VDPS in the schedule
  • Update traffic matrix, recalculate congestion
    values

28
Outline
  • Background Switch Fabrics, BvN
  • Formulation Graph, Feasibility
  • Mapping Heuristics
  • Scheduling Heuristics
  • Experiments

29
LDPC
  • 96 coders and 48 checkers
  • 23x23 mesh
  • Spare horizontal and vertical tracks to help
    routing

Number of Cycles
30
Distance Inverted Congestion
Number of Cycles
31
Mapping
Number of Cycles
32
Discussion Future Work
  • Explored mapping and scheduling for NoCs
  • Statically scheduled fabrics
  • Heuristics beating manual solutions, approaching
    optimum
  • One VDPS may take multiple clocks to finish
  • Fast networks are pipelined
  • Fixed time cycle not practical
  • Tweak heuristics to pack short transfers together

33
Thank You
  • Questions?

34
End
Write a Comment
User Comments (0)
About PowerShow.com