Title: Performance Analysis and Optimization of Latency Insensitive Systems
1Performance Analysis and Optimization of Latency
Insensitive Systems
- Luca P. Carloni
- Alberto L. Sangiovanni-Vincentelli
UC Berkeley
Design Automation Conference Los Angeles, June
2000
2Motivation System-on-a-Chip Design
3Sequential Modules and RTL Design
Output Register
Primary Outputs
Primary Inputs
Combinational Logic
State Register
4Block Diagram of a MAC Circuit
RTL Design separates functional specification
from performance analysis
5Intra-Module Delay and Timing Constraints
Once all modules are composed, the overall
system works correctly as far as it is running
with a clock period Tclk max T1 ,T2 ,T3 ,T4
6Impact of Inter-Module Path Delays
7DSM Percentage of Reachable Die
- For a 0.06 micron process a signal can reach only
5 of the dies length in a clock cycle D.
Matzke, (TI) 1997 - Cause Combination of high frequencies and slower
wires
8Need of a New Design Approach
- To relax time constraints during early phases of
the design when correct measures of the
inter-module delay paths are not available - To simplify the composition of sequential modules
in pipeline mode - To facilitate the insertion of extra pipeline
stages between one module and the next one with
the purpose of buffering those signals which
propagate on long wires
9Latency Insensitive Design ICCAD99
10Latency Insensitive Design ICCAD99
RS
11Latency Insensitive Design ICCAD99
P3
P2
RS
RS
12Informative Events and Stalling Events
- Each RelayStation introduces 1 stalling event
- A module receiving a stalling event as input
- emits stalling events as outputs at the next
cycle
13Advantages of LID Methodology
14Robustness of LID Performance
Performance Loss (after RelayStation
insertion)
The Latency Insensitive Protocol does not affect
performance only if the design does not present
any feedback path between the sequential modules
15Latency Insensitive Systems (LIS) Graph
- Capture the structure of a Latency Insensitive
System without getting lost into the details of
the logic inside each sequential module - Focus on communication and synchronization
properties, while neglecting the particular data
items exchanged among the modules - Model the system performance by enabling
early-exploration as well as late-adjustments
of the latency-throughput trade-offs
16LIS-Graph for the MAC Circuit
REG
REG
Composite
REG
REG
MPY
REG
REG
REG
REG
REG
REG
17Weight of LIS-Graph Arcs
The weight of an arc is equal to the number of
relayStations on the corresponding channel
18Equivalence of LIS-Graphs
19Progressive Trace of a LIS-Graph Arc
20Behavior of a LIS-Graph
S1
S5
The notion of LIS-graph behavior captures
the communication and synchronization
properties of a latency insensitive system
S4
S2
S3
S1
S2
S3
S4
S5
21Firing Semantic of a LIS-Graph
- Independence Rule every vertex Vj fires the
first informative event (number 1) on each
outgoing arc Ai (Vj, Vk). However, if arc Ai
has weight w(Ai), the down-link vertex Vj will
observe w(Ai) stalling events before seeing the
first informative events from Vj - AND-Causality Rule every vertex Vj fires the
n-th informative event only after the (n-1)-th
informative event has appeared on each arc
entering Vj
22Cycle Means and System Throughput
23Computing the Maximum Cycle Mean
- Acyclic LIS-Graph (pipelined system with no
feedback) - Thp(G) MCM(G) 1
- Cyclic LIS-Graph (1 Strongly Connected Component
(SCC)) - all K cycles can be detected in O((VA)
(K1)) - Cyclic LIS-Graph (more than 1 SCC)
- use Tarjans algorithm to detect all SCCs,
- then derive the largest MCM among all SCCs
24Recycling an Illegal LIS-Graph
- Annotated LIS-Graph each arc ai has a length
l(ai) that corresponds to the smallest multiple
of the clock period that is larger then the delay
of the channel associated to the arc - Illegal Arc iff w(ai) lt l(ai) 1
- Illegal LIS-Graph iff contains an illegal arc
- Recycling Operation Legalize a graph be
increasing the weights of illegal arcs (i.e.
adding relay stations to the corresponding
channels)
25Recycling Legalization Equalization
- Legalization after deriving the annotated
LIS-graph G legalize it by augmenting the weights
of each illegal arc ai by DW(ai) l(ai) - 1
- w(ai) - Equalization compute the max throughput Tk
sustainable by each SCC Sk in the legalized graph
G and equalize them by distributing Nk extra
relay stations on the critical cycle Ck of Sk - Key Point avoid being forced to augment weights
of cycles having small cardinality
26Case Study MPEG-2 Video Encoder
Frame Memory
DCT
Preprocessing
Input
Quantizer (Q)
Motion Compensation
Frame Memory
IDCT
Regulator
Motion Estimation
VLC Encoder
Buffer
Output
27LIS-graph of MPEG-2 Video Encoder
S
V1
V2
V3
V4
V5
V15
V10
V6
V7
V8
V9
V14
V11
V12
V13
T
28Detecting Cycles in MPEG-2 LIS-graph
S
V1
V2
V3
V4
Cycles
V10
V8
V5
V11
V6
V7
V9
V15
V14
V12
V13
T
29MPEG2 - Throughput Degradation
Cycles
Cardinality
3
4
5
8
9
10
Cycle Weight
30Moving Around the Latency - 1
Critical Cycle
S
V1
V2
V3
V4
V10
V8
V5
V11
V6
V7
V9
thp(G)
V15
V14
V12
V13
T
31Moving Around the Latency - 2
Critical Cycle
S
V1
V2
V3
V4
V15
V10
V6
V5
V7
V8
V9
V11
V14
thp(G)
V12
V13
T
32Practical Guidelines for LI Design
- All modules should put comparable timing
constraints on the global clock - Modules whose corresponding lis-graph nodes
belong to the same cycle should be kept close
while deriving the final implementation - Relay Station Insertion should be automatically
performed in a way similar to Buffer Insertion
33Conclusions
- LIS-graphs are a formal model to analyze the
properties of a Latency Insensitive System - Recycling is a rigorous method
- to capture latency variations of the
communication channels - to compute exactly the final throughput of the
system - MPEG-2 Case Study shows that the present work
- enables the exploration of latency/throughput
trade-offs at any stages of the design process, - facilitates the integration of pre-designed IP
cores on a single chip.
34Performance Analysis and Optimization of Latency
Insensitive Systems
- Luca P. Carloni
- Alberto L. Sangiovanni-Vincentelli
UC Berkeley
Design Automation Conference Los Angeles, June
2000