Title: Exploiting Level Sensitive Latches in Wire Pipelining
1Exploiting Level Sensitive Latches in Wire
Pipelining
- Vikram Seth
- Department of EE
- Texas AM University
Min Zhao Freescale Semiconductor Inc.
Jiang Hu Department of EE Texas AM University
2Outline
- Technology trend and wire pipelining
- Previous works
- Advantages and challenges of using latches
- Concurrent repeater, flip-flop and latch
insertion - Experimental results
- Conclusion
3Technology Trend
- Wire delay increase
- Clock period decrease
- Chip size increase
- Multiple clock cycles for long distance signal
propagation
4Wire Pipelining
Delay lt 1 clock cycle
Repeater
5Design Implications
- Minimizing delay is inadequate
- Latency number of clocked repeaters along a
source-sink path - Latency needs to be minimized or latency
constraint needs to be satisfied - More power dissipations
- Clocked repeaters are generally larger than
unclocked repeaters - Extra load to clock network
- Greater vulnerability to variations
- Dependence on clock skew
6Previous Work I Concurrent Repeater and
Flip-flop Insertion
- Cocchini, Hassoun-Alpert-Thiagarajan, ICCAD 02
- Given
- Steiner tree, candidate repeater locations on the
tree - Repeater library repeaters edge triggered
flip-flops - Clock skew
- MiLa
- Find repeater solution to Minimize the max
Latency - GiLa
- Find min cost repeater solution to satisfy Given
Latency constraint at each sink
7Previous Work II Wave Pipelining
- L. Zhang, Y. Hu and C. C.-P. Chen, TAU 2004
- Wave pipelining
- Signals are allowed to propagated over
multi-cycles without synchronous elements - Advantages
- No setup time and skew overhead
- Weakness
- Complicated recovery circuits at receiver
8Spectrum of Synchronization Effort
Flip-flop based pipelining
Latches
Wave pipelining
Loose synchronization Avoidable setup
time Tolerance to skew No recovery circuit
Strong synchronization Setup time overhead Skew
overhead
Zero synchronization Recovery circuit overhead
9Area and Power Advantages of Using Latches
- A flip-flop is usually composed by two latches
- Replacing each flip-flop with one latch may
reduce - Area
- Dynamic power
- Leakage power
- Load to clock network
10Timing Flexibility of Latches
FF2
FF1
Delay
t1
t2
Clock source
Delay lt T
Depart
T
t1
Arrive
t2
11Advantage of Latches on Handling Blockages
Delay T
Delay T
Negative edge triggered flip-flops
12Marginal Wave Pipelining
Delay T - Tp
Delay T Tp
F1
F2
L
F1
F2
Clock
L
Remark by Dr. Cocchini
- Sometimes there are two signals between L and F2
- Insert repeaters between L and F2 gt one signal
may not overwrite the other - Turn off during inactive clock level gt avoid
signal loss
13Advantages of Latches in Tree
Delay T
Delay T
Negative edge triggered flip-flops
Delay T
14ChallengeTight Short Path Constraint
FF2
FF1
Delay
t1
t2
Clock
th lt Delay lt T - ts
T
Depart
t1
Arrive
ts
th
t2
15Assumptions
- Flip-flop latch based wire pipelining is
applied in a flip-flop based circuit design - Single phase clock for flip-flops and latches
- Flip-flops are negative edge triggered
- Latches are positive level sensitive
- Timing reference point is aligned with falling
egde of the clock signal
16Problem Formulation
- Given
- Steiner tree, candidate repeater locations on the
tree - Repeater library repeaters flip-flops
latches - MiLa find repeater solution to Minimize the max
Latency - GiLa find min cost repeater solution to satisfy
Given Latency constraint at each sink - Both long path and short path constraints are
satisfied for flip-flops and latches - Gate RC switch model, wire Elmore delay model
17Algorithm Overview
Candidate locations
Driver
Sink
Candidate solutions are propagated from sinks
toward the driver
Sink
18Candidate Solutions
- Each candidate solution is associated with a node
in tree, and is characterized by - c downstream cap seen from the node
- r required arrival time
- y latency
- a repeater assignment
- At each sink node j, initial candidate solution (
cj, rj, 0, 0 )
19REPEAT Insert Buffer
(c1, r1, y1, 0)
- r1r r1 Rrc1
- If r1r lt -Tp, drop this solution
- c1r Cr
- y1r y1
- Cr repeater input capacitance
- Rr repeater output resistance
20REPEAT Insert Flip-flop
(c1, r1, y1, 0)
- If r1 Rfc1 lt 0
- Skip the REPEAT to enforce long path constraint
- c1f Cf
- r1f T - tsetup
- y1f y1 1
- Cf flip-flop input capacitance
- Rf flip-flop output resistance
21REPEAT Insert Latch
(c1, r1, y1, 0)
- r r1 RLc1
- If rlt -Tp, quit ( long path constraint )
- r lt delay padding ( short path constraint )
- c1L CL
- r1L min(T tsetup , T r )
- y1L y1 1
- CL latch input capacitance
- RL latch output resistance
r lt tsetup implies time borrowing
22Delay Padding
- Short path violation can be fixed by delay padding
Extra load
23Uniform Delay Padding
- If delay gt Tp thold, there is
no short path violation - Pad delay if delay lt Tp thold when insert latch
- If r gt T tsetup Tp thold
pad delay of r T tsetup Tp thold
tsetup
r
Delay
0
T
24Pessimism of Uniform Delay Padding
Latch A
Latch B
Delay
- Even if delay lt Tp thold , short path
constraint may be satisfied when signal arrival
time is late - Uniform delay padding may cause unnecessary
padding - Arrival time is not known in bottom-up solution
propagation!
Delay
Early arrival
Clock
Arrival time at A
25Deferred Delay Padding
- Along with solution propagation
- Note potential delay padding
- Defer actual padding till arrival information is
available
26Additional Characteristics for Solutions
- Earliest required arrival time r
- r lt RAT lt r
- Large r implies more chance of actual padding
- Potential delay padding p associated with a
specific short path - Each solution is characterized by (c,
r, y, a, r, p ) - At sink j, solution is (cj, rj, 0, 0, rj T, 0 )
27JOIN Operation for Deferred Delay Padding
(cl , rl , yl , al , rl , pl)
(cr , rr , yr , ar , rr , pr)
- cjoin cl cr
- rjoin min(rl , rr)
- yjoin max(yl , yr)
- ajoin al ? ar
- rjoin max(rl , rr)
- pjoin pl ? pr
- If rjoin gt rjoin , pad delay
28Latch Insertion Revisited
(c1, r1, y1, 0, r1, p1)
(c1L, r1fL, y1L, a1L, r1L, p1L)
- r r1 RLc1
- If rlt -Tp, quit ( long path constraint )
- r lt deferred delay padding, decide
- Potential delay padding p
- Any part of p needs to be instantiated
- Update r
- c1L CL
- r1L min(T tsetup , T r )
- y1L y1 1
r lt tsetup implies time borrowing
29DDP Generating Potential Delay Padding
- If -Tp lt r lt 0, generate p r Tp, propagate r
through latch - Neglect setup/hold time for simplicity
b
c
a
Dbc 2
0
4
8
At c
r 8 r 0 p 0
30DDP Small Earliest RAT
- If r lt -Tp, arrival time can be arbitrarily early
without causing short path violation - No padding, p 0, r 0 after latch insertion
b
c
a
Dbc 2
Dab 11
No short path violation even without padding
0
4
8
r 8 r 6 p 2
At b
31DDP Medium Earliest RAT
- If -Tp lt r lt 0, r Tp lt p, update p, propagate
r, no actual padding,
b
c
a
Dbc 2
Dab 9
4
16
0
8
r 8 r 6 p 2
At b
p min ( previous r Tp, previous p ),
potential padding on (a, b)
32DDP Large Earliest RAT
- If r gt p gt 0, instantiate padding of amount p !
reset p and r
b
c
a
Dbc 2
Dab 3
r 8 r 6 p 2
r 5 r 3 p 2
WIRE ab
At b
4
0
8
33Experiment MiLa without Blockages
Latency
sinks/net 117 Runtime 0.010.07 s
34Experiment MiLa with Blockages
Latency
No feasible solutions with only FF
blockages 340
35Experiment GiLa without Blockages
No feasible solutions with only FF
Area
36Experiment GiLa with Blockages
No feasible solutions with only FF
Area
37Conclusions
- Wire pipelining becomes a necessity and requests
more on timing and power/area - Advantages of using latches area and timing
flexibility - Short path constraint can be solved
- Mixed latch and flip-flop
- Proper delay padding
- Timing advantage of latch can be traded to
variation tolerance
38Thank You!