Title: MultiVDD FPGA Architecture
1Multi-VDD FPGA Architecture
- CSE 598C Project
- Fall 2003
- Aman Gayasen, Ki-Yong Lee
2FPGA overview Virtex-II architecture
3Power Consumption in FPGAs
- Orders of magnitude larger than embedded
processors (that are being used in mobile devices
extensively) - No embedded FPGA yet (but on its way)
- Static (leakage) power
- Smallest Virtex-II 200 mW
- No standby modes
4Dynamic Power Breakdownin Virtex-II
Taken from Li Shang et al. FPGA02
5Leakage Power Breakdownof a 90 nm CLB array
(Spartan 3)
Leakage in Configuration SRAMs can be easily
optimized, using conventional techniques
- Leakage in Virtex-II FPGA
- 25 of total power (average)
- 200 mW for smaller Virtex-II devices (0.13 nm)
- 540 mW for largest Virtex-II devices
6Leakage Recap -Dependence on Vdd
- Sub-threshold leakage a Vdd
- DIBL a exp(Vdd)
- Gate Leakage a exp(Vdd)
Lowering Vdd is a good idea to reduce leakage
7Multi-Vdd Strategy
- Let the non-critical paths run at lower voltage
- Saves dynamic static power
- Can maintain performance while saving power
8Issues in multi-Vdd Approach
- Optimal voltage assignment is an NP- complete
problem. - Level Converters need to be inserted if a low-Vdd
gate drives a high-Vdd gate - Consume power. Introduce delay.
9Multi-Vdd FPGA
- Expected to reduce dynamic static power
consumption of logic as well as routing
resources. - May need redundant level converters because
different designs may need different
number/placement of level converters.
10Versatile Place and Route (VPR) -An FPGA Place
and Route tool
- Developed at Univ. Toronto
- Open source.
- Flexible.
- Can specify various architectural parameters as
inputs - Simple architecture.
- Basic CLB design similar to Xilinx/Altera FPGAs
11VPR Example architectural parameters
- LUT size/inputs
- Number of slices (subblocks) in a CLB
- Various parameters associated with routing
matrix. - Delay values for Timing-Driven Routing
12Algorithm for assigning Vdds(for 2 Vdds)
- List out the paths whose delays become greater
than required clock time period when they are
operated at low Vdd. - Assign a criticality attribute to each node. This
is a measure of how many paths contain this node. - Mark all nodes on critical path as high Vdd.
- For every other path, traverse the nodes in
decreasing order of criticality, and mark them as
high Vdd till the path delay Tclock
13Algorithm for finding k longest paths from a
timing graph
- Generate the longest path P1
- Prepare list1 ordered list of branch slacks
- Calculate next_delay(P1)
- While (k paths not enough)
- I the path with longest next_delay
- J first branch point in list1
- Generate next longest path Pk1 by branching
out from the j-th node on path Pi - Prepare listk1 and calculate next_delay(Pk1)
- Update next_delay(Pi)
- K k 1
Source Ju et al. DAC 1991
14Results
- Assuming a 20 increase in delay of circuit when
it operates at low Vdd, it is observed that gt 60
of the nodes can be made low Vdd without
increasing the Tclock. - (This does not mean that 60 of the circuit can
run at low Vdd. A node in the timing graph does
not always result in an independent circuit
element.)
15Circuit Level Work
- 1. Logic Slice
- 2. D-flipflop
- 3. Level Converter
16Logic Slice Design
- 65nm, BSIM4 model
- K-input LUT Transmission gate mux
- Edge-Triggered D-Flipflop
17Logic Slice
- Combinational Logic Delay T_comb (psec)
18Logic Slice
- Sequential Logic Delay T_seq_in (psec)
19Logic Slice
- Sequential Logic Delay T_seq_out (psec)
20Logic Slice
- Dynamic Power (uW)
- Freq500MHz
21Logic Slice
22D-Flipflop
- Dynamic Power (uW) at Freq500MHz
- Compared with 0.18um technology
23D-Flipflop Comparison
I (pA)
I (nA)
24Level Converter
- Design
- 65nm, BSIM4 model
- Delay, Dynamic Power, Leakage Current
25Level Converter
26Level Converter
- Dynamic Power (uW)
- Freq500MHz
27Level Converter
28Summary