Design Techniques for Power Reduction - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

Design Techniques for Power Reduction

Description:

Supply voltage affects both active and leakage energy ... Sync. Slice. Up Smp. Register. FIFO. DPRAM. ROM. RAM. Accum. CMult. AddSub. Inverter. Logical ... – PowerPoint PPT presentation

Number of Views:176
Avg rating:3.0/5.0
Slides: 38
Provided by: bori99
Category:

less

Transcript and Presenter's Notes

Title: Design Techniques for Power Reduction


1
Design Techniques for Power Reduction
  • Borivoje Nikolic
  • bora_at_eecs.berkeley.edu

2
Digital IC Challenges
  • Robustness
  • Device features scale faster than their tolerance
  • Environment impact supply noise, coupling,
  • Tradeoff with power and performance
  • Power
  • Active power
  • Drain and gate leakage
  • Many known power reduction approaches
  • Trade-off performance for power savings
  • Affect robustness
  • Cost
  • NRE, masks, complexity
  • Good news density, intrinsic delays, improve

3
Power is a Problem
  • If we continue doing business as usual, both
    dynamic and leakage power will be a problem

chips are getting hot
and phones leaky!
  • Need to delivermaximum performance under power
    constraints

From S. Borkar, Intel
4
Outline
  • Know your enemy Power consumption in CMOS
  • Power and performance have to be jointly
    optimized
  • Reducing leakage
  • Robust design
  • BEE and INSECTA
  • Conclusions

5
Dynamic Power Consumption
V
dd
i
PMOS
L
NETWORK
A
1
V
A
out
C
N
L
NMOS
NETWORK
  • One half of the power from the supply is consumed
    in the pull-up network and one half is stored on
    CL
  • This also happens during glitching

6
Transistor Leakage
VDS 1.2V
G
Ci
S
D
Cd
Sub
Subthreshold slope S kT/q ln10 (1Cd/Ci)
Drain leakage current is exponential with
VGS Subthreshold slope is 70mV/dec _at_ room temp.
7
Transistor Leakage
3-10x in currenttechnologies
  • Two effects
  • diffusion current (like a bipolar transistor)
  • exponential increase with VDS (DIBL)

8
Gate Tunneling
  • IGD e-ToxeVgd,
  • IGS e-ToxeVgs
  • Independent of the subthreshold leakage
  • Contributes to the total leakage
  • Modeled in BSIM4
  • Also in BSIM3v3 but foundries usually do not
    include it

9
Reducing active power
  • Downsizing transistors (CL)
  • Slows down logic
  • Lowering the supply voltage (VDD)
  • Slows down logic
  • Reducing swing slows down the succeeding stage
  • Reducing frequency (f)
  • Does not reduce energy
  • Reducing switching activity (a)
  • Logic restructuring
  • Reducing glitching
  • Balancing logic

10
Reducing Active Power
  • Downsizing, lowering the supply on the critical
    path will lower the operating frequency
  • Downsize non-critical paths
  • Narrows down the path delay distribution
  • Increases impact of variations

11
Reducing Leakage
  • Using higher thresholds
  • Channel doping
  • Body biasing
  • Reduces drive current
  • Using stack effect
  • Stacked devices
  • Sleep transistors
  • Using longer transistors
  • Limited benefit
  • Increase in active current

12
Power-Performance Optimization
Energy/op
Unoptimized design
Emax
Emin
Dmin
Dmax
Delay
Maximize throughput for given energy
or Minimize energy for given throughput
13
Power-Performance Optimization
  • There are many sets of parameters to adjust
  • Tuning variables
  • Devices
  • Circuit(sizing, supply, threshold)
  • Logic style(std. cells, custom , )
  • Block topology (adder CLA, CSA, )
  • Micro-architecture (parallel, pipelined)

14
Multi-Level Approach
  • Energy minimization subject to delay constraint
  • Optimal trade-off between energy and area

Energy-Area (Cost) Performance
Architecture
Energy-Performance
Micro-Architecture
Energy-Delay
Circuit (Logic FFs)
D. Markovic
15
Sizing, Supply, Threshold Optimization
  • Transistor sizing can yield large power savings
    with small delay penalties
  • Gate sizing
  • Beta-ratio adjustments
  • Stack resizing
  • IBM EinsTuner
  • Supply voltage affects both active and leakage
    energy
  • Threshold voltages affect primarily the leakage

16
Optimization framework
  • Use for
  • Optimize datapath building blocks
  • Investigate the optimality of any given design
  • Use inside microarchitecture optimizer

R. Zlatanovici
17
Adders in Energy-Delay Space
Will demonstratein 90nm
  • Sparse Radix-4 adder is the fastest
  • R. Zlatanovici, S. Kao

18
Scope of Circuit Level Optimization
  • By combining sizing, supply and threshold
    optimization block delay can be varied in the
    range 12
  • Limited effectiveness

VDD, VTh, sizing optimization
64-b adder example
Nominal (Dmin, Enom)
Sizing opt. (1.1Dmin, 0.3Enom)
Energy Enom
Delay Dnom
D. Markovic
19
Microarchitecture Optimization
  • Viterbi decoder ACS recursion Transforming from
    add-compare-select to compare-select-add

E.Yeo
20
Sizing, Supply, Threshold Optimization
  • There exists optimal supply threshold for each
    function
  • In this optimum ESw/ELk 2
  • Depends on logic depth, activity, function
  • Technology is not optimal for all blocks
  • Adjust during the design
  • Multiple supplies, thresholds
  • Variable throughput applications
  • Variable supplies, thresholds

21
Sizing, Supply, Threshold Optimization
Reference Design Dref (Vddmax,Vthref)
Large variation in optimal circuit parameters
Vddopt, Vthopt, wopt
Vddmax
Vthmax
Vddmin
Vthmin
Technology parameters (Vddmax, Vthref) rarely
optimal
22
Dynamic Sleep Transistor
Active mode
Noise on virtual supply
Logic block
23
Dynamic Sleep Transistor
Idle mode
Virtual supply collapse
M.Sheets
24
Design Variability
  • Power-performance optimization

Power
T
S
F
Power constraint
Performance constraint
Leakage constraint
Performance
25
Robust Optimization
  • Optimization with uncertain parameters (R.
    Zlatanovici)
  • Parameters are within an ellipsoid centered on
    the nominal values
  • Optimize the worst case
  • Optimization with stochastic parameters
  • Parameters are random variables with known
    distribution centered on the nominal values
  • Optimize for parametric yield in the power
    delay space
  • Linear delay (logical effort based) models
    allow a convex formulation of the optimization
    with uncertain parameters
  • Bottom up approach get a handle on variations
    (K. Cao and Prof. Rabaeys group)

26
Whats Berkeley Emulation Engine?
  • A real-time FPGA-based hardware emulator, with
    speed up to 60 MHz
  • Emulation capacity of 10 Million ASIC
    gate-equivalents per module, corresponding to 600
    Gops (16-bit adds).
  • 2400 external parallel I/O providing 192 Gbps raw
    bandwidth.
  • Automated design flow from Simulink to FPGA
    emulation, integrated with INSECTA ASIC design
    flow.

27
BEE Applications
  • Real-time hardware emulation
  • Novel Communication Systems with analog front-end
    hardware (MCMA, UWB, 60GHz)
  • Digital signal processing systems
  • Real-time control systems
  • Hardware acceleration
  • Large-scale communication/signal processing
    system simulation
  • Hardware-in-the-loop cosimulation with software
    system
  • Complex parallel computing algorithms

28
BEE Design Environment
Servers
BEE Processing Unit
Analog Front-end
Client PC
Network
Ethernet
LVDS/LVTTL
BEE/Insecta Design Flow
FPGA Bit Stream Conf File
Simulink MDL
ASIC Layout
29
Design Flow Users Perspective
Virtual Components
VHDL Netlist
30
Basic Blocks
FIFO
DPRAM
Shifter
VHDL
Concat
Enable
Const
ROM
RAM
Counter
Delay
Mux
Down
P to S
Convert
ReInt
S to P
Sync
Slice
Up Smp
Register
FPGAASIC Support
FPGA Support Only
Scale
Sin Cos
Shift
Thresh
31
Communication DSP Blocks
Puncture
Conv. Encoder
Depuncture
DDS
CIC
FIR
FFT
FPGAASIC Support
FPGA Support Only
32
MAP (BCJR) Decoder
  • Fully enclosed design
  • Uniform RNG input vector
  • Channel encoder
  • AWGN filter
  • Channel decoder
  • BER collection mechanism
  • Part of 3G Turbo Decoder

33
MAP Simulation
  • 10 MHz system clock
  • SNR 14db ? -1db
  • 109 Samples
  • lt30minutes

34
ASIC Flow INSECTA
  • Tcl/Tk code drives the flow
  • Same scripting language used by several EDA
    tools First Encounter, Nanoroute, ModelSim,
    Synopsys
  • GUI controls technology selection, parameter
    selection, flow sequencing
  • A real Push Button flow
  • Users can refine flow-generated scripts

35
ASIC Flow Details
  • PC Software
  • Matlab R13 (6.5)
  • Xilinx ISE
  • Xilinx SystemGenerator 2.2
  • BEE ISE
  • Xilinx ChipScope
  • Xilinx Parallel Cable
  • UNIX SW Versions
  • TCL/TK 8.3
  • Synopsys 2002.05
  • Cadence SoCEncounter 2.2.(Nanoroute)
  • Modelsim 5.6
  • Cadence SE(icfb 4.4.6)
  • Mentor Calibre

Optional design steps
High-level Design
Generate backend scripts Insecta
View hierarchy Insecta
Identify files and paths Insecta
Run floorplanning First Encounter
View logic schematic DA
Resolve design hierarchy Insecta
Backannotate netlist DC
Gate-level simulation Modelsim
Check hierarchy consistency Insecta
Run physical synthesis DC/PSYN
View floorplan First Encounter
Identify bad VHDL structures Insecta
Run signal integrity First Encounter
View routed design NanoRoute
Correct bad VHDL structures Insecta
Re-run physical synthesis DC/PSYN
View log files Insecta
View GDSII pipo
Generate synthesis scripts Insecta
Run route NanoRoute
Virtual component generation MC
Post process DFII icfb
Run (first) logic synthesis DC
36
4092-bit LDPC Decoder
1.8 million transistors 2.7mm x 3.1mm (10x
smaller than a 1024-bit LDPC decoder) 1GHz (E.
Yeo)
37
Conclusions
  • Power and energy are now primary design
    constraints
  • Variations do not scale as well as the feature
    sizes
  • Optimization has to be performed across all the
    levels of hierarchy
  • Using multiple/variable supplies and thresholds
    helps achieve optimality
  • BEE and INSECTA (in 0.13µm) are fully operational
  • LDPC chip taped out in May
Write a Comment
User Comments (0)
About PowerShow.com