Title: An Automated Design Flow for LowPower, HighThroughput Dedicated Signal Processing Systems
1An Automated Design Flow for Low-Power,
High-ThroughputDedicated Signal Processing
Systems
W. Rhett Davis, Ning Zhang, Kevin Camera, Dejan
Markovic, Tina Smilkstein, Nathan Chan, M. Josie
Ammer, Engling Yeo, Borivoje Nikolic, Robert W.
Brodersen
- http//bwrc.eecs.berkeley.edu
2Energy-Efficiency of Architectures
1000
Dedicated HW
Direct mapped 100-1000 MOPS/mW
100
ReconfigurableProcessor/Logic
Reconfiguration (???) Potential of 10-100 MOPS/mW
Energy Efficiency MOPS/mW (or MIPS/mW)
10
1
Embedded mProcessors
Microprocessor .1-1 MIPS/mW
0.1
Flexibility (Coverage)
3Results in fully parallel solutions
Reducing supply voltage saves energy E CV2
(numbers taken from vendor-published
benchmarks) Orders of magnitude lower efficiency
even for an optimized processor architecture
4Standard DSP-ASIC Design Flow
Problems
- Three translations of design data
- Requirements for re-verification at each stage
- Uncontrolled looping when pipeline stalls
Prohibitively Long Design Time for Direct Mapped
Architectures
5Direct Mapping Design Flow
Algorithm/System
Simulation
Back-End
Front-End
Floorplan
RTL Libraries
Automated Flow
Mask Layout
Performance Estimates
- Encourages iterations of layout
- Controls looping
- Reduces the flow to a single phase
- Depends on fast automation
6Outline
- Chip-in-a-Day Flow
- User Perspective
- Design Decisions
- Estimation
- Automation
- Design Example
- Baseband Receiver
- Design Effort
- Problems
7Capturing Design Decisions
- Categories
- Function - basic input-output behavior
- Signal - physical signals and types
- Circuit - transistors
- Floorplan - physical positions
Layout and performance estimates in a day
8Performance Estimates
- Estimates vary in accuracy and execution time
- Different estimates should be available depending
on how many decisions have been made
9Push-Button Automation
- Automation similar to MAKE
- No decisions made after button is pressed
- No translation of design data
- No decisions expressed more than once
10Example Multiple Graphs
- Early versions inferred pads from ports
- Another copy of design was maintained to examine
internal signals (decisions expressed twice) - Todays version allows SimOnly ports
11Choice of Computation Model
- Balance between conflicting needs
- Abstraction of Hardware (simulation speed)
- Clear Verification Strategy (detailed hardware)
12Example Dataflow Graph
- Discrete-Time(cycle accurate)
- Fixed-Point Types(bit true)
- No need for RTL simulation
- Embed macro choices
- Used SimulinkTM
Multiply / Accumulate
13Hierarchical Dataflow Graphs
- Want the hardware hierarchy to match dataflow
graph hierarchy - Grouping needed to hide complexity
- Referencing needed to save design effort
14Modeling Control Logic
- Extended finite state-machine editor
- Co-simulation with dataflow graph
- New SoftwareStateflow-VHDL translator
- No need for RTL
Address Generator / MAC Reset
15Floorplan Merged on Each Iteration
(Function, Signal, Circuit decisions)
(Floorplan decisions)
save the state
16Outline
- Chip-in-a-Day Flow
- User Perspective
- Design Decisions
- Estimation
- Automation
- Design Example
- Baseband Receiver
- Design Effort
- Problems
17TDMA Baseband Receiver
- DSSS TDMA w/ length 31 spreading code 25 MHz
chip rate - 806 kHz symbol rate, w/ QPSK gives 1.6 Mb/s data
rate - 7 bit I Q streams at 200 MHz, 8 parallel
streams at 25 MHz
Q
18Design Effort
- Spec. Changes required no modification of
datapath macros - Routing began after 2 months
- No modification to dataflow graph from
switch-level sims. - Flow under development
- Reuse is crucial
19Automation Statistics
- Assuming automated flow and libraries are
debugged, design time is little more than a day
20Chip Layout Plot
- 600k transistors
- 0.18 mm
- 1.0 V
- 25 MHz
- 3.7 mm x 3.7 mm(w/ pads)
- 1.8 mm x 1.3 mm(core only)
- 21 mW
21Constant Shift Discrepancy
- Incorrect for right-shift and negative values
only - Breaks the verification methodology
- Solved by creating a new primitive
22Disabled Clock Discrepancy
- State is updated but output is held when disabled
- Solved by specifying certain cycles as dont
care
23Moving Beyond Cycle-Accuracy
- Using dont care windows could relax
cycle-accuracy constraint - Must be automated into verification process
24Conclusions
- Direct-Mapped hardware is the most efficient use
of silicon - Design with dataflow graphs, not sequential code
- Dont translate design data, refine it
- Verification methodology determines level of
abstraction - Chip-in-a-Day design flows are feasible