Title: Interconnect Focus Center
1Driver II The Collaborative Node (A Design
Driver for the IFC and the GSRC)
Anantha Chandrakasan, Rafael Reif (MIT) Jan
Rabaey (U.C. Berkeley) GSRC Liaison
Interconnect Focus Center
e
e
e
e
2Emerging Interconnect Issues
Power Dissipation T. Sakurai
Interconnect Delay Alpha Processors
Power in Interconnect
'95
'00
'5
'10
Year
- Integrated technology and system methodology
necessary - Technology (3-D, Optics,RF)
- Communication-aware Architecture and Circuits
- Predictive Interconnect Design Tools
3Wireless Sensor NodesState-of-the-Art
Smart Dust (U.C. Berkeley)
mAMPS (MIT)
PicoRadio(U.C. Berkeley)
Supported by DARPA
Integrated self-contained system to sense,
process and communicate
4Integrated System-on-a-Chip
Sensors (MEMS)
Energy chain(MEMS, passives, non-CMOS)
Computation/communication (CMOS)
Air interface(MEMS, passives, CMOS/Si-Ge)
- Requires integration of diverse process
technologies in a compact form factor (lt 1cm3) - Low MIPS requirement, but requires flexibility to
adapt to time-varying scenarios - Severe energy constraints (Average power lt 100mW)
- Mixed-signal interconnect issues
53-D Technology Integration
Tasks V, II, IV, and VI
- Research includes device and interconnect
integration, thermal and reliability issues,
performance modeling
Compact Interconnection of Heterogeneous
Technologies
6Physical Design Tools for 3-D Integration
3-D Standard Cell Placement and Routing
ERNI 3-DReliability Testing
3-D Magic
Tasks I and V
Design Tools Support 3-D Design
7m-Processor Power Breakdown
A. Sinha, DAC 01
Montanaro, JSSC 96
- High interconnect (clock, busses, memory lines)
and control energy overhead for a primitive
operation (e.g., an add) - Computation energy efficiency is ltlt 1
8Power Variation on the Xscale Processor
900
800
700
600
ARM Core
Energy (pJ)
500
Instructions
400
Coprocessor
Instructions
300
200
100
0
ldr
orr
bic
clz
eor
adc
add
and
mla
mul
mia
mov
mvn
mar
mra
cmn
cmp
miatt
miabt
miatb
miaph
miabb
b (predicted)
bl (predicted)
mrs (SPSR)
mrs (CPSR)
b (unpredicted)
bl (unpredicted)
mla (accumulate)
- Uses Extensive Clock gating
- Second level at units 83 unique enables
- Third level at blocks (I.e., local clock
buffers) 317 unique enables
9Computational Fabrics
1nJ/Op
0.25nJ/Op
Flexibility
- Key Questions
- How to interconnectenergy optimal points?
- What is the right granularity for flexible
logic?
Embedded Processor (ARM)
TI DSP
0.1pJ/Op
Embedded FPGA
Domain Specific Processors
Direct Mapped Hardware
Energy/Operation
Courtesy of R. Brodersen and J. Rabaey (Data from
StrongARM and TI DSP)
10Example of Interconnect Centric Architecture
MEM
MEM
MEM
PE
PE
PE
MEM
MEM
MEM
PE
PE
PE
- Exploits locality of reference, low interconnect
costs - Overhead amortized over multiple functional units
11Ultra Low Power FPGA Fabric
- Interconnect Architecture
- Granularity Selection
- Resource Utilization
- Interconnect Technology
- 3-D
- Optics
I/O Block
Channel Memory Block
H-to-V PIM
I/O Block
V-to-H PIM
Logic Block
Programmable Interconnect
- Interconnect Circuits
- Bus Coding
- Charge Recycling
- Low Swing
DC
IO
1
1
Clock/Control network
17
In Collaboration with Cypress
Clusters
13
Task I
Channels
68
12Power Scalable Design
E16x16
Energy
Scenario distribution
Esystem
Ep
Energy
Input Precision
Eperfect
Probability
Scenario
Input Precision
13DSM Energy Model (3-bit Bus)
Standard model
of transitions of cost E
Normalized Energy, E
Sub-micron model
of transitions of cost E
Normalized Energy, E
Minimizing the transition activity is not the
right approach to minimize power
14Transition Pattern Coding
Sotiriadis 00, ESSCIRC
Task I
- Coding can be combined with charge recycling to
further reduce energy
15Low-Swing Signaling
Common Level Converter
Differential Circuit
- Low swing signaling used for energy reduction
- Explore practical limitations and efficient
schemes for low swing signaling - Analyze sensitivity to process variations
Task II
16Energy Efficient Networks-on-a-Chip (GSRC)
- Orthogonalizes function and communication
- Allows for the choice of the most-efficient
interconnect medium, exploiting locality,
latency, and data properties
Behavior Captured as CFSMs
- Protocol Processor
- 1.3 M transistors in 0.18 mm CMOS
- 17.5 mm2 core size
- 12 mW average power
- 1.2 V supply (core)
- 12.5 MHz fclock
Mapping and Communication refinement
Debug Port
R/F chip
Voice Port
Cache
CPU Xtensa
Physical Layer
Protocol
UI Contr
Silicon Backplane
SDRAM Contr
Flash Contr
Sm0 (snoop)
Buttons/ display
Flash Port
SDRAM Port
Snoop Port
GSRC In cooperation with DARPA PAC/C
Silicon Implementation
17Power-Networks on-a-Chip
Battery
Active Power Network
Load
Load
Load
Chip-Supervisor manages power-up of units
In collaboration with GSRC
18Wavelet Sparification for Substrate Coupling
Contact layout
Table of larger Examples
Task II
19Summary
- Tight coupling between the Interconnect Focus
Center and GSRC through the Collaborative Node
design driver - A design driver for mixed-technology integration
and energy consumption - Communication-centric optimization is not just
about wire engineering it requires an
integrated system-level methodology