NoC Design and Implementation in 65nm Technology - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

NoC Design and Implementation in 65nm Technology

Description:

NoC Design and Implementation in 65nm Technology. 1 ... Placer is able to work on whole design. Theoretically optimal results ... – PowerPoint PPT presentation

Number of Views:268
Avg rating:3.0/5.0
Slides: 26
Provided by: nocsym
Category:

less

Transcript and Presenter's Notes

Title: NoC Design and Implementation in 65nm Technology


1
NoC Design and Implementationin 65nm Technology
Antonio Pullini1, Federico Angiolini2, Paolo
Meloni3, David Atienza4,5, Srinivasan Murali6,
Luigi Raffo3, Giovanni De Micheli4, Luca Benini2
1DAUIN, Politecnico di Torino, Italy 2DEIS,
University of Bologna, Italy 3DIEE, University of
Cagliari, Italy 4LSI, EPFL, Switzerland 5DACYA,
Complutense University, Spain 6CSL, Stanford
University, USA
2
Bringing NoCs to Success
Software Services Mapping, QoS, middleware...
CAD Tools
Architecture Packeting, buffering, flow control...
Physical Implementation Synchronization, wires,
power...
  • All these items are key opportunities and
    challenges
  • Strict interaction/feedback mandatory!...

3
NoC Physical Implementation
4
A Typical ASIC Back-End Flow
Design Space Exploration
RTL Coding
Logic Synthesis
Placement
Routing
  • How does this affect NoCs?
  • Locally synthesis, placement routing of NoC IP
    blocks?
  • Globally constraints on NoC links?

5
Placement Strategies
  • Virtual flat
  • Placer is able to work on whole design
  • Theoretically optimal results
  • Impractical for chip-sized designs (e.g. NoCs)
  • Soft macro
  • Fences define areas where placer can operate
  • If fine grain, fastest possible placement
  • If fine grain, too much designer effort
  • Identify tradeoff among effectiveness,
    performance and designer effort

6
xpipes Placement Approach
  • Floorplan mix of
  • hard macros for IP cores (may or may not allow
    over-the-cell routing)
  • soft macros for NoC blocks

7
Wireload Models and 65nm
  • Logic synthesis tools do not know about placement
    yet
  • Thus, loads and timing are uncertain!
  • Wireload models
  • Quite inaccurate. 130nm TCAD07 6 to 23 off
    from actual achievable post-placement timing
  • In 65nm, problem is dramatically worse
  • No timing closure after placement (-50
    frequency, huge runtimes...)
  • Traditional logic synthesis tools (e.g. Synopsys
    Design Compiler) insufficient

8
Placement-Aware Synthesis
  • Synopsys Physical Compiler flow

RTL
Observation 1 Use placement-aware tools to get
accurate estimations of design speed and to reach
timing closure. Traditional logic synthesis tools
may not be suitable.
Quick logic synthesis
Initial Netlist
Placement
Initial Placed Netlist
Thorough logic synthesis
Final Placed Netlist
9
Area Modeling
  • In our experiments, placementrouting is
    extremely sensitive to soft macro area
  • Fences too tight flow fails
  • Fences too wide tool produces bad results
  • Solution accurate component area models
  • Involves work since area depends on architectural
    parameters (cardinality, buffering...)

Observation 2 Thorough characterization of the
components may be key to the convergence of the
flow for a whole topology.
10
Technology Scaling on Modules
6x6 switch, 38 bits,6 buffers
  • Within modules, scaling looks great
  • 25 frequency
  • -46 area
  • -52 power

11
65nm Degrees of Freedom
Observation 3 There is no such thing as a 65nm
library. Power/performance degrees of freedom
span across one order of magnitude. It is the
designers (or the tools) responsibility to pick
the right cells.
  • Libraries differ in gate design, VT, VDD...

12
Link Design Constraints
65nm lowest power
65nm power/ performance
  • Power to drive a 38-bit (plus flow control)
    unidirectional link

Observation 4 Long links (unless custom
designed) become either infeasible, or too
power-hungry. Keep them segmented.
13
Link Repeaters/Relay Stations
  • Wire segmentation by topology design
  • Put more switches, closer
  • Adds a lot of overhead
  • Wire segmentation by repeater insertion
  • Flops/relay stations to break links
  • Details are strictly related to flow control
  • Could force design iterations for QoS
    provisioning!
  • Need for awareness in high-level CAD tooling!

14
Technology Scaling on Topologies
  • Three designs for max frequency

65 nm, 1 mm2 cores
  • 90 nm, 1 mm2 cores

65 nm, 0.4 mm2 cores
15
Mesh Scaling
  • Links
  • Always short (lt1.2 mm) ? non-pipelined
  • However
  • 90 nm 1 mm2 3.1 mW
  • 65 nm 1 mm2 3.6 mW (tightest fit ? more
    buffering)
  • 65 nm 0.4 mm2 2.2 mW
  • Power shifting from switches/NIs to links
    (buffering)

16
Updates to the xpipes Design Flow
17
A Complete NoC Design Flow
Application
Codesign, Simulation
User objectives power, hop delay
Constraints area, power, hop delay, wire length
NoC component library
FPGA Emulation
Input traffic model
IP Core models
Constraint graph Comm graph
Topology Synthesis includes Floorplanner NoC
Router
Platform Generation
Platform Generation (xpipes- Compiler)
Synthesis
Placement Routing
System specs
SystemC code
To fab
NoC Area models
RTL Architectural Simulation
NoC Power models
SunFloor
Floorplanning specifications
Area, power characterization
18
Example Layout
  • Floorplan is automatically generated
  • Black areas IP cores
  • Colored areas NoC
  • Over-the-cell routing allowed in this example

19
Studies on Task Graphs
20
dVOPD Application
21
Technology Scaling dVOPD
  • Low Power libraries cannot meet BW requirements
    of the application
  • Best topology features high-radix switches
  • Best topology does not change when moving to 65nm
  • But power improves a lot...

22
Complexity Scaling 65nm HP
  • Switch frequency must go up
  • Back-end cannot support switches of arbitrary
    cardinality
  • Therefore, more, smaller switches are
    instantiated
  • Link frequency/length must go up
  • Pipelined links get instantiated
  • Only 500 MHz would be achievable otherwise

23
Communication Efficiency
  • Much more efficiency when...
  • Moving from 90nm to 65nm ( 100)
  • Moving to low-frequency designs ( 50)
  • Moving to low-power libraries ( 100)

24
Conclusions and Future Work
  • NoCs in 65nm are feasible and perform well
  • 65nm design presents opportunities and challenges
  • Tool flows are key
  • Block pre-characterization is important
  • More degrees of freedom for block implementation
  • Pay attention to long links
  • A complete flow to map applications onto NoCs
    with back-end awareness
  • In depth study on high-performance vs. low-power
  • Leakage studies
  • Alternate link designs and optimizations

25
Questions Welcome!
Thank You
Write a Comment
User Comments (0)
About PowerShow.com