Title: DATE: Smart Interconnects for HMP SoCs
1DATE Smart Interconnects for HMP SoCs
- Efficient Data Flow for Multimedia-Intensive
Heterogeneous MultiProcessor SoCs
Jeff Haight, Dir. Tech. Mrktng Sonics,
Inc. 650-605-6171 jhaight_at_sonicsinc.com
2Who is Sonics
- Established in 1996
- Headquartered in Mountain View, CA
- Field offices London, Munich, Nice, Seoul, Tokyo
- Development center Yerevan
- Senior management team
- Customer base of industry-leaders in
- Wireless and Communications
- Digital Consumer
- Office Automation
- Over 200 million chips shipped using Sonics
SMART Interconnect Solutions
3Using the Old, Adding the New
Core 1
AXI for Seamless Connections
Core 2
µP
Core N
AHBLegacy Support
Intelligent Internal Interconnect
SMART Interconnects
OCP Maximizes Flexibility
AHB Cores
Core
DSP Core
APBLegacySupport
I/O
Memory
SoCs
Circa 2006
Adding Intelligent Data Flow Services
4Multicore Mobile Handset Example
P
P
T
T
S3220
T
T
T
T
CPU Tile
2D/3D GraphicsTile
MPEG4 CodecTile
MP3
USB 2.0
I
I
I
I
I
SMX
SMX
I
T
T
I
I
I
I
I
I
Flash Controller
T
T
I
T
T
DSP Tile
LCDController
CameraInterface
DMA
EmbeddedSRAM
SDRAM Controller
T
T
T
T
T
P
5Mobile Handset Example
P
P
T
T
S3220
T
T
T
T
CPU Tile
2D/3D GraphicsTile
MPEG4 CodecTile
MP3
USB 2.0
I
I
I
I
I
SMX
SMX
I
T
T
I
I
I
I
I
I
Flash Controller
T
T
I
T
T
DSP Tile
LCDController
CameraInterface
DMA
EmbeddedSRAM
SDRAM Controller
T
T
T
T
T
P
6Physical Implementation
- Cores 18 (10ia 8ta agents)
- TSMC 90nm (CLN90G-HiVt) Low Power Process
- SMX gate count 369K
- Die Area 8x8 sq mm
- SMX Cell Area 1.7 sq mm
- Frequency 250 MHz
- Features 1 XB 1 SL 2 PPs, Fully connected
- Benchmark for Wireless Cell Phone chip
7The Requirements.
- Multimedia traffic demands high bandwidth
- Congestion avoidance critical especially to
shared memory access high efficiency DRAM
scheduling - Predictability in performance
- Low power and leakage management
- Access protection support for Digital Rights
Managements - Low latency requirements and guaranteed
throughput high QOS - Scalability
- Fast time to market, IP reuse, and rapid feature
set evolution
8Data Flow Design ChallengesTypical Bus Style
Offerings Address Few of the Real Issues
Perf. Verification
Virtual Prototyping
Parallel IP Creation
Arch. Modeling
BusGenerator
Design Re-use
SW Development
Variable Clock Freq.
Timing Closure
Voltage Isolation
On-ChipBus
Complex Memory Hierarchies
Power Management
Error Management
Signal Integrity
Access Security
High Peripheral Count
Data Width Conversion
Distributed Processing
Mixed Endianness
Guaranteed BW QoS
Protocol Conversion
Pipelining
9SoC Design Reality.
10Perceptions of the Problem
11IP Core Integration is THE SoC challenge
- Problem Hardware performance bottlenecks
- Cause Processors blocked from intercommunication
or memory access - Problem Competitive Chip Power-Performance-Area
- Cause Non-optimal interconnect forces design
compromises - Problem Missed Time-to-market windows
- Cause Long verification tails and
re-engineering times because problems found right
before tape out - Problem Increasing software development
dependency - Cause Software having to workaround
architecture issues
12Drivers of SoC Design Economics
More engineers for longer project schedules
Cost
The primary source for these trends is
increasingly complex requirements for SoC
interconnectivity
Time To Market
Feature Set
More features require more engineers for longer
projects cycles
Longer schedules escalate feature demand
13 SoC Design Complexity Circa 2003-2005
SoC Design Complexity Circa 2005-2007
Sonics Offers SMART Interconnect Solutions
with Comprehensive Data Flow Services ?
Advanced features available early when
architecting product lines lowers development
costs ? Minimal re-engineering of product
derivatives via IP core and interconnect
decoupling reduces time to market gaps ?
Consistent IP block and sub system sharing
reduces probability of chip respins
Interconnects today are more than just wires
In-house interconnect design for most SoCs
Heterogeneous multi-processing exponentially
increases Architectural complexity of interconnect
14Outsourcing Has The Largest Impact
Outsourcing improves productivity gt25....
15And Outsourcing is gt 5 Times LESS Expensive
IP Acquisition Costs
35
30
3rd Party IP Cost
25
20
Percent of Silicon Cost
15
10
5
Internally Developed IP Cost
0
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
Significant cost advantage over in-house
interconnect
16The Intelligence is in the Agents
- Agents provide
- Protocol conversion
- Agent adapts to IP core
- Decoupling of IP cores from fabric
- Provide local, isolated environment
- Data flow services
- Agent data flow services
- QoS-based arbitration
- Power management
- Access security
- Error management
- Burst, width, and command conversion
INITIATOR SOCKETS
I
I
I
I
I
Initiator Agents (IA)
Fabric
Target Agents (TA)
T
T
T
T
T
TARGET SOCKETS
17Network-based SoC Example Actively Decoupled
- Separation
- Abstraction
- Optimization
- Independence
DRAM Controller
DMA
CPU
Network
18SonicsMX Basic Architecture
- Hybrid topologies
- Full / partial cross-bar
- Shared bus
- Fully split (dual) request / response
- Pipelined, multi-threaded, non-blocking fabric
- Distributed QoS arbiter
- Spans cycle, frequency, and data width boundaries
- Supports flexible thread merging tree topologies
SMX
CPU
ROM
DSP
SRAM
GFX
FlashCtl.
DRAMCtl.
SMX
T
T
I
I
I
I
I
19Sonics Delivers Time-to-Market
Today (Typical)
Architectural Definition
Logic Design Verification
Physical Design
Fab, Assy, Test
First Design
Derivative Design(s)
12 to 18 month time engineering savings !
- How is this possible?
- Socket-Based Design Methodology
- Highly Configurable Interconnect IP
20QoS-based Arbitration
- Initiator data flow threads mapped to target
threads by SMX fabric - E.g. 40 data flows sharing 8 DRAM threads in a
digital video system - Data flows sharing a target thread arbitrated
using bandwidth weighting - Independent threads assigned to QoS level
(maintained throughout SMX) - Non-blocking, multi-threaded fabric and target
interfaces allow - Higher priority requests to interleave with
respond before others - Guaranteed BW threads to minimize buffering /
receive latency guarantees - Optimum DRAM efficiency
Thread QoSLevel BandwidthAllocation ? QoS Model
Priority Yes Low latency while within BW allocation, best-effort otherwise
Bandwidth Yes Guaranteed BW while within BW allocation, best-effort otherwise
Best-effort No N/A
21 Design Flow With Smart Interconnects
Today (Typical)
Architectural Definition
Logic Design Verification
Physical Design
Fab, Assy, Test
First Design
Derivative Design(s)
12 to 18 month time engineering savings !
- How is this possible?
- Socket-Based Design Methodology
- Highly Configurable Interconnect IP
22Access Security
- Optional multi-region firewall
- Per-target, re-programmable
- Layered architecture supports rich set of
security domains with variable region sizes - Access permissions determined per role and access
type - Flexible security error caching and reporting
I_0 I_1 I_N
PR0 R RW R
PR1 X W R
PR7 X X R
role_1 role_2 role_32
Y Y N
N Y N
N Y Y
MAddr, MAddrSpace
L3 CAM
L2 CAM
L1 CAM
L0 CAM
1
L0 permissions
L2 permissions
L1 permissions
L3 permissions
L0 valid
L3 valid
L2 valid
L1 valid
priority
role permissions
MReqInfo
role
write permissions
read permissions
group ROM
Init thread ID
group
roleOK
read OK
MCmd
write OK
Access OK
TARGET CORE
23Data Flow Services Security Management
- Optional multi-region firewall
- Per-target, re-programmable
- Layered architecture supports secure update of
permissions and sizes - Access permissions determined per role and access
type - Flexible security error caching and reporting
CPU in user mode fails
CPU in supervisor, unsecure OK
DMA OK
Any intiator, RW
CPU in user, RO
CPU in supervisor
CPU in supervisor, non-secure
Default region CPU in supervisor, secure
24Power Management
- Simplifies design of APM
- Active status indication configurable on a
per-socket basis - Allows target-specific power management
- Supports interconnect chaining -- Unit Power
Managers can simply OR active flags for all
incoming signaling - Request/OK handshake provided by the interconnect
- Handshake sequence
- Application Power Manager (APM) makes request
- Interconnect blocks new transactions (continuing
existing transactions) - Interconnect drains
- Interconnect indicates OK
- APM removes clock or voltage, as appropriate
APM
I0
I1
Active
Unit Pwr Mgr
Active
Active
Active
Down_req
IA
IA
Down_ok
I2
TA
TA
Active
Active
Unit Pwr Mgr
Active
Active
T0
IA
IA
Down_req
Down_ok
TA
TA
Active
T1
T2
25Multicore Mobile Handset Example
P
P
T
T
S3220
T
T
T
T
CPU Tile
2D/3D GraphicsTile
MPEG4 CodecTile
MP3
USB 2.0
I
I
I
I
I
SMX
SMX
I
T
T
I
I
I
I
I
I
Flash Controller
T
T
I
T
T
DSP Tile
LCDController
CameraInterface
DMA
EmbeddedSRAM
SDRAM Controller
T
T
T
T
T
P
26SMART Interconnect ApproachAddresses the Total
Global Interconnect Challenge
Perf. Verification
Virtual Prototyping
Parallel IP Creation
Methodology Automation
Arch. Modeling
Design Re-use
SW Development
Variable Clock Freq.
ScalableFabrics
Timing Closure
Voltage Isolation
Power Management
Complex Memory Hierarchies
IntelligentAgents
Error Management
Signal Integrity
Access Security
High Peripheral Count
Data Width Conversion
Distributed Processing
Mixed Endianness
Guaranteed BW QoS
Protocol Conversion
Pipelining
27Sonics Delivers Time-to-Market
Today (Typical)
Architectural Definition
Logic Design Verification
Physical Design
Fab, Assy, Test
First Design
Derivative Design(s)
12 to 18 month time engineering savings !
- How is this possible?
- Socket-Based Design Methodology
- Highly Configurable Interconnect IP
28Continuous Integration
- In traditional bus-based designs, integration is
performed once, at the end of the logic design
phase - Architecture, µArchitecture, and logic nearly
frozen - Labor-intensive error-prone
- In SMART Interconnect-based design, integration
is performed continuously - Validate choices
- Explore implications at lower levels
- Cope with (inevitable) specification changes
- Allow optimization at any time, at any level
- System C modeling capabilities allows different
levels of abstraction, rapid architectural
exploration, simulation, and concurrent software
development
29Key Schedule Resource Differentiation
- Use of SMART Interconnects cuts design time
- Conventional design is serial iterative
- Sonics structured approach is // predictable
- Key differences
- Decoupling / Complete socket for IP cores
- Modeling of Communications / tradeoffs
- Predictable physical implementation
- Quality of Service Guarantees
- Automation of integration
- Architectural investigations based on real
process technology data
30Thank You
Nokia
Sony
Hughes Network Systems
Over 200 million Sonics enabled chips shipped
Cisco
Samsung
Dell
Toshiba