Title: Introduction to Multiprocessor System-on-Chip
1Introduction to Multiprocessor System-on-Chip
- Prof. Jan Madsen
- Informatics and Mathematical Modeling
- Technical University of Denmark
- Richard Petersens Plads, Building 321
- DK2800 Lyngby, Denmark
2Embedded systems
io
3Embedded systems
- Systems which use a computer to perform a
specific function, but are neither used nor
perceived as a computer - They are embedded within larger electronic
devices - Repeatedly carrying out a particular function
- Often completely unrecognized by the devices user
4Embedded systems design
Separated validations
Prototype realization
5Principples of Codesign
void UnitControl() up down 0 open
1 while (1) while (req floor)
open 0 if (req gt floor) up 1
else down 1 while (req ! floor)
open 1 delay(10)
6Overview
- Technology
- Processors
- IC fabric
- Codesign for speed-up
- component execution timing (SW and HW)
- Building sub-system
- Hardware/software partitioning
- Building system
- System-level issues of codesign
7Software
- Elements of computation
- Store data
- Transform data
- Move data
8Processor
func
if ...
then ...
else ...
for ... ..
- Architecture components
- Processing elements transform data
- Memories store data
- Interconnect move data
9Processor General Purpose
func
if ...
then ...
else ...
for ... ..
- Availability
- Low cost (mass production)
- Simple design flow
- High flexibility
10Processor General Purpose - example
p1
func
if ...
inst mem
controller
datapath
data mem
then ...
else ...
ir
cu
func
for ... ..
reg
pc
/-
x x Ai p1
5 cycles
11Processor Custom (ASIC)
func
if ...
then ...
else ...
for ... ..
- High performance
- Low power
- Complex design flow
- No flexibility
12Processor Custom (ASIC) example
p1
func
if ...
controller
datapath
then ...
else ...
cu
mem
Ai
for ... ..
/-
x x Ai p1
1 cycle
13Processor Semicustom (ASIP)
func
if ...
then ...
else ...
for ... ..
- Costumized datapath 16, 8 or 4 bit
- Optimized for particular class of programs - MACC
- Simple design flow
- High flexibility
14Processor Semicustom - example
p1
func
if ...
inst mem
controller
datapath
data mem
then ...
else ...
ir
cu
func
Ai
for ... ..
reg
pc
/-
x x Ai p1
2 cycles
15IC fabrics
- IC is an interconnection of transistors following
one of several possible styles fabrics - The fabric defines how and when transistors are
composed - the material of processors
- IC fabrics differ in terms of customizability and
generality
16IC fabrics Custom
- Exact implementation of processor components
- High NRE cost mask set 1M
17IC fabrics Semicustom
- Several semicustom fabrics
- Library of standard cells
- Cell arrays (sea-of-gates)
- Most processing steps are pre manufactured (high
volume)
18IC fabrics Programmable
- Set of interconnected modules
- Set of modules programmed to implement different
components - FPGA
- Programmable logic modules, storage and
interconnect
19Chips Implementing IC fabric
20Hardware/software codesign?
- Many possible mappings
- Processor may not exist yet!
- Exploring the design space
- Need to estimate
21Hardware/Software Codesign
- Optimizing
- Timing (high performance, hard deadlines)
- Area (cost)
- Power consumption
- Flexibility
- Reliability
- ...
- We will focus on timing
22Processing element timing
- Execution path
- Control data dependent
- Input data dependent
- Function implementation
- Component architecture
- Compiler or synthesis
23Formal execution path timing analysis
b1
if ...
b3
b2
else ...
then ...
for ... ..
b4
24Formal execution path timing analysis
b2
then ...
25Memory models
- Access time
- Control overhead
- Burst access (packets)
- Cache
- hit/miss time overhead
- Based on execution history
26Advanced architectures
- Modern high performance processors includes
architectural features which complicates timing
analysis - Dynamic instruction scheduling
- Speculative execution
- Though fast, it makes
- the processor very power hungry
- tight bounds on timing very difficult
- Computation less predictable
- Issues which are important for embedded systems
27Building sub-systems
func
if ...
then ...
else ...
for ... ..
- Initial codesign problem
- Hardware/software partitioning
- the LYCOS cosynthesis tool
- Automatic partitioning from C (subset) and VHDL
(single process) - Developed at DTU
28Hardware/Software partitioning
func
b1
if ...
b2
b3
then ...
else ...
b4
for ... ..
CPU
ASIC
CPU
ASIC
29Architectural choices
- Which processor should be selected and how fast
should it be? - Which ASIC technology should be chosen and how
fast should the ASIC be? - How large an ASIC can we afford and which
functions should it execute? - How should the processor and ASIC communicate?
30Partitioning Model
- Determines granularity and simplifying
assumptions w.r.t. communication, HW sharing, etc
31Estimation
SW
HW
32Process communication
b1
if ...
b2
b4
else send(...) receive(...)...
then ...
for ... ..
b3
33Solving the Partitioning Problem
SW
HW
1
2
3
4
5
6
Just try all combinations...
34Solving the Partitioning Problem
Parallel execution non-additive areas
Interleaved communication additive areas
No communication interleaved exec. additive areas
Knapsack Stuffing
Large scale linear/nonlinear integer programming
Heuristics needed!
35LYCOS Design Flow
Specification
Require
Functional
Translate
SW
Analysis
CDFG
SW Estim.
Model
HW
Partitioning
HW Estim.
Model
Comm.
Comm. Estim.
Model
CDFG
HW
SW
Comm.
Synthesis
Synthesis
Synthesis
Assembler
Netlist
SW/HW
36Building Systems
- Platform architectures are heterogeneous
- Different processing element types
- Different interconnection networks and
communication protocols - Different memory types
- Different scheduling and synchronization
strategies
37Managing HW platform complexity
- Development of APIs to hide complexity from
application programmer and improve portability - Specialized RTOS to control resource sharing and
interfaces - aComplex multi-level HW/SW architecture
38Software architecture
pe1
mem
private
application
RTOS
RTOS-APIs
shared
private
private
drivers
private
Cache
Bus
ce1
39Platform design challenges
- Integration
- Design process integration
- Heterogeneous component and language integration
- Design space exploration and optimization
- Verification
40Complex run-time interdependencies
PE
PE
CoP
- Run-time dependencies of independent components
via communication - Influence on timing and power
- Need to handle resource sharing
- Process/task scheduling
- Communication scheduling
- Scheduling strategies (static, dynamic, time or
priority driven)
41Interdependency example
- Complex non-functional interdependencies
- Periodic task executing on PE
- Task writes to bus at the end of each periodic
execution
Short execution time ahigh bus load
long execution time alow bus load
Local decision on improving performance may
impact the global system performance
42System-on-Chip challenge
43Network-on-Chip
- Multi-hop
- Segmented communication
- Concurrency
- Multiple simultaneous communications
44Network-on-Chip
- Multi-hop
- Segmented communication
- Concurrency
- Multiple simultaneous communications
- Sharing
- Quasi-simultaneous resource usage
- Multiple communication events occupying some or
all resources in an interleaved fashion
45System-on-Chip design
46New design paradigme ...
47thank you!