Processor Architectures and Program Mapping - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

Processor Architectures and Program Mapping

Description:

ASUs = complex appl. Spec. FUs (beyond subword //) e.g. biquad, median, DCT etc ... drivers. Additional characteristics of the A|RT designer template ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 43
Provided by: abc774
Category:

less

Transcript and Presenter's Notes

Title: Processor Architectures and Program Mapping


1
Processor Architectures and Program Mapping
  • Application domain specific processors
  • (ADSP or ASIP)
  • 5kk10
  • TU/e
  • Henk Corporaal
  • Jef van Meerbergen
  • Bart Mesman

2
Application domain specific processors (ADSP or
ASIP)
DSP
Programmable CPU
Programmable DSP
Application domain specific
Application specific processor
flexibility
efficiency
3
Application domain specific processors (ADSP or
ASIP)
  • takes a well defined application domain as a
    starting point
  • exploits characteristics of the domain
    (computation kernels)
  • still programmable within the domain
  • e.g. MPEG2 coding uses 88 DCT transform, DECT,
    GSM etc ...

performance clock speed ILP ILP
tuning to domain flexible dev. (new
apps.) cost effective (high volume)
problems
- specification
manual design, - design time and
effort large effort
gt synthesized cores
4
www.adelantetech.com
5
Outline
  • design process
  • retargetable code generation (problem statement)
  • ADSP/VLIW architectures (Mistral 2 /ART
    designer)
  • instructive demo (Adelante)
  • application examples
  • low power aspects (Mistral 2 /ART designer)
  • discussion
  • conclusion

6
Design process
processor- model
application(s)
e.g. VLIW with shared RFs
instance
parameters
3 phases 1. exploration 2. hw design (layout)
processing 3. design appl. sw
HW design
SW (code generation)
Estimations nsec/cycle, area, power/instr
Estimations cycles/alg occupation
Fast, accurate and early feedback
no
yes
yes
no
go to phase 2
7
Problem statement
A compiler is retargetable if it can generate
code for a new processor architecture
specified in a machine description file.
A guarded register transfer pattern (GRTP) is a
register transfer pattern (RTP) together with the
control bits of the instruction word that
control the RTP.
a b c instr xxxx0101 GRTPs contain all
inter-RT-conflict information.
Instruction set extraction (ISE) is the process
of generating all possible GRTPs for a specific
processor.
8
Problem statement
Algorithm spec
Processor spec (instance)
in ch 4 this is part of the code generator
FE
ISE
CDFG
GRTP
Code Generation
Machinecode
9
Example Simple processor Leupers
I.(125)
Inp
RAM
I.(4)
I.(2013)
1
PC
I.(32)
IM
I.(10)
I.(200)
REG
outp
10
Example Simple processor Leupers
11
ASIP/VLIW architectures
ART designer template as an example ( set of
rules, a model)
  • Differences with VLIW processors of ch. 4
  • 1. // FUs
  • ASUs complex appl. Spec. FUs (beyond subword
    //)
  • e.g. biquad, median, DCT etc
  • larger grainsize, more heterogeneous, more
    pipelines
  • 2. Rfiles
  • many Rfiles (gt5 vs 1 or 2)
  • limited ports (3 vs 15)
  • limited size (lt16 vs. 128)
  • 3. Issue slots
  • all in parallel vs. 5

12
RF5
RF7
RF1
RF3
RF6
RF8
RF2
RF4
FU3
FU4
FU1
FU2
flags
IR1
IR2
IR3
IR4
Instruction memory
Con- trol
13
ASIP/VLIW architectures
  • Additional characteristics of the ART designer
    template
  • interconnect network busses input
    multiplexers
  • mux control is part of the instruction
  • control can change every clock cycle
  • network can be incomplete
  • busses can be merged
  • memories are modeled as FUs
  • separate data in and data out
  • 2 inputs (data in and address) and 1 output
  • Each FU can generate one or more flags
  • instruction format (per issue slot)

14
ASIP/VLIW architectures example
RF1
RF2
RF3
RF4
ALU
MAC
bus1
bus2
15
ASIP/VLIW architectures example
16
assign ( ab, ALU, fu_alu1) assign ( a_, ALU,
fu_alu2) assign ( __, ALU, fu_alu3)
ASIP/VLIW architectures design flow
Algorithm spec
Datapath synthesis
RF1 x RF2 y, RF3 z ALU ADD Inmux
bus2
Change pragmas
RTs
Controller synthesis
Estimations area, power, timing
no
VLIW makes relatively simple code
selection possible
yes
17
ASIP/VLIW architectures list scheduling
Candidate
Conflict
Scheduled
IPB
LIST
Priority Comp.
Operation

4
OPB
MULT
ALU
IPB
OPB
5
18
ASIP/VLIW architectures feedback
19
Outline
  • design process
  • retargetable code generation (problem statement)
  • ASIP/VLIW architectures (Mistral 2 /ART
    designer)
  • instructive demo (Adelante)
  • application examples
  • low power aspects (Mistral 2 /ART designer)
  • discussion
  • conclusion

20
Application examples adaptive filter
Minimizes the difference between x and e
(reference signal)
x
y
filter
c0
c1
c63
Control unit
-
e
r
  • Many applications are possible
  • echo cancelling for TV
  • e flyback signal (known without echoes)
  • automatic equalization of cables in data
    transmission
  • acoustic echo cancelling

21
Application examples adaptive filter
speech
x
speaker
y
filter
c0
c1
c63
microphone
r
Control unit
-
Speech noise
e
noise
22
Application examples adaptive filter
noise (e.g. radio)
Hearing aid
x
y
filter
c0
c1
c63
r
Control unit
-
Speech noise
e
speech
23
Application examples adaptive filter
xn
xn-1
xn-i
xn-63
Z-1
Z-1
Z-1
c0
c1
ci
c63

An

A0

A1

Ai
tn
S63n
S0n
S1n
Sin

ê n

mu
Z-1
rn
-
en
24
Application examples adaptive filter
xn-i
Ai
Cin
Cin-1
Z-1


tn
25
Application examples adaptive filter
sumi
t
r
x_at_i
r

ci_at_1


w

sumi1
26
Application examples adaptive filter
implementation 1
2
1
1
2
1
2
2
3
RAM
ALU
ROM
MULT
ACU
bus1
bus2
266 clock cycles 1.1 mm2
27
Application examples adaptive filter
implementation 2
4
1
5
5
2
5
RAM
ALU
ROM
ACU
bus1
bus2
2250 clock cycles 0.7 mm2
28
Application examples adaptive filter
implementation 3
1
2
2
2
1
1
1
2
1
1
1
1
RAM1
ACU1
ALU
MULT
RAM2
ROM
ACU2
202 clock cycles 1.4 mm2
29
clock cycles
2000
1000
area (mm2)
1
2
30
Outline
  • design process
  • retargetable code generation (problem statement)
  • ADSP/VLIW architectures (Mistral 2 /ART
    designer)
  • instructive demo (Adelante)
  • application examples
  • low power aspects (Mistral 2 /ART designer)
  • discussion
  • conclusion

31
Low power aspects
  • Estimation

area

speed
power
Mistral2
Estimation Database
Architecture
32
GSM viterbi decoder default solution
EXU ACTIV AREA POWER alu_1 96 3469 46196 romctrl_
1 48 39 259 acu_1 26 327 1209 ipb_1 5 131 105 o
pb_1 23 1804 5801 ctrl 9821 135035 total 15591
188605
13750
  • controller responsible for 70 of power
    consumption
  • maximum resource-sharing
  • heavy decision-making main loop with 16
    metrics-computations per iteration
  • EXU-numbers include Registers for local storage

33
GSM viterbi decoder no loop-folding
EXU ACTIV AREA POWER alu_1 92 3411 45073 romctrl_
1 45 39 255 acu_1 25 294 1087 ipb_1 5 107 86 op
b_1 22 1661 5340 ctrl 4919 70087 total 10431 12
1928
14247
  • area down by 33
  • power down by 35
  • next step reduce of program-steps with second
    ALU

34
GSM viterbi decoder 2 ALUs
EXU ACTIV AREA POWER alu_1 69 1797 12248 alu_2 65
1393 8916 romctrl_1 67 39 255 acu_1 37 294 108
7 ipb_1 8 149 119 opb_1 33 2136 6871 ctrl 8957
87235 total 14766 116731
9739
  • cycle count down 30
  • area up 42
  • power down by 5
  • next step introduce ASU to reduce ALU-load

35
GSM viterbi decoder 1 x ACS-ASU
func ACS ( M1, M2, d ) MS, MS8 begin
MS if ( M1d gt M2-d ) -gt ( M1d) (
M2-d) fi MS8 if ( M1- d gt M2d) -gt
( M1- d) ( M2d) fi end

EXU ACTIV AREA POWER alu_1 20 261 105 acs_asu_1
83 2382 3816 or_asu_1 10 611 122 romctrl_1 1
6 65 21 acu_1 36 294 205 ipb_1 20 107 43 opb_
1 11 163 35 ctrl 1864 3597 total 5747 7944
1930
  • cycle count down 5X
  • power down 20X !

36
GSM viterbi decoder 4 x ACS-ASU
EXU ACTIV AREA POWER alu_1 94 243 97 acs_asu_1
95 1041 420 acs_asu_2 95 1041 420 acs_asu_3 95
1041 420 acs_asu_4 95 1041 420 split_asu_1 47 90
18 or_asu_1 47 592 118 romctrl_1 28 48 6 acu_
1 98 212 85 ipb_1 23 60 6 opb_1 50 369 80 ct
rl 1306 555 total 7084 2645
425
  • cycle count down another 5X
  • area up 23
  • power down another 3X !

37
GSM viterbi example summary
Mistral2
72x !
38
Discussion phase 3
processor- model
application(s)
application(s)
HW design
SW (code generation)
SW (code generation)
Freeze processor model
no
no
no
yes
yes
no
yes
Application software development constraint
driven compilation
Exploration phase
39
Discussion problems with VLIWs
code size and instruction bandwidth
  • code compaction reduce code size after
    scheduling
  • possible compaction ratio ?
  • e.g. p0 0.9 and p1 0.1
  • information content (entropy) -? pi log2
    pi 0.47
  • maximum compression factor ? 2
  • control parallelism during scheduling switch
    between
  • different processor models (10 of code 90
    runtime)
  • architecture
  • reduce number of control bits for operand
    addresses
  • e.g. 128 reg (TM) -gt 28 bits/issue slot for
    addresses only
  • gt use stacks and fifos

40
RF1
RF2
RF3
RF4
FU3
FU4
FU1
FU2
flags
IR1
IR2
IR3
IR4
Instruction memory
Con- trol
41
Discussion clustered VLIW architectures
RF1
RF2
RF3
RF4
FU1
FU2
FU3
FU4
42
Conclusions
  • ASIPs provide efficient solutions for
    well-defined application domains (2 orders of
    magnitude higher efficiency).
  • The methodology is interesting for IP creation.
  • The key problem is retargetable compilation.
  • A (distributed) VLIW model is a good compromise
    between HW and SW.
  • Although an automatic process can generate a
    default solution, the process usually is
    interactive and iterative for efficiency reasons.
    The key is fast and accurate feedback.

43
Imagine assignment
  • For the coming 3 weeks
  • Install the tools (VisualC package will be sent
    by mail)
  • Read the beginners guide
  • Experiment with the compiler on a few examples
  • http//www.ics.ele.tue.nl/hfatemi/5kk10/
  • Further information on Imagine
  • www.cva.stanford.edu/projects/imagine/
Write a Comment
User Comments (0)
About PowerShow.com