Fine Grained Application Source Code Profiling for ASIP Design PowerPoint PPT Presentation

presentation player overlay
1 / 26
About This Presentation
Transcript and Presenter's Notes

Title: Fine Grained Application Source Code Profiling for ASIP Design


1
Fine Grained Application Source Code Profiling
for ASIP Design
  • Kingshuk Karuri, Mohammad Al Faruque, Stefan
    Kraemer, Rainer Leupers, Gerd Ascheid,
    Heinrich Meyr

Institute for Integrated Signal Processing
Systems RWTH Aachen University, Germany
2
Organization
  • Introduction
  • µ-Profiling for ASIP Design
  • µ-Profiler at work MP3 ASIP Case Study
  • Performance and accuracy
  • Conclusions

3
Introduction
  • Mapping an embedded application to an
    architecture
  • Two major goals
  • Flexibility (Programmability)
  • Efficiency (MIPS/Watt)

Application
?
Behavioral Synthesis
C-Compiler
ARM/MIPS
ASIC
ASIP
Flexibility
Efficiency
4
Pre-architecture Exploration
Algorithm design (Matlab, SPW, ...)
C code generation or implementation
?
initial processor architecture
architecture optimization
e.g. LISATek
Extensive APPLICATION PROFILING is required
5
Related Work Fine-Grained Profiling
  • High-level SW performance estimation
  • P. Giusto, G. Martin et. al., Reliable Estimation
    of Execution Time of Embedded Software, DATE 2001
  • L. Lavagno, J. R. Bammi et. al., Software
    Performance Estimation Strategies in a
    System-level Design Tool, CODES 2000
  • Focus on memory and communication design
  • L. Cai, A. Gerstlauer et. al., Retargetable
    Profiling for Rapid, Early System-level Design
    Space Exploration, DAC 2004
  • M. Ravasi, M. Mattavelli, High-level Algorithmic
    Complexity Evaluation for System Design, Journal
    on Systems Architecture, no. 48, Elsevier, 2003
  • So far no dedicated profilers for ASIP design

6
µ-Profiling Approach
7
Profiling for ASIP Design
  • Traditional application code profiling
  • Goal Optimization of the computationally
    intensive areas of a given application
  • Used to identify application hot-spots that are
    manually optimized later
  • Usually done at C source code level or assembly
    level
  • Profiling for ASIP design
  • Goal Optimization of a target architecture for
    an already optimized application source code
  • Identification of application characteristics
    useful for micro-architecture and ISA design

8
Profiling at Assembly Level
Algorithm design (Matlab, SPW, ...)
C code generation or implementation
initial processor architecture
Assembly Level
architecture optimization
  • highly accurate
  • machine-specific
  • needs an initial architecture
  • slow (1000x native C)

9
Profiling at C Source Code Level
Algorithm design (Matlab, SPW, ...)
Source Level
C code generation or implementation
e.g. gprof/gcov
  • fast
  • can be done on host machine
  • only reports per C line data
  • C operator level information unavailable
  • cannot capture effects of code optimization

initial architecture
architecture optimization
10
µ-Profiling Approach
  • Source level profiling is too coarse grained
  • Only at C function or C source line granularity
  • No capture of hidden operations, e.g. address
    arithmetic
  • Cannot capture the effects of compiler
    optimizations
  • Potentially misleading profiling results
  • Assembly level profiling is too target specific
  • Need an initial target architecture to generate
    ISS
  • Comparatively slow
  • New approach profile at the intermediate
    representation(IR) level
  • All C operators, data and control flow are
    explicit
  • High level code optimizations can be performed

11
Profiling at IR Level
int p, flag float b, a20 p 5 f
(flag)? b(ap 2)
p 5 if (flag) goto LL1
Explicit operations and control flow
t1 (char )a t2 p 2 t3 t2
sizeof(float) t4 t3 t1 t5 (float
)t4 t6 t5 goto LL2
t2 10
Replaced by t4 t1 40
t3 10 4
LL1 t4 b
LL2 f t4
12
µ-Profiler Tool Architecture
C Source Code
C Front End
IncStatementExecCount() IncOperatorUseCount()
x a b
Optimizations
3 Address IR
Code Instrumenter
Profiler Library
IncrStatementExecCount() IncOperatorUseCount()
Compiler
Object Code
Linker
a.out on host machine
13
µ-Profiler User Interface
14
Using the µ-Profiler
15
Case Study MP3 Decoder ASIP
  • Goal
  • Use µ-Profiler to tailor an initial processor
    architecture for a given application
  • Target application
  • Publicly available MP3 decoder ANSI C source code
  • Initial target architecture
  • CoWare LISATek RISC (LT RISC)
  • Simple fixed point RISC template architecture
    with 32 bit instruction words

16
Initial Estimates
  • Initial µ-Profiling
  • Coarse estimation
  • No FPU in initial architecture
  • SW FPU emulation 100x slower than HW FPU
  • Real time constraints for MP3 standard 38 frames
    per second
  • gt 60 M cycles per MP3 frame
  • gt 2 GHz clock frequency _at_ 192 kbps

17
Hints from µ-Profiling
  • Need a HW FPU
  • Pay with significant area overhead ?
  • ints are in range -7012,17664
  • Reduce int width of from 32-bit to16-bit
  • Migrate from 32 to 16-bit integer ALU
  • Significant amount of area reduction
  • Use the extra area for HW FPU

18
More Hints from µ-Profiling
  • Almost all integer comparisons are gt
  • Leave out others from ISA
  • gt 98 of int immediates occupy less than 8 bits
  • Reduce immediate instruction field from 16 to 8
    bits
  • Few far jumps
  • Reduce jump field from 20 to 16 bits
  • Reduce instruction word size to 24 bits
  • Significant amount of code size reduction

19
Final MP3 ASIP Architecture
  • Hardware synthesis (gate-level)
  • 0.18 µm CMOS lib
  • Target clock frequency that safely meets real
    time constraints 25 MHz
  • Net effect
  • 300x cycle count reduction
  • No area increase
  • Considerable code size reduction

20
Performance and accuracy
21
Speed µ-Profiler, gcc, and ISS
788.5x
95x
18.9x
Relative Execution Time
3.1x
1x
0
Basic dynamic value range profiling
Basic dynamic value range trace generation
MIPS ISS
gcc
Basic profiing
22
Accuracy MIPS ISS vs µ-Profiler
  • Average deviation without optimizations 36
  • Average deviation with optimizations 23

23
Cycle Count Estimates LT RISC ISS vs µ-Profiler
  • Average deviation without optimizations 27
  • Average deviation with optimizations 11

24
Conclusions
25
Conclusions
  • State-of-the-art ASIP ISA and micro-architecture
    exploration tools
  • Pre-architecture exploration tools can make
    ASIP design even more efficient
  • µ-Profiler can help designers to take early
    design decisions on initial ASIP architecture
  • Future work
  • Accurate cost estimation of profiler hints
  • Automatic translation to ADL models
  • More case studies with diverse applications and
    architectures

26
Thank you
Write a Comment
User Comments (0)
About PowerShow.com