Title: Multigranularity Sampling for Heterogeneous Concurrent Applications
1Multi-granularity Sampling for Heterogeneous
Concurrent Applications
- Melhem Tawk Univ of Valenciennes France
- Khaled Z. Ibrahim IRISA/INRIA France
- Smail Niar INRIA France
2Motivations (1/2)
- Simulation for embedded systems is a vital for
the time-to-market production - Implementation alternatives needs to be evaluated
before a physical realization - Objective Design Space Exploration (DSE) for all
possible configurations - Find the best MPSoC configuration
- Perf./power/cost estimates or tradeoffs
3Motivations (2/2)
- But with ? complexity circuit (VLSI), use of
conventional "cycle/bit accurate" tools is
problematic - ? Increase in simulation time
- 1 sec on the real system ? several hours of
simulation - Simulation time increase by 10x, when of
processors increase by 16x
4Outline
- DSE for MPSoC design
- A survey on techniques for simulation
acceleration - Adaptive Sampling (AS) concurrent applications
- Our solution
- Multi-Granularity Sampling (MGS) approach
- Reduction of checkpointing disk storage with MGS
- Experimental results
- Conclusion
5DSE for MPSoC Design (1/2)
- Platform reconfiguration/adaptation for new
applications - MMU, ID , NoC, Instruction set, ..
- High performance platforms include more than 30
parameters - 3 obstacles
- High number of parameters to explore gt109 conf
- ? number of applications with several data to
realize tests - Applications are much larger gt1 G instructions
6DSE for MPSoC Design (2/2)
- Aim explore the maximum of solutions within
small time. - 2 issues
- Reduce the number of solutions to explore
meta-heuristics (Taboo search, ..etc) - Reduce the time associated with configuration
evaluation (fitness calculation)
7Approaches for Acceleration Simulation
- Statistical Simulation Generate a synthetic
program" smaller of instruction, same
profile. - Analytical Modeling Perf/power consump. is
approximate analytically (math. models). - Higher level models architectural details are
hidden, "Transactional Level Modeling" TLM. - Sampling
- 1 or more samples (or intervals) of the
application are chosen. - Samples smaller instruction count that
represents the whole application behavior.
8How Sampling Works ?
- Decompose application into intervals of the same
size - Intervals containing same Basic Blocks that are
similar belongs to 1 phase - 1 sample needed for each phase
- Application performance is estimated by a
representative sequence of phases
3 phases 3 samples are sufficient
9Adaptive Sampling (AS) Approach for
Multi-processors
- Sample phase string overlap
- Skipping repeated overlap
- Using simulation barriers to discretize overlaps
- if ?C lt Threshold
- ? Synchronization barrier
- Else
- ? continue simulation
?C estimated nbr of cycles to terminate b3 in P1
10AS Problem in Concurrent Heterogeneous
Applications
- Heterogeneity different behavior of the
concurrent applications - Phase string length increases
- Conservatism in phase string matching process
Detecting repeated overlaps is less likely to
occur
10
11Proposed Approach
- All possible granularities (interval sizes) are
analyzed - Overlaps contain 1 phase per processor
- Accuracy is maintained and detection repeated
overlaps is simplified
12MGS (Multi-granularity Sampling) (1/2)
- First Step Phase matrix creation
- Decompose application into intervals of order-1
granularity. - Use starting points of order-1 interval to form
intervals of coarser granularity. - Each granularity order has a specified interval
size
13MGS (Multi-granularity Sampling) (2/2)
- Second step Multi-phase cluster (MPC) generation
- Discretization overlapping phases of AS is
adopted - Identification of phases use starting point of
the MPC and the number of simulated instructions - Skipping repeated MPCs
14Experimental Platform
- MPARM ARM 7(up to 12 cores), SystemC,
Intercommunion, D 4KB, I 8KB.
15Acceleration Simulation
- AS low acceleration
- MGS High acceleration, up to 60
- AS low acceleration due to different behavior
15
16Length Phase Strings
- Rjindael and gsm differ in behavior
- Rjindael has a high miss rate
- 20 gsm phases overlap one rjindael phase
16
17Estimation IPC Error
18Checkpointing
- After skipping repeated samples checkpoints of
system states are required - System states
- Micro-architecture state and branch predictor
contents - Architecture state shared memory and register
data values - Generated once for all the DSE
- Checkpointing for each intervals costly in terms
of storage space
19Checkpointing Storage and MGS
- MGS phase Matrix reveals a lot of similar rows
- Exploiting similarity by storing representative
checkpoints - One checkpoint for each group of similar rows
20Conclusion
- Performance estimation through simulation for
concurrent heterogeneous applications - Adaptive Sampling length of phase string ? low
acceleration factor - Proposal Multigranularity Sampling (MGS)
- One phase per processor
- Each phase granularity can be different
21Conclusion
- MGS increases simulation speedup, up to 60x,
error lt10 - IPC Correction formula is devised error is
reduced by up to 90 - Technique presented (based on MGS) to reduce
checkpoint disk storage - MGS can be applied to DSE in a wide range of
embedded systems
22End
22