Multigranularity Sampling for Heterogeneous Concurrent Applications - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Multigranularity Sampling for Heterogeneous Concurrent Applications

Description:

Implementation alternatives needs to be evaluated before a physical realization ... Conservatism in phase string matching process. Detecting repeated overlaps ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 23
Provided by: hsienhs
Category:

less

Transcript and Presenter's Notes

Title: Multigranularity Sampling for Heterogeneous Concurrent Applications


1
Multi-granularity Sampling for Heterogeneous
Concurrent Applications
  • Melhem Tawk Univ of Valenciennes France
  • Khaled Z. Ibrahim IRISA/INRIA France
  • Smail Niar INRIA France

2
Motivations (1/2)
  • Simulation for embedded systems is a vital for
    the time-to-market production
  • Implementation alternatives needs to be evaluated
    before a physical realization
  • Objective Design Space Exploration (DSE) for all
    possible configurations
  • Find the best MPSoC configuration
  • Perf./power/cost estimates or tradeoffs

3
Motivations (2/2)
  • But with ? complexity circuit (VLSI), use of
    conventional "cycle/bit accurate" tools is
    problematic
  • ? Increase in simulation time
  • 1 sec on the real system ? several hours of
    simulation
  • Simulation time increase by 10x, when of
    processors increase by 16x

4
Outline
  • DSE for MPSoC design
  • A survey on techniques for simulation
    acceleration
  • Adaptive Sampling (AS) concurrent applications
  • Our solution
  • Multi-Granularity Sampling (MGS) approach
  • Reduction of checkpointing disk storage with MGS
  • Experimental results
  • Conclusion

5
DSE for MPSoC Design (1/2)
  • Platform reconfiguration/adaptation for new
    applications
  • MMU, ID , NoC, Instruction set, ..
  • High performance platforms include more than 30
    parameters
  • 3 obstacles
  • High number of parameters to explore gt109 conf
  • ? number of applications with several data to
    realize tests
  • Applications are much larger gt1 G instructions

6
DSE for MPSoC Design (2/2)
  • Aim explore the maximum of solutions within
    small time.
  • 2 issues
  • Reduce the number of solutions to explore
    meta-heuristics (Taboo search, ..etc)
  • Reduce the time associated with configuration
    evaluation (fitness calculation)

7
Approaches for Acceleration Simulation
  • Statistical Simulation Generate a synthetic
    program" smaller of instruction, same
    profile.
  • Analytical Modeling Perf/power consump. is
    approximate analytically (math. models).
  • Higher level models architectural details are
    hidden, "Transactional Level Modeling" TLM.
  • Sampling
  • 1 or more samples (or intervals) of the
    application are chosen.
  • Samples smaller instruction count that
    represents the whole application behavior.

8
How Sampling Works ?
  • Decompose application into intervals of the same
    size
  • Intervals containing same Basic Blocks that are
    similar belongs to 1 phase
  • 1 sample needed for each phase
  • Application performance is estimated by a
    representative sequence of phases

3 phases 3 samples are sufficient
9
Adaptive Sampling (AS) Approach for
Multi-processors
  • Sample phase string overlap
  • Skipping repeated overlap
  • Using simulation barriers to discretize overlaps
  • if ?C lt Threshold
  • ? Synchronization barrier
  • Else
  • ? continue simulation

?C estimated nbr of cycles to terminate b3 in P1
10
AS Problem in Concurrent Heterogeneous
Applications
  • Heterogeneity different behavior of the
    concurrent applications
  • Phase string length increases
  • Conservatism in phase string matching process

Detecting repeated overlaps is less likely to
occur
10
11
Proposed Approach
  • All possible granularities (interval sizes) are
    analyzed
  • Overlaps contain 1 phase per processor
  • Accuracy is maintained and detection repeated
    overlaps is simplified

12
MGS (Multi-granularity Sampling) (1/2)
  • First Step Phase matrix creation
  • Decompose application into intervals of order-1
    granularity.
  • Use starting points of order-1 interval to form
    intervals of coarser granularity.
  • Each granularity order has a specified interval
    size

13
MGS (Multi-granularity Sampling) (2/2)
  • Second step Multi-phase cluster (MPC) generation
  • Discretization overlapping phases of AS is
    adopted
  • Identification of phases use starting point of
    the MPC and the number of simulated instructions
  • Skipping repeated MPCs

14
Experimental Platform
  • MPARM ARM 7(up to 12 cores), SystemC,
    Intercommunion, D 4KB, I 8KB.

15
Acceleration Simulation
  • AS low acceleration
  • MGS High acceleration, up to 60
  • AS low acceleration due to different behavior

15
16
Length Phase Strings
  • Rjindael and gsm differ in behavior
  • Rjindael has a high miss rate
  • 20 gsm phases overlap one rjindael phase

16
17
Estimation IPC Error
18
Checkpointing
  • After skipping repeated samples checkpoints of
    system states are required
  • System states
  • Micro-architecture state and branch predictor
    contents
  • Architecture state shared memory and register
    data values
  • Generated once for all the DSE
  • Checkpointing for each intervals costly in terms
    of storage space

19
Checkpointing Storage and MGS
  • MGS phase Matrix reveals a lot of similar rows
  • Exploiting similarity by storing representative
    checkpoints
  • One checkpoint for each group of similar rows

20
Conclusion
  • Performance estimation through simulation for
    concurrent heterogeneous applications
  • Adaptive Sampling length of phase string ? low
    acceleration factor
  • Proposal Multigranularity Sampling (MGS)
  • One phase per processor
  • Each phase granularity can be different

21
Conclusion
  • MGS increases simulation speedup, up to 60x,
    error lt10
  • IPC Correction formula is devised error is
    reduced by up to 90
  • Technique presented (based on MGS) to reduce
    checkpoint disk storage
  • MGS can be applied to DSE in a wide range of
    embedded systems

22
End
  • Thank you

22
Write a Comment
User Comments (0)
About PowerShow.com