Static Identification of Delinquent Loads - PowerPoint PPT Presentation

About This Presentation
Title:

Static Identification of Delinquent Loads

Description:

Cache one of the major current bottlenecks in performance ... Weight formula for negative classes: negated mean of positive weights. The Heuristic Function ... – PowerPoint PPT presentation

Number of Views:69
Avg rating:3.0/5.0
Slides: 38
Provided by: eecgTo
Category:

less

Transcript and Presenter's Notes

Title: Static Identification of Delinquent Loads


1
Static Identification of Delinquent Loads
  • V.M. Panait
  • Sasturkar
  • W.-F. Fong

2
Agenda
  • Introduction
  • Related Work
  • Delinquent Loads
  • Framework
  • Address Patterns, Decision Criteria
  • The heuristic types of classes, computing the
    weights, final classes
  • Results

3
Introduction
  • Cache one of the major current bottlenecks in
    performance
  • One approach prefetch but prefetch what ? Cant
    prefetch everything
  • Few loads are really bad delinquent loads
  • This paper classification of address patterns in
    the load instructions

4
Introduction
  • Done after code generation, but before runtime
  • Singled out 10 of all loads causing over 90 of
    the misses in 18 SPEC benchmarks
  • Gets even better combined with basic block
    profiling 1.3 loads covering over 80 of the
    misses

5
Related Work
  • BDH method classify loads based on following
    criteria
  • Region of memory accessed by the load S (stack),
    H (heap) or G (global).
  • Kind of reference loading a scalar (S), element
    of array (A) or field of a structure (S)
  • Type of reference (P)ointer or (N)ot.

6
Related Work
  • Some classes account for most misses GAN, HSN,
    HFN, HAN, HFP, HAP.
  • The OKN method 3 simple heuristics
  • Use of a pointer dereference
  • Use of a strided reference
  • None of the above
  • This paper is much more precise than both above
    methods

7
Delinquent Loads
  • Why not stores too ? Write buffers are apparently
    good enough
  • Why not do it in hardware ? They do, but
  • Need additional specialized hardware
  • Complex decisions (fast) lt-gt complex hardware
  • Memory profiling not always practical

8
Delinquent Loads Profiling
9
Framework
  • Assembly code -gt address patterns for each load
    instruction -gt placement of the load instruction
    in a class
  • Classes weights -gt heuristic function
  • If the value of the heuristic is greater than a
    delinquency threshold, the instruction is
    classified as possibly delinquent

10
Address Patterns
  • Address Pattern summary of how the source
    address of the load instruction is computed
  • Uses CFG and DF analysis (reaching definitions)
    (one address pattern for each control path
    reaching the load)
  • Only uses basic registers (BR) gp, sp, regparam,
    regret

11
The Decision Criteria
  • Classes are derived from these criteria
  • H1 Register usage in an address pattern (usage
    of BRs)
  • H2 Type of operations used in address
    computation (arithmetic, logic)
  • H3 Maximum level of dereferencing

12
The Decision Criteria
  • H4 Recurrence (iterative walk through memory)
  • H5 Execution frequency based on BB profiling
    classifies loads as
  • Rarely executed (used here as negative)
  • Seldom executed (idem)
  • Fairly often executed (not used here)
  • In a program hotspot

13
Decision Criteria and Classes
  • Each criterion results in a set of classes
  • Class set of address patterns with a certain
    property
  • There are too many classes that can result only
    some are considered, and some of those are also
    aggregated into one class

14
Decision Criteria and Classes
  • H1 based classes enumerations of the number of
    occurrences of each of the 4 BRs in an address
    pattern
  • H2 based classes address patterns with
    multiplications and shift operations
  • H3 based classes as many as there are levels
    of dereferencing in the address patterns

15
Decision Criteria and Classes
  • H4 based classes two classes (address pattern
    involves recurrence or not)
  • H5 based classes three classes rarely, seldom
    and program hotspot

16
Experimental Setup
  • SimpleScalar toolkit cache simulator (for cache
    hits misses), compiler, objdump
  • Procedure Fortran -gt C code (via f2c) -gt MIPS
    executable (via C2MIPS compiler) -gt disassembled
    code (via objdump)
  • Reconstruction of CFG and DF analysis

17
Experimental Setup
  • 2 stages learning/training and experimental
    (actual)
  • Stage 1 get full memory profiling data on a
    subset of SPEC benchmarks, use it to compute
    weights for each class
  • Use the heuristic thus obtained on a new subset
    of benchmarks

18
The Heuristic Types of Classes
  • Three types of classes
  • Positive (loads in it are likely delinquent)
  • Negative ( not )
  • Neutral
  • Positive classes have positive weights, negative
    ones have negative weights, neutral classes have
    a weight of zero

19
The Heuristic Terminology
  • The miss probability of class F in benchmark j
  • The amount of misses accounted for by members of
    class F in benchmark j

20
The Heuristic Terminology
  • mj(F,C) likelihood of an instruction of class F
    in benchmark j to be a cache miss
  • However, if that instruction is only executed
    once, it wont be a delinquent load
  • nj(F,C) proportion out of total number of
    misses that members of F account for

21
The Heuristic Terminology
  • Strength index r mj / nj
  • A benchmark j is irrelevant to a class F if both
    indices mj and nj are below certain thresholds.
    Otherwise it is relevant.
  • Positive class r gt 5 for all benchs.
  • Negative class nj lt 0.5 for all benchs.
  • Neutral class r lt 5 for 1 benchs.

22
Computing the Weights
  • Form classes according to the five decision
    criteria
  • Compute mj, nj for each class
  • Weight of class Fk

23
Computing the Weights
  • This is the formula for positive classes only
  • Only relevant benchmarks are included in the
    formula
  • . is the cardinality of that set, i.e. the
    number of benchmarks relevant to that class

24
Aggregate Classes
  • AG1 both gp and sp are used 1 each (comes from
    H1)
  • AG2 only sp used 2 (H1)
  • AG3 either or shifts are used (H2)
  • AG4 one level dereferencing (H3)
  • AG5 two level dereferencing (H3)
  • AG6 three level dereferencing (H3)

25
Aggregate Classes
  • AG7 address patterns containing a recurrence
    (H4)
  • AG8 loads with low frequency of execution (100 lt
    f lt 1000) (H5)
  • AG9 loads with fairly low frequency of execution
    (f lt 100 times) (H5)
  • Weight formula for negative classes negated mean
    of positive weights

26
The Heuristic Function
  • 1 if
  • 0 otherwise
  • the load is delinquent

27
Precision and Coverage
  • Precision of a heuristic scheme H, ?(H) the
    (correct) number of loads that scheme H
    identifies as delinquent (the lower, i.e., closer
    to the real one, the better)
  • Coverage of a heuristic scheme H, ?(H) the
    number of cache misses caused by loads identified
    as delinquent by scheme H (the closer to 100,
    the better)

28
Results on different inputs
29
Results when varying cache associativity
30
Results when varying cache size
31
Performance on new benchmarks
32
Performance summary
33
Performance of OKN BDH
34
Performance with various ?
35
Combination with BB profiling
  • Use the heuristic to sharpen the set returned by
    BB profiling
  • Also add loads that are not in the hotspots
  • ? is the percentage of the highest scoring loads
    detected by our method but not by profiling that
    we consider to be delinquent

36
Combination with BB profiling
37
Conclusions
  • The static scheme for identifying delinquent
    loads has a precision of 10 and coverage of over
    90 over 18 benchmarks
  • More precise than related work, similar coverage
  • Immune to variation of framework parameters (e.g.
    cache size, assoc., input)
Write a Comment
User Comments (0)
About PowerShow.com