Phase Detection

About This Presentation

Transcript and Presenter's Notes

Title: Phase Detection

1
Phase Detection

Jonathan Winter
Casey Smith
CS 612
04/05/05

2
Motivation

Large-scale phases exist (order of millions of
instructions)
For many programs, if we look at any interesting
metric (cache misses, IPC, etc.), we see
repeating behavior
Call the regions with similar behavior phases
Knowledge of phase-based behavior can be used for
adaptive optimization
Current hardware doesnt exploit phase behaviors
For instance
A region of execution may only need a small
cachesave power/increase performance by
shrinking
A region of execution may benefit from data
structure reorganization

3
Basic Methodology

Identify phase boundaries
Classify phases
Determine what optimizations to perform for each
phase
When can each step be performed?
Run time, compile time, offline

4
Overview

Well focus two papers on phase detection
Sherwood, Sair, and Calder, Phase Tracking and
Prediction, ISCA 2003
Shen, Zhong, and Ding, Locality Phase
Prediction, ASPLOS 2004

5
Sherwood et al. 2003

Classifies the behavior of a program into phases
based on code execution
Finds strong correlations between code execution
phases and important performance and energy
metrics
Simulates hardware for real-time detection and
prediction of phases
Demonstrates usefulness through a variety of
optimization techniques made possible by phase
detection

6
Definition of a Phase

Previously (stemming from Denning 1972), a phase
was defined as an interval of execution where a
measured program metric stayed relatively
constant.
Sherwood et al. consider all sections of code
with similar values for the program metric to be
part of the same phase even if the intervals are
spread out over the course of the programs
execution.

7
Key Program Metrics

Instructions per cycle (IPC), energy, branch
prediction accuracy, data cache misses,
instruction cache misses, L2 cache misses are all
vital statistics for optimizing speed and power
consumption

8
Single Unified Metric

Goal find a single metric that
Uniquely distinguishes phases
Guides optimization and policy decisions
Need some section of code on which to measure
this metricpick 10M instructions
Much longer time span than typical architectural
techniques handle
Long enough to capture large-scale behavior
Short enough to capture detailed phase behavior
Size of an OS timeslice

9
Metric for Classification

Based on Basic Blocks
Basic blocks are a section of code with one entry
point and one exit point
Basic Block Vector
Count the number of times each basic block is
executed in the 10M interval
Entries in the vector are the product of the
number times each basic block is executed and the
block length (BB1L1, BB2L2, BB3L3, )
This vector is a signature of the phase which
correlates well with other metrics of interest
IPC, cache misses, etc.

10
Advantages of BBVs

Independent of architectural measures and thus
unaffected by optimizations
Weighting biases the signatures to more
frequently executed instructions
Creates unique signatures which execute the same
code but in different proportions

11
Hardware Implementation

Dont want to store and examine the whole vector
compress to a 32-entry vector (footprint)

12
Visualization of the Footprints
Footprints for different intervals of gzip
13
What do we do with our footprint?

Store a small sample of representative footprints
as phase signatures
Compare the current footprint to previously
stored footprints
If we have a close enough match, we classify them
as the same phase
If not, we store the new footprint as the
representative member of a new phase

14
Comparing Footprints

To save space, only store the top 6 bits of each
entry in the 32-vector
Counters were saturating 24-bit counters
The smallest value that the maximum entry could
have would occur if all 10M instructions were
distributed evenly across the 32 entries
In this case the top six bits means that a
counter value of 10M/32 would have a value of 1
Distance between footprints is defined as the
Manhattan distance the sum of the absolute
difference between corresponding entries in two
vectors

15
Finding a Match

If the Manhattan distance is less than a
threshold, two footprints are classified as being
in the same phase
Determine threshold
by false positives/
false negatives as
compared to an offline
oracle tool.
Threshold of 220
chosen

16
Opportunity

These classification methods are oversimplified
Opportunity to apply better machine learning
techniques

17
Within Phase Homogeneity

Within a phase, architectural metrics have nearly
constant values (this is what we were aiming for)

18
Phase Prediction

Once weve been through an interval, we can
identify the phase easily
But we want to know what phase were going to go
to next
We need to know what phase we will be in before
the interval starts in order to perform useful
optimizations (such as changing the cache size)

19
Simple Prediction

We could just predict that the next phase would
be the same as the current phase
The program tends to change phases more slowly
than our 10M intervals, so this actually gives
reasonable accuracy
However, we can do better
Note standard hardware predictors have not been
tried (branch prediction, memory disambiguation,
etc.)

20
Markov Model Predictor

Phase changes depend on the set of previous
phases and the duration of their execution
Phases tend to last many intervals, therefore
studying recent previous history doesnt provide
more information than the current state
Need to encode how long weve been in the current
state
Predict the length of phase to be the same length
it was previously

21
Run Length Encoding
22
Opportunity

RLE Markov model is overly simple
Better prediction techniques exist
Make use of the order of previous states rather
than just the length of the current state

23
Prediction Accuracy
24
Applications

Frequent Value Locality
Certain data values form bulk of loads
Compress to save energy
Specialize code segments to common values
Dynamic cache size adaptation
Shrink cache size to save energy
Dynamic processor width adaptation
Fetch/Decode/Issue fewer instructions per cycle
when IPC will be low anyway

25
Frequent Value Locality
26
Cache Size Adaptation
27
Processor Width Adaptation
28
Summary of BBV method

Divide program into 10M instruction intervals
Characterize each interval by footprint
approximation to basic block vector
Classify intervals as phases based on footprint
Predict future phases based on RLE Markov
predictor
Use information about phases to improve frequent
value locality and optimize cache size and
processor width for performance/energy

29
Bottom Line

Classifying phases based on the frequency of
executed basic blocks is effective at
partitioning the program into regions of
homogenous architectural behavior
Significant energy savings with small performance
degradation can be achieved by applying phase
specific optimizations.

30
Shen et al. 2004

Defines phases in a totally different way
Phases have variable lengths (not 10M intervals)
Detects phases by finding likely phase boundaries
Uses offline analysis of programs on test inputs
to predict behavior on other inputs

31
Metric of Interest

For optimizing cache size, what we really care
about is the locality of reference
Measure the locality directly, and classify
phases based on that
Independent of optimizations performed phases
recovered are independent of the hardware it runs
on.

32
Reuse Distance

Define the reuse distance as the number of
distinct data elements (locations in memory)
touched between two consecutive references to the
same element.
Define the reuse distance at the second reference
Example abcbbac
---1022
Also called LRU Stack Distance

33
Overview

Simulate a test run and record reuse distance
throughout the program
Use this to separate the program into phases
Insert phase markers into binary code
Predict when phase changes will occur
Use information about phases to adjust cache size
or other hardware parameters

34
New Definition of Phase

Here, a phase is a unit of repeating behavior,
rather than a unit of nearly uniform behavior
A phase change is an abrupt change in the data
reuse pattern

35
Reuse Trace
36
Why Offline Analysis?

Compilers cannot fully analyze data locality in
programs with indirect referencing or dynamic
structures
Hardware methods like the one presented earlier
require many severe approximations for real-time
analysis
Solution take method offline and analyze program
behavior on test inputs.

37
Phase Detection Process

Record reuse trace
Perform signal processing techniques to extract
useful information from the trace
Use the extracted information to find good places
for phase transitions

38
1) Record Reuse Trace

Nontrivial programs access data locations so many
times that an actual full trace would be
overwhelming
Just sample a representative set of memory
locations/reuse distances
Threshold to reduce trace size and remove
irrelevant data
Throw out short distances (Ci Ci 2)
Throw out references to nearby memory locations

39
2) Signal Processing

Use wavelet filtering to find abrupt changes in
reuse distance for each recorded memory location

40
3) Phase Partitioning

Now we have points representing locations of
abrupt changes in reuse distance for individual
memory locations
Want to divide the list with two things in mind
Maximize phase length
Minimize repetitions of memory locations within a
phase (no multiple abrupt changes)
Example abcdeefabdfccabef
abcde efabdfc cabef

41
Missing Link

So now we have locations of phase transitions.
How do detect which regions are the same phase?
Doesnt say.
Missing section in paper?
Assume we can somehow classify the regions into
phases

42
Phase Markers

We know how often a phase occurs and
approximately where its boundaries are
Goal find markers that tell us when were
entering a particular phase
For each phase, look for basic blocks that occur
once near each of its beginning boundaries, and
only near the beginnings of its boundaries.
Use that basic block as a marker to tell when the
program enters that phase

43
Using Phases

Now we know what basic blocks signal phase entry
points
Run the program with new input
When we enter a phase for the first time, we
record how long it lasts and its locality
properties
Assume that these properties will hold for all
subsequent executions of the same phase

44
Phase Prediction Performance
45
Negative Examples

Not all programs have phases of repeating
behavior that can be identified from test runs

46
Applications

Adaptive Cache Resizing
Potential performance increase
Potential power savings
Memory Remapping
Reorder data in memory to speed up execution

47
Adaptive Cache Resizing

Shrink cache without increasing miss ratio
Phases have repeating behavior, not uniform
behavior
Divide phases into 10K intervals
First couple of times we execute a phase follow
test properties
Apply those cache sizes to subsequent executions
of the phase

48
Cache Size Reductions
49
Cache Size Reductions with 5 Miss Increase
50
Memory Remapping

Reorder data in memory to speed up execution
For example, we might interleave arrays that tend
to be accessed together.
Options
Analyze whole program to find array affinities
Analyze by phase and reorganize data during
execution (should take into account cost of
remapping, but the authors dont)

51
Memory Remapping
52
Summary of the locality-based method

Record a sampled version of the reuse distance
trace on test input
Process the trace
Find phase boundaries
Find basic block markers for each phase
Run the program on new data.
When we see a new phase marker, record how long
it lasts and experiment with optimization
parameters for 10K intervals
Assume subsequent executions of the phase will
have the same length and locality profile, so we
can use the determined optimization parameters

53
Bottom Line

Many programs have long repeating patterns of
data reuse separated by abrupt changes
These repeating patterns can be detected by
analyzing the reuse trace
Characterizing these patterns can lead to
significant energy savings and performance
enhancement through cache resizing and memory
remapping

54
Overall Conclusions

Many programs exhibit large-scale phase behavior
which can be classified and predicted
Characterization of the phases can lead to energy
savings and performance enhancement through cache
resizing and other techniques
But no well-done analysis of just how much power
is saved
Some of this can be done at compile time
(identifying many phase markers), but interval
type analysis and phase characterization must be
done at runtime

55
Opportunities

More intelligent classification
More sophisticated prediction
Account for the cost of changing the cache size
in the energy/performance analysis
Compare results of phase-based adjustments to
actual optimal adjustments
Examine potential for using compilers for
different parts of the analysis

Write a Comment

User Comments (0)

About PowerShow.com

Phase Detection PowerPoint PPT Presentation