Title: Shared Resource Access Attributes for HighLevel Contention Models
1Shared Resource Access Attributes for High-Level
Contention Models
Alex Bobrek1, JoAnn M. Paul2, Donald E. Thomas1
- 2ECE Department
- Virginia Tech
- Arlington, VA 22203 USA
1ECE Department Carnegie Mellon
University Pittsburgh, PA 15213 USA
2Motivation
- Current trends in embedded system design
- multiple heterogeneous processors
- e.g. IBM Cell, Philips Nexperia
- Design challenges
- large design space
- design element interactions
- concurrent software
- shared resources
- Must capture performance impacts of contention
for shared resources
3Access Attribute Contention Modeling
- Summarize impact of accesses on resulting
contention - Introduce 3 access attributes
- Average Requested Utilization (?)
- Access Balance (B)
- Thread Concurrency (T)
- Access attribute uses
- Extract contention behavior through sampling,
infer future contention - Direct manipulation (e.g. RTL parameters like
propagation time)
4Design Methodology
- MESH Modeling Environment for Software and
Hardware - Performance modeling of resource
sharing in heterogeneous concurrent systems at a
high level - Identify promising design neighborhoods at MESH
level develop design further at cycle-accurate
level
5Problem SR Modeling Overhead
- Shared resource accesses force simulators to
interleave design element modeling at the shared
resource access rate - Abstraction level cannot be raised
6Solution Access Attribute Modeling
- Consider impact of multiple S.R. accesses by
summarizing them through access attributes - Statistically sample cycle-accurate simulation
- Extract application-specific contention behavior
through access attributes - Train a regression model based on sampled data
- Predict future contention at a high level using
the trained model
7Back-Annotated Execution-Based Simulation
- Application computation, control flow - execute
fast natively on the host - Delay of individual basic blocks - emulate target
system performance through annotations - Annotations inserted manually or by a tool
profiling the performance of target processors
8Statistical S.R. Contention Modeling
- Skip multiple S.R. accesses at a time, add
penalties after execution of a block is completed - Access attribute-based statistical model
estimates contention for larger blocks of
execution - Enables high-level modeling of systems with
frequent S.R. accesses - Contention modeling error lt1, 40X speedup over CA
9Related Work Highlights
- Transaction Level Models
- Presented work fits between functional and timing
models - Queuing Theory Models
- Traditionally used for large network analysis
- Exponential interarrival distribution does not
hold - Statistical Computer Simulation
- Uses statistical sampling to help abstract
simulation detail - Does not parameterize contention behavior
- Designed to capture average system performance
(i.e. IPC)
10Assumptions
- Focus on shared memory accesses
- In-order execution on all cores
- Processors stall on contention
- Do not model caches
- Workloads
- Single thread applications
- Multiple applications execute concurrently
- Data communicated only on application boundaries
- Focus on resource contention not data value
contention
11Collecting Access Information
- Each annotation slice contains S.R. utilization
value
12Average Requested Utilization, ?
- Sum of average requested utilizations for each
thread - Captures the overall request level for the shared
resource
Slice utilization ?i,j
13Access Balance, B
- Quantifies how much requested utilization for
each thread varies with regard to the average
utilization
Values of B near 0 indicatebalanced distribution
ofaccesses among threads
14Number of Active Threads, T
- Indicates the average number of threads making
any kind of S.R. access during a period of time - The value of T can never be higher than the
number of currently running threads.
15Selection of Access Attributes
- Initially considered 10-15 different S.R. access
statistics,such as - Per-thread utilization
- Average interarrival time and interarrival
variance - Alignment of access bursts
- Attribute selection criteria
- Attributes with high p-values were discarded
(null hypothesis testing) - Looked at goodness of fit (R2 value)
- Considered the computational complexity of
collecting the attribute during simulation - Chose final number of attributes to avoid curse
of dimensionality
16The Training Process
- Memory access trace collected via CA simulator
for ARM - Simulate concurrent behavior through tracesim
- Custom trace simulator
- Samples annotation blocks
- Statistical language R used to create
non-parametric regression model from sampled
values
17Determining Penalties
- Contention is determined by applying a
non-parametric multiple regression to the
collected access statistics - Where delay is given in the units of delay per
unit time (DPT)
18Benchmarks
- Chosen from MiBench and SPEC2000
- Each benchmark evaluated by
- Avg. memory utilization
- Mem. access coefficient of variation
- Selected a subset of 7 benchmarks, executed
concurrently 2, 3, 4, or 5 at a time - 112 different test scenarios
19Model Accuracy and Speedup
- Higher concurrency reduces variance within the
system - Tighter confidence intervals
- Higher concurrency increases number of HW
annotations present - Speedup of MESH decreases
20Conclusions
- Introduced access attributes
- Average Requested Utilization (?)
- Access Balance (B)
- Thread Concurrency (T)
- Contention modeling error lt1, ?3 95 CI
- 40X speedup over CA simulation (i.e. 50X slowdown
compared to native hardware execution) - Most cases, 10 mil. cycles of CA training
necessary - Future work
- Reduce reliance on frequent CA-level training
during the design exploration process - How can access attributes be directly manipulated?
21Relationship Between Attributes
- Contention increases when
- Requested utilization is increased
- B value is decreased (accesses are more balanced
among threads) - Covariates also depend on each other
- Balance has higher impact for lower utilizations