The Impact of Performance Asymmetry in Multicore Architectures PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: The Impact of Performance Asymmetry in Multicore Architectures


1
The Impact of Performance Asymmetry in
Multicore Architectures
  • Saisanthosh
  • Ravi
  • Michael
  • Konrad

Balakrishnan Rajwar Upton Lai
32nd Annual International Symposium on Computer
Architecture
2
Performance asymmetry
... difference in compute power of processors
  • Architectural differences
  • Micro-architectural parameters
  • Other
  • Heat Thermal throttling
  • Why need asymmetry now?
  • CMP/ Many cores as commodity systems
  • Run variety of workloads
  • Good serial performance and high throughput
  • Optimal energy consumption

Assume an asymmetric multicore system
3
Asymmetry MT workloads
4
The problems
Programmers Dont reason about asymmetry Characte
ristics of threads Partitioning, Synchronization
barriers, Interference, Lifetime Scheduling of
threads OS Kernel, Library, Application, DB/Web
servers, Managed runtime systems (Java, .NET)
5
Contributions
  • Asymmetry negatively affects applications
  • - Studied many workloads on real hardware
  • - Observed unpredictable workload behavior
  • This can be fixed by
  • - Evaluating threads work partitioning
  • Scheduling of threads with asymmetry

6
Outline
  • Asymmetry and Performance
  • Evaluation Methodology
  • Asymmetric Configurations
  • Workloads and Results

7
Evaluation methodology
Asymmetry in real hardware - Intel 4-way 3-GHz
Xeon - Different cores run at different
frequencies - Software controlled Benefits -
Long real-time runs (no simulations) - Workloads
are setup according to specs - Representative of
other forms of asymmetry - Communication -
Micro-architecture etc.
8
Configurations
all fast
all slow
1 slow
2 slow
3 slow
Symmetric
Asymmetric
F Full frequency S one-eighth of Full
frequency (in talk and paper) S one-fourth
of Full frequency (in paper)
9
Studying impact
Scalability
Stability
Perf. Metric
Perf. Metric
Same or Many runs
2 slow
1 slow
3 slow
all slow
all fast
(Asymm)
10
Workloads evaluated
  • SPECjbb
  • SPECjAppServer
  • Apache
  • Zeus
  • TPC-H
  • SPECOMP
  • H.264
  • PMake

Middle-tier business apps. Throughput parallel
Webservers Throughput parallel
Task-based parallelization
Embarrassingly parallel
11
Impact of asymmetry
Scalable Stable
Workloads
O
P
  • SPECjbb
  • SPECjAppServer
  • Apache
  • Zeus
  • TPC-H
  • SPECOMP
  • H.264
  • PMake

P
P
P
O
O
O
P
O
O
O
P
P
P
P
12
Workloads
Managed runtime system (BEA JRockit Sun
HotSpot) Windows 2003 and Linux 2 GCs- Parallel
and Gen. Concurrent. Only Minor GC Upto 20
threads Minimal communication
  • SPECjbb
  • SPECjAppServer
  • Apache
  • Zeus
  • TPC-H
  • SPECOMP
  • H.264
  • PMake

13
SPECjbb
Stability (JRockit/Gencon GC) on 2 slow
4 runs
  • Problem Interference from runtime system (JVM,
    GC)

14
Workloads
Webserver on Linux Thread-based vs.
Event-based model ApacheBench Raw perf. with
static page Light and heavy loads
  • SPECjbb
  • SPECjAppServer
  • Apache
  • Zeus
  • TPC-H
  • SPECOMP
  • H.264
  • PMake

15
Apache
Scalability Stability (light load)
  • Problem light load - threads can be on fast/slow
  • No issues under heavy load
  • Fixes Kernel scheduler or shorter lifetime of
    threads

16
Zeus
Scalability Stability
  • Under heavy and light loads unpredictable
  • Superior perf. on symmetric configs.
  • Problem Aggressive application-level scheduling

17
Workloads
OMP Scientific app. Loop-based
parallelization Intel Fortran,OpenMP on
Linux H.264 Media encoding OpenMP on Windows
2003 PMake Parallel Make of Linux Kernel
  • SPECjbb
  • SPECjAppServer
  • Apache
  • Zeus
  • TPC-H
  • SPECOMP
  • H.264
  • PMake

18
SPECOMP
Scalability
  • OpenMP schedules tasks assuming equal perf.
    procs.
  • Problem Fast processors are held by slow

19
H.264 PMake
H.264
  • H.264 slows down significantly with 1 slow proc.
  • Speeds up with 1 fast proc.

20
Impact of asymmetry
Interference from runtime system. Garbage
collector dependent. Concurrent GC causes more
problems.
Migrate tasks from slow to fast core if one is
free. Inspect runtime software, interference
between threads (GC).
Migrate tasks from slow to fast core if one is
free. Or, Handle few requests and recycle
threads. High overhead, low perf.
  • SPECjbb
  • SPECjAppServer
  • Apache
  • Zeus
  • TPC-H
  • SPECOMP
  • H.264
  • PMake

Robust, multi-tier application. Feedback tunes
the workload. Very responsive to interference,
small heaps etc.
Query parallelization not aware of
asymm. Intra-query parallelization worsens
stability.
Superior perf. in symmetric system Unpredictable
on asymm. with heavy and light loads.
Independent application scheduling
OpenMP based parallelization with sync.
barriers. Fast cores held by slow.
Thread serves many requests to reduce
overheads. Problems with light load. Threads
can map to fast or slow proc.
Reconsider application scheduling
Approx. application change by reducing degree of
Parallelization. Fix application scheduler.
Consider asymm. in query optimization engine.
Robust application. Heavy utilization. Threads
well-balanced and abundant.
Assign tasks on-demand instead of
up-front. Make OpenMP understand asymm.
Multi-programming with several tasks.
21
Conclusions
  • Asymmetric systems
  • - Good for energy and performance
  • - But can introduce unpredictability
  • Software to understand asymmetry
  • - Evaluate applications work partitioning
  • - Scheduling of tasks. Mostly no other changes.
  • - May be, feedback based
  • Suitable asymmetry
  • - Many slow few fast processors

22
Questions?
Write a Comment
User Comments (0)
About PowerShow.com