4.1.2. Run time - PowerPoint PPT Presentation

About This Presentation
Title:

4.1.2. Run time

Description:

Basic mechanisms for thread handling and communication. Reduce overhead and latency, ... Crosscut: Perf. Analysis, Job scheduling. Flat model. Key challenges ... – PowerPoint PPT presentation

Number of Views:16
Avg rating:3.0/5.0
Slides: 8
Provided by: PeterB9
Learn more at: https://exascale.org
Category:
Tags: crosscut | run | time

less

Transcript and Presenter's Notes

Title: 4.1.2. Run time


1
4.1.2. Run time
  • Technology Drivers
  • Scale, variance (uncertainty in characterization
    of apps and resource availability), Heterogeneity
    of resources, hierarchical structure of systems
    and applications, Latency
  • Alternative RD strategy
  • Flat vs hierarchical
  • Recommender research agenda
  • Heterogeneity. data transfer. Scheduling.
  • Hierarchical multiple levels / flat ? Hybrid
    (Interaction between levels)?
  • Asynchrony
  • Run time dependence analysis, JIT compilation,
    Interaction with compiler
  • Scheduling Dynamic, predictive
  • Basic mechanisms for thread handling and
    communication. Reduce overhead and latency,
    (Interaction arch.)
  • Optimize usage of communication infrastructure
    routes, mapping, overlap communication/computation
  • Scheduling for parallel efficiency computation
    time, Load balance, granularity control,
    Malleability
  • Scheduling for memory efficiency Locality
    handling
  • Shared address space
  • Memory management.
  • Application/area specific run times
  • Crosscutting considerations

2
Heterogeneity
Key challenges
Summary of research direction
Support execution of same program on different
heterogeneous platforms Optimize utilization of
resource and execution time Different
granularities supported by platforms
Unified/transparent accelerator run time
models Address heterogeneity of nodes and
interconnects in cluster. Scheduling for latency
tolerance and bandwidth minimization Adaptive
granularity
Potential impact on software component
Potential impact on usability, capability, and
breadth of community
Hide specificities of accelerator from
programmer
Broaden the portability of programs
3
Load Balance
Key challenges
Summary of research direction
General purpose self tuned run times Detect
imbalance and reallocate resources (cores,
storage, DVFS, BW,) within/across
level(s). Application specific load balancing
run times Minimization of impact of temporary
resource shortage (OS noise, external urgent
needs, )
Adapt to variability in time and space
(processes) of applications and systems Optimize
resource utilization, reduce execution time
Potential impact on software component
Potential impact on usability, capability, and
breadth of community
Drastically reduce the effort needed to ensure
efficient resource utilization and thus let
programmers focus on functionality. Only use
resources that can be profitably used. Maximize
ratio of achieved performance to power 5 years
Self tuned runtimes Crosscut Perf. Analysis,
Job scheduling
4
Flat model
Key challenges
Summary of research direction
Keep memory requirements small and constant.
Thread based MPI (rank per thread). Introductio
n of high levels of asynchrony MPI Collectives,
APGAS, Data-flow, Adapt communication
subsystems (routing, mapping, RDMA, ) to
application characteristics Improve the
performance of basic process management and
synchronization mechanism
Resource requirements (computing power, memory,
network) by runtime implementation Overcome
limitations deriving from global synchronizing
calls (barriers, collectives,.) Optimize usage
of communication resources
Potential impact on software component
Potential impact on usability, capability, and
breadth of community
MPI Leverage current applications 2-5 years
Increased scalability
5
Hierarchical/hybrid
Key challenges
Summary of research direction
Hierarchical integration of runtimes (MPIPGAS,
MPIthreadedAccelerator, MPIaccelerator,
PGASAccelerator,) Modularity, reusability.
Libraries compatibility. Dimensioning of
processes/threads. Scheduling, mapping to
nodes Memory placement and thread affinity
Match between model semantics at the different
levels Match platform structure, efficient usage
of resources Constrain size of name spaces
Potential impact on software component
Potential impact on usability, capability, and
breadth of community
Better match to hardware (i.e. shared memory
within node) Interaction with Load balance and
Job scheduling
Enable smooth migration path Improved
performance 5 years
6
4.1.2. Run time
Dynamic Memory association ??
Resilience
scheduling for locality
Your Metric
Async/Overlap hierarchy
Load balance Memory efficiency
Heterogeneity Power management job sched
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
7
4.1.2. Run time what why
Assume responsibility for matching algorithm
characteristics/demands to available resources,
optimizing their usage
So that we can finally rest
Run times performing, dynamic memory association
(work arrays, renaming,), tolerating functional
noise,
Machines will fail more than we do
The alternative will be to use current machines
Run times tolerating injection rate of 10
errors/hour
Demonstration that automatic locality aware
scheduling can get a factor on 5x in highly NUMA
memory hierarchies
Dynamicity, decoupling algorithm form resources
A target if we want to get there
General purpose Run time automatically achieving
load balance, optimized network usage, power
minimization, malleability, tolerance to
performance noise, on heterogeneous system
Demonstrate that asynchrony can get for both
flat and hybrid systems 3x strong scalability
Run the same source on 2 different
heterogeneoussystems Do it for a couple of
kernels and real applications
By this time EVERYBODY will be fed up with
writing the same application again and again
Fighting variance is a lost battle, learn to live
with it
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
Write a Comment
User Comments (0)
About PowerShow.com