Scaling to New Heights - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Scaling to New Heights

Description:

More than 80 researchers from universities, research centers, and ... Identify/fix bottlenecks; choose new methods? Case Study: NAMD Scalable Molecular Dynamics ... – PowerPoint PPT presentation

Number of Views:103
Avg rating:3.0/5.0
Slides: 22
Provided by: david960
Category:

less

Transcript and Presenter's Notes

Title: Scaling to New Heights


1
Scaling to New Heights
  • Retrospective
  • IEEE/ACM SC2002 Conference
  • Baltimore, MD

2
Introduction
  • More than 80 researchers from universities,
    research centers, and corporations around the
    country attended the first "Scaling to New
    Heights" workshop, May 20 and 21, at PSC.
  • Sponsored by the NSF leading-edge centers (NCSA,
    PSC, SDSC) together with the Center for
    Computational Sciences (ORNL) and NERSC, the
    workshop included a poster session, invited and
    contributed talks, and a panel.
  • Participants examined issues involved in adapting
    and developing research software to effectively
    exploit systems comprised of thousands of
    processors.
  • The following slides represent a collection of
    ideas from the workshop

3
Basic Concepts
  • All application components must scale
  • Control granularity Virtualize
  • Incorporate latency tolerance
  • Reduce dependency on synchronization
  • Maintain per-process load Facilitate balance
  • Only new aspect is the degree to which these
    things matter

4
Issues and Remedies
  • Granularity
  • Latencies
  • Synchronization
  • Load Balancing
  • Heterogeneous Considerations

5
Granularity
  • Define problem in terms of a large number of
    small objects independent of the process count
  • Object design considerations
  • Caching and other local effects
  • Communication-to-computation ratio
  • Control granularity through virtualization
  • Maintain per-process load level
  • Manage comms within virtual blocks, e.g. Converse
  • Facilitate dynamic load balancing

6
Latencies
  • Network
  • Latency reduction lags improvement in flop rates
    Much easier to grow bandwidth
  • Overlap communications and computations Pipeline
    larger messages
  • Dont wait Speculate!
  • Software Overheads
  • Can be more significant than network delays
  • NUMA architectures
  • Scalable designs must accommodate latencies

7
Synchronization
  • Cost increases with the process count
  • Synchronization doesnt scale well
  • Latencies come into play here too
  • Distributed resource exacerbates problems
  • Heterogeneity another significant obstacle
  • Regular communication patterns are often
    characterized by many synchronizations
  • Best suited to homogeneous co-located clusters
  • Transition to asynchronous models?

8
Load Balancing
  • Static load balancing
  • Reduces to granularity problem
  • Differences between processors and network
    segments are determined a priori
  • Dynamic process management requires distributed
    monitoring capabilities
  • Must be scalable
  • System maps objects to processes

9
Heterogeneous Considerations
  • Similar but different processors or network
    components configured within a single cluster
  • Different clock rates, NICs, etc.
  • Distinct processors, networking segments, and
    operating systems operating at a distance
  • Grid resources
  • Elevates significance of dynamic load balancing
    Data-driven objects immediately adaptable

10
Poor Scalability?
11
Good Scalability?
12
Performance Comparison
13
Tools
  • Automated algorithm selection and performance
    tuning by empirical means, e.g. ATLAS
  • Generate space of algorithms and search for
    fastest implementations by running them
  • Scalability prediction, e.g. PMaC Lab
  • Develop performance models (machine profiles
    application signatures) and trending patterns
  • Identify/fix bottlenecks choose new methods?

14
Case StudyNAMD Scalable Molecular Dynamics
  • Three-dimensional object-oriented code
  • Message-driven execution capability
  • Fixed problem sizes determined by biomolecular
    structures
  • Embedded PME electrostatics processor
  • Asynchronous communications

15
(No Transcript)
16
(No Transcript)
17
(No Transcript)
18
(No Transcript)
19
Case StudySummary
  • As more processes are used to solve the given
    fixed-size problems, benchmark times decrease to
    a few milliseconds
  • PME communication times and operating system
    loads are significant in this range
  • Scaling to many thousands of processes is almost
    certainly achievable now given a large enough
    problem
  • 700 atoms/process x 3,000 processes 2.1M atoms

20
Contacts and References
  • David ONeal oneal_at_ncsa.uiuc.edu
  • John Urbanic urbanic_at_psc.edu
  • Sergiu Sanielevici sergiu_at_psc.edu
  • Workshop materials
  • www.psc.edu/training/scaling/workshop.html

21
Topics for Discussion
  • How should large, scalable computational science
    problems be posed?
  • Should existing algorithms and codes be modified
    or should new ones be developed?
  • Should agencies explicitly fund collaborations to
    develop industrial-strength, efficient, scalable
    codes?
  • What should cyber-infrastructure builders and
    operators do to help scientists develop and run
    good applications?
Write a Comment
User Comments (0)
About PowerShow.com