Scalable Scientific Applications Characteristics - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Scalable Scientific Applications Characteristics

Description:

Scalable Scientific Applications Characteristics & Future Directions Douglas B. Kothe And Richard Barrett, Ricky Kendall, Bronson Messer, Trey White – PowerPoint PPT presentation

Number of Views:174
Avg rating:3.0/5.0
Slides: 40
Provided by: xnv1
Category:

less

Transcript and Presenter's Notes

Title: Scalable Scientific Applications Characteristics


1
Scalable Scientific ApplicationsCharacteristics
Future Directions
  • Douglas B. Kothe
  • And Richard Barrett, Ricky Kendall, Bronson
    Messer, Trey White
  • Leadership Computing Facility
  • National Center for Computational Sciences
  • Oak Ridge National Laboratory

2
Science Teams Have Specific PF Objectives
Application Area Science Driver Science Objective Impact
Combustion (S3D) Predictive engineering engine design simulation tool for new engine design Understanding flame stabilization in lifted autoigniting diesel fuel jets relevant to low-temperature combustion for engine design at realistic operating conditions Potential for 50 increase in efficiency and 20 savings in petroleum consumption with lower emission, leaner burning engines
Fusion (GTC) Understand and quantify physics and properties of ITER scaling and H-mode confinement Strongly coupled and consistent wall-to-edge-to-core modeling of ITER plasmas attain a realistic assessment of ignition margins ITER design and operation
Chemistry (MADNESS) Computational catalysis Describe large systems accurately with modern hybrid and meta density functional theory functionals Generate quantitative catalytic reaction rates and guide small system calibration
Nanoscale Science (DCA) Material-specific understanding of high-temperature superconductivity theory Understand the quantitative differences in the transition temperatures of high-temperature superconductors Macroscopic quantum effect at elevated temperatures (gt150K) new materials for power transmission and oxide electronics
Climate (POP) Accurate representation of ocean circulation Fully coupled eddy-resolving ocean and sea ice model to reduce the coupled model biases where ice and deep water parameters are governed by the accurate representation of current systems Reduce current uncertainties in coupled ocean-sea ice system model
Geoscience (PFLOTRAN) Perform multiscale, multiphase, multi-component modeling of a 3-D field CO2 injection scenario Include oil phase and four-phase liquid-gas-aqueous-oil system to describe dissipation of the supercritical CO2 phase and escape of CO2 to the surface Demonstrate viability of and potential for sequestration of anthropogenic CO2 in deep geologic formations
Astrophysics (CHIMERA) Understand the core-collapse supernova mechanism for a range of progenitor star masses Perform core-collapse simulations with sophisticated spectral neutrino transport, detailed nuclear burning, and general relativistic gravity Understand the origin of many elements in the Periodic Table and the creation of neutron stars and black holes
3
Application Requirements at the PF
  • Application categories analyzed
  • Science motivation and impact
  • Science quality and productivity
  • Application models, algorithms, software
  • Application footprint on platform
  • Data management and analysis
  • Early access science-at-scale scenarios
  • Results
  • 100 page Application Requirements Document
    published in Jul 07
  • New methods for categorizing platforms and
    application attributes devised and utilized in
    analysis guiding tactical infrastructure
    purchase and deployment
  • But still too qualitative! More work to do.

4
Application Codes in 2008An Incomplete List
  • Astrophysics
  • CHIMERA, GenASiS, 3DHFEOS, Hahndol, SNe, MPA-FT,
    SEDONA, MAESTRO, AstroGK
  • Biology
  • NAMD, LAMMPS
  • Chemistry
  • CPMD, CP2K, MADNESS, NWChem, Parsec, Quantum
    Expresso, RMG, GAMESS
  • Nuclear Physics
  • ANGFMC, MFDn, NUCCOR, HFODD
  • Engineering
  • Fasel, S3D, Raptor, MFIX, Truchas, BCFD, CFL3D,
    OVERFLOW, MDOPT
  • High Energy Physics
  • CPS, Chroma, MILC
  • Fusion
  • AORSA, GYRO, GTC, XGC
  • Materials Science
  • VASP, LS3DF, DCA, QMCPACK, RMG, WL-LSMS,
    WL-AMBER, QMC
  • Accelerator Physics
  • Omega3P, T3P
  • Atomic Physics
  • TDCC, RMPS, TDL
  • Space Physics
  • Pogorelov
  • Climate Geosciences
  • MITgcm, PFLOTRAN, POP, CCSM (CAM, CICE, CLM,
    POP)
  • Computer Science (Tools)
  • Active Harmony, IPM, KOJAK, mpiP, PAPI, PMaC,
    Sca/LAPACK, SvPablo, TAU

5
Apps Teams Are Reasonably Adept at Using our
Current Systems
Is the field of dreams approach inadequate
(too little too late)? What is effective
utilization? Scaling? Percent of Peak (Jacoby vs
MG)? Current SC apps range from 2-70 of peak
whats the goal? Remember, we improve what we
measure so lets have the right metrics and
measures My 0.02 science and engineering
achievements on these systems is the legacy
6
Science WorkloadJob Sizes and Resource Usage of
Key Applications
Code 2007 Resource Utilization (M core-hours) Projected 2008 Resource Utilization (M core-hours) Typical Job Size in 2006-2007 (K cores) Anticipated Job Size in 2008 (K cores)
CHIMERA 2 (under development) 16 0.25 (under development) gt10
GTC 8 7 8 12
S3D 6.5 18 8-12 gt15
POP 4.8 4.7 4 8
MADNESS 1 (under development) 4 0.25 (under development) gt8
DCA N/A (under development) 3-8 N/A (under development) 4-16 (w/o disorder) gt40 (with disorder)
PFLOTRAN 0.37 (under development) gt2 1-2 (under development) gt10
AORSA 0.61 1 15-20 gt20
7
Preparing for the ExascaleLong-Term Science
Drivers and Requirements
  • We have recently surveyed, analyzed, and
    documented the science drivers and application
    requirements envisioned for exascale leadership
    systems in the 2020 timeframe
  • These studies help to
  • Provide a roadmap for the ORNL Leadership
    Computing Facility
  • Uncover application needs and requirements
  • Focus our efforts on those disruptive
    technologies and research areas in need of our
    and the HPC communitys attention

8
What Will an EF System Look Like?
  • All projections are daunting
  • Based on projections of existing technology both
    with and without disruptive technologies
  • Assumed to arrive in 2016-2020 timeframe
  • Example 1
  • 115K nodes _at_ 10 TF per node, 50-100 PB, optical
    interconnect, 150-200 GB/s injection B/W per
    node, 50 MW
  • Examples 2-4 (DOE Townhall report)

www.er.doe.gov/ASCR/ProgramDocuments/TownHall.pdf
9
Science Prospects and Benefits with High End
Computing (EF?) in the Next Decade
Opportunity Key application areas Goal and benefit
Materials science Nanoscale science, manufacturing, and material lifecycles, response and failure Design, characterize, and manufacture materials, down to the nanoscale, tailored and optimized for specific applications
Earth science Weather, carbon management, climate change mitigation and adaptation, environment Understand the complex biogeochemical cycles that underpin global ecosystems and control the sustainability of life on Earth
Energy assurance Fossil, fusion, combustion, nuclear fuel cycle, chemical catalysis, renewables (wind, solar, hydro), bioenergy, energy efficiency, power grid, transportation, buildings Attain, without costly disruption, the energy required by the United States in guaranteed and economically viable ways to satisfy residential, commercial, and transportation requirements
Fundamental science High energy physics, nuclear physics, astrophysics, accelerator physics Decipher and comprehend the core laws governing the Universe and unravel its origins
Biology and medicine Proteomics, drug design, systems biology Understand connections from individual proteins through whole cells into ecosystems and environments
National security Disaster management, homeland security, defense systems, public policy Analyze, design, stress-test, and optimize critical systems such as communications, homeland security, and defense systems understand and uncover human behavioral systems underlying asymmetric operation environments
Engineering design Industrial and manufacturing processes Design, deploy, and operate safe and economical structures, machines, processes, and systems with reduced concept-to-deployment time
10
Science Case Climate
Mitigation Evaluate strategies and inform policy
decisions for climate stabilization 100-1000
year simulations Adaptation Decadal forecasts
region impacts prepare for committed climate
change 10-100 year simulations
  • 250 TF
  • Mitigation Initial simulations with dynamic
    carbon cycle and limited chemistry
  • Adaptation Decadal simulations with
    high-resolution ocean (1/10)
  • 1 PF
  • Mitigation Full chemistry, carbon/nitrogen/sulfur
    cycles, ice-sheet model, multiple ensembles
  • Adaptation High-resolution atmosphere (1/4),
    land, and sea ice, as well as ocean
  • Sustained PF
  • Mitigation Increased resolution, longer
    simulations, more ensembles for reliable
    projections coupling with socio-economic and
    biodiversity models
  • Adaptation Limited cloud-resolving simulations,
    large-scale data assimilation
  • 1 EF
  • Mitigation Multi-century ensemble projections
    for detailed comparisons of mitigation strategies
  • Adaptation Full cloud-resolving simulations,
    decadal forecasts of regional impacts and
    extreme-event statistics

Resolve clouds, forecast weather extreme
events, provide quantitative mitigation strategies
11
Barriers in Ultrascale Climate SimulationAttackin
g the Fourth Dimension Parallel in Time
  • Problem
  • Climate models use explicit time stepping
  • Time step must go down as resolution goes up
  • Time stepping is serial
  • Single-process performance is stagnating
  • More parallel processes do not help!
  • Possible complementary solutions
  • Implicit time stepping
  • High-order in time
  • Fast bases curvelets and multi-wavelets
  • Parareal parallel in time
  • Progress
  • Implicit version of HOMME for global
    shallow-water equations 10x speedup for
    steady-state test case
  • High-order single-step time integration
  • Single-cycle multi-grid linear solver for 1D
  • Pure advection with curvelets and multi-wavelets
  • Near-term plans
  • Scale, tune, and precondition implicit HOMME
  • Single-cycle multi-grid linear solver for 2D
  • Parareal for Burgers (1D nonlinear)

Ref Trey White (ORNL)
12
Science Case Astrophysics
  • 250 TF
  • The interplay of several important phenomena
    hydrodynamic instabilities, role of nuclear
    burning, neutrino transport
  • 1 PF
  • Determine the nature of the core-collapse
    supernova explosion mechanism
  • Fully integrated, 3D neutrino radiation
    hydrodynamics simulations with nuclear burning
  • Sustained PF
  • Detailed nucleosynthesis (element production)
    from core-collapse SNe
  • Large nuclear network capable of isotopic
    prediction (along with energy production)
  • 1 EF
  • Precision prediction of complete observable set
    from core-collapse SNe nucleosynthesis,
    gravitational waves, neutrino signatures, light
    output
  • Tests general relativity and information about
    the dense matter equation state, along with
    detailed knowledge of stellar evolution
  • Full 3D Boltzmann neutrino tranpsort, 3D MHD/RHD,
    nuclear burning

Explanation and prediction of core-collapse SNe
put general relativity, dense EOS, stellar
evolution theories to the test
13
Requirements Gathering
  • Consult literature and existing documentation
  • Construct a survey eliciting speculative
    requirements for scientific application on HPC
    platforms in 20102020
  • Pass the survey to leading computational
    scientists in a broad range of scientific domains
  • Analyze and validate the survey results (hard)
  • Make informed decisions and take action

14
Survey Questions
  • What are some possible science drivers and urgent
    problems that would require Leadership Computing
    in 20102020?
  • What are some looming computational challenges
    that will need resolution in 20102020?
  • What are some science objectives and outcomes
    that Leadership Computing could enable in
    20102020?
  • What are some improvement goals for
    science-simulation fidelity that Leadership
    Computing could enable in 20102020?
  • What are some possible changes in physical model
    attributes for Leadership-Computing applications
    in 20102020?
  • What major software-development projects could
    occur in your application area in 20102020?
  • What major algorithm changes could occur for your
    applications in 20102020?
  • What libraries and development tools may need to
    be developed or significantly improved for
    Leadership Computing in 20102020?
  • How might system-attribute priorities change for
    Leadership Computing for your application?
  • In what ways might or should your workflow in
    20102020 be different from today?
  • Are there any disruptive technologies that might
    affect your applications?

15
(No Transcript)
16
(No Transcript)
17
(No Transcript)
18
(No Transcript)
19
(No Transcript)
20
Findings in Models and Algorithms
  • The seven algorithm types are scattered broadly
    among science domains, with no one particular
    algorithm being ubiquitous and no one algorithm
    going unused.
  • Structured grids and dense linear algebra
    continue to dominate, but other algorithm
    categories will become more common.
  • Compared to the Seven Dwarfs for current
    applications, we project a significant increase
    in Monte Carlo and increases in unstructured
    grids, sparse linear algebra, and particle
    methods, as well as a relative decrease in FFTs
  • These projections reflect the expectation of
    much-greater parallelism in architectures and the
    resulting need for very high scalability
  • Load balancing, scalable sparse solver, and
    random number generator algorithms will be more
    important.
  • Some important algorithms are not captured in the
    Seven Dwarfs
  • Categories expected by application scientists to
    be of growing importance in 20102020 include
    adaptive mesh refinement, implicit nonlinear
    systems, data assimilation, agent-based methods,
    parameter continuation, and optimization

21
Findings in Software
  • Hero developer mode is fatalistic
  • Does not scale and no single person can
    adequately understand breadth and depth of issues
  • Only accomplished by computer scientists,
    algorithm developers, application developers, and
    end-user scientists working together in a tightly
    integrated manner
  • Must develop a means of interface between the
    heterogeneous computer, the developer, and the
    end user scientist
  • Must raise the level of abstraction
  • Current approach based on low-level constructs
    places constraints on performance over-constrain
    compiler and runtime system
  • Raising abstraction level allows for increased
    algorithm experimentation, incorporation of
    intent in data structures, flexible memory
    organization, inclusion of fault tolerance
    constructs
  • Enables exploration of power-aware algorithms
  • Freedom from heroic software efforts having to be
    the norm

22
Findings in Software
  • Application development and maintenance tools and
    practices need to fundamentally change
  • Productivity improvement is an important metric
    and guide for tool and software choices
  • Fault tolerance and VV software components must
    be used to improve reliability and robustness of
    application software
  • Knowledge discovery techniques and tools should
    be explored to help with bug detection,
    simulation steering, and data feature extraction
    and correlation
  • A holistic view of application data (from input
    to archival) is needed to most effectively
    deliver tools for the end-to-end workflow
    performed by scientists

23
(No Transcript)
24
(No Transcript)
25
(No Transcript)
26
(No Transcript)
27
(No Transcript)
28
Applications Analyzed
  • CHIMERA
  • Astrophysics core-collapse supernova explosion
    mechanism
  • S3D
  • Turbulent combustion lifted flame stabilization
    in diesel gas turbine engines
  • GTC
  • Fusion Analyze and validate CTEM and ETG core
    turbulence
  • POP
  • Global ocean circulation Eddy-resolved flow with
    biogeochemistry
  • DCA
  • High-temperature superconductivity Effect of
    charge spin inhomogeneities in the Hubbard
    model superconducting state
  • MADNESS
  • Chemistry neutron x-ray spectra of cuprates
    dynamics of few-electron systems metal oxides
    surfaces in catalytic processes
  • PFLOTRAN
  • Reactive flows in porous media Uranium migration
    and CO2 sequestration in subsurface geologic
    formations

29
Application Requirements and Workload Reinforces
a Balanced System Assertion
100
100
Distribution in this space depends upon the
applications and the problem being simulated for
a given application
  • Applications analyzed represent almost one half
    of our 2008 allocation
  • A broad range of compute/communicate workloads
    must be supported
  • Depends upon science, application within that
    science, and problem tackled by application
  • Application requirements call for breadth in
    models, algorithms, software, and scaling type
  • Physical models
  • coupled continuum conservation laws, radiation
    transport, many-body Schrodinger, plasma physics,
    Maxwells equations, turbulence
  • Numerical algorithms
  • Each of the 7 dwarfs is required
  • Software implementation
  • All popular languages are required
  • Science drivers
  • Strong scaling (time to solution)
  • Weak scaling (bigger problem)
  • Application readiness actions plans are in place
    and being followed

CHIMERA
CHIMERA
Communication
Communication
POP
POP
GTC
GTC
MADNESS
MADNESS
S3D
S3D
PFLOTRAN
DCA
DCA
0
0
Computation
0
100
Computation
0
100
CHIMERA
S3D
MADNESS
PFLOTRAN
POP
GTC
DCA
30
Resource Utilization by Science
ApplicationsScience Dictates the Requirements
31
Example PF Performance Observations and
Readiness Plan for Some of our Key Apps
Code Science Scaling Needs Performance Observations and Readiness Plan
S3D larger problem Compute-bound with minimal communication overhead Reduce memory contention with hybrid parallelism Increase cache reuse
GTC larger problem Compute-bound with minimal communication overhead Use radial domain decomposition to eliminate cross-core collective calls Reduce the size of the problem per core and get better cache-reuse Increase SSE factor
DCA solution time Heavily compute bound, benefitting from L3 BLAS routines (DGEMM,ZGEMM) Very good use of SSE (50) with no changes Include disorder model for additional level of parallelism 10x need for more processors Multithreaded linear algebra will allow additional parallelism at a lower level
MADNESS solution time Full asynchronous algorithm with communications hidden by model Nicely positioned to exploit Gemini Good SSE factor but still room for improvement
POP solution time Sizeable communication component Reduce memory contention time and increase SSE factor Minimize synchronous behavior better cache blocking New physics (biogeochemistry) increases compute fraction
CHIMERA larger problem Communication dominated by collectives Production level physics increases compute fraction Reasonable SSE factor but room for improvement 20 raw speedup from Gemini w/o enhancements
PFLOTRAN solution time Communication dominated by collectives Poor SSE factor some room for improvement Additional phases and chemical species will reduce memory contention (natural block structure of Jacobian enables more efficient use of memory hierarchy)
32
Accelerating Development Readiness
  • Automated diagnostics
  • Drivers performance analysis, application
    verification, S/W debugging, H/W-fault detection
    and correction, failure prediction and avoidance,
    system tuning, and requirements analysis
  • Hardware latency
  • Wont see improvement nearly as much as flop
    rate, parallelism, B/W in coming years
  • Can S/W strategies mitigate high H/W latencies?
  • Hierarchical algorithms
  • Applications will require algorithms aware of the
    system hierarchy (compute/memory)
  • In addtion to hybrid data parallelism, and
    file-based checkpointing, algorithms may need to
    include dynamic decisions between recomputing and
    storing, fine-scale task-data hybrid parallelism,
    and in-memory checkpointing
  • Parallel programming models
  • Improved programming models needed to allow
    developer to identify an arbitrary number of
    levels of parallelism and map them onto hardware
    hierarchies at runtime
  • Models continue to be coupled into larger models,
    driving the need for arbitrary hierarchies of
    task and data parallelism

33
Accelerating Development Readiness
  • Solver technology and innovative solution
    techniques
  • Global communication operations across 106-8
    processors will be prohibitively expensive,
    solvers will have to eliminate global
    communication where feasible and mitigate its
    effects where it cannot be avoided. Research on
    more effective local preconditioners will become
    a very high priority
  • If increases in memory B/W continue to lag the
    number of cores added to each socket, further
    research needed into ways to effectively trade
    flops for memory loads/stores
  • Accelerated time integration
  • Are we ignoring the time dimension along which to
    exploit parallelism? (Ex climate)
  • Model coupling
  • Coupled models require effective methods to
    implement, verify, and validate the couplings,
    which can occur across wide spatial and temporal
    scales. The coupling requirements drive the need
    for robust methods for downscaling, upscaling,
    and coupled nonlinear solving
  • Evaluation of the accuracy and importance of
    couplings drives the need for methods for
    validation, uncertainty analysis, and sensitivity
    analysis of these complex models
  • Maintaining current libraries
  • Reliance of current HPC applications on libraries
    will grow
  • Libraries must perform as HPC systems grow in
    parallelism and complexity

34
PF Survey Findings (with some opinion)
  • A rigorous evolving apps reqms process pays
    dividends
  • Needs to be quantitative apps cannot lie with
    performance analysis
  • Algorithm development is evolutionary
  • Can we break this mold?
  • Ex Explore new parallel dimensions (time,
    energy)
  • Hybrid/multi-level programming models virtually
    nonexistent
  • No algorithm sweet spots (one size fits all)
  • But algorithm footprints share characteristics
  • VV and SQA not in good standing
  • Ramifications on compute systems as well as apps
    results generated
  • No one is really clamoring for new languages
  • MPI until the water gets too hot (frog analogy)
  • Apps lifetimes are gt3-5x machine lifetimes
  • Refactoring a way of life
  • Fault tolerance via defensive checkpointing de
    facto standard
  • Wont this eventually bite us? Artificially
    drives I/O demands
  • Weak or strong scale or both (no winner)
  • Data analytics paradigm must change
  • The middleware layer is surprisingly stable and
    agnostic across apps (and should expand!)

35
Summary Recommendations EF Survey
  • We are in danger of failing because of a software
    crisis unless concerted investments are
    undertaken to close the H/W-S/W gap
  • H/W has gotten way ahead of the S/W (same ole
    same ole?)
  • Structured grids and dense linear algebra
    continue to dominate, but
  • Increase projected for Monte Carlo algorithms,
    unstructured grids, sparse linear algebra, and
    particle methods (relative decrease in FFTs)
  • Increasing importance for AMR, implicit nonlinear
    systems, data assimilation, agent-based method,
    parameter continuation, optimization
  • Priority of computing system attributes
  • Increase interconnect bandwidth, memory
    bandwidth, mean time to interrupt, memory
    latency, and interconnect latency
  • Reflect desire to increase computational
    efficiency to use peak flops
  • Decrease disk latency, archival storage
    capacity, disk bandwidth, wide area network
    bandwidth, and local storage capacity
  • Reflect expectation that computational efficiency
    will not increase
  • Per-core requirements relatively static, while
    aggregate requirements will grow with the system

36
Summary Recommendations EF Survey
  • System software must possess more stability,
    reliability, and fault tolerance during
    application execution
  • New fault tolerance paradigms must be developed
    and integrated into applications
  • Job management and efficient scheduling of those
    resources will be a major obstacle faced by
    computing centers
  • Systems must be much better science producers
  • Strong software engineering practices must be
    applied to systems to ensure good end-to-end
    productivity
  • Data analytics must empower scientists to ask
    what-if questions, providing S/W H/W
    infrastructure capable of answering these
    questions in a timely fashion (Google desktop)
  • Strong data management will become an absolute at
    the exascale
  • Just like H/W requires disruptive technologies
    for acceleration of natural evolutionary paths,
    so too will algorithm, software, and physical
    model development efforts need disruptive
    technologies (invest now!)

37
Fusion Simulation Project Where to find 12
orders in 10 years?
  • 1.5 orders increased processor speed and
    efficiency
  • 1.5 orders increased concurrency
  • 1 order higher-order discretizations
  • Same accuracy can be achieved with many fewer
    elements
  • 1 order flux-surface following gridding
  • Less resolution required along than across field
    lines
  • 4 orders adaptive gridding
  • Zones requiring refinement are lt1 of ITER volume
    and resolution requirements away from them are
    102 less severe
  • 3 orders implicit solvers
  • Mode growth time 9 orders longer than
    Alfven-limited CFL

38
A View from Berkeley (John Shalf)
  • Need better benchmarks and better performance
    models
  • For reliable extrapolated code requirements
  • Power is driving daunting concurrency
  • Scalable programming models
  • Need to exploit hierarchical machine architecture
  • Hybrid processors
  • More concurrency need a more generalized approach
  • Apps must deal with platform reliability
  • Dont forget autotuning
  • Shows value of good compilers and associated RD
  • Fast, robust I/O is hard
  • Scaling and concurrency is outsripping our
    ability to do rigorous VV
  • Application code complexity has outgrown
    available tools
  • Frameworks and community codes can work but with
    certain rules of engagement

ASCAC Fusion Simulation Project Review panel
presentation (4/30/08)
39
Questions?
Doug Kothe (kothe_at_ornl.gov)
Write a Comment
User Comments (0)
About PowerShow.com