THE PANEL - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

THE PANEL

Description:

Petaflops I Group Photograph. HTMT Technical Note. Peanut M&Ms ... expose/exploit fine grain parallelism. Petaflops I Group Photograph ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 27
Provided by: thomas763
Category:
Tags: panel | the | photograph

less

Transcript and Presenter's Notes

Title: THE PANEL


1
THE PANEL
  • Thomas Sterling
  • California Institute of Technology
  • NASA Jet Propulsion Laboratory
  • February 23, 1999

2
2nd Conference on Enabling Technologies for
Peta(fl)ops Computing
  • Thomas Sterling
  • California Institute of Technology
  • NASA Jet Propulsion Laboratory
  • February 16, 1999

3
(No Transcript)
4
Why We Were There
  • 5 years later, a 2nd look at a daunting prospect
  • 6 in-depth workshops
  • 8 point design studies sponsored by NSF
  • 2-year study sponsored by DARPA, NSA, and NASA
  • Teraflops
  • ASCI
  • PITAC
  • Contending views of an uncertain future in HEC

5
Goals
  • Conduct an open forum on Petaflops computing
  • Examine our new understanding
  • issues
  • opportunities
  • challenges
  • directions
  • Present results from important research
  • Expose contending viewpoints on alternatives
  • Frank discussions about the future of HEC
    research
  • Better define this as an interdisciplinary
    pursuit
  • Establish the inter-relationship of academia,
    industry, and government

6
Information Packet
  • The BAG
  • Enabling Technologies for Petaflops Computing,
    MIT Press 1995
  • PAWS/PetaSoft Procedings
  • PAL Notes/Proceedings
  • POWR Workshop Proceedings
  • Petaflops II Conference Information Packet
  • Petaflops I Group Photograph
  • HTMT Technical Note
  • Peanut MMs

7
What are the major findings from this conference?
  • Pflops are needed, as soon as possible
  • 2010 or before
  • Enough chips can be glued together
  • not sure about power or cost but thats only
    money (yours)
  • no insight on efficiency
  • Algorithms can play an important role in
    recovering from the sins of architecture
  • Adaptive control critical to effective resource
    management and task allocation/scheduling
  • Little Federal will/vision to support systems
    architecture research

8
Open Issues?
  • Efficiency
  • How much bandwidth is needed and where
  • Latency management strategy - roles for
  • language
  • compiler
  • runtime system
  • architecture
  • True operational requirements of Pflops apps
  • New relationship between compiler and runtime
  • Impact of exotic strategies
  • How to fund research in systems architecture

9
Major Obstacles and Areas to be Explored?
  • Pflops Applications requirements/characteristics
  • bandwidth, locality, granularity, access patters,
    ...
  • Adaptive resource management policies and
    mechanisms
  • Bandwidth
  • Programming model
  • Funding
  • sufficient
  • long term
  • multi-agency

10
Recommendations for Rapid Deployment of Effective
Pflops?
  • No more workshops/conferences, until
  • Point design studies in Pflops scaled
  • applications
  • algorithms
  • system software methodologies
  • The Other Path - explore it, exploit it
  • Sponsor Pflops systems architecture and software
    research
  • We need to build
  • Strong focused academic/industry/govt consortium

11
Observations
  • Longest/shortest 5 years of my professional life
  • So much done, so much to do
  • Genesis of a wonderful interdisciplinary
    community
  • Failure of federal will, abrogation of
    responsibility
  • ASCI is good
  • ASCI is lonely
  • HTMT needs intellectual consideration by the
    community
  • Completion of an exciting and important process

12
What are the Major Obstacles for sustained
Petaflops-scale Performance for Real Apps?
  • Getting a Petaflops computer

13
INTEGRATED SMP - WDM
DRAM - 4 GBYTES - HIGHLY INTERLEAVED
MULTI-LAMBDA AON
CROSS BAR
coherence
640 GBYTES/SEC
2nd LEVEL CACHE 96 MBYTES
64 bytes wide
160 gbytes/sec
VLIW/RISC CORE 24 GFLOPS 6 ghz
...
14
COTS PetaFlop System
128 die/box 4 CPU/die
3
4
...
5
2
16
1
17
64
ALL-OPTICAL SWITCH
18
63
...
...
32
49
48
Multi-Die Multi-Processor
...
33
47
46
I/O
10 meters 50 NS Delay
15
COTS PetaFlop System
  • 8192 Dies (4 CPU/die-minimum)
  • Each Die is 120 GFlops
  • 1 PetaFlop Peak
  • Power 8192 x200 Watts 1.6 MegaWatts
  • Extra Main Memory gt3 MegaWatts (512 TBytes)
  • 15.36 TFlops/Rack (128 die)
  • 30 KWatts/Rack - thus 64 racks - 30 inch
  • Common System I/O
  • 2 Level Main Memory
  • Optical Interconnect
  • OC768 Channels (40 GHz)
  • 128 Channels per Die (DWDM)-5.12 THz
  • ALL Optical Switching
  • Bisection Bandwidth of 50 TBytes/sec
  • 15 TFlops/rack.1bytes/flop/sec32 racks
  • Rack Bandwidth - 15 TFlops.1 12 THz

16
What are the Major Obstacles for sustained
Petaflops-scale Performance for Real Apps?
  • Getting a Petaflops computer
  • Getting Petaflops Apps

17
Applications Areas for Petaflops
  • Materials simulations between microscale and
    macroscale (bulk materials)
  • Coupled electro-mechanical simulations of
    nano-scale structures (micromachines)
  • Full plant optimization for complex processes
    (chemical, manufacturing problems)
  • High-resolution reacting flow problems
    (combustion, chemical mixing, multiphase flow)
  • High-realism immersive virtual by on realtime
    radiosity modeling and complex scenes
  • Time dependent simulations of complex
    biomolecules (membranes, synthesis and dna)
  • Multidisciplinary optimization problems combining
    structures, fluids and geometry
  • Modeling of integrated earth systems (ocean,
    atmosphere, bio-geosphere)
  • Improved 4d/6d data assimilation applied to
    remote sensing and environmental models
  • Computational cosmology (particle models,
    astrophysical fluids and radiation transport)
  • Computational testing and simulation to replace
    weapons testing (stockpile stewardship)
  • Simulation of plasma fusion devices for
    controlled fusion (to optimize future reactors)
  • Design of new chemical compounds and synthesis
    pathways (environmental safety and cost
    improvements)
  • Comprehensive modeling of groundwater and oil
    reservoirs (contamination and management)
  • Modeling of complex transportation, communication
    and economic systems

18
Rational Drug Design
Nanotechnology
Tomographic Reconstruction
Phylogenetic Trees
Biomolecular Dynamics
Neural Networks
Crystallography
Fracture Mechanics
MRI Imaging
Reservoir Modelling
Molecular Modelling
Biosphere/Geosphere
Diffraction Inversion Problems
Distribution Networks
Chemical Dynamics
Atomic Scattering
Electrical Grids
Flow in Porous Media
Pipeline Flows
Data Assimilation
Signal Processing
Condensed Matter Electronic Structure
Plasma Processing
Chemical Reactors
Cloud Physics
Electronic Structure
Boilers
Combustion
Actinide Chemistry
Radiation
Fourier Methods
Graph Theoretic
CVD
Quantum Chemistry
Reaction-Diffusion
Chemical Reactors
Cosmology
Transport
n-body
Astrophysics
Multiphase Flow
Manufacturing Systems
CFD
Basic Algorithms Numerical Methods
Discrete Events
PDE
Weather and Climate
Air Traffic Control
Military Logistics
Structural Mechanics
Seismic Processing
Population Genetics
Monte Carlo
ODE
Multibody Dynamics
Geophysical Fluids
VLSI Design
Transportation Systems
Aerodynamics
Raster Graphics
Economics
Fields
Orbital Mechanics
Nuclear Structure
Ecosystems
QCD
Pattern Matching
Symbolic Processing
Neutron Transport
Economics Models
Genome Processing
Virtual Reality
Astrophysics
Cryptography
Electromagnetics
Computer Vision
Virtual Prototypes
Intelligent Search
Multimedia Collaboration Tools
Computer Algebra
Databases
Magnet Design
Computational Steering
Scientific Visualization
Data Minning
Automated Deduction
Number Theory
CAD
Intelligent Agents
19
Bodega Bay Applications Workshop
  • Artificial Intelligence
  • Astrophysics
  • Climate
  • Computational Biology
  • Computational Chemistry
  • Computational Physics
  • Cryptography
  • Digital Libraries and Multimedia
  • Dynamical Systems
  • Economics
  • Computational Electromagnetics
  • Electronic Device Simulation
  • Fluid Dynamics
  • Geophysics
  • Graph Theory
  • Mathematics and Logic
  • Medicine
  • Multidisciplinary Problems
  • Optimization
  • Particle-in-cell models
  • Real-time/Time critical
  • Signal Processing
  • Shock physics
  • Structural mechanics
  • Vision and Geometric Computing

20
What are the Major Obstacles for sustained
Petaflops-scale Performance for Real Apps?
  • Getting a Petaflops computer
  • Getting Petaflops Apps
  • Getting Efficiency

21
Getting Efficiency
  • Overhead
  • work to manage program concurrency and resource
    parallelism
  • imposes upper bounds on scalability/granularity
  • Latency
  • distance in time (cycles) of service access
    requests
  • duration of waiting by operation sequence
  • Contention
  • time to service from shared resources
  • Starvation
  • sufficient concurrency to fill available
    resources
  • balance of workloads to engage all resources

22
(No Transcript)
23
0.3 m
1.4 m
4oK 50 W
77oK
SIDE VIEW
1 m
Fiber/Wire Interconnects
1 m
3 m
0.5 m
24
HTMT Percolation Model
CRYOGENIC AREA
DMA to CRAM
Split-Phase Synchronization to SRAM
done
start
C-Buffer
I-Queue
A-Queue
Parcel Invocation Termination
Parcel Assembly Disassembly
Parcel Dispatcher Dispenser
Re-Use
T-Queue
D-Queue
Run Time System
SRAM-PIM
DMA to DRAM-PIM
25
Getting Efficiency
  • Contention
  • hardware for bandwidth, logic throughput,
    hardware arbitration
  • Latency
  • multithreaded processor with hardware context
    switching
  • percolation for proactive prestaging of
    executables
  • PIM-DRAM PIM-SRAM provides smart data oriented
    mechanisms
  • Overhead
  • hardware context switching
  • in PIM smart synchronization and context
    management
  • proactive percolation performed in PIM
  • Starvation
  • dynamic load balancing
  • high speed processor for reduced parallelism
  • expose/exploit fine grain parallelism

26
Petaflops I Group Photograph
Write a Comment
User Comments (0)
About PowerShow.com