Title: A New Era for Computational Science
1A New Era for Computational Science
- NPACI Parallel Computing Institute
- August 28, 2000
- Sid Karin
- Director, NPACI/SDSC skarin_at_sdsc.edu
2SDSC
A National Laboratory for Computational Science
and Engineering
Leading-Edge Site for NPACI
3Continuing Evolution
NPACI
NPACI
SDSC
Resources
Resources
Education Outreach Training
Technology applications thrusts
Enabling technologies
Applications
Individuals
Partners
1985
2000
4A Distributed National Laboratory for
Computational Science and Engineering
5NPACI is a Highly Leveraged National Partnership
of Partnerships
46 institutions 20 states 4 countries 5 national
labs Many projects Vendors and industry Government
agencies
6Accelerate Scientific Discovery
Mission
- Through the development and implementationof
computationaland computerscience techniques
By creating a ubiquitous, continuous, and
pervasive national infrastructure the grid
7Changing How Science is Done
Vision
- Collect data from digital libraries,
laboratories, and observation - Analyze the data with models run on the grid
- Visualize and share data over the Web
- Publish results in a digital library
8Embracing the Scientific Community
Goals Fulfilling the Mission
- Capability Computing
- Provide compute and information resources of
exceptional capability - Discovery Environments
- Develop and deploy novel, integrated, easy-to-use
computational environments - Computational Literacy
- Extend the excitement, benefits, and
opportunities of computational science
9Partnership Organizing Principle Thrusts
Computational Literacy EOT
Discovery Environments APPLICATIONS
Discovery Environments TECHNOLOGIES
Molecular Science Neuroscience Earth Systems
Science Engineering
Metasystems Programming Tools
Environments Data-intensive Computing Interaction
Environments
Capability Computing RESOURCES
10Projects Meld Applications and Technology
Brain databases
Data-IntensiveComputingNeuroscience
Metasystems andParallel Tools Engineering
11Leadership Team
- Susan Graham, UC Berkeley Chief Computer
Scientistgraham_at_cs.berkeley.edu - Peter Taylor, SDSCChief Applications
Scientisttaylor_at_sdsc.edu - Wayne Pfeiffer, SDSCDeputy Directorpfeiffer_at_sdsc
.edu - Greg Moses, U WisconsinEducation, Outreach, and
Training Leadermoses_at_engr.wisc.edu
Sid Karin, SDSCDirectorskarin_at_sdsc.edu Peter
Arzberger, SDSC Executive Directorparzberg_at_sdsc.e
du Paul Messina, CaltechChief Architect(on
leave)
12NPACI Executive Committee
Andrew Grimshaw, U VirginiaMetasystems Joel
Saltz, U MarylandProgramming Tools and
Environments Reagan Moore, SDSCData-Intensive
Computing Arthur Olson, TSRIInteraction
Environments William Martin, U MichiganResource
Representative
Russ Altman, Stanford UMolecular Science Mark
Ellisman, UCSDNeuroscience Bernard Minster,
UCSD (SIO)Earth Systems Science Tinsley Oden, U
Texas (TICAM)Engineering James Pool,
CaltechResource Representative
Aron Kuppermann, Caltech User Representative
13NPACI Oversight
Institutional Oversight Board External Visiting
Committee
Directors Advisory Committee Users Advisory
Committee
UAC
EVC
IOB
DAC
Executive Committee
Leadership Team
Resource Partner Representatives
Applications Thrust Leaders
Technologies Thrust Leaders
14Budget Balance
SDSC
Partners
15Complementary rolesof five compute resource sites
- Leading-edge site (SDSC)
- Very high-performance resources
- IBM SP teraflops system
- Mid-range sites (U Texas U Michigan)
- Smaller systems compatible with LES
- Support for applications with limited
scalability, large-memory jobs, application
development, OS testing, and education - Alternate architecture research systems
- Caltech, UC Berkeley, SDSC
- Support for leading-edge applications, thrusts,
and evaluation
16Leading-Edge Site Supercomputer Roadmap
1 TFLOPSIBM SP1999
17NPACIs balanced complement ofhigh-end resources
for 2000
- Compute resources (SDSC 4 partners)
- IBM SP Teraflops system at SDSC
- Complementary systems at partner sites
- Data resources (SDSC 10 partners)
- gt180 TB mass store at SDSC
- gt100 GB data sets at partner sites
- Network resources (SDSC all partners)
- gt100 Mbps access to compute data resources
- Communications backbone for metacomputing
18IBM Selected as First NPACI Teraflops Vendor
- Strong commitment to high end by IBM
- Technology being developed through ASCI
- SDSC has largest system in US academia
- Growing partnership with IBM
191st Teraflops System for US Academia
Nov 1999
- 1 TFLOPs IBM SP
- 144 8-processor compute nodes
- 12 2-processor service nodes
- 1,176 Power3 processors at 222 MHz
- gt 640 GB memory (4 GB/node), upgrade to gt 1 TB
later - 6.8 TB switch-attached disk storage
- Largest SP with 8-way nodes
- High-performance access to HPSS
- Trailblazer switch interconnect with subsequent
upgrade
20Current Large SP Allocations
- Fundamental Physics
- T. Kinoshita, Cornell University
- R. Sugar, UC Santa Barbara
- Ab initio Biochemistry
- H. Scheraga, Cornell University
- A. McCammon, UC San Diego
- M. Klein, Univ. of Pennsylvania
- M. Gordon, Iowa State University
- Biomedicine
- A. Garfinkel, UCLA
- B. Pettitt, University of Houston
- Materials Science
- F. Abraham, IBM Almaden
- J. Kim, Ohio State University
- Fluid Dynamics
- K. Gubbins, Cornell University
- J. Kim, UCLA
- G. Karniadakis, Brown University
- Astrophysics
- P. Hauschildt, Univ. of Georgia
- J. Raeder, UCLA
- M. Ashour-Abdalla, UCLA
21NPACI alpha projects
- Bioinformatics Infrastructure for Large-Scale
Analyses - Protein Folding in a Distributed Computing
Environment - Telescience for Advanced Tomography Applications
- Multi-Component Models for Energy and the
Environment - Scalable Visualization Toolkits for Bays to Brains
22Bioinformatics Infrastructure for Large-Scale
Analyses
- Next-generation tools for accessing,
manipulating, and analyzing biological data - Russ Altman, Stanford University
- Reagan Moore, SDSC
- Analysis of Protein Data Bank, GenBank and other
databases - Accelerate key discoveries for health and medicine
23Protein Folding in a Distributed Computing
Environment
- Simulating protein movement governing reactions
within cells - Andrew Grimshaw, U Virginia
- Charles Brooks, The Scripps Research Institute
- Bernard Pailthorpe, UCSD/SDSC
- Computationally intensive
- Distributed computing power from Legion
24Telescience for Advanced Tomography Applications
- Integrates remote instrumentation, distributed
computing, federated databases, image archives,
and visualization tools. - Mark Ellisman, UCSD
- Fran Berman, UCSD
- Carl Kesselman, USC
- 3-D tomographic reconstruction of biological
specimens
25Multi-Component Modeling for Energy and the
Environment
- Simulating contaminant movement through
ecosystems - Leaders Joel Saltz, U Maryland and Johns Hopkins
U Mary Wheeler, U Texas - Will assist environmental cleanup efforts and
strategies - Engineering and environmental models linked
through metasystems and data manipulation tools
26Scalable Visualization Toolkits
- Vast data collections and large-scale simulations
require scalable visualization tools - Art Olson, The Scripps Research Institute
- Bernard Pailthorpe, SDSC/UCSD
- Art Toga, UCLA
- Carl Wunsch, MIT
- 3-D reconstruction, time-dependent modeling
27Examples of Additional Projects
- NPACI and SDSC activities
28MICE Transparent Supercomputing
- Molecular Interactive Collaborative Environment
- Gallery allows researchers, students to search
for, visualize, and manipulate molecular
structures - Integrates key SDSC technological strengths
- Biological databases
- Transparent supercomputing
- Web-based Virtual Reality Modeling Language
29The Protein Data Bank
- Worlds single scientific resource for depositing
and searching protein structures - Protein structure data growing exponentially
- 10,500 structures in PDB today
- 20,000 by the year 2001
- Vital to the advancement of biological sciences
- Working towards a digital continuum from primary
data to final scientific publication - Capture of primary data from high-energy
synchrotrons (e.g. Stanford Linear Accelerator
Center) requires 50Mbps network bandwidth
1CD3 The PDBs 10,000th structure.
30New Mode of Visualization
- Network-accessible TeleManufacturing
- 3-D hardcopy for visualization
- Used by many disciplines
- Molecules to Hurricanes
- Death Valley to Venus
- Reimann Zeta Function to Ozone Hole
31Digital Galaxy
- Collaboration with Hayden Planetarium
- American Museum of Natural History
- Support from NASA
- Linking SDSCs mass storage to Hayden Planetarium
requires 155 Mbps - MPIRE Galaxy Renderer
- Scalable volume visualization
- Linked to database of astronomical objects
- Produces translucent, filament-like objects
- An artificial nebula, modeled after a planetary
nebula
32The Digital Sky
- Billions of objects can be detected with optical,
infrared, and radio telescopes - Tens of terabytes of image and catalog data
- Digital Sky federating four sky surveys to allow
multi-wavelength studies across the data sets - DPOSS, 2MASS, NVSS, FIRST
- Tom Prince, Caltech, leading federation effort
- Uses MIX, SDSC SRB, and NPACI mass storage systems
A globular cluster from the DPOSS archive. Such
clusters provide a minimum age for the universe.
Image by Thomas Handley, Caltech.
33Looking out for San Diegos Regional Ecology
- Unique partnership
- 31 federal, state, regional,and local agencies
- John Helly, et al., SDSC
- Combines technologies and multi-agency data
- Sensing, analysis, VRML
- Physical, chemical, and biological data
- Web-based tool for science and public policy
34AMICO The Art of Managing Art
- Art Museum Image Consortium (AMICO)
- 28 art museums working toward educational use of
digital multimedia - Launch of the AMICO Library includes more than
50,000 works of art - AMICO, CDL, SDSC
- XML information mediation
- SDSC SRB data management
- Links between images, scholarly research,
educational material
35Mapping the Nets Terra Incognita
Nature Web Matters, 1/7/99. Science 10/16/98
36This is Only the Beginning...
YOU ARE HERE
TIME