Title: Building a National Cyberinfrastructure: Exploring the Landscape
1Building a National CyberinfrastructureExploring
the Landscape
- Miriam Heller, Ph.D.
- National Science Foundation
- Office of Cyberinfrastructure
NSF EPSCoR National Workshop on
Cyberinfrastructure Nashville, TN May 11, 2006
2Topics
- CyberInfrastructure (CI) at NSF Then and Now
- Strategic Planning Setting Directions
- OCI Investments Now and Later
- Closing Remarks
3NSF Cyberinfrastructure Investments
ANIR
4CI Genealogy Movement
PACI
HPCC
Courtesy of D.E. Atkins
5CI Genealogy Movement
- Atkins report - Blue-ribbon panel, chaired by
Daniel E. Atkins - Called for a national-level, integrated system of
hardware, software, data resources and services - New infrastructure to enable new paradigms of
science engineering research and education with
increased efficiency
www. nsf.gov/od/oci/reports/toc.jsp
Courtesy of D.E. Atkins
6CI Genealogy Movement
Courtesy of D.E. Atkins
7Office of CyberInfrastructure (OCI)
Debra Crawford Office Director (Acting) José
Muñoz Dep. Office Dir.
Judy Hayden Priscilla Bezdek Mary Daley Irene
Lombardo Allison Smith
ANL RP IU RP PU RP ORNL RP TACC RP MRI REU Sites
ETF GIG EIN IRNC Condor NMI Integ. Optiputer
CI-TEAM EPSCOR GriPhyN DISUN CCG NMI nanoHUB
HPC Acq. NCSA Core NCSA RP PSC RP
STI NMI Dev. CyberSecurity
SDSC Core SDSC RP
Kevin Thompson Program Director
Guy Almes Program Director
Fillia Makedon Program Director
Doug Gatchell Program Director
Miriam Heller Program Director
Steve Meacham Program Director
Chris Greer Program Director
Frank Scioli Program Director
Vittal Rao Program Director
ENG POC
SBE CI-Tools SBE POC
BIO POC Data, Data Tools
8Cyberinfrastructure Budgets
NSF 2006 CI Budget
OCI 2006 Budget 127M
OCI
25
Research Directorates
75
ETF CORE 56
FY07 182.42 (Request)
9Burgeoning Number of CI Systems
10 Cyberinfrastructure (CI) Governance
- OCI focuses on provisioning production-quality
CI to enable 21st century research and education
breakthroughs - CISE remains focused on basic CS research and
education mission - CyberInfrastructure Council (CIC)
- NSF ADs and ODs, chaired by Dr. Bement (NSF
Director) - CIC responsible for shared stewardship and
ownership of NSFs Cyberinfrastructure Portfolio - Advisory Committee for NSFs CI activities and
portfolio - Cyberinfrastructure User Advisor Committee (CUAC)
11NSF will play a leadership role
- NSF will play a leadership role in the
development and support of a comprehensive
cyberinfrastructure essential to 21st century
advances in science and engineering research and
education. - NSF is the only agency within the U.S. government
that funds research and education across all
disciplines of science and engineering. ... Thus,
it is strategically placed to leverage,
coordinate and transition cyberinfrastructure
advances in one field to all fields of research.
- From NSF Cyberinfrastructure Vision for the 21st
Century Discovery
12CI Vision 4 Interrelated Perspectives
High Performance Computing
Data, Data Analysis Visualization
http//www.nsf.gov/od/oci/ci_v5.pdf
13CyberInfrastructure Today Overarching Challenge
Distributed
- Distributed resources
- Compute engines (e.g. TeraGrid, Condor CPU farms,
SETI_at_home) - Pooled research funds (e.g. international
collaborations) - Distributed data
- Massive, distributed over real and socio-metric
spaces - Dynamic generation at the periphery of the
internet (sensor-nets, RFID, ) - Distributed users
- Extensive, networked teams spanning the globe
- Yet work as an integrated, holistic system
14Science Driven Cyberinfrastructure
- Trade-offs
- Interconnect fabric
- Processing power
- Memory
- I/O
Courtesy of NERSC
15Computing One Size Doesnt Fit All
Courtesy of SDSC
16HPC Computation TeraGrid
- Provides
- Unified user environment to support
high-capability, production-quality
cyberinfrastructure services for science
engineering research. - New SE opportunities using new ways to
distribute resources and services.
- Integrate grid services, incl.
- HPC
- Data collections
- Visualization servers
- Portals
- Distributed, open architecture
- GIG responsible for
- SW integration (incl. CTSS)
- Base infrastructure (security, networking, and
operations) - User support
- Community engagement (e.g. Science Gateways)
- 8 RPs
- PSC, TACC, NCSA, SDSC, ORNL, Indiana, Purdue,
Chicago/ANL - Other institutions participate as sub-awardees of
the GIG
17Wave Propagation in a Model of the Arterial
Circulation (Data of 55 main arteries from J.J.
Wang and K. Parker, 1997)
Grand-Challenge Problem Human Arterial Tree
Courtesy of G. Karniadakis
18Second Parallel TeraGrid Paradigm
Multiscale Simulation of Arterial Tree
Courtesy of G. Karniadakis
19Arterial-Tree Cross-Site Performance (Homogeneous
Network)
Fixed problem size
Fixed workload
- Three arteries 4 Million DOFs per artery
- 1 CPU/node on ANL 2CPUs/node on NCSA/SDSC
- No slown-down, full scalability
Courtesy of G. Karniadakis
20Arterial-Tree Cross-Site Performance (Heterogeneo
us Network)
- PSC connects to TG via application gateway
- Two arteries per site
- PSC proc2 GF vs 6 GF IA-64
Courtesy of G. Karniadakis
21Data, Data Analysis and Visualization
- Data curation issues
- Avoid creation of data mortuaries
- Catalyze development of SE data collections that
is open, extensible, evolvable - Challenges of petascale
- Substantial issues in policy and sustainable
economic - Support development of a new generation of tools
and services to facilitate transforming data into
knowledge - Data mining
- Integration
- Analysis
- Visualization
22Large Hadron Collider (LHC) Experiment
- 27 km Tunnel in Switzerland France
CMS
TOTEM
LHC _at_ CERN
ALICE
LHCb
- Search for
- Origin of Mass
- New fundamental forces
- Supersymmetry
- Other new particles
- 2007 ?
ATLAS
Courtesy of P. Avery
23LHC Petascale Global Science
- Complexity Millions of individual detector
channels - Scale PetaOps (CPU), 100s of Petabytes (Data)
- Distribution Global distribution of people
resources
CMS Example- 2007 5000 Physicists 250
Institutes 60 Countries
BaBar/D0 Example - 2004 700 Physicists 100
Institutes 35 Countries
Courtesy of P. Avery
24LHC Beyond Moores Law
LHC CPU Requirements
Moores Law (2000)
Courtesy of P. Avery
25LHC Global Data Grid (2007)
CMS Experiment
- 5000 physicists, 60 countries
- 10s of Petabytes/yr by 2008
- 1000 Petabytes in lt 10 yrs?
Online System
CERN Computer Center
150 - 1500 MB/s
Tier 0
10-40 Gb/s
Tier 1
gt10 Gb/s
Tier 2
2.5-10 Gb/s
Tier 3
Tier 4
Physics caches
PCs
Courtesy of P. Avery
26- Grid3 A National Grid Infrastructure
- October 2003 July 2005
- 32 sites, 3,500 CPUs Universities 4 national
labs - Sites in US, Korea, Brazil, Taiwan
- Applications in HEP, LIGO, SDSS, Genomics, fMRI,
CS
Brazil
www.ivdgl.org/grid3
Courtesy of P. Avery
27Common Middleware Virtual Data Toolkit
VDT
NMI
Test
Sources (CVS)
Build
Binaries
Build Test Condor pool 22 Op. Systems
Pacman cache
Package
Patching
RPMs
Build
Binaries
GPT src bundles
Build
Binaries
Test
Many Contributors
VDT Package, test, deploy, support, upgrade,
troubleshoot
Courtesy of P. Avery
28Berkeley Open Infrastructure for Network Computing
- BOINC lets you donate computing power to
scientific research projects such as - SETI_at_home search for extra-terrestrial
intelligence. - Rosetta_at_home help researchers develop cures for
human diseases. - Einstein_at_home search for gravitational signals
emitted by pulsars. - Quantum Monte Carlo at Home study the structure
and reactivity of molecules using Quantum
Chemistry. - folding_at_home advance our knowledge of human
disease (requires 5.2.1 or greater). - Predictor_at_home investigate protein-related
diseases. - Cell Computing biomedical research (Japanese
requires nonstandard client software). - SZTAKI Desktop Grid search for generalized
binary number systems. - Climateprediction.net, BBC Climate Change
Experiment, and Seasonal Attribution Project
study climate change. - LHC_at_home improve the design of the CERN LHC
particle accelerator
29NSF Middleware Initiative (NMI)
- Program to design, develop, test, deploy, and
sustain a set of reusable and expandable
middleware functions that benefit many science
and engineering applications in a networked
environment. - Define open-source, open-architecture standards
for on-line (international) collaboration
resource sharing that is sustainable, scalable,
and securable - Examples include
- Community-wide access to experimental data on the
Grid - Authorized resource access across multiple
campuses using common tools - Web-based portals that provide a common interface
to wide-ranging Grid-enabled computation
resources - Grid access to instrumentation such as
accelerators, telescopes
30Examples NMI-funded Activities
- From 2001-2004 funded gt 40 development awards
integration awards - Integration award highlights include NMI Grids
Center (e.g. Build and Test), Campus Middleware
Services (e.g. Shibolleth), and Nanohub - Condor Mature distributed computing system
installed on 1000s of CPU pools and 10s of
1000s of CPUs. - GridChem Open source Java application
launches/monitors computational chemistry
calculations (Gaussian03, GAMESS, NWChemMolpro,
Qchem, Aces) on CCG supercomputers remotely. - NanoHub Extends NSF Network for Computational
Nanotechnology applications, e.g., NEMO3D,
nanoMOS, to distributed environment over
Teragrid, U Wisconsin, other grid assets using
InVIGO, Condor-G, etc.
31Virtual Organizations
- CI relaxes/reduces constraints of time and
distance - Geographical
- Institutional
- Disciplinary
- Flattens the world
- Functionally complete VOs
- One stop shopping
- Via tailorable portals/gateways
32Virtual Organizations
- Catalyze the development and sustainability of
world-class virtual organizations for SE
research and education - Secure, efficient, reliable, accessible, usable,
- Support the development of CI resources, services
and tools for creation and operation of VOs
(national and inter-national) - Sustainable framework of resources supported by
many stakeholders - Federal, state, academia, public, commercial,
non-profit - Culture of sharing
33NanoHUB A Science Gateway
Collaboration
Research
nanoHUB Virtual Organization
Enabling CI
Education
Web presence Rappture middleware
Single ZIP Package
34Everything, Everywhere
Tiny computers that constantly monitor
ecosystems, buildings and even human bodies could
turn science on its head. Declan Butler
investigates.
35Cyberinfrastructure Tipping Point Information
Flow Reversal
courtesy S. Kim
36CyberInfrastructure Evolution Meeting the
challenge of distributed systems
- CI is evolving rapidly to meet the challenge of
distributed resources, data and users. - Middleware is key to managing extensive, complex
distributed entities. - New linkages needed between the cyber and
physical worlds to extend the magic of the
search, analyze and act paradigm beyond digital
files.
37DDDAS Paradigm
Challenges Dynamic Compilers Application
Composition Algorithms Computing Systems Support
38International Research Network Connections (IRNC)
- Awards
- TransPAC2 (U.S. Japan and beyond)
- GLORIAD (U.S. China Russia Korea)
- Translight/PacificWave (U.S. Australia)
- TransLight/StarLight (U.S. Europe)
- WHREN (U.S. Latin America)
39Learning and Our 21st Century CI
WorkforceCI-TEAM Demonstration Projects
- Input 70 projects / 101 proposals / 17 (24)
collaborative projects - Outcomes
- Invested 2.67 M in awards for projects up to
250K total over 1-2 years - 15.7 success rate in 11 Demonstration Projects
(14 proposals) across BIO, CISE, EHR, ENG, GEO,
MPS disciplines - Broadening Participation for CI Workforce
Development - Alvarez (FIU) CyberBridges
- Crasta (VaTech) Project-Centric Bioinformatics
- Fortson (Adler) CI-Enabled 21st c. Astronomy
Training for HS Science Teachers - Fox (IU) - Bringing Minority Serving Institution
Faculty into CI e-Science Communities - Gordon (OSU) Leveraging CI to Scale-Up a
Computational Science U/G Curriculum - Panoff (Shodor) Pathways to CyberInfrastructure
CI through Computational Science - Takai (SUNY Stonybrook) High School Distributed
Search for Cosmic Rays (MARIACHI) - Developing and Implementing CI Resources for CI
Workforce Development - DiGiano (SRI) Cybercollaboration Between
Scientists and Software Developers - Figueiredo (UFl) In-VIGO/Condor-G MW for
Coastal Estuarine Science CI Training - Regli (Drexel) CI for Creation and Use of
Multi-Disciplinary Engineering Models - Simpson (PSU) CI-Based Engineering Repositories
for Undergraduates (CIBER-U)
40FY05 CI-TEAM CIBER-U Demonstration Project
Community Databases for Research
EducationSimpson (PSU), Regli (Drexel), Stone
(UMo), Lewis (SUNY Buffalo)
- National Design Repository (NDR)
- Digital Library of over 55,000 CAD models and
assemblies - Serves over 1000 users per month
- Digital library in use for research will prepare
engineering undergraduates for collaborate design
and expose HS students to CAD/CAE - Implement CIBER-U in 7 U/G engineering design
courses to teach access, storage, search, reuse
of CAD models and data from NDR - Enhance and use collaboration tools in NDR
- Assess educational experience, learning,
practical knowledge impact - 1700 undergraduate students will participate in
CIBER-U - Prepare workforce for distributed,
technology-mediated environment, preferred by
automotive and aerospace industries today
41Spiral Design
We are here
Community Input
X
NSB
Call to Action
HPC
Data Viz
VO and LWD
Final version to be released Summer 2006
NSF Directorates Offices
of both CI activities OCI role and structure
42OCI Investment Highlights
- Midrange (Track 2) HPC Acquisition (30)
- Leadership Class High-Performance Computing
System (Track 1) Acquisition (50M) - Data- and Collaboration-Intensive Software
Services (25.7M) - Conduct applied research and development
- Perform scalability/reliability tests to explore
tool viability - Develop, harden and maintain software tools and
services - Provide software interoperability
- CI Training, Education, Advancement and Training
(10M)
43Acquisition Strategy
Science and engineering capability (logrithmic
scale)
Track 1 system(s)
Track 2 systems
Typical university HPC systems
FY06
FY09
FY08
FY07
FY10
44HPC Acquisition Activities
- HPC acquisition will be driven by the needs of
the SE community - RFI held for interested Resource Providers and
HPC vendors on 9 Sep 2005 - First in a series of HPC SE requirements
workshops held 20-21 Sep 2005 - Generated Application Benchmark Questionnaire
- Attended by 77 scientists and engineers
- Proposal review nearly complete
45HPC Acquisition Track 1
- Increased funding will support first phase of a
Petascale System acquisition - Over four years NSF anticipates investing 200M
- Acquisition is critical to NSFs multi-year plan
to deploy and support world-class HPC environment - Collaborating with sister agencies with a stake
in HPC (DARPA, HPCMOD, DOE/OS, NNSA, NIH)
46Cyberinfrastructure Training, Education,
Advancement, and Mentoring (CI-TEAM) FY06
- Prepare a cyber-savvy workforce for discovery,
innovation, and learning in and across all areas
of science and engineering. - Exploit Cyberinfrastructure to cross digital,
disciplinary, institutional, and geographic
divides and fosters inclusion of emphasis on
traditionally underrepresented groups. - Focus on workforce development activities lt50
tool development. - FY06 program funds 10 M for two types of
awards - Demonstration Projects ( FY05 projects,
exploratory, limited in scope and scale,
potential to expand to implementation scale
250,000) - Implementation Projects (larger in scope or
scale, draw on prior experience, deliver
sustainable learning and workforce development
activities that complement ongoing NSF investment
in cyberinfrastructure 1,000,000).
New CI-TEAM Solicitation Due June 5, 2006
47OCI as a Broker of Informed Mutual Self-Interest
Borromean Ring Synergy iterative, participatory
design collateral learning.
Three symmetric, interlocking rings, no two of
which are interlinked. Removing one destroys the
synergy.
48Alignment of Stakeholders Towards Achieving
Strategic Goals
K-20
49Concluding Remarks
- NSF has taken a leadership role in CI and is
working to define the vision and future
directions - Successful past investments position CI for the
Revolution - Achieving the goal of provisioning CI for 21st
breakthrough science and engineering research and
education depends on the successful investment in
the development and deployment of useful,
appropriate, usable, used, sustainable CI
resources, tools, and services complemented by
investment in a cyber-savvy workforce - Need PIs to
- Identify and advise NSF and others on CI needs
- Track growing CI use and performance
- Demonstrate and share breakthrough research and
education
50Closing Remarks
- Leadership Computing Massive data generation at
the periphery of the net -- but middleware
filters streaming to the core. - End-to-end data solutions / fusing
research-learning - Middleware (and not the CPU) lies at the heart of
the emerging cyberinfrastructure paradigm. - Dynamic data and the new (bidirectional) flow
from the core to the periphery so its not just
the issue of massive wave and distributed
system. - New dynamic compilers application composition
- Architecturally specific / robust algorithms
51Thank You!
?
- Miriam Heller, Ph.D.
- Program Director
- Office of Cyberinfrastructure
- Suite 1145
- National Science Foundation
- Tel 1.703.292.7025 Email mheller_at_nsf.gov
52nanoHUB Enhanced Research
53nanoHUB Enhanced Computation
Remote access to simulators and compute power
nanoHUB infrastructure
internet
nanoHUB.org Web site
NMI Cluster
Remote desktop (VNC)
54nanoHUB Enhanced Collaboration
Single Sign-on
Auto registration to generic course rooster
Assessment Hosting
Score Reporting
- Integration code for the nanoHUB single sign-on
is being tested - Potential to host learning modules in Sakai
- Ability to report scores from assessments in
learning modules is key - Conduct research studies based on assessment data
55Other Middleware Funding
- OCI made a major award in middleware in November
2005 to Foster/Kesselman - "Community Driven Improvement of Globus
Software", 13.3M award over 5 years - Ongoing funding to Virtual Data Toolkit (VDT)
middleware via OCI and MPS OSG activities,
including - DiSUN is a 5 year 12 M award for computational,
storage, middleware resources at four Tier-2 site - GriPhyN (ITR) lead to iVDGL, VDT, VDS
- Ongoing funding to VDT middleware via TeraGrid as
part of the CTSS
56Principles of RFID