Title: Building the Teragrid
1Building the Teragrid
- NPACI All-Hands Meeting
- March 7, 2002
- Carl Kesselman
- Chief Software Architect
- NPACI
- carl_at_isi.edu
2What are we building?
- A big computer yes
- That would be the tera part
- A fast network.
- Good for getting data in and out
- But what about the grid bit?
- Why are we bothering?
3Computational Science is a Team Sport
- Collaborations within projects
- PI, co-investigators, students
- National and international collaborations
- Southern California Earthquake Center (SCEC)
- NPACI !!
- Collaborations across disciplines
- Collaboration across resources
- E.g. data federation (National Virtual
Observatory, BIRN)
4TeraGrid Application Exemplars
- Traditional supercomputing made simpler
- remote access to data archives and computers
- Distributed data archive access and correlation
- Remote rendering and visualization
- Remote sensor and instrument coupling
5Teragrid Means New Science
Stimulus Generation
Understanding The Functional Organization of the
Cricket Sensory System (Gwen Jacobs)
Data Acquisition
Grid Middleware
Brain Simulation Signal Analysis
6The Grid Problem
- Resource sharing coordinated problem solving
in dynamic, multi-institutional virtual
organizations
7Putting the Grid in TeraGrid
- Grid software defines DTF as a system
- Heterogeneity and distribution should be and
advantage! - Grid infrastructure
- Intersite security, scheduling, discovery,
resource coordination, resource management, - TeraGrid poses unique challenges
- Issues of scale and performance
- Work at NPACI (and Alliance) has positioned us
well to address these problems
8Layered Grid Architecture(By Analogy to Internet
Architecture)
Application
9Teragrid is not an Island
- Four DTF sites a core
- Must form an integrated facility
- End user sites important components
- Strong need to integrate into other high-end and
specialized facilities - Experimental facilities, PSC, HEP Tier II sites,
campus facilities
10NSF Middleware Initiative
- NSF Funded Project to build national middleware
infrastructure - USC/ISI, SDSC, U. Wisc., ANL, NCSA, I2
- Software Integration (NMI Software Releases)
- Interoperability
- Testing
- Install, Configure, Manage
- University Campus Infrastructure Integration
- Campus Authentication / GSI
- Enterprise Directories / GSI and MDS
- Use NMI as Teragrid Baseline
- Specialize for Teragrid unique aspects (e.g. Viz
resources)
11NMI-R1 Software Components
- Globus Toolkit
- Condor-G
- Network Weather Service
- KX.509 / KCA
- Certificate Profile Maker
- Pubcookie
- Grid Packaging Tools
12NPACI Specific Teragrid software
- Applications via alpha projects
- Already layered on NPACI technologies
- Need to refocus NPACI Grid technologies for
TeraGrid - Build on NMI foundation
- Contribute to NMI foundation
13Legion-G Architecture
Grid-enabled DFS
CHARMM
LegionFS
Tools for Easily Creating and Invoking Grid
Services
Legion GUIs for P-Space Studies
CCA
SRB
Legion-G Grid-Enabled Object Model
NWS
Legion-G Object Process Instantiation
Integration with LegionFS
Security in and across Legion-G Objects
Information Services for Legion-G Objects
GSI
GRAM
GRIP
GridFTP
14Immediate Term Legion-G and the DTF
- Transition existing NPACI Legion use and
infrastructure to Legion-G - Persistent infrastructure and tools for
scientific discovery - Protein Folding on the Grid CHARMM portal and
Legion-G support (with Charles Brooks III, TSRI) - Exposure of PDB via LegionFS
- Work with scientists to use Legion-G for
collaboration in the DTF
15DataCutter
- Purpose Specialized components for processing
data - Based on Active Disks research Acharya, Uysal,
Saltz ASPLOS98, - filters logical unit of computation
- reads data from input buffers, compute/filter/aggr
egate data, then writes data to output buffers - filter can only carry out subsetting,
commutative/associative data aggregations - subsetting implemented by (among other tings)
multilevel hierarchical indexes based on R-tree
indexing method. - filter computations are pipelined
- streams how filters communicate
- unidirectional buffer pipes
- application developer specifies connectivity
between filters - copies because of the way filters are defined,
each filters computations can be carried out by
a system defined number of transparent copies
16Teragrid Data Processing Infrastructure
DataCutter/Globus/NWS
- DataCutter
- filters subset, filter, aggregate data
- Network Weather Service
- used to provide network, processor performance
information used in placing filter copies,
determining which data replicas to use - Globus
- used to run filter code, track location of data
replicas, maintain updated NWS performance
information, record location of active filter
copies.
17Storage Resource Broker
- Current
- Authentication
- GSI certificate mapped to local user
- local user account authenticated to SRB
- Access
- GSI-FTP callbacks mapped to SRB client calls
- Future
- Authentication
- Pass GSI certificate directly to SRB
- Access
- GSI-FTP callbacks mapped to SRB server calls.
- Support for creating/specifying containers,
replication,subsetting
18NWS as an Grid Information Provider
GIIS
GRIP
GRRP
GIIS
GRIS
sensor data
19Its Also a Client
NWS
Sensors
NWS Process Registrations
Forecasters
GIIS or GRIS
Reporters
Sensor Data
Persistent State
20Summary
- Teragrid is an opportunity for broadening the
scope of computational science - Infrastructure must enable applications
- Teragrid is about a system
- Grid middleware a critical part of the
infrastructure - Integration into the global Grid environment is
essential - NPACI technology can
- Provide value added
- Provide application transition
- Lot of work in the next three years
- Logistics and development