Title: Introduction to eInfrastructure
1Introduction toeInfrastructure
- Jennifer M. Schopf
- UK National eScience Centre
- Argonne National Lab
2Talk Outline
- Definition of Grids, eInfrastructure, and
eResearch - JISC plans
- Globus Toolkit
- Provider of basic infrastructure
- Focus on data tools
- OMII Open Middleware Infrastructure
- UK repository and distribution of eResearch tools
3What is a Grid?
- Many definitions many differences especially
between academics and industry - Both use the buzzword to get funding
- My definition
- Resource sharing
- Coordinated problem solving
- Dynamic, multi-institutional virtual orgs
4Resource Sharing
- Resources can be anything-
- Computers
- Storage/repositories
- Sensors and Networks
- People and software
- Local Control of the resources, and local
policies for their use - Sharing is always conditional
- Issues of trust, policy
- Negotiation and payment
5Coordinated Problem Solving
- Beyond client-server
- Client Server defines a small set of
well-understood interactions as the only ones
that can take place - Actions in this space can include
- Distributed data analysis
- Computation and visualization of results
- Collaboration
6Virtual Organization (VO) Concept
- VO for each application or workload
- Carve out and configure resources for a
particular use and set of users
7Dynamic, Multi-institutionalVirtual Organizations
- Crossing administrative domains
- No one has full control over the resources
- Local policy not global
- Different local policy on different sites
- Community overlays on classic organizational
structures - Large or small, static or dynamic
8What is eScience or eResearch?
- Use of distributed resources, in a coordinated
way, across multiple administrative domains to do
science or further your research - Classic eScience
- Use compute and data resources at many sites to
run large scale simulations for a physics or
biology application - Todays Use Cases
- Replicate data across multiple sites to increase
reliability, redundancy and performance - Use one common interface to access a variety of
data resources at multiple sites - Look at a number of available resources to select
the one that best suits the application needs at
this time
9What is eInfrastructure?
- A framework (political, technological and
administrative) for the easy and cost-effective
shared use of distributed electronic resources
across a geographical area - The combination of research infrastructure,
grid, and broadband technologies projects - Anything that enables eScience, collaborative
research distributed, persistent, reliable,
accessible services - Broader than Grids - includes things like
digital libraries, networking, etc - current Grid-based eInfrastructure model
10How does JISC define it?
- Similar to NSFs cyberinfrastructure work
(CIGrids) - Tony Hey (JCSR chair) says
- A national eInfrastructure to support
collaborative and multidisciplinary research and
innovation is the joint responsibility of RCUK
(OST) and JISC (HEFCs) - 2006 eInfrastructureGrid initiatives continue
building advanced Grid-empowered infrastructures - Production quality ready-to-use SW
- Environments dynamically adaptable to user needs
11Malcolm Read has said
- E-infrastructure includes
- Networks (internet, light paths)
- Computers (workstations, servers, HPC)
- Access controls (security, AAA)
- Middleware (metadata)
- Finding tools (portals, search engines)
- Digital libraries (bibliographic, text, images,
sound) - Research data (national and scientific databases,
individual data)
12JISC funding for eInfrastructure
- July 27 05 press release for additional funds
- http//www.jisc.ac.uk/ index.cfm?namenews_spendin
greview - Continued development of JANET
- Further digitisation of major scholarly
collections - Enhancement to e-learning programmes, (e-assmt,
e-portfolios, e-learning tools) - Development of the e-infrastructure
- Incl development of collaborative envts
- Development of a shared infrastructure to support
use of institutional repositories
13Much Still To Be Defined
- Ive been told 11M specifically for
eInfrastructure - Starting in April 2006, 2 years of funding
- Programme manager being hired
- OST roadmap is basis (due by March, no draft
available yet) - areas are (no mapping to funding amount)
- 1 Middleware/AA/DRM
- 2 Networks and Computer Power (Hardware)
- 3 Preservation and Curation
- 4 Search and Navigation
- 5 Data and Information Creation
- 6 Virtual Research Communities
14JISC cont.
- When this is better formulated, it will be
broadcast widely - Theres a JCSR meeting in mid February where some
of it should be solidified
15Questions on Definitions or JISC?
16Two Common eInfrastructure Approaches in the UK
- Globus Toolkit
- Open Middleware Infrastructure Institute (OMII)
17What functionality isneeded to use a Grid?
- Basics
- Run a job
- Transfer a file
- Find out whats going on (service and job
monitoring - All done securely
- Higher-level
- Replication
- Higher level data movement
- Workflow-scheduling
18Globus ToolkitWas Created To Help Applications
- The Globus Toolkit consists of collections of
solutions to problems that frequently come up
when trying to build collaborative distributed
applications - Heterogeneity
- Focus on simplifying heterogeneity for
application developers - Working towards more vertical solutions
- Standards
- Capitalize on and encourage use of existing
standards (IETF, W3C, OASIS, GGF) - Reference implementations of new/proposed
standards in these organizations - Open source, open contribution model
19Globus is an Hour Glass
Higher-Level Services and Users
- Local sites have an their own policies, installs
heterogeneity! - Queuing systems, monitors, network protocols, etc
- Globus unifies
- Build on Web services
- Use WS-RF, WS-Notification to represent/access
state - Common management abstractions interfaces
Standard GT4 Interfaces
Local heterogeneity
20Globus Toolkit Open Source Grid Infrastructure
Globus Toolkit v4 www.globus.org
Data Replication
Replica Location
Grid Telecontrol Protocol
Data Access Integration
Community Scheduling Framework
Python Runtime
Reliable File Transfer
C Runtime
Workspace Management
Authentication Authorization
Grid Resource Allocation Management
Java Runtime
Data Mgmt
Execution Mgmt
Info Services
21GT4 Web Services Core
- Supports both GT (GRAM, RFT, Delegation, etc.)
user-developed services - Redesign to enhance scalability, modularity,
performance, usability - Leverages existing WS standards
- WS-I Basic Profile WSDL, SOAP, etc.
- WS-Security, WS-Addressing
- Adds support for emerging WS standards
- WS-Resource Framework, WS-Notification
- Java, Python, C hosting environments
- Java is standard Apache
22WSRF WS-Notification
- Naming and bindings (basis for virtualization)
- Every resource can be uniquely referenced and has
one or more associated services for interacting - Lifecycle (basis for resilient state management)
- Resources created by svcs following a factory
pattern - Resource destroyed immediately or scheduled
- Information model (basis for monitoring
discovery) - Resource properties associated with resources
- Operations for querying and setting this info
- Asynchronous notification of changes to
properties - Service groups (basis for registries collective
svcs) - Group membership rules and membership management
- Base fault type
- The definition of WSRF means that the Grid and
Web services communities can move forward on a
common base - Why Not Just Use XML/SOAP?
- WSRF and WS-N are just XML and SOAP
- WSRF and WS-N are just Web services
- Benefits of following the specs
- These patterns represent best practices that have
been learned in many Grid applications - There is a community behind them
- Why reinvent the wheel?
- Standards facilitate interoperability
24Basic Globus Security Mechanisms
- Grid-wide identities implemented as PKI
certificates - Transport-level and message-level authentication
- Ability to delegate credentials to agents
- Ability to map between Grid local identities
- Local security administration enforcement
- Single sign-on support implemented as proxies
- A plug in framework for authorization decisions
25The Challenge of GridResource Management
- Enabling secure, controlled remote access to
heterogeneous computational resources and
management of remote computation - Authentication and authorization
- Resource discovery characterization
- Reservation and allocation
- Computation monitoring and control
- Addressed by a set of protocols services
- GRAM protocol as a basic building block
- Resource brokering co-allocation services
- GSI for security, MDS for discovery
26GT4 ExecutionManagement (GRAM)
- Common WS interface to schedulers
- Unix, Condor, LSF, PBS, SGE,
- More generally interface for process execution
management - Lay down execution environment
- Stage data
- Monitor manage lifecycle
- Kill it, clean up
- A basis for application-driven provisioning
27A Model Architecture for Data Grids
Attribute Specification
Replica Loc. Svc
Metadata Catalog
Multiple Locations
Logical Collection and Logical File Name
Selected Replica
Replica Selection
Performance Information Predictions
GridFTP Control Channel
Disk Cache
Tape Library
Disk Array
Disk Cache
Replica Location 1
Replica Location 2
Replica Location 3
28GT4 Data Functions
- Find your data Replica Location Service
- Managing 40M files in production settings
- Move/access your data
- GridFTP, Reliable File Transfer (RFT)
- High-performance striped data movement
- Couple data execution management
- GRAM uses GridFTP and RFT for staging
- Access databases through standard Grid
interfaces OGSA-DAI
29GridFTP in GT4
- Basic file transfer support, and memory-to-memory
copies - High-performance, secure, reliable data transfer
- Optimized for high-bandwidth wide-area networks
- FTP with well-defined extensions
- Uses basic Grid security (control and data
channels) - Multiple data channels for parallel transfers
- Partial file transfers
- Third-party (direct server-to-server) transfers
- Performance tuning
- Greatly improve performance over most FTP
implementations - On TeraGrid network achieved 27 Gbs on a 30 Gbs
link (90 utilization) with 32 nodes
30Reliable File TransferThird Party Transfer
- Fire-and-forget transfer
- Web services interface
- Many files directories
- Integrated failure recovery
RFT Client
SOAP Messages
RFT Service
GridFTP Server
GridFTP Server
- Data access
- Relational XML Databases, semi-structured files
- Data integration
- Multiple data delivery mechanisms, data
translation - Extensible Efficient framework
- Request documents contain multiple tasks
- A task execution of an activity
- Group work to enable efficient operation
- Extensible set of activities
- gt 30 predefined, framework for writing your own
- Moves computation to data
- Pipelined and streaming evaluation
- Concurrent task evaluation
32Monitoring and Discovery System(MDS4)
- Grid-level monitoring system used most often for
resource selection - Aid user/agent to identify host(s) on which to
run an application - Uses standard interfaces to provide publishing of
data, discovery, and data access, including
subscription/notification - WS-ResourceProperties, WS-BaseNotification,
WS-ServiceGroup - Functions as an hourglass to provide a common
interface to lower-level monitoring tools
33MDS4 Components
- Information providers
- Basic data sources queue data, cluster data,
etc - Can be from web services, executables, files
- Index Service
- Caching registry of data
- Trigger Service
- Warnings when conditions are met
- WebMDS
- Visualization of data
34(No Transcript)
35Tested Platforms
- Debian
- Fedora Core
- FreeBSD
- Red Hat
- Sun Solaris
- SGI Altix (IA64 running Red Hat)
- SuSE Linux
- Tru64 Unix
- Apple MacOS X (no binaries)
- Windows Java components only
- List of binaries and known platform-specific
install bugs at - http//www.globus.org/toolkit/docs/4.0/admin/
docbook/ ch03.html
36Many Tools Build on, or Can Contribute to,
GT4-Based Grids
- Condor-G, DAGman
- Nimrod-G
- Ninf-G
- Open Grid Computing Env.
- Commodity Grid Toolkit
- GriPhyN Virtual Data System
- Virtual Data Toolkit
- GridXpert Synergy
- Platform Globus Toolkit
- Sun Grid Engine
- PBS scheduler
- LSF scheduler
- GridBus
- TeraGrid CTSS
- IBM Grid Toolbox
37Any questions about Globus?
38Open MiddlewareInfrastructure Institute
To be a leading provider of reliable
interoperable and open-source Grid middleware
components services and tools to support advanced
Grid enabled solutions in academia and industry.
- Formed University of Southampton (2004)
- Focus on an easy to install e-Infrastructure
solution - Utilise existing software standards
- Expanding with new partners in 2006
- OGSA-DAI team at Edinburgh
- myGrid team at Manchester
Slides compliments of Steven Newhouse
39OMII Functions
- Provide a software repository of Grid components
and tools from e-science projects - Re-engineering software, harden it, and provide
support for components sourced from the community - Contract the development of missing software
components necessary in grid middleware (managed
programme) - Provide an integrated grid middleware release of
the sourced software components
Slides compliments of Steven Newhouse
40The Managed Programme Distribution and Repository
- OGSA-DAI (Data Access service)
- GridSAM (Job Submission Monitoring service)
- Grimoires (Registry service based on UDDI)
- GeodiseLab (Matlab Jython environments)
- FINS (Notification services using WS-Eventing)
- BPEL (Workflow service)
- MANGO (Managing workflows with BPEL)
- FIRMS (Reliable messaging)
Slides compliments of Steven Newhouse
- eInfrastructure has many definitions but
basically its Grid computing - JISC has funding for this but havent yet
defined where it will be spent - Globus Toolkit provides many basic tools, and is
incorporated in many projects, esp those focused
on data movement - In the UK, OMII is another useful source of
eInfrastructure software
42Additional Information
- Contact
- Jennifer M. Schopf
- jms_at_mcs.anl.gov
- http//www.mcs.anl.gov/jms
- Globus Alliance
- http//www.globus.org
- Information about OMII
- http//www.omii.ac.uk
- s.newhouse_at_omii.ac.uk