Introduction to eInfrastructure - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

Introduction to eInfrastructure

Description:

Definition of Grids, eInfrastructure, and ... Community overlays on classic organizational structures. Large or small, static or dynamic ... 'Classic' eScience ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 41
Provided by: Carl1173
Category:

less

Transcript and Presenter's Notes

Title: Introduction to eInfrastructure


1
Introduction toeInfrastructure
  • Jennifer M. Schopf
  • UK National eScience Centre
  • Argonne National Lab

2
Talk Outline
  • Definition of Grids, eInfrastructure, and
    eResearch
  • JISC plans
  • Globus Toolkit
  • Provider of basic infrastructure
  • Focus on data tools
  • OMII Open Middleware Infrastructure
  • UK repository and distribution of eResearch tools

3
What is a Grid?
  • Many definitions many differences especially
    between academics and industry
  • Both use the buzzword to get funding
  • My definition
  • Resource sharing
  • Coordinated problem solving
  • Dynamic, multi-institutional virtual orgs

4
Resource Sharing
  • Resources can be anything-
  • Computers
  • Storage/repositories
  • Sensors and Networks
  • People and software
  • Local Control of the resources, and local
    policies for their use
  • Sharing is always conditional
  • Issues of trust, policy
  • Negotiation and payment

5
Coordinated Problem Solving
  • Beyond client-server
  • Client Server defines a small set of
    well-understood interactions as the only ones
    that can take place
  • Actions in this space can include
  • Distributed data analysis
  • Computation and visualization of results
  • Collaboration

6
Virtual Organization (VO) Concept
  • VO for each application or workload
  • Carve out and configure resources for a
    particular use and set of users

7
Dynamic, Multi-institutionalVirtual Organizations
  • Crossing administrative domains
  • No one has full control over the resources
  • Local policy not global
  • Different local policy on different sites
  • Community overlays on classic organizational
    structures
  • Large or small, static or dynamic

8
What is eScience or eResearch?
  • Use of distributed resources, in a coordinated
    way, across multiple administrative domains to do
    science or further your research
  • Classic eScience
  • Use compute and data resources at many sites to
    run large scale simulations for a physics or
    biology application
  • Todays Use Cases
  • Replicate data across multiple sites to increase
    reliability, redundancy and performance
  • Use one common interface to access a variety of
    data resources at multiple sites
  • Look at a number of available resources to select
    the one that best suits the application needs at
    this time

9
What is eInfrastructure?
  • A framework (political, technological and
    administrative) for the easy and cost-effective
    shared use of distributed electronic resources
    across a geographical area
  • The combination of research infrastructure,
    grid, and broadband technologies projects
  • Anything that enables eScience, collaborative
    research distributed, persistent, reliable,
    accessible services
  • Broader than Grids - includes things like
    digital libraries, networking, etc
  • current Grid-based eInfrastructure model

10
How does JISC define it?
  • Similar to NSFs cyberinfrastructure work
    (CIGrids)
  • Tony Hey (JCSR chair) says
  • A national eInfrastructure to support
    collaborative and multidisciplinary research and
    innovation is the joint responsibility of RCUK
    (OST) and JISC (HEFCs)
  • 2006 eInfrastructureGrid initiatives continue
    building advanced Grid-empowered infrastructures
  • Production quality ready-to-use SW
  • Environments dynamically adaptable to user needs

11
Malcolm Read has said
  • E-infrastructure includes
  • Networks (internet, light paths)
  • Computers (workstations, servers, HPC)
  • Access controls (security, AAA)
  • Middleware (metadata)
  • Finding tools (portals, search engines)
  • Digital libraries (bibliographic, text, images,
    sound)
  • Research data (national and scientific databases,
    individual data)

12
JISC funding for eInfrastructure
  • July 27 05 press release for additional funds
  • http//www.jisc.ac.uk/ index.cfm?namenews_spendin
    greview
  • Continued development of JANET
  • Further digitisation of major scholarly
    collections
  • Enhancement to e-learning programmes, (e-assmt,
    e-portfolios, e-learning tools)
  • Development of the e-infrastructure
  • Incl development of collaborative envts
  • Development of a shared infrastructure to support
    use of institutional repositories

13
Much Still To Be Defined
  • Ive been told 11M specifically for
    eInfrastructure
  • Starting in April 2006, 2 years of funding
  • Programme manager being hired
  • OST roadmap is basis (due by March, no draft
    available yet)
  • areas are (no mapping to funding amount)
  • 1    Middleware/AA/DRM                           
                
  • 2    Networks and Computer Power (Hardware)  
  • 3    Preservation and Curation                   
              
  • 4    Search and Navigation                       
                 
  • 5    Data and Information Creation               
              
  • 6    Virtual Research Communities  

14
JISC cont.
  • When this is better formulated, it will be
    broadcast widely
  • Theres a JCSR meeting in mid February where some
    of it should be solidified

15
Questions on Definitions or JISC?
16
Two Common eInfrastructure Approaches in the UK
  • Globus Toolkit
  • Open Middleware Infrastructure Institute (OMII)
    release

17
What functionality isneeded to use a Grid?
  • Basics
  • Run a job
  • Transfer a file
  • Find out whats going on (service and job
    monitoring
  • All done securely
  • Higher-level
  • Replication
  • Higher level data movement
  • Workflow-scheduling

18
Globus ToolkitWas Created To Help Applications
  • The Globus Toolkit consists of collections of
    solutions to problems that frequently come up
    when trying to build collaborative distributed
    applications
  • Heterogeneity
  • Focus on simplifying heterogeneity for
    application developers
  • Working towards more vertical solutions
  • Standards
  • Capitalize on and encourage use of existing
    standards (IETF, W3C, OASIS, GGF)
  • Reference implementations of new/proposed
    standards in these organizations
  • Open source, open contribution model

19
Globus is an Hour Glass
Higher-Level Services and Users
  • Local sites have an their own policies, installs
    heterogeneity!
  • Queuing systems, monitors, network protocols, etc
  • Globus unifies
  • Build on Web services
  • Use WS-RF, WS-Notification to represent/access
    state
  • Common management abstractions interfaces

Standard GT4 Interfaces
Local heterogeneity
20
Globus Toolkit Open Source Grid Infrastructure
Globus Toolkit v4 www.globus.org
Data Replication
Replica Location
Grid Telecontrol Protocol
CredentialMgmt
Data Access Integration
Community Scheduling Framework
Delegation
Python Runtime
WebMDS
Reliable File Transfer
CommunityAuthorization
Trigger
C Runtime
Workspace Management
GridFTP
Authentication Authorization
Grid Resource Allocation Management
Index
Java Runtime
Data Mgmt
Security
CommonRuntime
Execution Mgmt
Info Services
21
GT4 Web Services Core
  • Supports both GT (GRAM, RFT, Delegation, etc.)
    user-developed services
  • Redesign to enhance scalability, modularity,
    performance, usability
  • Leverages existing WS standards
  • WS-I Basic Profile WSDL, SOAP, etc.
  • WS-Security, WS-Addressing
  • Adds support for emerging WS standards
  • WS-Resource Framework, WS-Notification
  • Java, Python, C hosting environments
  • Java is standard Apache

22
WSRF WS-Notification
  • Naming and bindings (basis for virtualization)
  • Every resource can be uniquely referenced and has
    one or more associated services for interacting
  • Lifecycle (basis for resilient state management)
  • Resources created by svcs following a factory
    pattern
  • Resource destroyed immediately or scheduled
  • Information model (basis for monitoring
    discovery)
  • Resource properties associated with resources
  • Operations for querying and setting this info
  • Asynchronous notification of changes to
    properties
  • Service groups (basis for registries collective
    svcs)
  • Group membership rules and membership management
  • Base fault type

23
WSRF vs XML/SOAP
  • The definition of WSRF means that the Grid and
    Web services communities can move forward on a
    common base
  • Why Not Just Use XML/SOAP?
  • WSRF and WS-N are just XML and SOAP
  • WSRF and WS-N are just Web services
  • Benefits of following the specs
  • These patterns represent best practices that have
    been learned in many Grid applications
  • There is a community behind them
  • Why reinvent the wheel?
  • Standards facilitate interoperability

24
Basic Globus Security Mechanisms
  • Grid-wide identities implemented as PKI
    certificates
  • Transport-level and message-level authentication
  • Ability to delegate credentials to agents
  • Ability to map between Grid local identities
  • Local security administration enforcement
  • Single sign-on support implemented as proxies
  • A plug in framework for authorization decisions

25
The Challenge of GridResource Management
  • Enabling secure, controlled remote access to
    heterogeneous computational resources and
    management of remote computation
  • Authentication and authorization
  • Resource discovery characterization
  • Reservation and allocation
  • Computation monitoring and control
  • Addressed by a set of protocols services
  • GRAM protocol as a basic building block
  • Resource brokering co-allocation services
  • GSI for security, MDS for discovery

26
GT4 ExecutionManagement (GRAM)
  • Common WS interface to schedulers
  • Unix, Condor, LSF, PBS, SGE,
  • More generally interface for process execution
    management
  • Lay down execution environment
  • Stage data
  • Monitor manage lifecycle
  • Kill it, clean up
  • A basis for application-driven provisioning

27
A Model Architecture for Data Grids
Attribute Specification
Replica Loc. Svc
Metadata Catalog
Application
Multiple Locations
Logical Collection and Logical File Name
MDS
Selected Replica
Replica Selection
Performance Information Predictions
NWS
GridFTP Control Channel
Disk Cache
GridFTPDataChannel
Tape Library
Disk Array
Disk Cache
Replica Location 1
Replica Location 2
Replica Location 3
28
GT4 Data Functions
  • Find your data Replica Location Service
  • Managing 40M files in production settings
  • Move/access your data
  • GridFTP, Reliable File Transfer (RFT)
  • High-performance striped data movement
  • Couple data execution management
  • GRAM uses GridFTP and RFT for staging
  • Access databases through standard Grid
    interfaces OGSA-DAI

29
GridFTP in GT4
  • Basic file transfer support, and memory-to-memory
    copies
  • High-performance, secure, reliable data transfer
  • Optimized for high-bandwidth wide-area networks
  • FTP with well-defined extensions
  • Uses basic Grid security (control and data
    channels)
  • Multiple data channels for parallel transfers
  • Partial file transfers
  • Third-party (direct server-to-server) transfers
  • Performance tuning
  • Greatly improve performance over most FTP
    implementations
  • On TeraGrid network achieved 27 Gbs on a 30 Gbs
    link (90 utilization) with 32 nodes

30
Reliable File TransferThird Party Transfer
  • Fire-and-forget transfer
  • Web services interface
  • Many files directories
  • Integrated failure recovery

RFT Client
SOAP Messages
Notifications(Optional)
RFT Service
GridFTP Server
GridFTP Server
31
OGSA-DAI
  • Data access
  • Relational XML Databases, semi-structured files
  • Data integration
  • Multiple data delivery mechanisms, data
    translation
  • Extensible Efficient framework
  • Request documents contain multiple tasks
  • A task execution of an activity
  • Group work to enable efficient operation
  • Extensible set of activities
  • gt 30 predefined, framework for writing your own
  • Moves computation to data
  • Pipelined and streaming evaluation
  • Concurrent task evaluation

32
Monitoring and Discovery System(MDS4)
  • Grid-level monitoring system used most often for
    resource selection
  • Aid user/agent to identify host(s) on which to
    run an application
  • Uses standard interfaces to provide publishing of
    data, discovery, and data access, including
    subscription/notification
  • WS-ResourceProperties, WS-BaseNotification,
    WS-ServiceGroup
  • Functions as an hourglass to provide a common
    interface to lower-level monitoring tools

33
MDS4 Components
  • Information providers
  • Basic data sources queue data, cluster data,
    etc
  • Can be from web services, executables, files
  • Index Service
  • Caching registry of data
  • Trigger Service
  • Warnings when conditions are met
  • WebMDS
  • Visualization of data

34
(No Transcript)
35
Tested Platforms
  • Debian
  • Fedora Core
  • FreeBSD
  • HP/UX
  • IBM AIX
  • Red Hat
  • Sun Solaris
  • SGI Altix (IA64 running Red Hat)
  • SuSE Linux
  • Tru64 Unix
  • Apple MacOS X (no binaries)
  • Windows Java components only
  • List of binaries and known platform-specific
    install bugs at
  • http//www.globus.org/toolkit/docs/4.0/admin/
    docbook/ ch03.html

36
Many Tools Build on, or Can Contribute to,
GT4-Based Grids
  • Condor-G, DAGman
  • MPICH-G2
  • GRMS
  • Nimrod-G
  • Ninf-G
  • Open Grid Computing Env.
  • Commodity Grid Toolkit
  • GriPhyN Virtual Data System
  • Virtual Data Toolkit
  • GridXpert Synergy
  • Platform Globus Toolkit
  • VOMS
  • PERMIS
  • GT4IDE
  • Sun Grid Engine
  • PBS scheduler
  • LSF scheduler
  • GridBus
  • TeraGrid CTSS
  • NEES
  • IBM Grid Toolbox

37
Any questions about Globus?
38
Open MiddlewareInfrastructure Institute
To be a leading provider of reliable
interoperable and open-source Grid middleware
components services and tools to support advanced
Grid enabled solutions in academia and industry.
  • Formed University of Southampton (2004)
  • Focus on an easy to install e-Infrastructure
    solution
  • Utilise existing software standards
  • Expanding with new partners in 2006
  • OGSA-DAI team at Edinburgh
  • myGrid team at Manchester

Slides compliments of Steven Newhouse
39
OMII Functions
  • Provide a software repository of Grid components
    and tools from e-science projects
  • Re-engineering software, harden it, and provide
    support for components sourced from the community
  • Contract the development of missing software
    components necessary in grid middleware (managed
    programme)
  • Provide an integrated grid middleware release of
    the sourced software components

Slides compliments of Steven Newhouse
40
The Managed Programme Distribution and Repository
  • OGSA-DAI (Data Access service)
  • GridSAM (Job Submission Monitoring service)
  • Grimoires (Registry service based on UDDI)
  • GeodiseLab (Matlab Jython environments)
  • FINS (Notification services using WS-Eventing)
  • BPEL (Workflow service)
  • MANGO (Managing workflows with BPEL)
  • FIRMS (Reliable messaging)

Slides compliments of Steven Newhouse
41
So
  • eInfrastructure has many definitions but
    basically its Grid computing
  • JISC has funding for this but havent yet
    defined where it will be spent
  • Globus Toolkit provides many basic tools, and is
    incorporated in many projects, esp those focused
    on data movement
  • In the UK, OMII is another useful source of
    eInfrastructure software

42
Additional Information
  • Contact
  • Jennifer M. Schopf
  • jms_at_mcs.anl.gov
  • http//www.mcs.anl.gov/jms
  • Globus Alliance
  • http//www.globus.org
  • Information about OMII
  • http//www.omii.ac.uk
  • s.newhouse_at_omii.ac.uk
Write a Comment
User Comments (0)
About PowerShow.com