NeSC Review - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

NeSC Review

Description:

Sun Microsystems. National e-Science Centre represented by EPCC. Timescales ... Fritz Ferstl, Sun Microsystems. TOG (Transfer-queue Over Globus) Grid Engine. a. b. c ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 24
Provided by: terry144
Category:

less

Transcript and Presenter's Notes

Title: NeSC Review


1
EPCC Sun Data and Compute Grids
Geoff Cawood, Terry Sloan Edinburgh Parallel
Computing Centre (EPCC) Telephone 44 131
650 5155 Email t.sloan_at_epcc.ed.ac.uk
  • NeSC Review
  • 18 March 2004

2
Overview
  • Description and Aims
  • Project Status
  • Technical Achievements
  • Dissemination/Exploitation
  • Future Plans

3
Description and Aims
4
Project Goal
  • Develop a fully Globus-enabled compute and data
    scheduler based around Grid Engine, Globus and a
    wide variety of data technologies
  • Partners
  • Sun Microsystems
  • National e-Science Centre represented by EPCC
  • Timescales
  • 23 (2) months duration
  • Due to project staff involvement in ODDGenes
  • Start Feb 2002, end Feb 2004
  • Grid Engine
  • open source distributed resource management (DRM)
    system
  • Globus integration enables sharing of resources
    amongst collaborating enterprises

5
Project Scenario
  • If enterprises A and B could expose some of their
    machines to each other across the internet
  • Both A and B could enjoy throughput efficiency
    improvements
  • Large gains when one enterprise is busy and the
    other is idle

6
Functional Aims
  • What does the project goal mean in practice?
  • Identify five key functional aims
  • 1. Job scheduling across Globus to remote Grid
    Engines
  • 2. File transfer between local client site and
    remote jobs
  • 3. File transfer between any site and remote jobs
  • 4. Allow 'datagrid aware' jobs to work remotely
  • 5. Data-aware job scheduling
  • Derived from questioning existing Grid Engine
    users during Requirements WP

7
Project Status
8
Workpackages
  • WP 1 Analysis of existing Grid components
  • WP 1.1 UML analysis of core Globus 2.0
  • WP 1.2 UML analysis of Grid Engine
  • WP 1.3 UML analysis of other Globus 2.0
  • WP 1.4 Globus toolkit V3.0 Investigations
  • WP 1.5 Data Technologies Investigations
  • WP 2 Requirements Capture Analysis
  • WP 3 Prototype Development
  • WP 4 Hierarchical Scheduler Design
  • WP 5 Hierarchical Scheduler Development

9
Deliverables
  • All WPs finished
  • Deliverables available from project public web
    site
  • http//www.epcc.ed.ac.uk/sungrid
  • Or from the Grid Engine community web site (for
    software)
  • http//gridengine.sunsource.net/
  • WP 3 Prototype Development (FINISHED)
  • D3.1 Prototype Development Requirements
  • D3.2 Prototype Development Design
  • D3.3 Prototype Development Test plan
  • D3.4 Prototype Development TOG software
  • D3.6 Prototype Development How-To
  • WP 4 Hierarchical Scheduler Design (FINISHED)
  • D4.1 JOSH Functional Specification
  • D4.2 JOSH Systems Design
  • WP5 Hierarchical Scheduler Development
    (FINISHED)
  • JOSH User Guide
  • JOSH Software
  • JOSH Client Install Guide
  • JOSH Server Install Guide
  • JOSH Known Problems Solutions
  • WP 1 Analysis of existing Grid components
    (FINISHED)
  • D1.1 Analysis of Globus Toolkit V2.0
  • D1.2 Grid Engine UML Analysis
  • D1.3 Globus Toolkit 2.0 GRAM Client API Functions
  • D1.4 Globus 3.0 Features and Use
  • D1.5.2 Datagrids In Practice
  • D1.5.3 GridFTP
  • D1.5.4 OGSA-DAI
  • D1.5.5 Storage Resource Broker (SRB)
  • WP 2 Requirements Capture Analysis (FINISHED)
  • D2.1 Use cases and requirements
  • D2.2 Questionnaire Report

10
Technical Achievements
"From Sun's perspective, the SunDCG project has
been tremendously successful.  Together, EPCC and
Sun have produced very high quality software and
documents, providing real added value to Sun's
Grid Engine suite and addressing some of the key
issues in robust and usable Grid
middleware." Fritz Ferstl, Sun Microsystems
11
TOG (Transfer-queue Over Globus)
Site B
Site A
Grid Engine
User B
Grid Engine
User A
e
f
g
h
a
b
c
d
Globus 2.2.x
d
e
Transfer queue
  • WP 3 deliverable prototype compute scheduler
  • Integrates GE and Globus 2.2.x/2.4 (Software
    library)
  • Supply GE execution methods (starter method etc.)
    to implement a 'transfer queue' which sends jobs
    over Globus to a remote GE
  • GE complexes used for configuration
  • Globus GSI for security, GRAM for interaction
    with remote GE
  • GASS for small data transfer, GridFTP for large
    datasets
  • Written in Java - Globus functionality accessed
    through Java COG kit

12
TOG Software
  • Functionality
  • 1. Job scheduling across Globus to remote Grid
    Engines
  • 2. File transfer between local client site and
    remote jobs
  • Add special comments to job script to specify set
    of files to transfer between local site and
    remote site
  • 4. Allow 'datagrid aware' jobs to work remotely
  • Use of Globus GRAM ensures proxy certificate is
    present in remote environment
  • Absent
  • 3. File transfer between any site and remote jobs
  • Files are transferred between remote site and
    local site only
  • 5. Data-aware job scheduling

13
TOG Software
  • Pros
  • Simple approach
  • Usability
  • Existing Grid Engine interface
  • Only addition is Globus certificate for
    authentication/authorisation
  • Remote administrators still have full control
    over their resources
  • Cons
  • Low quality scheduling decisions
  • State of remote resource is it fully loaded?
  • Ignores data transfer costs
  • Scales poorly - one local transfer queue for each
    remote queue
  • Manual set-up
  • Configuring the transfer queue with same
    properties as remote queue
  • Java virtual machine invocation per job submission

14
JOSH (JOb Scheduling Hierarchically)
  • Developing JOSH software
  • Address the shortcomings of TOG
  • Incorporate Globus 3 and grid services
  • WP 5 deliverable - compute/data scheduler
  • Adds a new 'hierarchical' scheduler above Grid
    Engine
  • hiersched submit_ge
  • Takes GE job script as input (embellished with
    data requirements)
  • Queries grid services at each compute site to
    find best match and submits job

15
JOSH
  • Pros
  • Satisfies the 5 functionality goals
  • Fulfills the project goal
  • Remote administrators still have full control
    over their GEs
  • Makes use of existing GE functionality eg. 'can
    run'
  • Cons
  • Latency in decision making
  • Not so much 'scheduling' as 'choosing'
  • Grid Engine specific solution

16
Dissemination/Exploitation
17
Presentations
  • Ernst Young, WestInfo Services, Strategy
    Performance Associates, SingTel Optus, Executive
    Briefing Centre, Curtin Business School, Curtin
    University of Technology, Perth Australia,
    February 24th, 26th, 2004.
  • Curtin Business School Information Systems
    Seminar, Curtin University of Technology, Perth,
    Australia, February 20th 2004
  • GlobusWORLD 2004, San Francisco, USA, January
    22nd, 2004
  • White Rose Grid, EPCC Sun Data Compute Grids,
    UCL Workshop, York University, November 11th,
    2003
  • Sun HPC Consortium, Phoenix, USA, November 2003
  • Open Issues in Grid Scheduling, National
    e-Science Centre, Edinburgh, UK, October 21st
    2003
  • 2nd Grid Engine Workshop, Regensburg, Germany,
    September 22-24 2003
  • SunLabs Europe, Edinburgh, September 1st, 2003
  • Sun HPC Consortium, Grid and Portal Computing
    SIG, Heidelberg, Germany, June 21st 2003
  • Resource Management and Scheduling for the Grid,
    National e-Science Centre, Edinburgh, UK,
    February 13th 2003
  • Sun HPC Consortium, Grid and Portal Computing
    SIG, Baltimore, USA, November 15th 2002
  • EPCC Sun Data and Compute Grids / White Rose
    Computational Grid Meeting, EPCC, Edinburgh, UK,
    November 7th 2002
  • Sun HPC Consortium, Grid and Portal Computing
    SIG, Glasgow, UK, July 18th 02
  • Grid Engine Workshop, Regensburg, Germany, April
    22-24 2002

18
Software Take-up
  • Transfer-queue Over Globus (TOG) takeup includes
  • ODD-Genes
  • Uses SunDCG TOG and OGSA-DAI to demonstrate a
    scientific use for the grid (bioinformatics),
    presented at
  • UK All Hands Meeting 2003 in Sept 2003
  • Supercomputing 2003 in Nov 2003 on Sun, UK
    e-Science and Globus Alliance booths
  • Poster/Demo at Globusworld 2
  • Numerous visitors to Edinburgh University
  • INWA
  • Uses Sun DCG TOG, OGSA-DAI and FirstDIG browser
    to demonstrate data mining of commercial bank and
    telco data over the grid with Curtin Business
    School, Perth Australia
  • Liverpool Universitys ULGrid
  • Using Sun DCG TOG to enable users to access
    resources from various departments
  • Raytheon Inc (USA)
  • Use SunDCG TOG in grid evaluations
  • Sun Singapore

19
Software Take-up
  • Job Scheduling Hierarchically (JOSH) known
    interest includes
  • White Rose Grid
  • Raytheon Inc.
  • Academic Technology Services at UCLA
  • School of Pharmaceutical Sciences at the
    University of Nottingham
  • Texas Advanced Computing Center
  • Forecast Systems Laboratory of NOAA

20
Downloads
  • 10,300 document downloads between Feb 27th 2003
    and Feb 26th 2004
  • No specific figures on TOG/JOSH software
    downloads
  • Hosted at Grid Engine community web site
  • Figures are not available
  • BUT from EPCC web site
  • gt 400 TOG Requirements document
  • gt 400 JOSH Functional Specification
  • gt 300 JOSH Systems design
  • JOSH documents only available since Feb 3rd
    2004
  • Community Scheduler Framework
  • Does not have data aware scheduling
  • Platform have asked if they could get the JOSH
    algorithms included
  • So LOTS of interest in JOSH

21
Future Plans
22
Future Plans
  • Effort budget ran out in February 2004
  • Sun will integrate TOG/JOSH into Grid Engine
    source from March 2004
  • Open Source development via Grid Engine community
    web site
  • If funds made available
  • WS-RF update
  • Access to other DRMS eg Loadleveller, LSF
  • WS-Agreement compliance, JSDL
  • Further functionality
  • All are straightforward due to good design in JOSH

"I just recommended TOG and JOSH as a starting
point for a partner who wants to build Grid
middleware for nuclear plants." Fritz Ferstl, Sun
Microsystems
23
Demo
Write a Comment
User Comments (0)
About PowerShow.com