NeSC Workshop on Resource Management and Scheduling for the Grid, 13th February 2003 - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

NeSC Workshop on Resource Management and Scheduling for the Grid, 13th February 2003

Description:

EPCC Sun Data and Compute Grids Project. Using Sun Grid Engine and Globus to Schedule Jobs ... Steven Newhouse (London e-Science Centre) Neil Chue Hong (EPCC) ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 30
Provided by: terry144
Category:

less

Transcript and Presenter's Notes

Title: NeSC Workshop on Resource Management and Scheduling for the Grid, 13th February 2003


1
EPCC Sun Data and Compute Grids Project Using Sun
Grid Engine and Globus to Schedule Jobs Across a
Combination of Local and Remote Machines Terry
Sloan Edinburgh Parallel Computing Centre
(EPCC) Telephone 44 131 650 5155 Email
t.sloan_at_epcc.ed.ac.uk
2
Overview
  • The Project
  • Why do it ?
  • Project Scenario
  • Project Goal
  • How ?
  • Project Achievements
  • The Compute Scheduler
  • The Compute Data Scheduler

3
The Project
4
The Project
  • Develop a Globus enabled compute and data
    scheduler
  • Based on Grid Engine, Globus and variety of data
    technologies

5
The Project (cont)
  • Partners
  • Sun Microsystems
  • National e-Science Centre represented by EPCC
  • Timescales
  • 23 months
  • Start Feb 2002
  • End Dec 2003
  • Feb 2003 Project Month 13 (PM13)

6
Why do it ?
7
Why do it?
  • Grid Engine over 20000 downloads (Nov 2002)
  • Distributed Resource Management tool
  • Schedules activities across networked resources
  • Sun classifies 3 levels of Grid
  • Cluster Grid a single team or project and their
    associated resources
  • Enterprise Grid multiple teams and projects but
    within a single organisation, facilitating
    collaboration across the enterprise
  • Global Grid linked Cluster and Enterprise
    grids, providing collaboration amongst
    organisations
  • Grid Engine meets first two levels but by itself
    does not meet the third

8
Why do it? (cont)
  • Globus Toolkit
  • A Grid API for connecting distributed compute and
    instrument resources
  • Integration with Globus allows Grid Engine to
    meet level 3
  • Collaboration amongst enterprises
  • Most integration efforts use Globus to submit
    work to Grid Engine
  • This project tackles opposite problem - to
    engineer Grid Engine on top of Globus

9
Why do it? (cont)
  • Grid Engine concerned with compute resources
  • Extend it to work with popular data and service
    access protocols (eg. OGSA-DAI)

10
Project Scenario
11
Project Scenario
  • Two collaborating enterprises A and B both have
    some machines
  • Both enterprises run Grid Engine to schedule jobs
  • Local demand for machines is variable
  • Sometimes it exceeds supply
  • Other times machines lie idle

A
B
Users (A)
Users (B)
12
Project Scenario(cont)
A
B
Users (A)
Users (B)
13
The Project Goal
14
Project Goal
  • Final goal
  • Develop a scheduler based on Grid Engine to
    schedule jobs across a combination of local and
    remote machines
  • Enable jobs to access necessary data sources
  • Use Globus as the Grid API to provide secure
    communications and transfer
  • Development Criteria
  • Industrial strength
  • Application of software engineering techniques
  • Use of industry standard design and analysis
    tools
  • Migration to OGSA-compliant Globus 3

15
How ?
16
Workpackages
  • WP 1 Analysis of existing Grid components
  • WP 1.1 UML analysis of core Globus 2.0
  • WP 1.2 UML analysis of Grid Engine
  • WP 1.3 UML analysis of other Globus 2.0
  • WP 1.4 UML analysis of Globus 3.0
  • WP 1.5 Exploration of data technologies
  • WP 2 Requirements Capture Analysis
  • WP 3 Prototype Compute Scheduler
  • WP 4 Compute/Data Scheduler Design
  • WP 5 Compute/Data Scheduler Development

17
The Project Team
  • Project Personnel
  • Terry Sloan Project leader
  • Geoff Cawood Project architect
  • Ratna Abrol Engineering
  • Thomas Seed Engineering
  • Ali Anjomshoaa Globus 2 Analysis
  • Paul Graham Requirements Capture and Analysis
  • Amy Krause Technical reviewer
  • Project Review Board
  • Fritz Ferstl (Sun Microsystems Gmbh)
  • John Barr (Sun Microsystems Ltd)
  • Steven Newhouse (London e-Science Centre)
  • Neil Chue Hong (EPCC)

18
Achievements
19
Achievements
  • Publications
  • D1.1 Analysis of Globus Toolkit V2.0
  • D1.2 Grid Engine UML Analysis
  • D2.1 Use cases and requirements
  • D2.2 Questionnaire Report
  • D3.1 Prototype Development Requirements
  • Software
  • Transfer-queue Over Globus (TOG)

20
Transfer-queue Over Globus (TOG) - A Compute
Scheduler
21
Transfer-queue Over Globus (TOG)
B
A
Grid Engine
User B
Grid Engine
User A
e
f
g
h
a
b
c
d
Globus 2
d
e
  • Integrates Grid Engine and Globus 2 to access
    remote resources
  • GE execution methods provide job submission and
    control
  • GE job context stores job specific information eg
    job handle
  • Globus GSI for security
  • Globus GRAM enables interaction with remote
    resource
  • GASS for small data transfer, GridFTP for large
    datasets

22
TOG (cont)
  • Current Status
  • Secure job submission functionality implemented
    and tested
  • Staging of input data and executables and
    transfer of output
  • Secure job control functionality implemented and
    tested
  • Suspend, Resume, Terminate
  • Basic scheduling functionality implemented and
    tested
  • Schedules jobs to remote resources when local
    resources are full
  • Testing
  • Integrated successfully within Grid Engine test
    suite
  • Tested through firewalls
  • TOG software available upon request
  • Contact sungrid_at_epcc.ed.ac.uk
  • Generally available via web site soon
  • www.epcc.ed.ac.uk/sungrid

23
TOG (cont)
  • Pros
  • Simple approach
  • Usability existing Grid Engine interface, users
    only need to be aware of Globus certificates
  • Remote administrators still have full control of
    their resources

24
TOG (cont)
  • Cons
  • Low quality scheduling decisions (?)
  • May be a time-lag in getting query results back
    from remote resource
  • Incorporating data transfer costs into scheduling
  • Mirror queues for remote resources
  • Possible set-up overhead
  • Globus 2 vs. Globus 3
  • Grid Engine specific solution

25
The Compute Data Scheduler
26
Current status
  • Considering two possible routes
  • Extend TOG
  • Migrate to Globus 3
  • Incorporate OGSA-DAI
  • Hierarchical Scheduler
  • Overcome limitations
  • Global Grid vision

27
1. Extend compute scheduler
  • Compute Grid
  • Data Grid

GE
GE
GE
Globus
Globus
Globus
Globus
(Hides ODBC, JDBC, XMLDB etc.)
28
2. Hierarchical Scheduler
  • Unified Interface
  • Grid Scalability

Scotland
Same Interface
Edinburgh
Grid Engine
EPCC
Grid Engine
Grid Engine
29
Conclusions
  • Before proceeding
  • Examine Globus 3 Analysis
  • Examine Data Technologies ie OGSA-DAI, etc
  • Informed decision on whether to
  • Extend Compute Scheduler, or
  • Build Hierarchical Scheduler or some sub-set of
    this.
  • Delivery in December 2003
Write a Comment
User Comments (0)
About PowerShow.com