Grid Scheduling through ServiceLevel Agreement - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

Grid Scheduling through ServiceLevel Agreement

Description:

Incremental Negotiation. RSLA: reserve resources for future use ... SLA negotiation subject to policy. One SLA affects another, e.g. RSLA subdivision ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 33
Provided by: karl105
Category:

less

Transcript and Presenter's Notes

Title: Grid Scheduling through ServiceLevel Agreement


1
Grid Scheduling through Service-Level Agreement
  • Karl Czajkowski
  • The Globus Project
  • http//www.globus.org/

2
Overview
  • Introduction to Grid Environments
  • The Resource Management Problem
  • Cross-domain applications
  • Resource owner goals vs. application goals
  • An Open Architecture to Manage Resources
  • Service-Level Agreement (SLA)
  • GRAM and Managed Services
  • Related and Ongoing Work

3
Grid Resource Environment
R
?
R
R
R
?
R
R
R
R
R
network
dispersed users
R
?
?
R
R
R
R
R
R
R
R
R
R
VO-A
VO-B
  • Distributed users and resources
  • Variable resource status
  • Variable grouping and connectivity
  • Decentralized scheduling/policy

4
Social/Policy Conflicts
  • Application Goals
  • Users deadlines and availability goals
  • Applications need coordinated resources
  • Localized Resource Owner Goals
  • Policies towards users
  • Optimization goals
  • Community Goals Emerge As
  • An aggregate user/application?
  • A virtual resource? Both!

5
Data-Intensive Example
  • Concurrent resource requirements
  • Large scale storage, computing, network, graphics
  • Datapath involves autonomous domains

6
Early Co-Allocation in Grids
  • SF-Express (1997-8)
  • Real-time simulation
  • 12 supercomputers, 1400 processors
  • Required advance reservation
  • Brokered by telephone!
  • Globus DUROC software to sync startup
  • Over 45 minutes to recover from failure
  • In use today in MPICH-G2 (MPI library)

7
Traditional Scheduling
  • Closed-System Model
  • Presumption of global owner/authority
  • Sandboxed applications with no interactions
  • Toss job over the fence and wait
  • Utilization as Primary Metric
  • Deep batch queues allow tighter packing
  • No incentives for matching user schedule
  • Sub-cultures Counter Site Policies
  • Users learn tricks for gaming their site

8
An Open Negotiation Model
  • Resources in a Global Context
  • Advertisement and negotiation
  • Normalized remote client interface
  • Resource maintains autonomy
  • Users or Agents Bridge Resources
  • Drive task submission and provisioning
  • Coordinate acts across domains
  • Community-based Mediation
  • Coordination for collective interest

9
Community Scheduling Example
  • Individual users
  • Require service
  • Have application goals
  • Community schedulers
  • Broker service
  • Aggregate scheduling
  • Individual resources
  • Provide service
  • Have policy autonomy
  • Serve above clients

10
Negotiation Phases
  • Discovery
  • What resources are relevant to interest?
  • Finds service providers
  • Monitoring
  • Whats happening to them now?
  • Compare service providers
  • Service-Level Agreement
  • Will they provide what I need?
  • The core Resource Management problem
  • Process can iterate due to adaptation

11
Service-Level Agreement
  • Three kinds of SLA
  • Task submission (do something)
  • Resource reservation (pre-agreement)
  • Lazy task/resource Binding (apply resv.)
  • Simple protocol for negotiating SLAs
  • Basic 2-party negotiation
  • Support for basic offer/accept pattern
  • Optional counter-offer patterns
  • Variable commitment phase for stricter promises
  • Client may maintain multiple 2-party SLAs

12
Many Types of Service
  • Must support service heterogeneity
  • Resources
  • Hardware disks, CPU, memory, networks, display
  • Logical accounts, services
  • Capabilities space, throughput
  • Tasks
  • Data stored file, data read/write
  • Compute execution, suspended/swapped job
  • SLAs bear embedded term languages
  • Isolate domain-specific details

13
Domain Extension File Transfer
  • Single goal
  • Reliable deadline transfer
  • Specialized scheduler
  • Brokers basic services
  • Synthesizes new service
  • Fault-handling logic
  • Distributed resources
  • Storage space
  • Storage bandwidth
  • Network bandwidth

14
Technical Challenges
  • Complex Security Requirements
  • Global Scalability
  • Similar ideals to Internet
  • Interoperable infrastructure
  • Policy-configurable for social needs
  • Permanence or Evolve in Place
  • Cannot take World off-line for service
  • Over time upgrade, extend, adapt
  • Accept heterogeneity

15
GRAM Architecture
SLA implementation
Planner
Domain-specific SLA
Application
Information Service
Monitor
Discover
Concrete SLA
Incremental SLAs
Local resource managers
GRAM2
GRAM2
GRAM2
Job
CPU
Disk
16
WS-Agreement
  • New standardization effort
  • Generalizes GRAM ideas
  • Service-oriented architecture
  • Resource becomes Service Provider
  • Tasks become Negotiated Services
  • SLAs presented as Agreement services
  • Still supports extensible domain terms

17
WS-Agreement Entities
18
WS-Agreement Adds Management
19
Virtualized Providers
20
Agreement-based Jobs
  • Agreement represents queue entry
  • Commitment with job parameters etc.
  • Agreement Provider
  • i.e. Job scheduler/Queuing system
  • Management interface to service provider
  • Service Provider
  • i.e. scheduled resource (compute nodes)
  • Service is the Job computation

21
Advance Reservation for Jobs
  • Schedule-based commitment of service
  • Requires schedule based SLA terms
  • Optional Pre-Agreement (RSLA)
  • Agreement to facilitate future Job Agreement
  • Characterizes virtual resource needed for Job
  • May not need full job terms
  • Job Agreement almost as usual
  • May exploit Pre-Agreement
  • Reference existing promise of resource schedule
  • May get schedule commitment in one shot
  • Directly include schedule terms
  • (Can think of as atomic advance reserve/claim)

22
Need for Complex Description
  • 128 physical nodes
  • Physical topology
  • Interconnect
  • RAM, disk size
  • Subject of RSLA
  • Single MPI job
  • Subject of TSLA
  • May reference RSLAs
  • Quality requirements
  • Real-time parameters
  • CPU, disk performance
  • Subject of BSLA

23
MDS Resource Models (History)
24
Future Models
  • Service behavioral descriptions
  • Unified service term model
  • Capture user/application requirements
  • Capture provider capabilities
  • Core meta-language
  • Facilitates planner/decision designs
  • Extends with domain concepts
  • Extensible negotiability mark-up
  • Capture range of negotiability for variable terms
  • Capture importance of terms (required/optional)
  • Capture cost of options (fees/penalties)

25
SLA Types in Depth
  • Resource SLA (RSLA), i.e. reservation
  • A promise of resource availability
  • Client must utilize promise in subsequent SLAs
  • Task SLA (TSLA), i.e. execution
  • A promise to perform a task
  • Complex task requirements
  • May reference an RSLA (implicit binding)
  • Binding SLA (BSLA), i.e. claim
  • Binds a resource capability to a TSLA
  • May reference an RSLA (otherwise obtain
    implicitly)
  • May be created lazily to provision the task

26
Resource Lifecycle
  • S0 Start with no SLAs
  • S1 Create SLAs
  • TSLA or RSLA
  • S2 Bind task/resource
  • Explicit BSLA
  • Implicit provider schedule
  • S3 Active task
  • Resource consumption
  • Backtrack to S0
  • On task completion
  • On expiration
  • On failure

27
Incremental Negotiation
  • RSLA reserve resources for future use
  • TSLA submit task to scheduler
  • BSLA bind reservation to task
  • Resources change state due to SLAs and scheduler
    decisions

28
Linking SLAs for Complex Case
TSLA1
account tmpuser1
RSLA1
50 GB in /scratch filesystem
BSLA1
30 GB for /scratch/tmpuser1/foo/ files
TSLA2
Complex job
TSLA3
TSLA4
RSLA2
Net
Stage in
Stage out
BSLA2
time
  • Dependent SLAs nest intrinsically
  • BSLA2 defined in terms of RSLA2 and TSLA4
  • Chained SLAs simplify negotiation
  • Optionally link destruction/reclamation

29
Related Work
  • Academic Contemporaries
  • Condor Matchmaking
  • Economy-based Scheduling
  • Work-flow Planning
  • Commercial Scheduler Examples
  • Many examples for traditional sites
  • Several generalized for the enterprise
  • Platform Computing
  • LSF scaled to lots of jobs
  • MultiCluster for site-to-site resource sharing
  • IBM eWLM
  • Goal-based provisioning of transactional flows

30
Condor Matchmaking
  • At heart a scheduling algorithm
  • Heuristics for pairing job with resource
  • Match symmetric Classified Ads
  • Great for bulk/commodity matching
  • Closed system view
  • Subsumes resource through lease
  • Sandboxed job environment
  • Favor vertical integration over generality
  • Tuned high-throughput system

31
Condor on GRAM
  • Condor already uses GRAM two ways
  • GRAM treats Condor as local scheduler
  • Condor uses GRAM to access resource
  • Condor maps to SLA architecture
  • Advertise resource ClassAd
  • Submit job ClassAd (as TSLA)
  • Matchmaker is a Community Scheduler
  • Need SLA scalability to be practical

32
Future Work
  • SLA interaction with policy
  • SLA negotiation subject to policy
  • One SLA affects another, e.g. RSLA subdivision
  • One client more important than another
  • SLA implemented by low-level policies
  • Domain-specific SLA maps to resource SLAs
  • Resource SLAs map to resource control mechanisms
  • Resource characterization
  • Advertisement of resources options, cost
  • Interoperable capability languages

33
Conclusion
  • Generic SLA management
  • Compositional for complex scenarios
  • Extensible for unique requirements
  • Requires work on Grid service modeling
  • To describe jobs, resource requirements, etc.
  • Enhancement to proven architectures
  • Encompasses GRAMGARA
  • Evolution of the Globus Toolkit RM
  • GRAM evolving since 1997
  • WS-Agreement standard in progress
Write a Comment
User Comments (0)
About PowerShow.com