GRUBER: A Grid Resource Usage SLA Broker - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

GRUBER: A Grid Resource Usage SLA Broker

Description:

Automated resource discovery and usage SLA enforcement represent important elements ... 'How usage SLAs are handled in grid environments? ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 35
Provided by: catal7
Category:
Tags: gruber | sla | broker | grid | resource | usage

less

Transcript and Presenter's Notes

Title: GRUBER: A Grid Resource Usage SLA Broker


1
GRUBER A Grid Resource Usage SLA Broker
Ian Foster Argonne National Laboratory The
University of Chicago
  • Catalin L. Dumitrescu
  • The University of Chicago

2
Introduction
  • Large distributed Grid systems pose new
    challenges
  • Overwhelming resource characteristics
  • Complex workload characteristics
  • Complex interactions and resource allocations
  • Automated resource discovery and usage SLA
    enforcement represent important elements

3
Talk Outline / Part I
  • Part I
  • Introduction
  • Our Approach GRUBER
  • Motivating Scenarios
  • Architecture
  • Part II
  • Evaluation Metrics
  • Experimental Results
  • Conclusions and Questions

4
Our Approach GRUBER
  • GRUBER an architecture and toolkit for resource
    usage service level agreement (SLA) specification
    and enforcement in a grid environment
  • GT3 and GT4 based implementations
  • Able to handle as many clients (submission hosts)
    as the GTx containers performance permits

5
A bit of History
  • Started in the context of Grid3 as monitoring
    engine
  • Evolved in a simple site recommendation engine
  • Later where added additional capabilities such
    as
  • Enforcement components
  • Complex Usage SLAs and specification interfaces

6
GRUBER Novelty
  • Handles
  • Sites with RMs
  • VO and groups
  • Submission hosts
  • Model usage allocations (SLAs) at several levels
  • Capacity
  • to collect monitoring metrics from a grid
  • to make various decisions based on this
    information
  • To enforce complex SLAs by various means

7
Environment Overview
8
Environment Details
  • Target environments with
  • large number of resources
  • resource owners
  • VOs
  • where usage SLAs are required to handle resource
    utilisations
  • A few examples are
  • Grid3
  • OSG
  • TeraGrid
  • DataGrid

9
Research Problems
  • How usage SLAs are handled in grid
    environments?
  • What is the gain for taking in account such usage
    SLAs?

10
Motivating Scenario
  • Controlled resource sharing is important because
    each participant wants to ensure that its goals
    are achieved
  • Three dimensions in the usage policy space
  • resource providers (sites, VOs, groups)
  • resource consumers (VOs, groups, users),
  • time.
  • Provider policies make resources available to
    consumers for specified time periods.

11
Main Players Elements
  • Owners want convenient and flexible mechanisms
    for expressing the policies that determine how
    many resources are allocated to different
    purposes
  • User and group jobs are the main interested
    parties in resources provided by sites and
    resources
  • Algorithms and policies capture how jobs are
    assigned to host machines

12
Problem Domain
  • A grid consists of
  • a set of resource provider sites each contains a
    number of processors and some amount of disk
    space
  • a three-level hierarchy of users, groups, and
    VOs each user is a member of exactly one group,
    and each group is member to exactly one VO
  • a set of submit hosts and jobs specified by four
    attributes VO, Group, Required-Processor-Time,
    Required-Disk-space

13
Problem Domain cont.
  • A grid consists of (cont.)
  • Usage SLAs
  • site policy statement defines site usage SLAs by
    specifying the number of processors and amount of
    disk space that sites make available to different
    VOs
  • VO policy statement defines VO usage SLAs by
    specifying the fraction of the VOs total
    processor and disk resources (i.e., the aggregate
    of contributions to that VO from all sites) that
    the VO makes available to different groups.

14
GRUBER Architecture
  • Engine implements various algorithms for
    detecting available resources and maintains a
    generic view of resource utilization in the grid
  • Site monitoring component is one of the data
    providers for the GRUBER engine
  • Site selectors are tools that communicate with
    the GRUBER engine and provide answers to the
    question which is the best site at which I can
    run this job?
  • Queue manager is a complex GRUBER client that
    must reside on a submitting host

15
GRUBER Picture
16
GRUBER Engine
  • If fewer waiting jobs at a site than available
    CPUs, then GRUBER assumes the job will start
    right away if an extensible usage policy is in
    place
  • If more waiting jobs than available CPUs or if an
    extensible SLA is not in place, then it
    considers
  • if the VO is under its allocation, GRUBER assumes
    that a new job can be started (in a time that
    depends on the local resource manager type)
  • if the VO is over its allocation, GRUBER assumes
    that a new job cannot be started (the running
    time is unknown for the jobs already running)

17
GRUBER QM/SiteSel
  • QM is responsible for determining how many jobs
    per VO or VO group can be scheduled at a certain
    moment in time and when to release them
  • Job assignment and enforcement components are
    part of GRUBER
  • The site selector component answers Where is
    best to run next?, while the queue manager
    answers How many jobs should group Gm of VOn V
    be allowed to run? and When to start these
    jobs?

18
Disk Space Considerations
  • Introduces additional complexities
  • A file that has been staged to a site cannot be
    delayed, it can only be deleted. Yet deleting a
    file that has been staged for a job can result in
    livelock, if a jobs files are repeatedly deleted
    before the job runs
  • So far, we have considered a UNIX quota-like
    approach

19
Usage SLA Language
  • Based on Mauis semantics and WS-Agreement syntax
  • Allocations are made for processor time,
    permanent storage, or network bandwidth
    resources, and there are at least two-levels of
    resource assignments to a VO, by a resource
    owner, and to a VO user or group, by a VO.
  • e.g., VO0 15.5, VO1 10.0, VO2 5.0-.

20
Screenshot Site Selection
21
Screenshot VO Usage SLA
22
Screenshot VO Verifier
23
Talk Outline / Part II
  • Part I
  • Introduction
  • Our Approach GRUBER
  • Motivating Scenarios
  • Architecture
  • Part II
  • Evaluation Metrics
  • Experimental Results
  • Conclusions and Questions

24
Evaluation Metrics
  • Comp percentage of jobs completed successfully
  • Replan number of re-planning operations
  • Time total execution time for the workload
  • Util average resource utilization
  • Util S i1..N ETi / (cpus ?t) 100.00
  • Delay is average time per job
  • Delay Si1..N DTi / jobs

25
Experimental Settings
  • A single job type in all experiments the
    sequence analysis program BLAST
  • A single BLAST job has
  • execution time of about an hour
  • about 10-33 kilobytes of input reads
  • about 0.7-1.5 megabytes of output
  • Various configurations
  • 1x1K 1000 independent BLAST jobs
  • 4x1K the 1x1K workload is run in parallel from
    four hosts
  • each job can be re-planed at most four times

26
Experimental Environment
  • All experiments on Grid3 (December 2004)
  • Comprises around 30 sites across the U.S., of
    which we used 15
  • Each site is autonomous and managed by different
    local resource managers, such as Condor, PBS, and
    LSF
  • Each site enforces different usage policies which
    are collected by our site SLA observation point
    and used in scheduling workloads

27
Results
Least Used Site Assignment Policy
28
4x1k Completion vs. Time
29
Results Variance
30
SiteSel Comparisons
31
Related Work
  • Fair share scheduling strategies developed for
    mainframes
  • SHARP
  • SPHINX
  • CREMONA

32
Conclusions about GRUBER
  • the experiments we performed with several
    approaches in task assignment policies showed
    initial GRUBER performance in scheduling jobs
  • GRUBER is an architecture and toolkit for
    resource usage SLAs specification and enforcement
    in a grid-like environment
  • Open Problems
  • over-subscribed local resources, in the sense of
    a local policy that states that 40 of the local
    CPU power is available to VO1 and 80 is
    available to VO2
  • hierarchic grouping and allocation of resources
    based on policy

33
Addressed Questions
  • How usage SLAs are handled in grid
    environments?
  • What is the gain for taking in account such usage
    SLAs?

34
Thanks
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com