Resource Specification Prediction Model - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Resource Specification Prediction Model

Description:

Increasing availability of open source cluster management tools (e.g. ROCKS) ... For better tractability, at first consider only parallelism ( ) and regularity ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 24
Provided by: kenk172
Category:

less

Transcript and Presenter's Notes

Title: Resource Specification Prediction Model


1
Resource Specification Prediction Model
  • Richard Huangryhuang_at_cs.ucsd.edujoint work
    with Henri Casanova and Andrew Chien

2
Introduction
  • Advances in networking technology
  • 10-40Gbps aggregate bandwidth
  • Optical fibres
  • More resources mean bigger problems can be solved
  • Increasing deployment of clusters
  • Decreasing hardware prices
  • More choices in cluster vendors
  • Increasing availability of open source cluster
    management tools (e.g. ROCKS)
  • Large-scale distributed environments (LSDEs) can
    be used to run large-scale loosely synchronous
    apps such as scientific workflows

3
Running Applications on LSDEs
  • One key challenge in running scientific workflows
    is resource selection

4
Whats the problem?
  • Different resource selection systems are out
    there (such as VGES)
  • How does one go about writing the resource
    specification?
  • We dont know any other work that address this
    problem.

5
Solution depends on
  • Application (DAG) characteristics
  • Type of scheduling algorithm employed
  • Types of resources available

6
Assumptions
  • Resources are plentiful
  • Therefore we can pick the right size RC
  • Resources are dedicated or have advanced
    reservation OR
  • Underlying middleware (such as VGES) can take
    care of interfacing with batch queue systems.
  • Bandwidth is reasonably plentiful
  • We dont deal with network contention
  • Performance models for applications so we know
    task runtimes

7
Resource Specification Prediction Model
  • Empirical Model uses input of DAG
    characteristics and optional utility function
  • Heuristic Prediction Model predicts the best
    scheduling heuristic to use
  • Size Prediction Model predicts the optimal
    resource collection (RC) size

8
Strategy in Formulating Prediction Model
  • Determine relevant DAG characteristics
  • Define what the best RC should be
  • Execute reference scheduling heuristic on an
    observation set of DAG configurations while
    varying relevant DAG characteristics
  • Derive model from the observation set results
    that predicts the best RC size

9
Relevant DAG Characteristics
  • DAG size
  • Communication-to-computation ratio (CCR)
  • Amount of parallelism
  • Regularity among tasks from different DAG levels
  • Other possible characteristics
  • DAG height and average number of tasks per level
    (subsumed by above characteristics)

10
Define best RC size
  • Take an application and run it on different
    number of hosts
  • Best RC size is where increasing the number of
    hosts does not improve performance (knee value)

11
Observation Set
  • Instantiate 10 random DAGs for each DAG
    configuration
  • Vary number of tasks per level randomly while
    maintaining parallelism, regularity, and size
  • Idea is to run scheduling heuristics on each DAG
    configuration and try to see if we can detect
    some trends

12
Size Prediction Model Formulation
  • For better tractability, at first consider only
    parallelism (?) and regularity (?) for each
    (size, CCR) pair
  • Knee size seems to approximately double for every
    0.1 increase in ?
  • Knee size decreases with increase in regularity
  • Based on tables similar to one on right,
    hypothesize size prediction could be modeled as
    2(a?b?c)

Sample observation set knee values
13
Size Prediction Model Formulation
  • We need to solve for a,b,and c for each (size
    CCR) pair
  • Use linear regression to do planar fit of
    logarithm of knee value
  • Interpolation between different DAG sizes and CCR
    values

14
Model Validation
  • Two workloads
  • randomly generated DAGs (range of different DAG
    characteristics)
  • Montage DAGs
  • Performance Metric Application Turn-Around time
    (scheduling time makespan)
  • Cost Derived cost from Amazons Elastic Cloud of
    0.10 per hour for a 1.7 GHz processor
  • Use brute force method to calculate optimal size

15
Randomly generated DAGs
  • Peak performance degradation was 15, but on
    average, most were below 1
  • Prediction model predicted smaller RC sizes
    (reduces costs)

16
Comparison with using DAG width
  • Using DAG width to try to maximize parallelism
    costs a lot more!
  • Similar performance degradation as our model for
    smaller DAGs
  • As DAG size increases, performce degrades rapidly
    as scheduling times becomes larger because of
    larger RC sizes

17
Performance Cost Tradeoff
  • User can specify optional utility function
  • For example 1 performance degradation for every
    10 in cost
  • Knee threshold of 2 provides best utility in
    this example

18
Montage
  • Different thresholds did not degrade performance
    too much
  • Better savings at higher thresholds (using fewer
    hosts)

19
Sensitivity Analysis
  • Previous results all based on homogeneous clock
    rates and reference scheduling heuristic
  • Should look at how model reacts to
  • Different levels of clock rate heterogeneity
  • Different scheduling heuristics

20
Impact of Clock Rate Heterogeneity
21
Impact of Scheduling Heuristics
22
Summary
  • We devised empirical model to predict good RC
    size
  • We have shown that our model leads to good
    application performance at often reduced costs
    from optimal RC size
  • Our model maintains good performance over range
    of DAG configurations, range of resource
    heterogeneity, and over different scheduling
    heurisitics

23
Future Work
  • Heuristic Prediction Model to predict which
    heuristic to use given input DAG and optional
    utility function
  • How do we degrade gracefully when the resource
    selection system cannot return the desired
    resource collection
  • Translate output of our model into input to
    different resource selection systems
Write a Comment
User Comments (0)
About PowerShow.com