Cluster Resource Management: A Scalable Approach - PowerPoint PPT Presentation

About This Presentation
Title:

Cluster Resource Management: A Scalable Approach

Description:

Implemented a new management scheme: Banking. Comparable ... Future Work. Real system implementation. Real Workloads. Real node level resource management ... – PowerPoint PPT presentation

Number of Views:17
Avg rating:3.0/5.0
Slides: 37
Provided by: sjordan
Category:

less

Transcript and Presenter's Notes

Title: Cluster Resource Management: A Scalable Approach


1
Cluster Resource Management A Scalable Approach
  • Ning Li and Jordan Parker
  • CS 736 Class Project

2
Outline
  • Introduction
  • A Scalable Approach Hierarchy
  • Results
  • Conclusions
  • Questions

3
Why Study Resource Management?
  • Clusters have become increasingly popular for
    large parallel computing.
  • Web Servers
  • Clusters are becoming increasingly large to the
    order of thousands of nodes.
  • Clusters are providing multiple services.
  • Hard to evaluate
  • Bad is easy to determine
  • Good is much harder

4
Resource Management Example
  • 4th Node Services only B
  • Poor Management
  • Ideal

Overall
A 37.5
B 62.5
5
Clustering Goals
  • Scalability
  • Reliability
  • High Performance
  • Affordability

6
Related Work
  • Proportional-Share
  • Cluster Reserves

7
Related Work Approach Differences
  • Our Goal to provide a scalable solution for
    resource management.
  • Other work focused primarily on just having good
    management
  • This often meant 1 manager for all the nodes
  • Clearly this could present a scalable bottleneck
  • Effectiveness Other solutions probably better
    for smaller clusters, we hope to be better for
    large (gt1000 nodes) clusters.

8
Outline
  • Introduction
  • A Scalable Approach Hierarchy
  • Results
  • Conclusions
  • Questions

9
Hierarchy A Scalable Approach
  • Hierarchical Management
  • Nodes service jobs
  • Managers facilitate resource management

10
Banking Algorithm
  • Goal
  • Determine best allocation given previous usage
  • Primitives
  • Tickets
  • Bank accounts
  • Deposit / withdraw tickets
  • 6 Steps

11
Banking Algorithm
  • Step 1 For each service class on each node
  • Deposit unused tickets
  • Step 2 For each service class on each node
  • Reallocate service class
  • Full utilization Allocation usage k
  • Under utilization Allocation usage - k

12
Banking Algorithm Cont.
  • Step 3 For each service class
  • Compare total allocation to desired
  • Subtract from over-allocated
  • Add to needy under-allocated
  • Step 4 For each service class
  • Deposit / Withdraw
  • If still over-allocated withdraw
  • If still under-allocated deposit

13
Banking Algorithm Cont.
  • Step 5
  • Withdraw and allocate
  • Reward the needy nodes
  • Step 6
  • Done, clear the bank accounts

14
Reliability
  • Bottom-up Manager Replacement

5
6
7
8
9
10
11
12
2
3
4
1
2
5
15
Outline
  • Introduction
  • A Scalable Approach Hierarchy
  • Results
  • Conclusions
  • Questions

16
Results
Cluster Nodes Managers 1st/2nd Level Reporting 1st/2nd Level Workloads Workloads Class 2 Constraints Tests Tests
4 2/1 1/1 Steady Dyn 1 1 1
1/5 Steady Dyn
100 10/1 1/1 Steady Dyn 1-30 2 3
1/5 Steady Dyn 4 4
900 30/1 1/1 Steady Dyn 1-300 5 5
1/5 Steady Dyn
17
Implementation Details
  • Simulations via The NS Network Simulator
  • Low bandwidth 10Mbs communication network
  • UDP for lower server overhead
  • Assumptions
  • Node level resource management works ideally

18
Test 1 Overview
  • 4 nodes 3 services 60/30/10 Allocation
  • 4th node receives all of 3rd classs requests
  • Steady Workload

19
Test 1 Data
20
Test 2 Overview
  • 100 nodes 3 services 60/30/10 Allocation
  • nodes 1-30 receive all of 3rd classs requests
  • Steady Workload

21
Test 2 Data
22
Test 3 Overview
  • 100 nodes 3 services 60/30/10 Allocation
  • nodes 1-30 receive all of 3rd classs requests
  • Dynamic Workload

23
Test 3 Data
24
Test 4 Overview
  • 100 nodes 3 services 60/30/10 Allocation
  • nodes 1-30 receive all of 3rd classs requests
  • Steady Workload
  • Reporting 1/5
  • Nodes every 0.3 second
  • Managers every 1.5 seconds

25
Test 4 Data
26
Test 5 Overview
  • 900 nodes 3 services 60/30/10 Allocation
  • nodes 1-300 receive all of 3rd classs requests
  • Steady Workload

27
Test 5 Data
28
Outline
  • Introduction
  • A Scalable Approach Hierarchy
  • Results
  • Conclusions
  • Questions

29
Conclusions
  • Benefits of an hierarchy
  • Scalable
  • Reliable
  • Geographic Applications
  • Implemented a new management scheme Banking
  • Comparable Results
  • Improved Scalability

30
Conclusions
  • Clusters are sensitive to small policy changes
  • Clusters are built for specific workloads
  • Their performance is important and small changes
    have significant impact
  • No scheme is universally applicable
  • Future Work
  • Real system implementation
  • Real Workloads
  • Real node level resource management
  • More steady performance

31
Outline
  • Introduction
  • A Scalable Approach Hierarchy
  • Results
  • Conclusions
  • Questions

32
Questions
33
Related Work Proportional-Share
  • Stride Scheduling
  • Ticket based and similar to lottery
  • Scale
  • Randomly query k nodes to find best allocation
  • Different Application
  • Condor-like resource allocation/applications

34
Related Work Cluster Reserves
  • Resource Container Schedulers
  • Constrained Optimization Algorithm
  • Scale
  • Centralized single manager

35
Hierarchical Cluster Reserves Version 1
  • Modify Cluster Reserves optimization algorithm
  • Use it when manager manages nodes
  • AND when level_n1 manager manages level_n
    managers.

36
Hierarchical Cluster Reserves Version 2
  • Cluster Reserves optimization algorithm
  • Use it when manager manages nodes
  • Dont use it for upper level managers
  • Modify the manager to manager reporting
  • Lie to the algorithm
Write a Comment
User Comments (0)
About PowerShow.com