Title: Cluster Resource Management: A Scalable Approach
1Cluster Resource Management A Scalable Approach
- Ning Li and Jordan Parker
- CS 736 Class Project
2Outline
- Introduction
- A Scalable Approach Hierarchy
- Results
- Conclusions
- Questions
3Why Study Resource Management?
- Clusters have become increasingly popular for
large parallel computing. - Web Servers
- Clusters are becoming increasingly large to the
order of thousands of nodes. - Clusters are providing multiple services.
- Hard to evaluate
- Bad is easy to determine
- Good is much harder
4Resource Management Example
- 4th Node Services only B
- Poor Management
- Ideal
Overall
A 37.5
B 62.5
5Clustering Goals
- Scalability
- Reliability
- High Performance
- Affordability
6Related Work
- Proportional-Share
- Cluster Reserves
7Related Work Approach Differences
- Our Goal to provide a scalable solution for
resource management. - Other work focused primarily on just having good
management - This often meant 1 manager for all the nodes
- Clearly this could present a scalable bottleneck
- Effectiveness Other solutions probably better
for smaller clusters, we hope to be better for
large (gt1000 nodes) clusters.
8Outline
- Introduction
- A Scalable Approach Hierarchy
- Results
- Conclusions
- Questions
9Hierarchy A Scalable Approach
- Hierarchical Management
- Nodes service jobs
- Managers facilitate resource management
10Banking Algorithm
- Goal
- Determine best allocation given previous usage
- Primitives
- Tickets
- Bank accounts
- Deposit / withdraw tickets
- 6 Steps
11Banking Algorithm
- Step 1 For each service class on each node
- Deposit unused tickets
- Step 2 For each service class on each node
- Reallocate service class
- Full utilization Allocation usage k
- Under utilization Allocation usage - k
12Banking Algorithm Cont.
- Step 3 For each service class
- Compare total allocation to desired
- Subtract from over-allocated
- Add to needy under-allocated
- Step 4 For each service class
- Deposit / Withdraw
- If still over-allocated withdraw
- If still under-allocated deposit
13Banking Algorithm Cont.
- Step 5
- Withdraw and allocate
- Reward the needy nodes
- Step 6
- Done, clear the bank accounts
14Reliability
- Bottom-up Manager Replacement
5
6
7
8
9
10
11
12
2
3
4
1
2
5
15Outline
- Introduction
- A Scalable Approach Hierarchy
- Results
- Conclusions
- Questions
16Results
Cluster Nodes Managers 1st/2nd Level Reporting 1st/2nd Level Workloads Workloads Class 2 Constraints Tests Tests
4 2/1 1/1 Steady Dyn 1 1 1
1/5 Steady Dyn
100 10/1 1/1 Steady Dyn 1-30 2 3
1/5 Steady Dyn 4 4
900 30/1 1/1 Steady Dyn 1-300 5 5
1/5 Steady Dyn
17Implementation Details
- Simulations via The NS Network Simulator
- Low bandwidth 10Mbs communication network
- UDP for lower server overhead
- Assumptions
- Node level resource management works ideally
18Test 1 Overview
- 4 nodes 3 services 60/30/10 Allocation
- 4th node receives all of 3rd classs requests
- Steady Workload
19Test 1 Data
20Test 2 Overview
- 100 nodes 3 services 60/30/10 Allocation
- nodes 1-30 receive all of 3rd classs requests
- Steady Workload
21Test 2 Data
22Test 3 Overview
- 100 nodes 3 services 60/30/10 Allocation
- nodes 1-30 receive all of 3rd classs requests
- Dynamic Workload
23Test 3 Data
24Test 4 Overview
- 100 nodes 3 services 60/30/10 Allocation
- nodes 1-30 receive all of 3rd classs requests
- Steady Workload
- Reporting 1/5
- Nodes every 0.3 second
- Managers every 1.5 seconds
25Test 4 Data
26Test 5 Overview
- 900 nodes 3 services 60/30/10 Allocation
- nodes 1-300 receive all of 3rd classs requests
- Steady Workload
27Test 5 Data
28Outline
- Introduction
- A Scalable Approach Hierarchy
- Results
- Conclusions
- Questions
29Conclusions
- Benefits of an hierarchy
- Scalable
- Reliable
- Geographic Applications
- Implemented a new management scheme Banking
- Comparable Results
- Improved Scalability
30Conclusions
- Clusters are sensitive to small policy changes
- Clusters are built for specific workloads
- Their performance is important and small changes
have significant impact - No scheme is universally applicable
- Future Work
- Real system implementation
- Real Workloads
- Real node level resource management
- More steady performance
31Outline
- Introduction
- A Scalable Approach Hierarchy
- Results
- Conclusions
- Questions
32Questions
33Related Work Proportional-Share
- Stride Scheduling
- Ticket based and similar to lottery
- Scale
- Randomly query k nodes to find best allocation
- Different Application
- Condor-like resource allocation/applications
34Related Work Cluster Reserves
- Resource Container Schedulers
- Constrained Optimization Algorithm
- Scale
- Centralized single manager
35Hierarchical Cluster Reserves Version 1
- Modify Cluster Reserves optimization algorithm
- Use it when manager manages nodes
- AND when level_n1 manager manages level_n
managers.
36Hierarchical Cluster Reserves Version 2
- Cluster Reserves optimization algorithm
- Use it when manager manages nodes
- Dont use it for upper level managers
- Modify the manager to manager reporting
- Lie to the algorithm