Title: Supporting Load Balancing for Distributed Data-Intensive Applications
1 Supporting Load Balancing for
DistributedData-Intensive Applications Leonid
Glimcher, Vignesh Ravi, and Gagan
Agrawal Department of ComputerScience and
Engg. The Ohio State University Columbus, Ohio -
43210
2Outline
- Introduction
- Motivation
- FREERIDE-G Processing Structure
- Run-time Load Balancing System
- Experimental Results
- Conclusions
3Introduction
- Growing abundance of data
- Sensors, scientific simulations and business
transactions - Data Analysis
- Translate raw data into knowledge
- Grid/Cloud Computing
- Enables distributed processing
4Motivation
Grid/Cloud Environment
Middleware user
- Resources are geographically distributed
- Data nodes
- Compute nodes
- Middleware user
- Remote data analysis is important
- Heterogeneity of resources
- Difference in network bandwidth
- Difference in compute power
Data Nodes
Compute Nodes
5FREERIDE-G Processing Structure(Framework for
Rapid Implementation of Datamining Engines Grid)
While( ) forall( data instances d)
(I , d) process(d) R(I) R(I)
op d .
- A Map-reduce like system
- Remote data analysis
- Middleware API
- Process
- Reduce
- Global Combine
Reduction Object
6A Real-time Grid/Cloud Scenario
A
B
Data
D
C
Compute
7Run-time Load Balancing
- Two factors of load imbalance
- Computational factor, w1
- Remote data transfer (wait time), w2
- Case 1 w1 gt w2
- Case 2 w2 gt w1
- We use sum of weights to account for both the
- components
8Dynamic Load Balancing Algorithm
Input Bandwidth matrix, W1 W2
Consider every chunk, Ci
Calculate Data transfer cost, Tc
Calculate Compute cost, Cc
Total cost W1Cc W2Tc
If Total cost lt Min
Update Min
Assign Ci to Pj
9Experimental Setup
- Settings
- Organizational Grid
- Wide Area Network (WAN)
- Goals are to evaluate
- Scalability
- Dynamic Load balancing overhead
- Adaptability to scenarios
- compute bound,
- I/O bound,
- WAN setting
- Applications
- K-means
- Vortex Detection
10Scalability and Overhead of Dynamic Balancing
- Vortex detection
- 14.8 GB data
- Organizational setting
- Bandwidth
- 50mb/sec
- 100mb/sec
- 31 benefit
- Overhead within 10
April 12, 2014
10
11Model Adaptability Compute Bound Scenario
Ideal Case
Dynamic case
- Kmeans clustering
- 25.6 GB data
- Bandwidth
- 50 MB
- 200 MB
- Best result
- 75-25 combination
- skewed towards work load component
- Initial (unbalanced) overhead
- 57 over balanced
- Dynamic overhead
- 5 over balanced
Compute
Data transfer
12Model Adaptability I/O Bound Scenario
- Kmeans clustering
- 25.6 GB data
- Bandwidth
- 15 mb/s
- 60 mb/s
- Best result
- 25-75 combination
- skewed towards data transfer component
- Initial (unbalanced) overhead
- 40 over balanced
- Dynamic overhead
- 4 over balanced
13Model Adaptability WAN setting
- Vortex Detection
- 14.6 GB
- Best result
- 25-75 combination results in lowest overhead
(favoring data delivery component) - Unbalanced configuration
- 20 overhead over balanced
- Our approach
- Overhead reduced to 8
14Conclusions
- Dynamic load balancing solution for grid
environments - Both workload and data transfer factors are
important - Scalability is good and overheads are within 10
- Adaptable to compute-bound, I/O bound, and WAN
settings
15- Thank You!
- Questions?
- Contacts
- Leonid Glimcher -glimcher_at_cse.ohio-state.edu
- Vignesh Ravi - raviv_at_cse.ohio-state.edu
- Gagan Agrawal - agrawal_at_cse.ohio-state.edu
-
16Setup 1 Organizational Grid
Repository cluster (bmi-ri)
- Data hosted on Opteron 250s
- Processed on Opteron 254s
- 2 clusters connected through two 10 GB optical
fibers - Both clusters within same city (0.5 mile apart)
- Evaluating
- Scalability
- Adaptability
- Integration overhead
Compute cluster (cse-ri)
17Setup 2 WAN
Repository cluster (Kent ST)
- Data Repository
- Opteron 250s (OSU)
- Opteron 258s (Kent St)
-
- Processed on Opteron 254s
- No dedicated link between processing and
repository clusters - Evaluating
- Scalability
- Adaptability
Repository cluster (OSU)
Compute cluster (OSU)
18FREERIDE-G System Design