Supporting Load Balancing for Distributed Data-Intensive Applications - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Supporting Load Balancing for Distributed Data-Intensive Applications

Description:

Supporting Load Balancing for Distributed Data-Intensive Applications Leonid Glimcher, Vignesh Ravi, and Gagan Agrawal Department of ComputerScience and Engg. – PowerPoint PPT presentation

Number of Views:82
Avg rating:3.0/5.0
Slides: 19
Provided by: halcy7
Category:

less

Transcript and Presenter's Notes

Title: Supporting Load Balancing for Distributed Data-Intensive Applications


1
Supporting Load Balancing for
DistributedData-Intensive Applications Leonid
Glimcher, Vignesh Ravi, and Gagan
Agrawal Department of ComputerScience and
Engg. The Ohio State University Columbus, Ohio -
43210
2
Outline
  • Introduction
  • Motivation
  • FREERIDE-G Processing Structure
  • Run-time Load Balancing System
  • Experimental Results
  • Conclusions

3
Introduction
  • Growing abundance of data
  • Sensors, scientific simulations and business
    transactions
  • Data Analysis
  • Translate raw data into knowledge
  • Grid/Cloud Computing
  • Enables distributed processing

4
Motivation
Grid/Cloud Environment
Middleware user
  • Resources are geographically distributed
  • Data nodes
  • Compute nodes
  • Middleware user
  • Remote data analysis is important
  • Heterogeneity of resources
  • Difference in network bandwidth
  • Difference in compute power

Data Nodes
Compute Nodes
5
FREERIDE-G Processing Structure(Framework for
Rapid Implementation of Datamining Engines Grid)
While( ) forall( data instances d)
(I , d) process(d) R(I) R(I)
op d .
  • A Map-reduce like system
  • Remote data analysis
  • Middleware API
  • Process
  • Reduce
  • Global Combine

Reduction Object
6
A Real-time Grid/Cloud Scenario
A
B
Data
D
C
Compute
7
Run-time Load Balancing
  • Two factors of load imbalance
  • Computational factor, w1
  • Remote data transfer (wait time), w2
  • Case 1 w1 gt w2
  • Case 2 w2 gt w1
  • We use sum of weights to account for both the
  • components

8
Dynamic Load Balancing Algorithm
Input Bandwidth matrix, W1 W2
Consider every chunk, Ci
Calculate Data transfer cost, Tc
Calculate Compute cost, Cc
Total cost W1Cc W2Tc
If Total cost lt Min
Update Min
Assign Ci to Pj
9
Experimental Setup
  • Settings
  • Organizational Grid
  • Wide Area Network (WAN)
  • Goals are to evaluate
  • Scalability
  • Dynamic Load balancing overhead
  • Adaptability to scenarios
  • compute bound,
  • I/O bound,
  • WAN setting
  • Applications
  • K-means
  • Vortex Detection

10
Scalability and Overhead of Dynamic Balancing
  • Vortex detection
  • 14.8 GB data
  • Organizational setting
  • Bandwidth
  • 50mb/sec
  • 100mb/sec
  • 31 benefit
  • Overhead within 10

April 12, 2014
10
11
Model Adaptability Compute Bound Scenario
Ideal Case
Dynamic case
  • Kmeans clustering
  • 25.6 GB data
  • Bandwidth
  • 50 MB
  • 200 MB
  • Best result
  • 75-25 combination
  • skewed towards work load component
  • Initial (unbalanced) overhead
  • 57 over balanced
  • Dynamic overhead
  • 5 over balanced

Compute
Data transfer
12
Model Adaptability I/O Bound Scenario
  • Kmeans clustering
  • 25.6 GB data
  • Bandwidth
  • 15 mb/s
  • 60 mb/s
  • Best result
  • 25-75 combination
  • skewed towards data transfer component
  • Initial (unbalanced) overhead
  • 40 over balanced
  • Dynamic overhead
  • 4 over balanced

13
Model Adaptability WAN setting
  • Vortex Detection
  • 14.6 GB
  • Best result
  • 25-75 combination results in lowest overhead
    (favoring data delivery component)
  • Unbalanced configuration
  • 20 overhead over balanced
  • Our approach
  • Overhead reduced to 8

14
Conclusions
  • Dynamic load balancing solution for grid
    environments
  • Both workload and data transfer factors are
    important
  • Scalability is good and overheads are within 10
  • Adaptable to compute-bound, I/O bound, and WAN
    settings

15
  • Thank You!
  • Questions?
  • Contacts
  • Leonid Glimcher -glimcher_at_cse.ohio-state.edu
  • Vignesh Ravi - raviv_at_cse.ohio-state.edu
  • Gagan Agrawal - agrawal_at_cse.ohio-state.edu

16
Setup 1 Organizational Grid
Repository cluster (bmi-ri)
  • Data hosted on Opteron 250s
  • Processed on Opteron 254s
  • 2 clusters connected through two 10 GB optical
    fibers
  • Both clusters within same city (0.5 mile apart)
  • Evaluating
  • Scalability
  • Adaptability
  • Integration overhead

Compute cluster (cse-ri)
17
Setup 2 WAN
Repository cluster (Kent ST)
  • Data Repository
  • Opteron 250s (OSU)
  • Opteron 258s (Kent St)
  • Processed on Opteron 254s
  • No dedicated link between processing and
    repository clusters
  • Evaluating
  • Scalability
  • Adaptability

Repository cluster (OSU)
Compute cluster (OSU)
18
FREERIDE-G System Design
Write a Comment
User Comments (0)
About PowerShow.com