Cloud Computing Test Bed: Hadoop as a Service Case Study

About This Presentation

Title:

Cloud Computing Test Bed: Hadoop as a Service Case Study

Description:

Market-based resource allocator, Tycoon (http://tycoon.hpl.hp.com) ... Update at ApacheCon in November. Thanks to: John Wilkes, HP Labs ... – PowerPoint PPT presentation

Number of Views:1072

Avg rating:3.0/5.0

Slides: 23

Provided by: HaliOr7

Category:

more less

Transcript and Presenter's Notes

Title: Cloud Computing Test Bed: Hadoop as a Service Case Study

1
Cloud Computing Test Bed Hadoop as a Service
Case Study

Thomas Sandholm, Kevin Lai from Hewlett-Packard
Laboratories, Palo Alto

2
Outline

Part I Cloud Computing Test Bed
What is the cloud computing test bed?
Technical benefits
Infrastructure stack
Part II Hadoop as a Service
Why a Hadoop service?
Differentiated Hadoop services
Economic job optimization
Experiment results
Demo

3
Cloud Computing Test Bed
4
What is the cloud computing test bed?
University of Illinois, Urbana Champaign
Karlsruhe University, Germany
Intel Research,Pittsburgh
Yahoo! Research, Sunnyvale
HP Labs, Palo Alto, Bristol
Infocomm Development Authority, Singapore
1000-4000 cores/site
5
Goal of the cloud computing test bed

Promote
collaborative cloud computing research among
industry, academia and government on
cloud computing software,
data-center management infrastructure and
hardware at an
Internet scale
Availability by end of 2008

6
Technical benefits

Allow federated cluster experiments/benchmarks
Allow innovation at all levels of the cloud
computing infrastructure stack
Commitment to openness in sharing software,
tools, best practices
Collection of usage statistics
Example research areas
QoS, resource allocation, multi-tenancy,
virtualization, management automation,
scalability, distributed security, storage
management

7
Infrastructure stack
Application Services (Example Mahout, PIG )

Infrastructure Services (Example Hadoop, NFS)
Optional Interfaces

Virtual Resource Set (Example Tycoon, Eucalyptus
)
Physical Resource Set (Example Tashi, Emulab)
Mandatory Interface
Hardware(Example computers, network, storage)
Recommendations
8
Physical Resource Set (PRS)

Thin layer with open source reference
implementation
Includes clear policies on
who to admit
how to arbitrate among competing requests
what resource capacity may be requested over what
time frames
VLAN isolated mini-datacenter
Reset, reboot, power up, power down, get status
Bias towards large and short experiments
Site coordination required, e.g. accounting

9
Hadoop as a Service
10
Why a Hadoop service?

Simplify installation, setup and provisioning
Research Questions
How to support multi-tenancy with QoS
differentiation
How to optimize workflows across users with
fluctuating capacity requirements
Key features
On-demand creation
Dynamic resource flexing

11
Differentiated Hadoop services

Problem
More important jobs should preempt less important
jobs
Time critical jobs need to meet deadlines
Test jobs need no stringent QoS guarantees
How to get users to truthfully reveal their
resource requirements?

12
Differentiated Hadoop services continued

Approach
Market-based resource allocator, Tycoon
(http//tycoon.hpl.hp.com)
Continuous bidding (of spending rates) for
resource capacity
Users can evaluate and select providers based on
cost/benefit metrics
Give incentive to users to be judicial about
capacity requests and time to submit

13
Economic job optimization

Assumption
Not all subtasks need maximum capacity at all
times
Approach
Automatically rescale the capacity as needed to
optimize the cost/benefit ratio of the workflow
as a whole
Opportunity
Application scalability profile not perfectly
linear

14
Optimization strategies

Priority
P Some tasks/nodes more performance critical
than others
S Declare relative priority of mappers and
reducers or critical nodes and split budget
accordingly (e.g. master funding boost)
Data Reduction
P Early phases of workflow more data intensive
S Use decaying spending rates when bidding
Bottleneck
P During map/reduce synch up some nodes may be
bottlenecks
S Redistribute funds to active bottlenecks

15
Optimization strategies continued

Best Response
P When other users place competing bids, optimal
configuration/allocation might change
S Find game theoretical best response bids
continuously to maximize utility
Risk
P Some users are more risk averse than others
(can tolerate less fluctuations)
S Bid on nodes based on predicted guarantee to
deliver a QoS level

16
Experiment setup

2 Competing Hadoop users
GridMix on 40 nodes
25 GiB input data
6-13 MapReduce Tasks
7-12 minutes/job
30-job sequential workflow/user

17
Experiment results
10-12 performance improvement
18
Experiment results continued

45 efficiency improvement

19
Lessons learned

What worked well
Security Sweet spot OpenId AWS REST Auth
Diffie-Hellman
MapReduce also useful for CPU-bound apps but
completion time prediction and efficient failover
becomes trickier
StreamPython powerful combination

20
Lessons learned continued

Enhancement opportunities
Python MapReduce utility library with standard
tasks (joins/filters, aggregation, merge)
Lightweight piping
Lightweight merging
O(1) instead of O(n) master startup
Expose Web monitoring with REST APIs/custom
formats (XML/RSS)
Suspend/resume/checkpoint for cpu bound apps
Built-in media split/join