Title: Cloud Computing Test Bed: Hadoop as a Service Case Study
1Cloud Computing Test Bed Hadoop as a Service
Case Study
- Thomas Sandholm, Kevin Lai from Hewlett-Packard
Laboratories, Palo Alto
2Outline
- Part I Cloud Computing Test Bed
- What is the cloud computing test bed?
- Technical benefits
- Infrastructure stack
- Part II Hadoop as a Service
- Why a Hadoop service?
- Differentiated Hadoop services
- Economic job optimization
- Experiment results
- Demo
3Cloud Computing Test Bed
4What is the cloud computing test bed?
University of Illinois, Urbana Champaign
Karlsruhe University, Germany
Intel Research,Pittsburgh
Yahoo! Research, Sunnyvale
HP Labs, Palo Alto, Bristol
Infocomm Development Authority, Singapore
1000-4000 cores/site
5Goal of the cloud computing test bed
- Promote
- collaborative cloud computing research among
industry, academia and government on
- cloud computing software,
- data-center management infrastructure and
hardware at an
- Internet scale
- Availability by end of 2008
6Technical benefits
- Allow federated cluster experiments/benchmarks
- Allow innovation at all levels of the cloud
computing infrastructure stack
- Commitment to openness in sharing software,
tools, best practices
- Collection of usage statistics
- Example research areas
- QoS, resource allocation, multi-tenancy,
virtualization, management automation,
scalability, distributed security, storage
management
7Infrastructure stack
Application Services (Example Mahout, PIG )
Infrastructure Services (Example Hadoop, NFS)
Optional Interfaces
Virtual Resource Set (Example Tycoon, Eucalyptus
)
Physical Resource Set (Example Tashi, Emulab)
Mandatory Interface
Hardware(Example computers, network, storage)
Recommendations
8Physical Resource Set (PRS)
- Thin layer with open source reference
implementation
- Includes clear policies on
- who to admit
- how to arbitrate among competing requests
- what resource capacity may be requested over what
time frames
- VLAN isolated mini-datacenter
- Reset, reboot, power up, power down, get status
- Bias towards large and short experiments
- Site coordination required, e.g. accounting
9Hadoop as a Service
10Why a Hadoop service?
- Simplify installation, setup and provisioning
- Research Questions
- How to support multi-tenancy with QoS
differentiation
- How to optimize workflows across users with
fluctuating capacity requirements
- Key features
- On-demand creation
- Dynamic resource flexing
11Differentiated Hadoop services
- Problem
- More important jobs should preempt less important
jobs
- Time critical jobs need to meet deadlines
- Test jobs need no stringent QoS guarantees
- How to get users to truthfully reveal their
resource requirements?
12Differentiated Hadoop services continued
- Approach
- Market-based resource allocator, Tycoon
(http//tycoon.hpl.hp.com)
- Continuous bidding (of spending rates) for
resource capacity
- Users can evaluate and select providers based on
cost/benefit metrics
- Give incentive to users to be judicial about
capacity requests and time to submit
13Economic job optimization
- Assumption
- Not all subtasks need maximum capacity at all
times
- Approach
- Automatically rescale the capacity as needed to
optimize the cost/benefit ratio of the workflow
as a whole
- Opportunity
- Application scalability profile not perfectly
linear
14Optimization strategies
- Priority
- P Some tasks/nodes more performance critical
than others
- S Declare relative priority of mappers and
reducers or critical nodes and split budget
accordingly (e.g. master funding boost)
- Data Reduction
- P Early phases of workflow more data intensive
- S Use decaying spending rates when bidding
- Bottleneck
- P During map/reduce synch up some nodes may be
bottlenecks
- S Redistribute funds to active bottlenecks
15Optimization strategies continued
- Best Response
- P When other users place competing bids, optimal
configuration/allocation might change
- S Find game theoretical best response bids
continuously to maximize utility
- Risk
- P Some users are more risk averse than others
(can tolerate less fluctuations)
- S Bid on nodes based on predicted guarantee to
deliver a QoS level
16Experiment setup
- 2 Competing Hadoop users
- GridMix on 40 nodes
- 25 GiB input data
- 6-13 MapReduce Tasks
- 7-12 minutes/job
- 30-job sequential workflow/user
17Experiment results
10-12 performance improvement
18Experiment results continued
- 45 efficiency improvement
19Lessons learned
- What worked well
- Security Sweet spot OpenId AWS REST Auth
Diffie-Hellman
- MapReduce also useful for CPU-bound apps but
completion time prediction and efficient failover
becomes trickier
- StreamPython powerful combination
20Lessons learned continued
- Enhancement opportunities
- Python MapReduce utility library with standard
tasks (joins/filters, aggregation, merge)
- Lightweight piping
- Lightweight merging
- O(1) instead of O(n) master startup
- Expose Web monitoring with REST APIs/custom
formats (XML/RSS)
- Suspend/resume/checkpoint for cpu bound apps
- Built-in media split/join
21Hadoop as a Serviceleveraging the test bed set up
- How do users react to economic feedback?
- How do markets bridge cross-site trust domains?
- Which stock market tools come into play when
trading volume increases?
- What effects do macroeconomic policies such as
taxes have on usage?
22More info
- Community web sitehttp//cloudtestbed.org
- Questions global-cloud-research-tesbed-info_at_exte
rnal.groups.hp.com
- Update at ApacheCon in November
- Thanks to John Wilkes, HP Labs