Cloud Computing Test Bed: Hadoop as a Service Case Study - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Cloud Computing Test Bed: Hadoop as a Service Case Study

Description:

Market-based resource allocator, Tycoon (http://tycoon.hpl.hp.com) ... Update at ApacheCon in November. Thanks to: John Wilkes, HP Labs ... – PowerPoint PPT presentation

Number of Views:1072
Avg rating:3.0/5.0
Slides: 23
Provided by: HaliOr7
Category:
Tags: bed | case | cloud | computing | hadoop | hp | service | study | test

less

Transcript and Presenter's Notes

Title: Cloud Computing Test Bed: Hadoop as a Service Case Study


1
Cloud Computing Test Bed Hadoop as a Service
Case Study
  • Thomas Sandholm, Kevin Lai from Hewlett-Packard
    Laboratories, Palo Alto

2
Outline
  • Part I Cloud Computing Test Bed
  • What is the cloud computing test bed?
  • Technical benefits
  • Infrastructure stack
  • Part II Hadoop as a Service
  • Why a Hadoop service?
  • Differentiated Hadoop services
  • Economic job optimization
  • Experiment results
  • Demo

3
Cloud Computing Test Bed
4
What is the cloud computing test bed?
University of Illinois, Urbana Champaign
Karlsruhe University, Germany
Intel Research,Pittsburgh
Yahoo! Research, Sunnyvale
HP Labs, Palo Alto, Bristol
Infocomm Development Authority, Singapore
1000-4000 cores/site
5
Goal of the cloud computing test bed
  • Promote
  • collaborative cloud computing research among
    industry, academia and government on
  • cloud computing software,
  • data-center management infrastructure and
    hardware at an
  • Internet scale
  • Availability by end of 2008

6
Technical benefits
  • Allow federated cluster experiments/benchmarks
  • Allow innovation at all levels of the cloud
    computing infrastructure stack
  • Commitment to openness in sharing software,
    tools, best practices
  • Collection of usage statistics
  • Example research areas
  • QoS, resource allocation, multi-tenancy,
    virtualization, management automation,
    scalability, distributed security, storage
    management

7
Infrastructure stack
Application Services (Example Mahout, PIG )


Infrastructure Services (Example Hadoop, NFS)
Optional Interfaces

Virtual Resource Set (Example Tycoon, Eucalyptus
)
Physical Resource Set (Example Tashi, Emulab)
Mandatory Interface
Hardware(Example computers, network, storage)
Recommendations
8
Physical Resource Set (PRS)
  • Thin layer with open source reference
    implementation
  • Includes clear policies on
  • who to admit
  • how to arbitrate among competing requests
  • what resource capacity may be requested over what
    time frames
  • VLAN isolated mini-datacenter
  • Reset, reboot, power up, power down, get status
  • Bias towards large and short experiments
  • Site coordination required, e.g. accounting

9
Hadoop as a Service
10
Why a Hadoop service?
  • Simplify installation, setup and provisioning
  • Research Questions
  • How to support multi-tenancy with QoS
    differentiation
  • How to optimize workflows across users with
    fluctuating capacity requirements
  • Key features
  • On-demand creation
  • Dynamic resource flexing

11
Differentiated Hadoop services
  • Problem
  • More important jobs should preempt less important
    jobs
  • Time critical jobs need to meet deadlines
  • Test jobs need no stringent QoS guarantees
  • How to get users to truthfully reveal their
    resource requirements?

12
Differentiated Hadoop services continued
  • Approach
  • Market-based resource allocator, Tycoon
    (http//tycoon.hpl.hp.com)
  • Continuous bidding (of spending rates) for
    resource capacity
  • Users can evaluate and select providers based on
    cost/benefit metrics
  • Give incentive to users to be judicial about
    capacity requests and time to submit

13
Economic job optimization
  • Assumption
  • Not all subtasks need maximum capacity at all
    times
  • Approach
  • Automatically rescale the capacity as needed to
    optimize the cost/benefit ratio of the workflow
    as a whole
  • Opportunity
  • Application scalability profile not perfectly
    linear

14
Optimization strategies
  • Priority
  • P Some tasks/nodes more performance critical
    than others
  • S Declare relative priority of mappers and
    reducers or critical nodes and split budget
    accordingly (e.g. master funding boost)
  • Data Reduction
  • P Early phases of workflow more data intensive
  • S Use decaying spending rates when bidding
  • Bottleneck
  • P During map/reduce synch up some nodes may be
    bottlenecks
  • S Redistribute funds to active bottlenecks

15
Optimization strategies continued
  • Best Response
  • P When other users place competing bids, optimal
    configuration/allocation might change
  • S Find game theoretical best response bids
    continuously to maximize utility
  • Risk
  • P Some users are more risk averse than others
    (can tolerate less fluctuations)
  • S Bid on nodes based on predicted guarantee to
    deliver a QoS level

16
Experiment setup
  • 2 Competing Hadoop users
  • GridMix on 40 nodes
  • 25 GiB input data
  • 6-13 MapReduce Tasks
  • 7-12 minutes/job
  • 30-job sequential workflow/user

17
Experiment results
10-12 performance improvement
18
Experiment results continued
  • 45 efficiency improvement

19
Lessons learned
  • What worked well
  • Security Sweet spot OpenId AWS REST Auth
    Diffie-Hellman
  • MapReduce also useful for CPU-bound apps but
    completion time prediction and efficient failover
    becomes trickier
  • StreamPython powerful combination

20
Lessons learned continued
  • Enhancement opportunities
  • Python MapReduce utility library with standard
    tasks (joins/filters, aggregation, merge)
  • Lightweight piping
  • Lightweight merging
  • O(1) instead of O(n) master startup
  • Expose Web monitoring with REST APIs/custom
    formats (XML/RSS)
  • Suspend/resume/checkpoint for cpu bound apps
  • Built-in media split/join

21
Hadoop as a Serviceleveraging the test bed set up
  • How do users react to economic feedback?
  • How do markets bridge cross-site trust domains?
  • Which stock market tools come into play when
    trading volume increases?
  • What effects do macroeconomic policies such as
    taxes have on usage?

22
More info
  • Community web sitehttp//cloudtestbed.org
  • Questions global-cloud-research-tesbed-info_at_exte
    rnal.groups.hp.com
  • Update at ApacheCon in November
  • Thanks to John Wilkes, HP Labs
Write a Comment
User Comments (0)
About PowerShow.com