Programming In Hadoop - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Programming In Hadoop

Description:

Scale and Abstraction; Quality and Agility. Yahoo!'s unique footprint. Yahoo!'s Cloud Strategy ... 4,000 cores, 3 TB RAM, 1.5 PB disks, 27 teraflops! ... – PowerPoint PPT presentation

Number of Views:73
Avg rating:3.0/5.0
Slides: 27
Provided by: owenom
Category:

less

Transcript and Presenter's Notes

Title: Programming In Hadoop


1
Cloud Computing _at_Yahoo!
Dekel Tankel Director, Product
Management Yahoo! Cloud Computing dekel_at_yahoo-in
c.com
IGT, June 2009
2
What well cover today
  • Why Cloud?
  • Scale and Abstraction Quality and Agility
  • Yahoo!s unique footprint
  • Yahoo!s Cloud Strategy
  • Overview of the Yahoo! Cloud vision and portfolio
  • Deep dive on Horizontal Functional Cloud
    Services
  • The Yahoo! Open Strategy
  • Marrying Yahoo!s Open Strategy, its platforms
    and ethic with external Cloud services

3
Why Cloud? Benefits for Yahoo!
  • Higher Agility Stability while maintaining
    Scale
  • Abstraction
  • Enable developers to focus on their
    applications, not infrastructure
  • Accelerating innovation
  • Adding new features and products at an ever
    faster rate
  • Increasing Scale Availability
  • More robustly, more globally, more completely,
    for a given budget

Cloud is pushing up the Operation Excellence
Curve
4
Yahoo!s Unique Cloud Unprecedented Scale
  • Massive user base and engagement
  • 500M unique users per month
  • Hundreds of petabyte of storage
  • Hundreds of billions of objects
  • Hundred of thousands of requests/sec
  • Global
  • Tens of globally distributed data centers
  • Serving each region at low latencies
  • Challenging Users
  • Rapidly extracting value from voluminous data
  • Downtime is not an option (outages cost
    millions)
  • Variable usage patterns

5
Yahoo! Cloud Services
ROI Innovation
Y!OS, BOSS, YQL, APT, Analytics,
Storage, Batch, Edge Serving,
6
Yahoo! Cloud Services Focus on PaaS offerings
ROI Innovation
SaaS
PaaS
IaaS
7
From Infrastructure to Shareholders benefit
  • Horizontal Cloud
  • Focus on open source and collaborative RD with
    industry, academia and government
  • Functional Cloud
  • Focus on developing "open strategy" frameworks,
    tools and services for developers (at Yahoo! and
    beyond)
  • Combined Together
  • Leverage our unique scale, assets and data to
    drive disruptive innovations in the market and
    expand Yahoo!s competitive differentiation

8
Yahoo! Cloud Strategy in ActionThe Front Page
Case Study
  • Horizontal Cloud Storage Hadoop
  • Analyze extremely large content data sets
  • Functional Cloud Content Optimization
  • Rate content items based on various parameters
  • Applications Yahoos Front Page
  • Display high rating items to the right users
  • Benefit consumers and advertisers and grow
    Yahoo!s revenue

9
Yahoo! Cloud Strategy in ActionThe Inquisitor
Case Study
  • Horizontal Cloud Hadoop
  • Analyze large search-index data sets
  • Functional Cloud - BOSS
  • Expose the data in a structured, open,
    flexibleand cloud like way
  • Applications - iPhoneTM Inquisitor
  • Leverage BOSS to provide innovative consumer
    experience
  • Benefit consumers and grow Yahoo!s revenue

10
Horizontal Cloud Services
ROI Innovation
11
Horizontal Cloud Services
  • Optimized for Yahoo!-scale
  • Yahoo!-internal focus
  • Data processing and serving environments
  • Drive faster innovation and agility
  • Shorter product development cycles
  • Reduce labor and costs for infrastructure
  • Multi-year effort
  • Strategic investment across the company

12
Horizontal Cloud Services Conceptual View
Simple APIs
Operational StorageStructured, unstructured
Batch Storage ProcessingHadoop, PIG
Edge Content ServicesCaching, Proxies
Online Serving Web, Data
ID Account Management
Security and Authentication
Metering, Billing
Monitoring QoS
Provisioning Virtualization (Xen)
Shared Infrastructure
Common Approaches to QA, Production
Engineering, Performance Engineering, Datacenter
Management, and Optimization
13
Horizontal Cloud Services Use Cases
Search Index
Content Optimization
Machine Learning (e.g. Spam filters)
Ads Optimization
Attachment Storage
Image/Video Storage Delivery
14
Yahoo! Distribution of Hadoop
  • Hadoop in a nutshell
  • Open source distributed file system parallel
    execution environment to process massive amounts
    of data
  • Started in 2005, became top-level Apache project
    in 2008
  • Simple Design for Horizontal Scaling on commodity
    HW
  • Yahoo! Distribution of Hadoop
  • Source distribution of Yahoo!s implementation of
    Hadoop(Based entirely on code found in the
    Apache Hadoop)
  • Tested and deployed at Yahoo!s massive scale
  • Benefit the larger ecosystem , Increase pace of
    innovation
  • http//developer.yahoo.com/hadoop

15
Yahoo! runs the largest Hadoop Clusters in the
World
  • 25,000 nodes
  • Clusters of up to 4,000 nodes
  • 4 Tiers of clusters
  • Development Testing, POCs, Science Research,
    Production
  • Terasort Benchmarks
  • 62 seconds to sort One Terabyte (run on 1,500
    nodes)
  • 16.25 hours to sort One Petabyte (run on 3,700
    nodes)
  • Webmap application
  • 490 TB shuffling
  • 280 TB output

16
Case Study - Search Assist
  • Database for Search Assist is built using
    Hadoop.
  • 3 years of log-data, 20-steps of map-reduce
  • Leverage Hadoops scalability, load balancing and
    resiliency
  • Simplified access, flexibility for rapid
    innovation (from C to Python)

17
Functional Cloud Services
ROI Innovation
18
Functional Cloud Services
  • Provides functional capabilities for applications
  • Help developers to accomplish integrated web
    experiences in a faster and easier way
  • Provides common set of functional building
    blocks
  • Powered by the horizontal cloud services
  • Abstracts infrastructure services from the
    Application
  • E.g. Storage, Compute, Serving, Robustness and
    Scalability
  • Self-Served, Global, Managed, Elastic and Metered

19
Functional Cloud Services YQL BOSS

Build your Own Search Service
Yahoo! Query Language
A single endpoint service that enables developers
to query, filter and combine data across Yahoo!
and beyond http//developer.yahoo.com/yql/conso
le/
Providing Yahoo! Search infrastructure and
technology to developers and companies to help
them build their own search experiences
http//developer.yahoo.com/search/boss/
20
Build your Own Search Service (BOSS)
  • Yahoo!'s open search web services platform
  • Serving hundreds of millions of users across the
    Web.
  • Goal foster innovation in the search industry
  • Build and launch web-scale search products that
    utilize the entire Yahoo! Search index.
  • Access to Yahoo!'s investments in crawling and
    indexing, ranking and relevancy algorithms

21
Yahoo! Query Language (YQL)
  • Single endpoint service to query, filter and
    combine data across Yahoo! and beyond
  • The Internet API
  • SQL-like SELECT syntax for getting the right data
  • Quickly discover available data sources and
    structure
  • Combined data from a single web browser
  • Easy-to-use Consol
  • http//developer.yahoo.com/yql/console/

22
Y!OS and Cloud
23
Yahoo! Open Stagey (Y!OS) Goals
24
Y!OS and Cloud Strategy
CLOUD SERVICES
24
25
Open Collaborations around the globe
  • M45 - Yahoo!s supercomputing cluster
  • 4,000 cores, 3 TB RAM, 1.5 PB disks, 27
    teraflops!
  • Operational since November 2007, 4 major
    Universities
  • Focus on highly parallel computing
  • Open Cirrus with HP Intel
  • A global, multi-data center, open source test bed
  • Target to advance cloud computing research
    education
  • Simulates a real-life, Internet-scale environment
  • 9 Global sites, more than 50 research projects

26
Questions?
Dekel Tankel Director, Product
Management Yahoo! Cloud Computing dekel_at_yahoo-inc
.com
Write a Comment
User Comments (0)
About PowerShow.com