Benchmarking MapReduce-Style Parallel Computing - PowerPoint PPT Presentation

About This Presentation

Title:

Benchmarking MapReduce-Style Parallel Computing

Description:

Corporate sales, stock market transactions, census, airline traffic, ... Entertainment ... Internet images, Hollywood movies, MP3 files, ... Medicine. MRI & CT ... – PowerPoint PPT presentation

Number of Views:38

Avg rating:3.0/5.0

Slides: 15

Provided by: randa181

Learn more at: https://iiswc.org

Category:

Tags: mapreduce | benchmarking | computing | parallel | style

Transcript and Presenter's Notes

Title: Benchmarking MapReduce-Style Parallel Computing

1
BenchmarkingMapReduce-StyleParallel Computing
Randal E. Bryant Carnegie Mellon University
http//www.cs.cmu.edu/bryant
2
Programming with MapReduce

Background
Developed at Google for aggregating web data
Dean Ghemawat MapReduce Simplified Data
Processing on Large Clusters, OSDI 2004
Strengths
Easy way to write scalable parallel programs
Powerful programming model
Beyond web search applications
Runtime system automatically handles many of the
challenges of parallel programming
Scheduling, load balancing, fault tolerance

3
Overall Execution Model

General Form
Input
Large set of files
Compute
Aggregate information
Output
Files containing aggregations

Example Word Count Index
Input
1010 cached web pages
Stored on cluster of 1000 machines, each with own
local disk
Compute
Index of words with occurrence counts
Output
File containing count for each word

4
MapReduce Programming

Map
Function generating keyword/value pairs from
input file
E.g., word/count for each word in document
Reduce
Function aggregating values for single keyword
E.g.,Sum word counts

5
MapReduce Implementation

(Somewhat naïve implementation)
Map
Spawn mapping task for each input file
Execute on processor local to file
Generate file for each keyword/value
Shuffle
Redistribute files by hashing keywords K gt Ph(K)
Reduce
Spawn reduce task for each keyword
On processor to which keyword hashes Ph(K)

6
Appealing Features

Ease of Programming
Programmer provides only two functions
Express in terms of computation over data, not
detailed execution on system
Robustness
Tolerant to failures of disks, processors,
network
Source files stored redundantly
Runtime monitor detects and reexecutes failed
tasks
Dynamic scheduling automatically adapts to
resource limitations

7
Tolerating Failures

Dean Ghemawat, OSDI 2004
Sorting 10 million 100-byte records with 1800
processors
Proactively restart delayed computations to
achieve better performance and fault tolerance

8
Our Data-Driven World

Science
Data bases from astronomy, genomics, natural
languages, seismic modeling,
Humanities
Scanned books, historic documents,
Commerce
Corporate sales, stock market transactions,
census, airline traffic,
Entertainment
Internet images, Hollywood movies, MP3 files,
Medicine
MRI CT scans, patient records,

9
Big Data Computing Beyond Web Search

Application Domains
Rely on large, ever-changing data sets
Collecting maintaining data is major effort
Computational Requirements
Extract information from large volumes of raw
data
Hypothesis
Can apply MapReduce style computation to many
other application domains
Give it a Try!
Hadoop Open source implementation of parallel
file system MapReduce

10
Q1 Workload Characteristics

Hardware
1000s of nodes
Each with processor(s), disk(s), network
interface
High-speed, local network using commodity
technology
E.g., gigabit ethernet with switches
Data Organization
Distributed file system providing uniform name
space and redundant storage
Computation
Each task executed as separate process with file
I/O
Rely on file system for data transfer

11
Q2 Hardware/Software Challenges

Performance Issues
Disk bandwidth limitations
? 3.6 hours to read data from 1TB disk
Data transfer across network
Process file I/O overhead
Runtime Issues
Detecting and mitigating effects of failed
components

12
Q3 Benchmarking Challenges

Generalizing Results
Beyond specific data set cluster configuration
Performance depends on many different factors
Can we predict how program will scale?
Identifying Bottlenecks
Many interacting parts to system
Evaluating Robustness
Creating realistic failure modes

13
Q4 University Contributions

Currently Industry ahead of universities
Dealing with massive data sets
Computing at very large scale
Developing new programming/runtime approaches
Google, Yahoo!, Microsoft
University Role
More open and systematic inquiry
Apply to noncommercial problems
Extend and improve programming model and
notations
Expose students to emerging styles of computing

14
Background Information

Data-Intensive Supercomputing The case for
DISC
Tech Report CMU-CS-07-128
Available from http//www.cs.cmu.edu/bryant

Write a Comment

User Comments (0)

About PowerShow.com

Recommended Relevance Latest Highest Rated Most Viewed

Sort by:

Related More from user

CrystalGraphics Presentations

Introducing-PowerShowcom PowerPoint PPT Presentation

Introducing-PowerShowcom - Introducing-PowerShowcom (Without Music)

CrystalGraphics 3D Character Slides for PowerPoint PowerPoint PPT Presentation

CrystalGraphics 3D Character Slides for PowerPoint - CrystalGraphics 3D Character Slides for PowerPoint

Chart and Diagram Slides for PowerPoint PowerPoint PPT Presentation

Chart and Diagram Slides for PowerPoint - Beautifully designed chart and diagram s for PowerPoint with visually stunning graphics and animation effects. Our new CrystalGraphics Chart and Diagram Slides for PowerPoint is a collection of over 1000 impressively designed data-driven chart and editable diagram s guaranteed to impress any audience. They are all artistically enhanced with visually stunning color, shadow and lighting effects. Many of them are also animated. And they’re ready for you to use in your PowerPoint presentations the moment you need them. – PowerPoint PPT presentation

Related Presentations

CS 213: Parallel Processing Architectures PowerPoint PPT Presentation

CS 213: Parallel Processing Architectures - Parallelism moved to instruction level. Microprocessor performance ... Process Level or Thread level parallelism; mainstream for general purpose computing? ... | PowerPoint PPT presentation | free to view

An Introduction to Apache Hadoop MapReduce PowerPoint PPT Presentation

An Introduction to Apache Hadoop MapReduce - An Introduction to Apache Hadoop MapReduce, what is it and how does it work ? What is the map reduce cycle and how are jobs managed. Why should it be used and who are big users and providers ? | PowerPoint PPT presentation | free to view

Hadoop online training PowerPoint PPT Presentation

Hadoop online training - Hadoop: A Software Framework for Data Intensive Computing Applications | PowerPoint PPT presentation | free to view

GridFTP: File Transfer Protocol in Grid Computing Networks PowerPoint PPT Presentation

GridFTP: File Transfer Protocol in Grid Computing Networks - File Transfer Protocol in Grid Computing Networks Caitlin Minteer Agenda Grid Computing Globus Toolkit Grid FTP Advantages of GridFTP Disadvantages of GridFTP Using ... | PowerPoint PPT presentation | free to view

Advanced Hardware Parallel/Distributed Processing High Performance Computing Top 500 list Grid computing PowerPoint PPT Presentation

Advanced Hardware Parallel/Distributed Processing High Performance Computing Top 500 list Grid computing - CMPE 478, Parallel Processing Advanced Hardware Parallel/Distributed Processing High Performance Computing Top 500 list Grid computing picture of | PowerPoint PPT presentation | free to view

??????????(Distributed Computing)?????(Parallel Computing)?????(Grid Computing)????????????,??????????????????????????????????????????,?????????????????????????? PowerPoint PPT Presentation

??????????(Distributed Computing)?????(Parallel Computing)?????(Grid Computing)????????????,??????????????????????????????????????????,?????????????????????????? - Distributed Computing Parallel Computing Grid ... | PowerPoint PPT presentation | free to view

Alternative Computing Paradigms: Summary and Future Directions PowerPoint PPT Presentation

Alternative Computing Paradigms: Summary and Future Directions - Quantum Computing: Parallel computation in quantum systems Basic Mechanism: Parallel computation along all possible computational paths, ... | PowerPoint PPT presentation | free to view

Parallel Collaborative Design Overview and examples of method for designers and other stakeholders to quickly generate and refine design ideas. PowerPoint PPT Presentation

Parallel Collaborative Design Overview and examples of method for designers and other stakeholders to quickly generate and refine design ideas. - Parallel Collaborative Design Overview and examples of method for designers and other stakeholders to quickly generate and refine design ideas. | PowerPoint PPT presentation | free to view

Observation on Parallel Computation of Transitive and Max-closure Problems PowerPoint PPT Presentation

Observation on Parallel Computation of Transitive and Max-closure Problems - Observation on Parallel Computation of Transitive and Max-closure Problems Motivation TC problem has numerous applications in many areas of computer science. | PowerPoint PPT presentation | free to view

Introduction to Parallel Computing PowerPoint PPT Presentation

Introduction to Parallel Computing - Introduction to Parallel Computing Yao-Yuan Chuang | PowerPoint PPT presentation | free to view

Introduction to Parallel Programming PowerPoint PPT Presentation

Introduction to Parallel Programming - Title: The IC Wall Collaboration between Computer science + Physics Last modified by: bal Document presentation format: Custom Other titles: Times New Roman Arial ... | PowerPoint PPT presentation | free to view

Introduction to Parallel Computing PowerPoint PPT Presentation

Introduction to Parallel Computing - Load balancing is important to parallel programs for ... Memory Hybrid Distributed-Shared Memory Shared Memory Shared memory parallel computers vary ... | PowerPoint PPT presentation | free to view

Introduction to Cluster Computing PowerPoint PPT Presentation

Introduction to Cluster Computing - ... and HTC Parallel algorithms Software technologies High Performance Computing CPU clock frequency Parallel computers ... load balancing Transparent process ... | PowerPoint PPT presentation | free to view

Lecture 1: Introduction to High Performance Computing PowerPoint PPT Presentation

Lecture 1: Introduction to High Performance Computing - Title: CSE 574 Parallel Processing Author: ICS Faculty User Last modified by: Esin Onbasioglu Created Date: 7/12/2005 12:19:29 PM Document presentation format | PowerPoint PPT presentation | free to view

Parallel Job Deployment and Monitoring in a Hierarchy of Mobile Agents PowerPoint PPT Presentation

Parallel Job Deployment and Monitoring in a Hierarchy of Mobile Agents - Parallel Job Deployment and Monitoring in a Hierarchy of Mobile Agents Munehiro Fukuda Computing & Software Systems, University of Washington, Bothell | PowerPoint PPT presentation | free to view

pArray as an Efficient Static Parallel Container in STAPL (Standard Template Adaptive Parallel Library) PowerPoint PPT Presentation

pArray as an Efficient Static Parallel Container in STAPL (Standard Template Adaptive Parallel Library) - pArray as an Efficient Static Parallel Container in STAPL (Standard Template Adaptive Parallel Library) Presenter: Olga Tkachyshyn Grad Student Advisors: Ping An ... | PowerPoint PPT presentation | free to view

High Performance Molecular Visualization and Analysis with GPU Computing PowerPoint PPT Presentation

High Performance Molecular Visualization and Analysis with GPU Computing - High Performance Molecular Visualization and Analysis with GPU Computing John Stone Theoretical and Computational Biophysics Group Beckman Institute for Advanced ... | PowerPoint PPT presentation | free to view

EE 316 Computer Engineering Junior Lab PowerPoint PPT Presentation

EE 316 Computer Engineering Junior Lab - EE 316 Computer Engineering Junior Lab Lecture on PC Parallel port The IEEE 1284 parallel interface standard Parallel ports are used for connecting a computer (host ... | PowerPoint PPT presentation | free to view

Parallel and Distributed Models in Evolutionary Computing PowerPoint PPT Presentation

Parallel and Distributed Models in Evolutionary Computing - Parallel and Distributed Models in Evolutionary Computing Motivation Parallelization models Distributed models Neural and Evolutionary Computing - Lecture 10 * | PowerPoint PPT presentation | free to view

Parallelism PowerPoint PPT Presentation

Parallelism - Parallelism Writing with Clarity and Style! Parallelism Using parallel structure helps the reader to navigate your writing with ease. Writing this way makes your ... | PowerPoint PPT presentation | free to view

IaaS Cloud Benchmarking: PowerPoint PPT Presentation

IaaS Cloud Benchmarking: - IaaS Cloud Benchmarking: Approaches, Challenges, and Experience Alexandru Iosup Parallel and Distributed Systems Group Delft University of Technology | PowerPoint PPT presentation | free to view

Mapreduce In Hadoop PPT by Ravi Namboori Cisco Evangelist PowerPoint PPT Presentation

Mapreduce In Hadoop PPT by Ravi Namboori Cisco Evangelist - Ravi Namboori presenting How Mapreduce process works In Hadoop with a Flow diagram which explains the flow from Job Submission Process to initialization, Task Assignment & heartbeat method and Task Execution. | PowerPoint PPT presentation | free to view

Parallel solution of the Helmholtz equation with high frequency PowerPoint PPT Presentation

Parallel solution of the Helmholtz equation with high frequency - Parallel solution of the Helmholtz equation with high frequency Dan Gordon Computer Science University of Haifa Rachel Gordon Aerospace Eng. Technion | PowerPoint PPT presentation | free to view

The Best MapReduce Online Training with Job Assist PowerPoint PPT Presentation

The Best MapReduce Online Training with Job Assist - Mindmajix MapReduce Training helps you to learn implementation for processing and generating large data sets with a parallel, distributed algorithm on a cluster. The framework takes care of scheduling tasks, monitoring them and re-executing any faile d tasks. | PowerPoint PPT presentation | free to view

Cloud Computing Training in Chandigarh (2) PowerPoint PPT Presentation

Cloud Computing Training in Chandigarh (2) - Cloud Computing Training in Chandigarh is provided by CBitss Technologies . For more information contact-us : 9914641983 | PowerPoint PPT presentation | free to view

Cloud Computing Training in Chandigarh (11) PowerPoint PPT Presentation

Cloud Computing Training in Chandigarh (11) - Cloud Computing Training in Chandigarh provided by CBitss Technologies ta sector 34A . For more information contact-us : 9914641983 | PowerPoint PPT presentation | free to view

Introduction to Big Data HADOOP HDFS MapReduce - Department of Computer Engineering PowerPoint PPT Presentation

Introduction to Big Data HADOOP HDFS MapReduce - Department of Computer Engineering - This presentation is an Introduction to Big Data, HADOOP: HDFS, MapReduce and includes topics What is Big Data and its benefits, Big Data Technologies and their challenges, Hadoop framework comparison between SQL databases and Hadoop and more. It is presented by Prof. Deptii Chaudhari, from the department of Computer Engineering at International Institute of Information Technology, I²IT. | PowerPoint PPT presentation | free to view