Title: Cloud Computing Systems
1Cloud Computing Systems
Hadoop, HDFS and Microsoft Cloud Computing
Technologies
Hong Kong University of Science and Technology
Oct. 3, 2011
2The Microsoft Cloud
Application Services
Software Services
Platform Services
Infrastructure Services
3Application Patterns
Grid / Parallel Computing Application
User
Silverlight Application
Web Browser
Mobile Browser
WPF Application
Private Cloud
Public Services
Enterprise Application
Application Service
Enterprise Web Svc
Data Service
Table Storage Service
Blob Storage Service
Queue Service
Enterprise Data
Storage Service
Identity Service
Enterprise Identity
Service Bus
Access Control Service
Workflow Service
User Data
Application Data
Reference Data
4HadoopHistory
- Started in 2005 by Doug Cutting
- Yahoo! became the primary contributor in 2006
- Scaled it to 4000 node clusters in 2009
- Yahoo! deployed large-scale science clusters in
2007 - Many users today
- Amazon/A9
- Facebook
- Google
- IBM
- Joost
- Last.fm
- New York Times
- PowerSet
- Veoh
5Hadoop at Facebook
- Production cluster comprises 8000 cores, 1000
machines, 32 GB per machine (July 2009) - 4 SATA disks of 1 TB each per machine
- 2-level network hierarchy, 40 machines per rack
- Total cluster size is 2 PB (projected to be 12 PB
in Q3 2009) - Another test cluster has 800 cores 16GB each
- Source Dhruba Borthakur
6HadoopMotivation
- Need a general infrastructure for fault-tolerant,
data-parallel distributed processing - Open-source MapReduce
- Apache License
- Workloads are expected to be IO bound and not CPU
bound
7First, a file system is in needHDFS
- Very large distributed file system running on
commodity hardware - Replicated
- Detect failures and recovers from them
- Optimized for batch processing
- High aggregate bandwidth, locality aware
- User-space FS, runs on heterogeneous OS
8HDFS
1. filename
NameNode
2. BlckId, DataNodes
Secondary NameNode
Client
3.Read data
Cluster
DataNodes
NameNode Manage metadata DataNode manage
file dataMaps a block-id to a physical location
on disk Secondary NameNode fault
tolerancePeriodically merge the Transaction log
9HDFS
- Provide a single namespace for entire cluster
- Files, directories, and their hierarchy
- Files are broken up into large blocks
- Typically 128 MB block size
- Each block is replicated on multiple DataNodes
- Meta-data in Memory
- Metadata Names of files (including dirs) and a
list of Blocks for each file, list of DataNodes
for each block, file attributes, e.g creation
time, replication factor - High performance (high throughput, low latency)
- A Transaction Log records file creations, file
deletions. etc - Data Coherency emphasizes the append operation
- Client can
- find location of blocks
- access data directly from DataNode
10(No Transcript)
11HadoopDesign
- Hadoop Core
- Distributed File System - distributes data
- Map/Reduce - distributes logic (processing)
- Written in Java
- Runs on Linux, Mac OS/X, Windows, and Solaris
- Fault tolerance
- In a large cluster, failure is norm
- Hadoop re-executes failed tasks
- Locality
- Map and Reduce in Hadoop queries HDFS for
locations of data - Map tasks are scheduled close to the inputs when
it is possible
12Hadoop Ecosystem
- Hadoop Core
- Distributed File System
- MapReduce Framework
- Pig (initiated by Yahoo!)
- Parallel Programming Language and Runtime
- Hbase (initiated by Powerset)
- Table storage for semi-structured data
- Zookeeper (initiated by Yahoo!)
- Coordinating distributed systems
- Storm
- Hive (initiated by Facebook)
- SQL-like query language and storage
13Word Count Example
- Read text files and count how often words occur.
- The input is a collection of text files
- The output is a text file
- each line word, tab, count
- Map Produce pairs of (word, count)
- Reduce For each word, sum up the counts.
14WordCount Overview
- public class WordCount
- 14 public static class Map extends
MapReduceBase implements Mapper ... - 17
- 18 public void map ...
- 26
- 27
- 28 public static class Reduce extends
MapReduceBase implements Reducer ... - 29
- 30 public void reduce ...
- 37
- 38
- 39 public static void main(String args)
throws Exception - 40 JobConf conf new JobConf(WordCount.cla
ss) - 41 ...
- 53 FileInputFormat.setInputPaths(conf, new
Path(args0)) - 54 FileOutputFormat.setOutputPath(conf,
new Path(args1)) - 55
- 56 JobClient.runJob(conf)
- 57
15wordCount Mapper
- 14 public static class Map extends
MapReduceBase implements MapperltLongWritable,
Text, Text, IntWritablegt - 15 private final static IntWritable one
new IntWritable(1) - 16 private Text word new Text()
- 17
- 18 public void map(
- LongWritable key, Text value,
- OutputCollectorltText, IntWritablegt output,
- Reporter reporter)
- throws IOException
- 19 String line value.toString()
- 20 StringTokenizer tokenizer new
StringTokenizer(line) - 21 while (tokenizer.hasMoreTokens())
- 22 word.set(tokenizer.nextToken())
- 23 output.collect(word, one)
- 24
- 25
- 26
16wordCount Reducer
- 28 public static class Reduce extends
MapReduceBase implements ReducerltText,
IntWritable, Text, IntWritablegt - 29
- 30 public void reduce(Text key,
IteratorltIntWritablegt values, OutputCollectorltT
ext, IntWritablegt output, Reporter reporter)
throws IOException - 31 int sum 0
- 32 while (values.hasNext())
- 33 sum values.next().get()
- 34
- 35 output.collect(key, new
IntWritable(sum)) - 36
- 37
17Invocation of wordcount
- /usr/local/bin/hadoop dfs -mkdir lthdfs-dirgt
- /usr/local/bin/hadoop dfs -copyFromLocal
ltlocal-dirgt lthdfs-dirgt - /usr/local/bin/hadoop jar hadoop--examples.jar
wordcount -m ltmapsgt -r ltreducersgt
ltin-dirgt ltout-dirgt
18ExampleHadoop Applications Search Assist
- Database for Search Assist is built using
Hadoop. - 3 years of log-data, 20-steps of map-reduce
Before Hadoop After Hadoop
Time 26 days 20 minutes
Language C Python
Development Time 2-3 weeks 2-3 days
19Large Hadoop Jobs
2008 2009
Webmap 70 hours runtime 300 TB shuffling 200 TB output1480 nodes 73 hours runtime 490 TB shuffling 280 TB output 2500 nodes
Sort benchmarks (Jim Gray contest) 1 Terabyte sorted 209 seconds 900 nodes 1 Terabyte sorted 62 seconds, 1500 nodes1 Petabyte sorted 16.25 hours, 3700 nodes
Largest cluster 2000 nodes 6PB raw disk 16TB of RAM 16K CPUs 4000 nodes 16PB raw disk 64TB of RAM 32K CPUs (40 faster CPUs too)
Source Eric Baldeschwieler, Yahoo!
20 Data Warehousing at Facebook
Web Servers
Scribe Servers
Network Storage
Oracle RAC
Hadoop Cluster
MySQL
- 15 TB uncompressed data ingested per day
- 55TB of compressed data scanned per day
- 3200 jobs on production cluster per day
- 80M compute minutes per day
Source Dhruba Borthakur
21But all these are data analytics applications.
Can it extend to general computation?
How to construct a simple, generic, and automatic
parallelization engine for the cloud?
Lets look at an example...
22The Tomasulos Algorithm
- Designed initially for IBM 360/91
- Out-of-order execution
- The descendants of this include Alpha 21264, HP
8000, MIPS 10000, Pentium III, PowerPC 604,
23Three Stages of Tomasulo Algorithm
- 1. Issueget instruction from a queue
- Record the instructions information in the
processors internal control, and rename
registers - 2. Executeoperate on operands (EX)
- When all operands are ready, executeotherwise,
watch Common Data Bus (CDB) for result - 3. Write resultfinish execution (WB)
- Write result to CDB. All awaiting units receive
the result.
24Tomasulo organization
25How does Tomasulo exploit parallelism?
- Naming and renaming
- Keep track of data dependence and resolve
conflicts by renaming registers. - Reservation stations
- Record instructions control information and the
values of operands. Data has versions. - In Tomasulo, data drive logic
When data is ready, execute!
26Dryad
- Distributed/parallel execution
- Improve throughput, not latency
- Automatic management of scheduling, distribution,
fault tolerance, and parallelization! - Computations are expressed as a DAG
- Directed Acyclic Graph vertices are
computations, edges are communication channels - Each vertex has several input and output edges
27Why using a dataflow graph?
- A general abstraction of computation
- The programmer may not have to know how to
construct the graph - SQL-like queries LINQ
Can all computation be represented by a finite
graph?
28Yet Another WordCount, in Dryad
Count Wordn
MergeSort Wordn
Distribute Wordn
Count Wordn
29Organization
30Job as a DAG (Directed Acyclic Graph)
Outputs
Processing vertices
Channels (file, pipe, shared memory)
Inputs
31Scheduling at JM
- A vertex can run on any computer once all its
inputs are ready - Prefers executing a vertex near its inputs
(locality) - Fault tolerance
- If a task fails, run it again
- If tasks inputs are gone, run upstream vertices
again (recursively) - If a task is slow, run another copy elsewhere and
use the output from the faster computation
32Distributed Data-Parallel Computing
- Research problem How to write distributed
data-parallel programs for a compute cluster? - The DryadLINQ programming model
- Sequential, single machine programming
abstraction - Same program runs on single-core, multi-core, or
cluster - Familiar programming languages
- Familiar development environment
33LINQ
- LINQ A language for relational queries
- Language INtegrated Query
- More general than distributed SQL
- Inherits flexible C type system and libraries
- Available in Visual Studio products
- A set of operators to manipulate datasets in .NET
- Support traditional relational operators
- Select, Join, GroupBy, Aggregate, etc.
- Integrated into .NET
- Programs can call operators
- Operators can invoke arbitrary .NET functions
- Data model
- Data elements are strongly typed .NET objects
- More expressive than SQL tables
Is SQL Turing-complete? Is LINQ?
34LINQ Dryad DryadLINQ
CollectionltTgt collection bool IsLegal(Key
k) string Hash(Key) var results from c in
collection where IsLegal(c.key) select new
Hash(c.key), c.value
Vertexcode
Queryplan (Dryad job)
Data
collection
C
C
C
C
results
35DryadLINQ System Architecture
Client
Cluster
Dryad
DryadLINQ
.NET program
Distributedquery plan
Invoke
Query Expr
Input Tables
Query plan
Vertexcode
ToTable
Dryad Execution
Output Table
Results
.Net Objects
Output Tables
foreach
(11)
36Yet Yet Another Word Count
Count word frequency in a set of documents var
docs A collection of documents var words
docs.SelectMany(doc gt doc.words) var groups
words.GroupBy(word gt word) var counts
groups.Select(g gt new WordCount(g.Key,
g.Count()))
37Word Count in DryadLINQ
Count word frequency in a set of documents var
docs DryadLinq.GetTableltDocgt(file//docs.txt)
var words docs.SelectMany(doc gt
doc.words) var groups words.GroupBy(word gt
word) var counts groups.Select(g gt new
WordCount(g.Key, g.Count())) counts.ToDryadTable
(counts.txt)
38Distributed Execution of Word Count
LINQ expression
Dryad execution
IN
SM
DryadLINQ
GB
S
OUT
39DryadLINQ Design
- An optimizing compiler generates the distributed
execution plan - Static optimizations pipelining, eager
aggregation, etc. - Dynamic optimizations data-dependent
partitioning, dynamic aggregation, etc. - Automatic code generation and distribution by
DryadLINQ and Dryad - Generates vertex code that runs on vertices,
channel serialization code, callback code for
runtime optimizations - Automatically distributed to cluster machines
40Summary
- DAG dataflow graph is a powerful computation
model - Language integration enables programmers to
easily use the DAG based computation - Decoupling of Dryad and DryadLINQ
- Dryad execution engine (given DAG, schedule
tasks and handle fault tolerance) - DryadLINQ programming language and tools (given
query, generates DAG)
41 Development
- Works with any LINQ enabled language
- C, VB, F, IronPython,
- Works with multiple storage systems
- NTFS, SQL, Windows Azure, Cosmos DFS
- Released within Microsoft and used on a variety
of applications - External academic release announced at PDC
- DryadLINQ in source, Dryad in binary
- UW, UCSD, Indiana, ETH, Cambridge,
42Advantages of DAG over MapReduce
- Dependence is naturally specified
- MapReduce complex job runs gt1 MR stages
- Tasking overhead
- Reduce tasks of each stage write to replicated
storage - Dryad each job is represented with a DAG
- intermediate vertices written to local file
- Dryad provides a more flexible and general
framework - E.g., multiple types of input/output
43DryadLINQ in the Software Stack
Image Processing
MachineLearning
Graph Analysis
DataMining
Applications
Other Applications
DryadLINQ
Other Languages
Dryad
Cosmos DFS
SQL Servers
Azure Platform
CIFS/NTFS
Cluster Services
Windows Server
Windows Server
Windows Server
Windows Server