Title: Parallel and Distributed Programming Models and Languages
1Parallel and Distributed ProgrammingModels and
Languages
- 15-740/18-740 Computer Architecture
- In-Class Discussion
- Dong Zhou
- Kun Li
- Mike Ralph
2Why distributed computations?
- Buzzword Big Data
- Take sorting as an example
- Amount of data that can be sorted in 60 seconds
- One computer can read 60 MB/sec from one disk
- 2012 world record
- Flat Datacenter Storage by Ed Nightingale et.al
- 1470 GB
- 256 heterogeneous nodes, 1033 disks
- Google indexes 100 billion web pages
3Solution use many nodes
- Grid computing
- Hundreds of supercomputers connected by high
speed net - Cluster computing
- Thousands or tens of thousands of PCs connected
by high speed LANS - 1000 nodes potentially give 1000x speedup
4Distributed computations are difficult to program
- Sending data to/from nodes
- Coordinating among nodes
- Recovering from node failure
- Optimizing for locality
- Debugging
5MapReduce
- A programming model for large-scale computations
- Process large amounts of input, produce output
- No side-effects or persistent state
- MapReduce is implemented as a runtime library
- Automatic parallelization
- Load balancing
- Locality optimization
- Handling of machine failures
6MapReduce design
- Input data is partitioned into M splits
- Map extract information on each split
- Each map produces R partitions
- Shuffle and sort
- Bring M partitions to the same reducer
- Reduce aggregate, summarize, filter or transform
- Output is in R result files
7More specifically
- Programmer specifies two methods
- map(k, v) ? ltk', v'gt
- reduce(k', ltv'gt) ? ltk'', v''gt
- All v' with same k' are reduced together
- Usually also specify
- partition(k', total partitions) ? partition for
k - often a simple hash of the key
8Runtime
9MapReduce is widely applicable
- Distributed grep
- Distributed clustering
- Web link graph reversal
- Detecting approx. duplicate web pages
10Dryad
- Similar goals as MapReduce
- Focus on throughput, not latency
- Automatic management of scheduling, distribution,
fault tolerance - Computations expressed as a graph
- Vertices are computations
- Edges are communication channels
- Each vertex has several input and output edges
11Why using a dataflow graph?
- Many programs can be represented as a distributed
dataflow graph - The programmer may not have to know this
- SQL-like queries LINQ
- Dryad will run them for you
-
12 Runtime
- Vertices (V) run arbitrary app code
- Vertices exchange data through
- files, TCP pipes etc.
- Vertices communicate with JM to report
- status
- Daemon process (D)
- executes vertices
- Job Manager (JM) consults name server(NS)
- to discover available machines.
- JM maintains job graph and schedules vertices
13Job Directed Acyclic Graph
Outputs
Processing vertices
Channels (file, pipe, shared memory)
Inputs
14Advantages of DAG over MapReduce
- Big jobs more efficient with Dryad
- MapReduce big jobs runs gt 1 MR stages
- Reducers of each stage write to replicated
storage - Output of reduce 2 network copies, 3 disks
- Dryad each job is represented with a DAG
- Intermediate vertices write to local file
15Pig Latin
- High-level procedural abstraction of MapReduce
- Contains SQL-like primitives
- Example
- good_urls FILTER urls BY pagerank gt 0.2
- groups GROUP good_urls BY category
- big_groups FILTER groups BY COUNT(good_urls)gt106
- Output FOREACH big_groups GENERATE category,
AVG(good_urls.pagerank) - Plus user-defined functions (UDFs)
16Value
- Reduces development time
- Procedural vs. declarative
- Overhead/performance costs worthwhile?
17Green-Marl
- High-level graph analysis language/compiler
- Uses basic data types and graph primitives
- Built-in graph function
- BFS, RBFS, DFS
- Uses domain specific optimizations
- Both non-architecture and architecture specific
- Compiler translates Green-Marl to other
high-level language (ex. C)
18Tradeoffs
- Achieve speedup over hand-tuned parallel
equivalents - Tested only on single workstation
- Only works with graph representations
- Difficulty representing certain data sets and
computations - Domain specific vs. general purpose languages
- Future work for more architectures, user-defined
data structures
19Questions and Discussion
20Example count word frequencies in web page
- Input is files with one doc per record
- Map parses document into words
- key document URL
- value document contents
- Output of map
"to", "1" "be", "1" "or", "1" "not", "1" "to",
"1" "be", "1"
"doc1", "to be or not to be"
21Example count word frequencies in web page
- Reduce computes sum for a key
- Output of reduce saved
key "be" values "1", "1"
key "not" values "1"
key "or" values "1"
key "to" values "1", "1"
"2"
"1"
"2"
"2"
"to", "2" "be", "2" "or", "1" "not", "1"
22Example Pseudo-code
- Map(String input_key, String input_value)
- //input_key document name
- //input_value document contents
- for each word w in input_values
- EmitIntermediate(w, "1")
- Reduce(String key, Iterator intermediate_values)
- //key a word, same for input and output
- //intermediate_values a list of counts
- int result 0
- for each v in inermediate_values
- result ParseInt(v)
- Emit(AsString(result))