A Map-Reduce System with an Alternate API for Multi-Core Environments - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

A Map-Reduce System with an Alternate API for Multi-Core Environments

Description:

A Map-Reduce System with an Alternate API for Multi-Core Environments Presented by Wei Jiang * * * * A Case Study: Apriori Implementation Considerations Data ... – PowerPoint PPT presentation

Number of Views:80
Avg rating:3.0/5.0
Slides: 29
Provided by: halcy7
Category:

less

Transcript and Presenter's Notes

Title: A Map-Reduce System with an Alternate API for Multi-Core Environments


1
A Map-Reduce System with an Alternate API for
Multi-Core Environments
  • Presented by Wei Jiang

May 17, 2014
1
May 17, 2014
1
2
Outline
  • Background
  • MapReduce
  • Generalized Reduction
  • System Design and Implementation
  • Experiments
  • Related Work
  • Conclusions

May 17, 2014
2
May 17, 2014
2
3
Background
  • We have evaluated FREERIDE and Hadoop MapReduce
    based on a set of applications
  • Phoenix is one of the implementations of
    MapReduce for shared-memory systems, written in
    C, of small code size
  • We also want to make FREERIDE smaller and
    light-weighted

May 17, 2014
3
May 17, 2014
3
4
Googles MapReduce Engine
May 17, 2014
4
May 17, 2014
4
5
Phoenix implementation
  • It is based on the same principles but targets
    shared-memory systems
  • Consists of a simple API that is visible to
    application programmers
  • An efficient runtime that handles
    parallelization, resource management, and fault
    recovery

May 17, 2014
5
6
Phoenix runtime
May 17, 2014
6
7
Generalized Reduction
  • Processing structures

May 17, 2014
7
8
A Case Study Apriori
May 17, 2014
8
9
A Case Study Apriori
10
System Design and Implementation
  • Basic dataflow of MATE (MapReduce with AlternaTE
    API)
  • Data structures to communicate between the user
    code and the runtime
  • Three sets of functions in MATE
  • Example, how to write a user application

May 17, 2014
10
11
MATE runtime dataflow
  • Basic one-stage dataflow

May 17, 2014
11
12
Data structures-(1)
  • scheduler_args_t Basic fields

Field Description
Input_data Input data pointer
Data_size Input dataset size
Data_type Input data type
Stage_num Computation-Stage number
Splitter Pointer to Splitter function
Reduction Pointer to Reduction function
Finalize Pointer to Finalize function
May 17, 2014
12
13
Data structures-(2)
  • scheduler_args_t Optional fields for performance
    tuning

Field Description
Unit_size of bytes for one element
L1_cache_size of bytes for L1 data cache size
Model Shared-memory parallelization model
Num_reduction_workers Max of threads for reduction workers(threads)
Num_procs Max of processor cores used
May 17, 2014
13
14
Functions-(1)
  • Transparent to users

Function Description R/O
static inline void schedule_tasks(thread_wrapper_arg_t ) R
static void combination_worker(void ) R
static int array_splitter(void , int, reduction_args_t ) R
void clone_reduction_object(int num) R
static inline int isCpuAvailable(unsigned long, int) R
May 17, 2014
14
15
Functions-(2)
  • APIs provided by the runtime

Function Description R/O
int mate_init(scheudler_args_t args) R
int mate_scheduler(void args) R
int mate_finalize(void args) O
void reduction_object_pre_init() R
int reduction_object_alloc(int size)return the object id R
void reduction_object_post_init() R
void accumulate/maximal/minimal(int id, int offset, void value) O
void reuse_reduction_object() O
void get_intermediate_result(int iter, int id, int offset) O
May 17, 2014
15
16
Functions-(3)
  • APIs defined by the user

Function Decription R/O
int (splitter_t)(void , int, reduction_args_t ) O
void (reduction_t)(reduction_args_t ) R
Void (combination_t)(void) O
void (finalize_t)(void ) O
May 17, 2014
16
17
Implementation Considerations
  • Data partitioning dynamically assigns splits to
    worker threads
  • Buffer management two temporary buffers, one for
    reduction objects, the other for combination
    results
  • Fault tolerance re-executes failed tasks
    checkingpoint may be a better solution

18
What is in the user code ?
  • Implements necessary functions such as reduction,
    splitter, finalize, and etc.
  • Generates the input dataset
  • Setups the fields in scheduler_args_t
  • Initializes the middleware and declare reduction
    object(s)
  • Executes reduction tasks by calling
    mate_scheduler(one or more passes)
  • Maybe does some finalizing work

May 17, 2014
18
19
K-means user code
  • int main (int argc, char argv)
  • parse_args()
  • generate_points()
  • generate_means()
  • mate_init(scheduler_args_t)
  • reduction_object_pre_init()
  • while(needed) reduction_object_alloc(size)
  • reduction_object_post_init()
  • while(not finished)
  • mate_scheduler()
  • update_means()
  • reuse_reduction_object()
  • process_next_iteration()
  • mate_finalize()

May 17, 2014
19
20
Experiments K-means
  • K-means 400MB, 3-dim points, k 100 on one WCI
    node with 8 cores

May 17, 2014
20
21
Experiments K-means
  • K-means 400MB, 3-dim points, k 100 on one AMD
    node with 16 cores

May 17, 2014
21
22
Experiments PCA
  • PCA 8000 1024 matrix, on one WCI node with 8
    cores

May 17, 2014
22
23
Experiments PCA
  • PCA 8000 1024 matrix, on one AMD node with 16
    cores

May 17, 2014
23
24
Experiments Apriori
  • Apriori 1,000,000 transactions, 3 support, on
    one WCI node with 8 cores

May 17, 2014
24
25
Experiments Apriori
  • Apriori 1,000,000 transactions, 3 support, on
    one AMD node with 16 cores

May 17, 2014
25
26
Related Work
  • Improves MapReduces API or implementations
  • Evaluates MapReduce across different platforms
    and application domains
  • Acadamia CGL-MapReduce, Mars, MITHRA, Phoenix,
    Disco
  • Industry Facebook (Hive), Yahoo! (Pig Latin,
    Map-Reduce-Merge), Google (Sawzall), Microsoft
    (Dryad)

May 17, 2014
26
27
Conclusions
  • MapReduce is simple and robust in expressing
    parallelism
  • Two-stage computation style may cause
  • performance losses for some subclasses of
    applications in data-intensive computing
  • MATE provides an alternate API that is based on
    generalized reduction
  • This variation can reduce overheads of data
    management and communication between Map and
    Reduce

May 17, 2014
27
28
Questions?
May 17, 2014
28
May 17, 2014
28
Write a Comment
User Comments (0)
About PowerShow.com