Intel - PowerPoint PPT Presentation

About This Presentation
Title:

Intel

Description:

Title: Verdana Bold 30 Author: Intel Corporation Last modified by: Nikolay Kurtov Created Date: 2/3/2006 6:16:10 PM Document presentation format: On-screen Show – PowerPoint PPT presentation

Number of Views:154
Avg rating:3.0/5.0
Slides: 30
Provided by: Intel79
Category:

less

Transcript and Presenter's Notes

Title: Intel


1
Intel Concurrent Collections for C -a model
for parallel programming
  • Nikolay Kurtov
  • email nikolay.kurtov_at_intel.com
  • Software and Services Group
  • October 23, 2008

2
Agenda
  • Existing parallel programming models
  • Key concepts, Blackscholes example
  • Performance results

3
Parallel programming is important
  • Number of multi-core machines is growing
  • Developers want to fully exploit architecture
    capabilities
  • But Parallel programming is hard
  • Users must reason about parallelism
  • Thread synchronization
  • Embedded in serial languages
  • Data Overwriting
  • Arbitrary Serialization
  • Tuning Performance
  • Depends on a platform

4
Parallel Programming Models
  • Improve productivity of programming
  • Hide low-level details
  • Provide high-level abstractions
  • The following models are very popular
  • OpenMP
  • Cilk
  • Intel Threading Building Blocks

5
OpenMP
  • Perfect for Data-parallel algorithms
  • Basics are easy to be applied
  • pragma omp parallel for
  • for (int i 0 i lt N i) doSomething(i)
  • Advanced usage is complicated and error-prone
  • Requires compiler support

6
Cilk
  • The programmer identifyes elements that can
    safely be executed in parallel
  • int fibonacci(int n)
  • if (n lt 2) return n
  • int x cilk_spawn fib(n-1)
  • int y cilk_spawn fib(n-2)
  • cilk_sync
  • return (xy)
  • Explicit spawning of tasks and synchronization
    with barriers

7
Intel Threading Building Blocks
  • Implemented as a C library
  • Requires an excellent knowledge of C
  • Provides excellent high-level abstractions
  • Provides basic parallel algorithms
  • parallel_for
  • parallel_sort
  • parallel_while
  • parallel_reduce
  • parallel_do
  • parallel_scan

8
Existing models - summary
  • The programmer explicitly expresses parallelism
  • Provide an imperative algorithm description
  • Many low-levels questions are solved by the
    programmer
  • Good control over performance

9
Agenda
  • Existing parallel programming models
  • Key concepts, Blackscholes example
  • Performance results

10
Ideal Parallel programming model
  • The application problem
  • Serial code
  • Semantic correctness
  • Intel Concurrent Collections
  • Architecture
  • Actual parallelism
  • Load balancing
  • Distribution among processors

Domain Expert (person) Only domain knowledge No
tuning knowledge
Tuning Expert (person, runtime, static
analysis) No domain knowledge Only tuning
knowledge
11
How people think about their application
Blackscholes A data-parallel application Solves
an equation independently for each parameters set
Solve
Result
Parameters
What are high level operations? What are the
chunks of data? What are the producer/consumer
relationships? What are the inputs and outputs?
12
Intel Concurrent Collections Key Concepts
  • Step a single high-level operation
  • Item a single data element
  • Tag an identifier of a step or an item
  • Inputs/Outputs items or tags produced or
    consumed by the environment

13
Textual Graph Representation
  • // Declarations
  • ltSolveTags int ngt
  • OptionData Parameters int n
  • float Result int n
  • // Step prescription
  • ltSolveTagsgt (Solve)
  • // Step execution
  • Parameters -gt (Solve) -gt Result
  • // Input from the environment
  • // initialize all tags and data
  • env -gt ltSolveTagsgt, Parameters
  • // Output to the environment
  • Result -gt env

14
Graph definition Translator
  • Translates a graph definition into a declaration
    of a class
  • A generated class contains properly named item
    collections, tag collections and step collections
  • Generates a coding hints file a template for
    steps definition
  • Checks correctness of a graph

class blackscholes_graph_t public Graph_t
public ItemCollection_tltOptionDatagt
Parameters ItemCollection_tltfloatgt
Result TagCollection_t SolveTags
StepCollection_t SolveStepCollection
...
15
Tags
  • Items identifiers
  • Items are stored in a graph in an item
    collection
  • Put stores an item, associates it with a tag
  • Get accesses items by a tag
  • Items are immutable
  • Steps identifiers
  • Steps are prescribed by tags
  • Put stores a tag, instantiates prescribed steps
  • The same tag is passed to each instantiated step

16
Specifying Computation
  1. StepReturnValue_t Solve(
  2. Blackscholes_graph_t graph,
  3. const Tag_t step_tag)
  4. OptionData data
  5. graph.Parameters.Get(step_tag)
  6. float result solveEquation(data)
  7. graph.Result.Put(step_tag, result)
  8. return CNC_Success

17
Using the graph in your C application
  1. Blackscholes_graph_t my_graph
  2. for (int i 0 i lt N i)
  3. my_graph.SolveTags.Put(Tag_t(i))
  4. my_graph.Parameters.Put(Tag_t(i), datai)
  5. my_graph.run()
  6. for (int i 0 i lt N i)
  7. float result my_graph.Result.Get(Tag_t(i))
  8. stdcout ltlt result ltlt stdendl

18
Steps Rescheduling
Image Tag k
Block Tag i,j
Split k
Result i, j
Process i, j
Image k
Block i, j
A step may begin execution before its input items
are available It will be rescheduled and started
again from the beginning when the corresponding
item is added to the collection
19
Constraints required by the application
  1. Steps have no side-effects
  2. Steps call Gets before any Puts
  3. Steps call Gets before allocating any memory

20
Benefits from using Intel Concurrent Collections
  • Improves programming productivity
  • Only serial code
  • No knowledge of parallel technologies required
  • Determinism
  • Race-free
  • Portability
  • Scalability
  • Expert-tuning system

21
Summary How to write an application using Intel
Concurrent Collections?
  1. Draw the algorithm on a chalkboard
  2. Define Data structures
  3. Represent the algorithm in the textual notation
  4. Implement high-level operations in C
  5. Instantiate a Graph and run it

22
Agenda
  • Existing parallel programming models
  • Key concepts, Blackscholes example
  • Performance results

23
Blackscholes benchmark
  • Calculations for a single set of parameters are
    less than 500 CPU instructions
  • Steps should be grouped to reduce the overhead
    and improve cache locality
  • Automatic grain selection is an area for future
    research

24
Dedup benchmark
  • Algorithm is a pipeline
  • The last pipeline stage is serial
  • Feature Steps Priorities makes Dedup run 1.4
    times faster

25
Possible model improvements
  • Memory management
  • Garbage collection
  • Automatic grain selection
  • Streaming data input

26
Getting More Information
  • Intel Concurrent Collections for C/C
  • on WhatIf.intel.com
  • http//software.intel.com/en-us/articles/
  • intel-concurrent-collections-for-cc

27
  • Questions Answers

28
  • Thank you!

29
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com