Title: Intel
1Intel Concurrent Collections for C -a model
for parallel programming
- Nikolay Kurtov
- email nikolay.kurtov_at_intel.com
- Software and Services Group
- October 23, 2008
2Agenda
- Existing parallel programming models
- Key concepts, Blackscholes example
- Performance results
3Parallel programming is important
- Number of multi-core machines is growing
- Developers want to fully exploit architecture
capabilities - But Parallel programming is hard
- Users must reason about parallelism
- Thread synchronization
- Embedded in serial languages
- Data Overwriting
- Arbitrary Serialization
- Tuning Performance
- Depends on a platform
4Parallel Programming Models
- Improve productivity of programming
- Hide low-level details
- Provide high-level abstractions
- The following models are very popular
- OpenMP
- Cilk
- Intel Threading Building Blocks
5OpenMP
- Perfect for Data-parallel algorithms
- Basics are easy to be applied
- pragma omp parallel for
- for (int i 0 i lt N i) doSomething(i)
- Advanced usage is complicated and error-prone
- Requires compiler support
6Cilk
- The programmer identifyes elements that can
safely be executed in parallel - int fibonacci(int n)
- if (n lt 2) return n
- int x cilk_spawn fib(n-1)
- int y cilk_spawn fib(n-2)
- cilk_sync
- return (xy)
-
- Explicit spawning of tasks and synchronization
with barriers
7Intel Threading Building Blocks
- Implemented as a C library
- Requires an excellent knowledge of C
- Provides excellent high-level abstractions
- Provides basic parallel algorithms
- parallel_for
- parallel_sort
- parallel_while
- parallel_reduce
- parallel_do
- parallel_scan
8Existing models - summary
- The programmer explicitly expresses parallelism
- Provide an imperative algorithm description
- Many low-levels questions are solved by the
programmer - Good control over performance
9Agenda
- Existing parallel programming models
- Key concepts, Blackscholes example
- Performance results
10Ideal Parallel programming model
- The application problem
- Serial code
- Semantic correctness
- Intel Concurrent Collections
- Architecture
- Actual parallelism
- Load balancing
- Distribution among processors
Domain Expert (person) Only domain knowledge No
tuning knowledge
Tuning Expert (person, runtime, static
analysis) No domain knowledge Only tuning
knowledge
11How people think about their application
Blackscholes A data-parallel application Solves
an equation independently for each parameters set
Solve
Result
Parameters
What are high level operations? What are the
chunks of data? What are the producer/consumer
relationships? What are the inputs and outputs?
12Intel Concurrent Collections Key Concepts
- Step a single high-level operation
- Item a single data element
- Tag an identifier of a step or an item
- Inputs/Outputs items or tags produced or
consumed by the environment
13Textual Graph Representation
- // Declarations
- ltSolveTags int ngt
- OptionData Parameters int n
- float Result int n
- // Step prescription
- ltSolveTagsgt (Solve)
- // Step execution
- Parameters -gt (Solve) -gt Result
- // Input from the environment
- // initialize all tags and data
- env -gt ltSolveTagsgt, Parameters
- // Output to the environment
- Result -gt env
14Graph definition Translator
- Translates a graph definition into a declaration
of a class - A generated class contains properly named item
collections, tag collections and step collections - Generates a coding hints file a template for
steps definition - Checks correctness of a graph
class blackscholes_graph_t public Graph_t
public ItemCollection_tltOptionDatagt
Parameters ItemCollection_tltfloatgt
Result TagCollection_t SolveTags
StepCollection_t SolveStepCollection
...
15Tags
- Items identifiers
- Items are stored in a graph in an item
collection - Put stores an item, associates it with a tag
- Get accesses items by a tag
- Items are immutable
- Steps identifiers
- Steps are prescribed by tags
- Put stores a tag, instantiates prescribed steps
- The same tag is passed to each instantiated step
16Specifying Computation
- StepReturnValue_t Solve(
- Blackscholes_graph_t graph,
- const Tag_t step_tag)
-
- OptionData data
- graph.Parameters.Get(step_tag)
- float result solveEquation(data)
- graph.Result.Put(step_tag, result)
- return CNC_Success
17Using the graph in your C application
- Blackscholes_graph_t my_graph
- for (int i 0 i lt N i)
- my_graph.SolveTags.Put(Tag_t(i))
- my_graph.Parameters.Put(Tag_t(i), datai)
-
- my_graph.run()
- for (int i 0 i lt N i)
- float result my_graph.Result.Get(Tag_t(i))
- stdcout ltlt result ltlt stdendl
18Steps Rescheduling
Image Tag k
Block Tag i,j
Split k
Result i, j
Process i, j
Image k
Block i, j
A step may begin execution before its input items
are available It will be rescheduled and started
again from the beginning when the corresponding
item is added to the collection
19Constraints required by the application
- Steps have no side-effects
- Steps call Gets before any Puts
- Steps call Gets before allocating any memory
20Benefits from using Intel Concurrent Collections
- Improves programming productivity
- Only serial code
- No knowledge of parallel technologies required
- Determinism
- Race-free
- Portability
- Scalability
- Expert-tuning system
21Summary How to write an application using Intel
Concurrent Collections?
- Draw the algorithm on a chalkboard
- Define Data structures
- Represent the algorithm in the textual notation
- Implement high-level operations in C
- Instantiate a Graph and run it
22Agenda
- Existing parallel programming models
- Key concepts, Blackscholes example
- Performance results
23Blackscholes benchmark
- Calculations for a single set of parameters are
less than 500 CPU instructions - Steps should be grouped to reduce the overhead
and improve cache locality - Automatic grain selection is an area for future
research
24Dedup benchmark
- Algorithm is a pipeline
- The last pipeline stage is serial
- Feature Steps Priorities makes Dedup run 1.4
times faster
25Possible model improvements
- Memory management
- Garbage collection
- Automatic grain selection
- Streaming data input
26Getting More Information
- Intel Concurrent Collections for C/C
- on WhatIf.intel.com
- http//software.intel.com/en-us/articles/
- intel-concurrent-collections-for-cc
27 28 29(No Transcript)