Title: Parallel Image Processing
1Parallel Image Processing
- Programming and Architecture
IST PhD Lunch Seminar
Wouter Caarls
Quantitative Imaging Group
2Why Parallel?
- Processing time
- Smaller timesteps, more scales, faster response
times - Memory
- Larger images, more dimensions
- Energy consumption
- More applications, smaller devices
3Data parallelism
- Many image processing operations have locality
of reference (segmentation, filtering, distance
transforms, etc.) - Data parallelism
4Task farm parallelism
- An application consists of many different
operations - Some of these operations are independent (scale
spaces, parameter sweeps, noise realizations,
etc.) - Task farm parallelism
5Pipeline parallelism
- An image processing algorithm consists of
consecutive stages - If multiple objects are to be processed, they
may be in different stages at the same time - Pipeline parallelism
6Parallel hardware architecturesFine grained
- Irregular
- Superscalar (most modern microprocessors)
- VLIW (DSPs)
- Regular
- Vector (supercomputers, MMX)
- SIMD (graphics processors)
- Custom
- FPGA
7Parallel hardware architecturesCoarse grained
- Homogeneous
- Multi-core, SMP
- Cluster
- Heterogeneous
- Embedded systems
- Grid
8Obstacles
- Programming
- Synchronization, bookkeeping
- Different systems, languages, optimization
strategies - Choosing an architecture
- Analyze program before it is written
- Additional requirements or unexpected performance
may require rewrite
9Architecture-independent parallel programming
- Data parallelism
- Differentiate between synchronization pattern and
computation - Library provides pattern, user provides
computation - Task farm pipeline parallelism
- Operations do not work on images, but on streams
- Sequences of operation calls do not imply an
order, but a stream graph.
10Algorithmic Skeletons
11Example skeletons
- Pixel
- Neighbourhood
- Recursive neighbourhood
- Stack
- Filter
- Associative reduction
12Constructing stream graphs
capture
normalize
- By program (dynamic)
- capture(orig)
- normalize(orig, norm)
- dx(orig, x_der, 1.0)
- dy(orig, y_der, 1.0)
- direction(x_der, y_der, dir)
- display(dir)
- Visually (static)
dx
dy
direction
display
13Mapping stream graphs to processors
14Dealing with heterogeneous tasks
15Dealing with interconnect
16Dealing with dependencies
17Choosing an architecture automatically
- Architecture-independent program allows automatic
analyis after it is written, but before an
architecture is chosen - Based on certain constraints, architecture can be
chosen automatically to optimize some cost
function. - Tradeoff between cost, power and performance must
be made by the designer
18Design Space Exploration
Archi- tecture
Explore
Program
Metrics
Analyze
19Search strategyConstrained single objective
20Search strategyMultiobjective tradeoff iteration
21Search strategyStrength Pareto
22Conclusions
- Architecture-independent programming allows
- Parallel programming without bookkeeping
- Targeting heterogeneous systems
- Choosing the most appropriate architecture
automatically - http//www.qi.tnw.tudelft.nl/wcaarls/smartcam
23Overview
- Parallelism in image processing
- Parallel hardware architectures
- Architecture-independent parallel programming
- Algorithmic skeletons
- Stream programming
- Choosing an appropriate architecture
- Design Space Exploration
24Exploiting parallelismFine grained, irregular
- Superscalar
- Dataflow dispatch reorder
- Most modern microprocessors
- Automatic by processor
- Very Long Instruction Word
- Multiple instructions per word
- DSPs, Itanium
- Automatic by compiler
25Exploiting parallelismFine grained, regular
- Vector instructions
- Supercomputers
- MMX/SSEx
- Special instructions/datatypes
- Single Instruction Multiple Data
- Graphics processors
- Special languages
26Exploiting parallelismCoarse grained
- Multiprocessing
- Multiple processors/cores sharing a memory
- Shared-memory threading libraries (pthread,
OpenMP) - Clusters
- Relatively loosely coupled systems connected by a
network - Message-passing libraries (MPI)
- Heterogeneous systems
- Exploit differences in algorithmic requirements
- Multiple paradigms in a single application