Title: What is Parallel Algorithms?
1Introduction
- What is Parallel Algorithms?
- Why Parallel Algorithms?
- Evolution and Convergence of Parallel Algorithms
- Fundamental Design Issues
2What is Parallel Algorithm?
- An algorithm which uses multiple components of
hardware and software. - Level of parallelism
- Circuits -- VLSI Arithmetic
- Functional Units -- Instructional level
- Processors
- Processes
- Tasks (Job)
- Data
- Closely related to parallel architectures and
computation model
3What is Parallel Architecture?
- A parallel computer is a collection of processing
elements that cooperate to solve problems fast - Issues
- Resource
- how large a collection?
- how powerful are the processing elements?
- Communication and Synchronization
- how do the elements cooperate and communicate?
- how are data transmitted between processors?
- what are the abstractions and primitives for
cooperation? - Performance and Scalability
- how does it all translate into performance?
- how does it scale?
4Inevitability of Parallel Computing
- Application demands Demands for computing
cycles - Scientific computing CFD, Biology, Chemistry,
Physics, ... - General-purpose computing Video, Graphics, CAD,
Databases, TP... - Technology Trends
- Number of transistors on chip growing rapidly
- Clock rates expected to go up only slowly
- Architecture Trends
- Instruction-level parallelism valuable but
limited - Coarser-level parallelism, as in MPs, the most
viable approach - Economics
- Current trends
- Todays microprocessors have multiprocessor
support - Servers and workstations becoming MP Sun, SGI,
DEC, COMPAQ!... - Tomorrows microprocessors are multiprocessors
5Driving Force of Parallel Architectures
- Application Trends
- Applications provide wish lists
- (Grand Challenge Problems)
- Weather Simulation, Data Mining, N-body
- Some problems require terra (1012) to Peta flops
(1015) - VLSI Technology Trends
- Smaller minimum feature size (transistor size)
- Increasing Die Size gt Transistor count double
every 3 year - Architectural Trends
- Trends in Computer Performance
- Annual rate of 1.35 before 1985, 1.85 after
1986 (RISC) - Programming Model Convergence
- Economics
6Application Trends
- Transition to parallel computing has occurred for
scientific and engineering computing - In rapid progress in commercial computing
- Database and transactions as well as financial
- Usually smaller-scale, but large-scale systems
also used - Desktop also uses multithreaded programs, which
are a lot like parallel programs - Demand for improving throughput on sequential
workloads - Greatest use of small-scale multiprocessors
- Solid application demand exists and will increase
7General Technology Trends
- Microprocessor performance increases 50 - 100
per year - Transistor count doubles every 3 years
- DRAM size quadruples every 3 years
- Huge investment per generation is carried by huge
commodity market - Not that single-processor performance is
plateauing, but that parallelism is a natural way
to improve it.
180
160
140
DEC
120
alpha
Integer
FP
100
IBM
HP 9000
80
RS6000
750
60
540
MIPS
MIPS
40
M2000
Sun 4
M/120
20
260
0
1987
1988
1989
1990
1991
1992
8Technology A Closer Look
- Basic advance is decreasing feature size ( ??)
- Circuits become either faster or lower in power
- Die size is growing too
- Clock rate improves roughly proportional to
improvement in ? - Number of transistors improves like ????(or
faster) - Performance gt 100x per decade clock rate 10x,
rest transistor count - How to use more transistors?
- Parallelism in processing
- multiple operations per cycle reduces CPI
- Locality in data access
- avoids latency and reduces CPI
- also improves processor utilization
- Both need resources, so tradeoff
- Fundamental issue is resource distribution, as in
uniprocessors
9Economics
- Commodity microprocessors not only fast but CHEAP
- Development cost is tens of millions of dollars
(5-100 typical) - BUT, many more are sold compared to
supercomputers - Crucial to take advantage of the investment, and
use the commodity building block - Exotic parallel architectures no more than
special-purpose - Multiprocessors being pushed by software vendors
(e.g. database) as well as hardware vendors - Standardization by Intel makes small, bus-based
SMPs commodity - Desktop few smaller processors versus one larger
one? - Multiprocessor on a chip
10Challenge of Parallel Processing
Micro-processors, Memory are cheap. Connect
bunches of them together. Use either
aynchonously or syncronously. Example
each borad 100 of 10MFLOP micro-processor
each system 100 Board
Speed 100GFLOP Reason Cache
Coherancy Extra Design Time for Bus,
interface Compiler may not find
parallelism easily Communication and I/O
complexity Economics Amdahl's Law
Writing Parallel Programs not easy
11Amdahl's Law Speedup S is bounded by
1 S lt
------------- f (1-f)/p
p number of PEs, f fraction
of serial part. Example if 10 of a
program must be sequentially, maximum speed-up
is 10 To achieve S20 for p100, 99.75
of codes should be parallel How to overcome the
law? Increase problem size Scaled
speed-up. Develop parallel algorithm
maximizing the parallelism and minimizing the
sequentail operations. How
to Solve Communication Overhead Increase
Communication bandwith Decrease
Communication Latency Communication
Latency Hiding Increase Granularity
(Figure 8.4 of Henessey's book)