Title: Lecture 1: Introduction
1Lecture 1 Introduction
2Why Parallel Programming
- Faster Computation
- Solve compute-intensive problems faster
- Make infeasible problems feasible
- Reduce Design Time
- Solve larger problems in same amount of time
- Improve answers precision
- Reduce Design Time
3Definitions
- Parallel Computing Use of parallel computers to
reduce the time needed to solve a single
computational problem
- Parallel Computer A multi-processor computer
system supporting parallel programming
- Multicomputer Parallel computer with multiple
computers and - interconnecting networks
- Centralized Multiprocessor (Symmetrical
Multiprocessor SMP) - All CPUs share access to a single global memory
- Parallel Programming Programming in a language
that supports concurrency explicitly
4Parallel Programming Libraries
- MPI (Message Passing Interface)
- Standard specification for message-passing
libraries - Libraries available on virtually all parallel
computers - Free libraries also available for networks of
workstations or commodity clusters - Open MP
- Application programming interface (API) for
shared-memory systems - Supports higher performance parallel programming
of symmetrical multiprocessors - Hybird MPI/OpenMP
5Classical Science vs Modern Methods
Test car models Simulation of Earthquakes and
Volcanoes Protein Folding Climate Change
Nature
Observation
Physical Experimentation
Theory
6Evolution of Supercomputers
- 1906 Lee De Forest Electronic Valve
- 1936 Z1 Konrad Zuse calculations for Henschel
Aircraft Company
- 1943 Church Turing Thesis Alan Turing and Alonzo
Church
"I think there is a world market for maybe five
computers.", Thomas Watson, chairman of IBM.
- 1944 Harvard Mark 1 Howard Aiken and Grace
Hopper gunnery and ballistic calculations
- 1946 ENIAC I John Mauchly and J Presper Eckert
used for writing artillery-firing tables
- 1947 First Transistor (William B. Shockley, John
Bardeen and Walter H. Brattain), Magnetic Drum
Storage
- 1949-52 EDVAC von Neumann First Magnetic Tape
"Computers in the future may weigh no more than
1.5 tons. Popular Mechanics
- 1950 Alan Turing Test of Machine Intelligence
- 1954 IBM 650 first mass-produced computers
"I have traveled the length and breadth of this
country and talked with the best people, and I
can assure you that data processing is a fad that
won't last out the year. The editor in charge of
business books for Prentice Hall.
7- 1958 Jack Kilby and Robert Noyce Integrated
Circuit
- 1965 CDC 6600 Seymour Cray (first
supercomputer)
- 1970 Unix Dennis Ritchie and Kenneth Thompson
- 1971 First Microprocessor developed by Intel
- 1976 Crayl first commercially developed
supercomputers Seymour Cray
"There is no reason anyone would want a computer
in their home." Ken Olson, Digital Equipment Corp.
- 1978 8086 by Intel, first PC First Video Game
- 1981 Cosmic Cube Charles Seitz and Geoffery Fox
- 1985 Microsoft Windows released
- 1986 Connection Machine, Thinking Machine
Corporation parallel processing introduced
8- 1989 World Wide Web Tim Berners-Lee
""Windows NT addresses 2 Gigabytes of RAM which
is more than any application will ever need".
Microsoft
- 1994 Beowulf Thomas Sterling and Don Becker
NASAs Goddard Space Flight Center
- 1997-2000 ASCI Red, ASCI Blue Pacific, ASCI
White IBM
- 2002 Earth Simulator NASDA, JAERI, and JAMSTEC
- 2005 Blue Gene IBM, MHD, ITER(Nuclear Fusion)
960 Years of Speed Increases
One Billion Times Faster!
10CPUs 1 Million Times Faster
- Moores Law(1965)--the number of transistors on
an integrated circuit (computing power) doubles
every 24 months - Faster clock speeds
- Greater system concurrency
- Multiple functional units
- Concurrent instruction execution
- Speculative instruction execution
11Systems 1 Billion Times Faster
- Processors are 1 million (106) times faster
- Combine thousands of processors
- Parallel computer
- Multiple processors
- Supports parallel programming
- Parallel computing Using a parallel computer to
execute a program faster
12Copy-cat Strategy
- Microprocessor
- 1 speed of supercomputer
- 0.1 cost of supercomputer
- Parallel computer 1000 microprocessors
- 10 x speed of supercomputer
- Same cost as supercomputer
13Why Didnt Everybody Buy One?
- Supercomputer ? ? CPUs
- Computation rate ? throughput
- Inadequate I/O putation rate ? throughput
- Software
- Inadequate operating systems
- Inadequate programming environments
14Beowulf Concept
- NASA (Sterling and Becker)
- Commodity processors
- Commodity interconnect
- Linux operating system
- Message Passing Interface (MPI) library
- High performance/ for certain applications
15Concurrency Leads to Parallelism
- Identify concurrent operations
- Data dependency graphs
- Directed graph
- Vertices tasks
- Edges dependencies
16Three Examples
- Divide the numbers into k processors
- Compute sequential products on elements in each
processor - Multiply the results from each processor
17Three Examples
- Find the roots of ax2bxc0
1. v12a 2. v24a 3. v3bb
4. v4v2c
5. v5v3-v4
6. v6sqrt(v5)
7. v7v6-b 8. v8-v6-b
18Three Examples
- For k2 to n
- Calculate (k-1)!
- Send output to next unit
- Multiply by k
19Different Forms of Parallelism
- Data Parallelism
- Independent tasks apply same operation to
different elements of a data set - Functional Parallelism
- Independent tasks apply different operations to
different data elements - Pipelining
- Divide into several stages
20Example Data Clustering
- Data mining looking for meaningful patterns in
large data sets - Data clustering organizing a data set into
clusters of similar items - Data clustering can speed retrieval of related
items
21Document Vectors
Moon
The Geology of Moon Rocks
The Story of Apollo 11
A Biography of Jules Verne
Alice in Wonderland
Rocket
22Document Clustering
23Data Clustering
- Input N documents
- For each N documents, generate a document vector
- Choose K initial clusters
- Repeat
- For each document find closest center and compute
the performance function - Adjust K clusters to improve value of
performance function - Output K centers
24Data Dependence Graph
Input document 2
Input document N
Input document 1
Build document vector N
Build document vector 2
Build document vector 1
Choose cluster center 1
Output Clusters
Database
Choose cluster center 2
Choose cluster center K
Find closest center to vector N calculate
function
Find closest center to vector 1 calculate
function
Find closest center to vector 0 calculate
function
Adjust centers
25Opportunities for Parallelism
- Data Parallelism
- Input Documents
- Generating document vectors
- Picking initial values of cluster centers
- Finding closest center to each vector
- Functional Parallelism
- Generating document vectors
- Picking initial values of the clusters
26Programming Parallel Computers
- Extend compilers translate sequential programs
into parallel programs - Extend languages add parallel operations
- Add parallel language layer on top of sequential
language - Define totally new parallel language and compiler
system
27Extend Compilers
- Parallelizing compiler
- Detect parallelism in sequential program
- Produce parallel executable program
- Focus on making Fortran programs parallel
28Extend Compilers
- Can leverage millions of lines of existing
serial programs - Saves time and labor
- Requires no retraining of programmers
- Sequential programming easier than parallel
programming
- Parallelism may be irretrievably lost when
programs written - in sequential languages
- Performance of parallelizing compilers on broad
range of - applications still up in air
29Extend Language
- Add functions to a sequential language
- Create and terminate processes
- Synchronize processes
- Allow processes to communicate
- Ways to distinguish between public data and
private data
30Extend Language
- Easiest, quickest, and least expensive
- Allows existing compiler technology to be
leveraged - New libraries can be ready soon after new
parallel computers are - available
- Flexibility for code development
- Lack of compiler support to catch errors
- Easy to write programs that are difficult to
debug - Extensions may not be sufficient to express all
parallelism in the - application
- Takes longer to compile
31Add a Parallel Programming Layer
- Lower layer (Sequential Part)
- Core of computation
- Process manipulates its portion of data to
produce its portion of result - Upper layer (Parallel Part)
- Creation and synchronization of processes
- Partitioning of data among processes
- Compiler support for translating the two-layered
program - A few research prototypes have been built based
on these principles
32Create a Parallel Language
- Develop a parallel language from scratch
- Supports sequential and parallel execution
- Automatic communication and synchronization
- Add parallel constructs to an existing language
- Manipulation of multidimensional arrays
- Compiler directives to specify data mapping
- Examples Fortran 90, High Performance Fortran,
C
33New Parallel Languages
- Allows programmer to communicate parallelism to
compiler - Improves probability that executable will
achieve high - performance
- Requires development of new compilers
- New languages may not become standards
- Programmer resistance
34Current Status
- Low-level approach is most popular
- Augment existing language with low-level parallel
constructs - MPI and OpenMP are examples
- Advantages of low-level approach
- Efficiency
- Portability
- Disadvantage More difficult to program and debug
35Websites History of Computing
- l
- http//inventors.about.com/library/blcoindex.htm?o
ncetrue
- http//trillian.randomstuff.org.uk/stephen/hist
ory/timeline-INDEX.html