SIMD and Associative Computing - PowerPoint PPT Presentation

About This Presentation
Title:

SIMD and Associative Computing

Description:

SIMD and Associative Computing Computational Models and Algorithms * An Example Step 4 D E H I F C G B A 8 6 5 3 3 2 2 2 1 6 1 4 2 4 7 Add the node with the ... – PowerPoint PPT presentation

Number of Views:104
Avg rating:3.0/5.0
Slides: 95
Provided by: Ober82
Learn more at: https://www.cs.kent.edu
Category:

less

Transcript and Presenter's Notes

Title: SIMD and Associative Computing


1
SIMD and Associative Computing
  • Computational Models and Algorithms

2
Associative Computing Topics
  • Introduction
  • References
  • SIMD computing Architecture
  • Motivation for the MASC model
  • The MASC and ASC Models
  • A Language Designed for the ASC Model
  • List of Algorithms and Programs designed for ASC
  • An ASC Algorithm Examples
  • ASC version of Prims MST Algorithm

3
Comment on Slides Included
  • Some of these slides will be covered only lightly
    or else left for students to read.
  • The emphasis here is to provide an introduction
    to material covered, not a deep understanding.
  • Inclusion of these slides will provide a better
    survey of this material.
  • This material is a useful background for the Air
    Traffic Control example and projects we expect to
    use in this course.

4
Associative Computing References
  • Note Below KSU papers are available on the
    website http//www.cs.kent.edu/parallel/
  • (Click on the link to papers)
  • Maher Atwah, Johnnie Baker, and Selim Akl, An
    Associative Implementation of Classical Convex
    Hull Algorithms, Proc of the IASTED International
    Conference on Parallel and Distributed Computing
    and Systems, 1996, 435-438
  • Mingxian Jin, Johnnie Baker, and Kenneth Batcher,
    Timings for Associative Operations on the MASC
    Model, Proc. of the 15th International Parallel
    and Distributed Processing Symposium, (Workshop
    on Massively Parallel Processing, San Francisco,
    April 2001.

5
Associative Computing References
  1. Jerry Potter, Johnnie Baker, Stephen Scott,
    Arvind Bansal, Chokchai Leangsuksun, and Chandra
    Asthagiri, An Associative Computing Paradigm,
    Special Issue on Associative Processing, IEEE
    Computer, 27(11)19-25, Nov. 1994. (Note MASC
    is called ASC in this article.)
  2. Jerry Potter, Associative Computing - A
    Programming Paradigm for Massively Parallel
    Computers, Plenum Publishing Company, 1992.

6
SIMD slides from Chapter 2
7
Alternate Names for SIMDs
  • Recall that all active processors of a true SIMD
    computer must simultaneously access the same
    memory location.
  • The value in the i-th processor can be viewed as
    the i-th component of a vector.
  • SIMD machines are sometimes called vector
    computers Jordan,et.al. or processor arrays
    Quinn 94,04 based on their ability to execute
    vector and matrix operations efficiently.

8
SIMD Architecture
  • Has only one control unit.
  • Scientific applications have data parallelism

9
Data/instruction Storage
  • Front end computer
  • Also called the control unit
  • Holds and runs program
  • Data manipulated sequentially
  • Processor array
  • Data manipulated in parallel

10
Processor Array Performance
  • Performance work done per time unit
  • Performance of processor array
  • Speed of processing elements
  • Utilization of processing elements

11
Performance Example 1
  • 1024 processors
  • Each adds a pair of integers in 1 ?sec (1
    microsecond or one millionth of second or 10-6
    second.)
  • What is the performance when adding two
    1024-element vectors (one per processor)?

12
Performance Example 2
  • 512 processors
  • Each adds two integers in 1 ?sec
  • What is the performance when adding two vectors
    of length 600?
  • Since 600 gt 512, 88 processor must add two pairs
    of integers.
  • The other 424 processors add only a single pair
    of integers.

13
Example of a 2-D Processor Interconnection
Network in a Processor Array
Each VLSI chip has 16 processing elements. Each
PE can simultaneously send a value to a neighbor.
PE processor element
14
SIMD Execution Style
  • The traditional (SIMD, vector, processor array)
    execution style (Quinn 94, pg 62, Quinn 2004,
    pgs 37-43
  • The sequential processor that broadcasts the
    commands to the rest of the processors is called
    the front end or control unit (or sometimes
    host).
  • The front end is a general purpose CPU that
    stores the program and the data that is not
    manipulated in parallel.
  • The front end normally executes the sequential
    portions of the program.
  • Each processing element has a local memory that
    can not be directly accessed by the control unit
    or other processing elements.

15
SIMD Execution Style
  • Collectively, the individual memories of the
    processing elements (PEs) store the (vector) data
    that is processed in parallel.
  • Called the parallel memory
  • When the front end encounters an instruction
    whose operand is a vector, it issues a command to
    the PEs to perform the instruction in parallel.
  • Although the PEs execute in parallel, some units
    can be allowed to skip any particular
    instruction.

16
Masking in Processor Arrays
  • All the processors work in lockstep except those
    that are masked out (by setting mask register).
  • The conditional if-then-else is different for
    processor arrays than sequential version
  • Every active processor tests to see if its data
    meets the negation of the boolean condition.
  • If it does, it sets its mask bit so those
    processors will not participate in the operation
    initially.
  • Next the unmasked processors, execute the THEN
    part.
  • Afterwards, mask bits (for original set of active
    processors) are flipped and unmasked processors
    perform the ELSE part.

17
if (COND) then A else B
18
if (COND) then A else B
19
if (COND) then A else B
20
SIMD Machines
  • An early SIMD computer designed for vector and
    matrix processing was the Illiac IV computer
  • Initial development at the University of Illinois
    1965-70
  • Moved to NASA Ames, completed in 1972 but not
    fully functional until 1976.
  • See Jordan et. al., pg 7 and Wikipedia
  • The MPP, DAP, the Connection Machines CM-1 and
    CM-2, MasPar MP-1 and MP-2 are examples of SIMD
    computers
  • See Akl pg 8-12 and Quinn, 94
  • The CRAY-1 and the Cyber-205 use pipelined
    arithmetic units to support vector operations and
    are sometimes called a pipelined SIMD
  • See Jordan, et al, p7, Quinn 94, pg 61-2, and
    Quinn 2004, pg37).

21
SIMD Machines
  • Quinn 1994, pg 63-67 discusses the CM-2
    Connection Machine (with 64K PEs) and a smaller
    updated CM-200.
  • Our Professor Batcher was the chief architect for
    the STARAN and the MPP (Massively Parallel
    Processor) and an advisor for the ASPRO
  • ASPRO is a small second generation STARAN used by
    the Navy in surveillance planes.
  • Professor Batcher is best known architecturally
    for the MPP, which is at the Smithsonian
    Institute currently displayed at a D.C. airport.

22
Todays SIMDs
  • Many SIMDs are being embedded in sequential
    machines.
  • Others are being build as part of hybrid
    architectures.
  • Others are being build as special purpose
    machines, although some of them could classify as
    general purpose.
  • Much of the recent work with SIMD architectures
    is proprietary.
  • Often the fact that a parallel computer is SIMD
    is not mentioned by company building them.

23
ClearSpeeds Inexpensive SIMD
  • ClearSpeed is producing a COTS (commodity off the
    shelf) SIMD Board
  • Not a traditional SIMD as the hardware doesnt
    synchronize every step.
  • PEs are full CPUs
  • Hardware design supports efficient
    synchronization
  • This machine is programmed like a SIMD.
  • The U.S. Navy has observed that their machines
    process radar a magnitude faster than others.
  • There is quite a bit of information about this at
    www.clearspeed.com and www.wscape.com

24
Special Purpose SIMDs in the Bioinformatics Arena
  • Parcel
  • Acquired by Celera Genomics in 2000
  • Products include the sequence supercomputer
    GeneMatcher, which has a high throughput sequence
    analysis capability
  • Supports over a million processors
  • GeneMatcher was used by Celera in their race with
    U.S. government to complete the description of
    the human genome sequencing
  • TimeLogic, Inc
  • Has DeCypher, a reconfigurable SIMD

25
Advantages of SIMDs
  • Reference Roosta, pg 10
  • Less hardware than MIMDs as they have only one
    control unit.
  • Control units are complex.
  • Less memory needed than MIMD
  • Only one copy of the instructions need to be
    stored
  • Allows more data to be stored in memory.
  • Less startup time in communicating between PEs.

26
Advantages of SIMDs (cont)
  • Single instruction stream and synchronization of
    PEs make SIMD applications easier to program,
    understand, debug.
  • Similar to sequential programming
  • Control flow operations and scalar operations can
    be executed on the control unit while PEs are
    executing other instructions.
  • MIMD architectures require explicit
    synchronization primitives, which create a
    substantial amount of additional overhead.

27
Advantages of SIMDs (cont)
  • During a communication operation between PEs,
  • PEs send data to a neighboring PE in parallel and
    in lock step
  • No need to create a header with routing
    information as routing is determined by program
    steps.
  • the entire communication operation is executed
    synchronously
  • SIMDs are deterministic have much more
    predictable running time.
  • Can normally compute a tight (worst case) upper
    bound for the time for communications operations.
  • Less complex hardware in SIMD since no message
    decoder is needed in the PEs
  • MIMDs need a message decoder in each PE.

28
SIMD Shortcomings(with some rebuttals)
  • Claims are from our textbook i.e., Quinn 2004.
  • Similar statements are found in Grama, et. al.
  • Claim 1 Not all problems are data-parallel
  • While true, most problems seem to have a data
    parallel solution.
  • In Fox, et.al., the observation was made in
    their study of large parallel applications at
    national labs, that most were data parallel by
    nature, but often had points where significant
    branching occurred.

29
SIMD Shortcomings(with some rebuttals)
  • Claim 2 Speed drops for conditionally executed
    branches
  • MIMDs processors can execute multiple branches
    concurrently.
  • For an if-then-else statement with execution
    times for the then and else parts being
    roughly equal, about ½ of the SIMD processors are
    idle during its execution
  • With additional branching, the average number of
    inactive processors can become even higher.
  • With SIMDs, only one of these branches can be
    executed at a time.
  • This reason justifies the study of multiple SIMDs
    (or MSIMDs).

30
SIMD Shortcomings(with some rebuttals)
  • Claim 2 (cont) Speed drops for conditionally
    executed code
  • In Fox, et.al., the observation was made that
    for the real applications surveyed, the MAXIMUM
    number of active branches at any point in time
    was about 8.
  • The cost of the extremely simple processors used
    in a SIMD are extremely low
  • Programmers used to worry about full utilization
    of memory but stopped this after memory cost
    became insignificant overall.

31
SIMD Shortcomings(with some rebuttals)
  • Claim 3 Dont adapt to multiple users well.
  • This is true to some degree for all parallel
    computers.
  • If usage of a parallel processor is dedicated to
    a important problem, it is probably best not to
    risk compromising its performance by sharing
  • This reason also justifies the study of multiple
    SIMDs (or MSIMD).
  • SIMD architecture has not received the attention
    that MIMD has received and can greatly benefit
    from further research.

32
SIMD Shortcomings(with some rebuttals)
  • Claim 4 Do not scale down well to starter
    systems that are affordable.
  • This point is arguable and its truth is likely
    to vary rapidly over time
  • ClearSpeed currently sells a very economical SIMD
    board that plugs into a PC.

33
SIMD Shortcomings(with some rebuttals)
  • Claim 5 Requires customized VLSI for processors
    and expense of control units in PCs has dropped.
  • Reliance on COTS (Commodity, off-the-shelf parts)
    has dropped the price of MIMDS
  • Expense of PCs (with control units) has dropped
    significantly
  • However, reliance on COTS has fueled the success
    of low level parallelism provided by clusters
    and restricted new innovative parallel
    architecture research for well over a decade.

34
SIMD Shortcomings(with some rebuttals)
  • Claim 5 (cont.)
  • There is strong evidence that the period of
    continual dramatic increases in speed of PCs and
    clusters is ending.
  • Continued rapid increases in parallel performance
    in the future will be necessary in order to solve
    important problems that are beyond our current
    capabilities
  • Additionally, with the appearance of the very
    economical COTS SIMDs, this claim no longer
    appears to be relevant.

35
Slides from Associative Computing Part 1
36
Associative Computers
  • Associative Computer A SIMD computer with a few
    additional features supported in hardware.
  • These additional features can be supported (less
    efficiently) in traditional SIMDs in software.
  • The name associative is due to its ability to
    locate items in the memory of PEs by content
    rather than location.

37
Associative Models
  • The ASC model (for ASsociative Computing) gives a
    list of the properties assumed for an associative
    computer.
  • The MASC (for Multiple ASC) Model
  • Supports multiple SIMD (or MSIMD) computation.
  • Allows model to have more than one Instruction
    Stream (IS)
  • The IS corresponds to the control unit of a SIMD.
  • ASC is the MASC model with only one IS.
  • The one IS version of the MASC model is
    sufficiently important to have its own name.

38
ASC MASC are KSU Models
  • Several professors and their graduate students at
    Kent State University have worked on models
  • The STARAN and the ASPRO fully support the ASC
    model in hardware. The MPP supports ASC, partly
    in hardware and partly in software.
  • Prof. Batcher was chief architect or consultant
  • He received both the Eckert-Mauchly Award and the
    Seymour Cray Computer Engineering Award
  • Dr. Potter developed a language for ASC
  • Dr. Baker works on algorithms for models and
    architectures to support models
  • Dr. Walker is working with a hardware design to
    support the ASC and MASC models.
  • Dr. Batcher and Dr. Potter are currently not
    actively working on ASC/MASC models but still
    provide advice.

39
Motivation
  • The STARAN Computer (Goodyear Aerospace, early
    1970s) and later the ASPRO provided an
    architectural model for associative computing
    embodied in the ASC model.
  • STARAN built to support Air Traffic Control.
  • ASPRO built to support Air Defense Systems
  • ASC extends the data parallel programming style
    to a complete computational model.
  • ASC provides a practical model that supports
    massive parallelism.
  • MASC provides a hybrid data-parallel, control
    parallel model that supports associative
    programming.
  • Descriptions of these models allow them to be
    compared to other parallel models

40
The ASC Model
C
Cells
E
PE
Memory
L
L






IS
N

E


PE
Memory
T
W
O
R
PE
Memory
K
41
Basic Properties of ASC
  • Instruction Stream
  • The IS has a copy of the program and can
    broadcast instructions to cells in unit time
  • Cell Properties
  • Each cell consists of a PE and its local memory
  • All cells listen to the IS
  • A cell can be active, inactive, or idle
  • Inactive cells listen but do not execute IS
    commands until reactivated
  • Idle cells contain no essential data and are
    available for reassignment
  • Active cells execute IS commands synchronously

42
Basic Properties of ASC
  • Responder Processing
  • The IS can detect if a data test is satisfied by
    any of its responder cells in constant time
    (i.e., any-responders property).
  • The IS can select an arbitrary responder in
    constant time (i.e., pick-one property).

43
Basic Properties of ASC
  • Constant Time Global Operations (across PEs)
  • Logical OR and AND of binary values
  • Maximum and minimum of numbers
  • Associative searches
  • Communications
  • There are at least two real or virtual networks
  • PE communications (or cell) network
  • IS broadcast/reduction network (which could be
    implemented as two separate networks)

44
Basic Properties of ASC
  • The PE communications network is normally
    supported by an interconnection network
  • E.g., a 2D mesh
  • The broadcast/reduction network(s) are normally
    supported by a broadcast and a reduction network
    (sometimes combined).
  • See posted paper by Jin, Baker, Batcher (listed
    in associative references)
  • Control Features
  • PEs and the IS and the networks all operate
    synchronously, using the same clock

45
Non-SIMD Properties of ASC
  • Observation The ASC properties that are unusual
    for SIMDs are the constant time operations
  • Constant time responder processing
  • Any-responders?
  • Pick-one
  • Constant time global operations
  • Logical OR and AND of binary values
  • Maximum and minimum value of numbers
  • Associative Searches
  • These timings are justified by implementations
    using a resolver in the paper by Jin, Baker,
    Batcher (listed in associative references and
    posted).

46
Typical Data Structure for ASC Model
Busy- idle
1
Make, Color etc. are fields the programmer
establishes Various data types are supported.
Some examples will show string data, but they are
not supported in the ASC simulator.
47
The Associative Search
IS asks for all cars that are red and on the
lot. PE1 and PE7 respond by setting a mask bit in
their PE.
48
MASC Model
  • Basic Components
  • An array of cells, each consisting of a PE and
    its local memory
  • A PE interconnection network between the cells
  • One or more Instruction Streams (ISs)
  • An IS network
  • MASC is a MSIMD model that supports
  • both data and control parallelism
  • associative programming

49
MASC Basic Properties
  • Each cell can listen to only one IS
  • Cells can switch ISs in unit time, based on the
    results of a data test.
  • Each IS and the cells listening to it follow
    rules of the ASC model.
  • Control Features
  • The PEs, ISs, and networks all operate
    synchronously, using the same clock
  • Restricted job control parallelism is used to
    coordinate the interaction of the multiple ISs.

50
Characteristics of Associative Programming
  • Consistent use of style of programming called
    data parallel programming
  • Consistent use of global associative searching
    and responder processing
  • Usually, frequent use of the constant time global
    reduction operations AND, OR, MAX, MIN
  • Broadcast of data using IS bus allows the use of
    the PE network to be restricted to parallel data
    movement.

51
Characteristics of Associative Programming
  • Tabular representation of data think 2D arrays
  • Use of searching instead of sorting
  • Use of searching instead of pointers
  • Use of searching instead of the ordering provided
    by linked lists, stacks, queues
  • Promotes an highly intuitive programming style
    that promotes high productivity
  • Uses structure codes (i.e., numeric
    representation) to represent data structures such
    as trees, graphs, embedded lists, and matrices.
  • Examples of the above are given in
  • Ref Nov. 1994 IEEE Computer article in
    references
  • Also, see Associative Computing book by
    Potter.

52
Languages Designed for the ASC
  • Professor Potter has created several languages
    for the ASC model.
  • The most important of these is called ASC, a
    C-like language designed for ASC model
  • ACE is a higher level language than ASC that uses
    natural language syntax e.g., plurals, pronouns.
  • Language References
  • ASC Primer Copy available on parallel lab
    website www.cs.kent.edu/parallel/
  • Associative Computing book by Potter 11
    some features in this book were never fully
    implemented in ASC Compiler

53
Algorithms and Programs Implemented in ASC
  • A wide range of algorithms implemented in ASC
    without the use of the PE network
  • Graph Algorithms
  • minimal spanning tree
  • shortest path
  • connected components
  • Computational Geometry Algorithms
  • convex hull algorithms (Jarvis March, Quickhull,
    Graham Scan, etc)
  • Dynamic hull algorithms

54
ASC Algorithms and Programs(not requiring PE
network)
  • String Matching Algorithms
  • all exact substring matches
  • all exact matches with dont care (i.e., wild
    card) characters.
  • Algorithms for NP-complete problems
  • traveling salesperson
  • 2-D knapsack.
  • Data Base Management Software
  • associative data base
  • relational data base

55
ASC Algorithms and Programs (not requiring a PE
network)
  • A Two Pass Compiler for ASC not the one we will
    be using. This compiler uses ASC parallelism.
  • first pass
  • optimization phase
  • Two Rule-Based Inference Engines for AI
  • An Expert System OPS-5 interpreter
  • PPL (Parallel Production Language interpreter)
  • A Context Sensitive Language Interpreter
  • (OPS-5 variables force context sensitivity)
  • An associative PROLOG interpreter

56
Associative Algorithms Programs (using a
network)
  • There are numerous associative programs that use
    a PE network
  • 2-D Knapsack ASC Algorithm using a 1-D mesh
  • Image processing algorithms using 1-D mesh
  • FFT (Fast Fourier Transform) using 1-D nearest
    neighbor Flip networks
  • Matrix Multiplication using 1-D mesh
  • An Air Traffic Control Program (using Flip
    network connecting PEs to memory)
  • Demonstrated using live data at Knoxville in mid
    70s.
  • All but first were developed in assembler at
    Goodyear Aerospace

57
Example 1 An ASC algorithm for MST
  • A graph has nodes labeled by some identifying
    letter or number and arcs which are directional
    and have weights associated with them.
  • Such a graph could represent a map where the
    nodes are cities and the arc weights give the
    mileage between two cities.
  • A B
  • C D
  • E

3
5
2
4
5
58
The MST Problem
  • The MST problem assumes the weights are positive,
    the graph is connected, and seeks to find the
    minimal spanning tree,
  • i.e. a subgraph that is a tree1, that includes
    all nodes (i.e. it spans), and
  • where the sum of the weights on the arcs of the
    subgraph is the smallest possible weight (i.e. it
    is minimal).
  • Note The solution may not be unique.
  • 1 A tree is a set of points called vertices,
    pairs of distinct vertices called edges, such
    that (1) there is a sequence of edges called a
    path from any vertex to any other, and (2) there
    are no circuits, that is, no paths starting from
    a vertex and returning to the same vertex.

59
Recalling Prims MST Sequential Algorithm
  • The next 12 slides are included to recall Prims
    MST sequential algorithm
  • These slides are reference slides for students
    and will not be covered in class.

60
An Example(Prims MST Sequential Algorithm)
2
4
7
3
6
5
1
2
3
2
6
4
8
2
1
As we will see, the algorithm is simple. The ASC
program is quite easy to write. A SISD solution
is a bit messy because of the data structures
needed to hold the data for the problem
61
An Example Step 0
2
4
7
3
6
5
1
2
3
2
6
4
8
2
1
We will maintain three sets of nodes whose
membership will change during the run. The first,
V1, will be nodes selected to be in the tree. The
second, V2, will be candidates at the current
step to be added to V1. The third, V3, will be
nodes not considered yet.
62
An Example Step 0
2
4
7
3
6
5
1
2
3
2
6
4
8
2
1
V1 nodes will be in red with their selected edges
being in red also. V2 nodes will be in light blue
with their candidate edges in light blue also. V3
nodes and edges will remain white.
63
An Example Step 1
2
4
7
3
6
5
1
2
3
2
6
4
8
2
1
Select an arbitrary node to place in V1, say
A. Put into V2, all nodes incident with A.
64
An Example Step 2
2
4
7
3
6
5
1
2
3
2
6
4
8
2
1
Choose the edge with the smallest weight and put
its node, B, into V1. Mark that edge with red
also. Retain the other edge-node combinations in
the to be considered list.
65
An Example Step 3
2
4
7
3
6
5
1
2
3
2
6
4
8
2
1
Add all the nodes incident to B to the to be
considered list. However, note that AG has
weight 3 and BG has weight 6. So, there is no
sense of including BG in the list.
66
An Example Step 4
2
4
7
3
6
5
1
2
3
2
6
4
8
2
1
Add the node with the smallest weight that is
colored light blue and add it to V1. Note the
nodes and edges in red are forming a subgraph
which is a tree.
67
An Example Step 5
2
4
7
3
6
5
1
2
3
2
6
4
8
2
1
Update the candidate nodes and edges by including
all that are incident to those that are in V1 and
colored red.
68
An Example Step 6
2
4
7
3
6
5
1
2
3
2
6
4
8
2
1
Select I as its edge is minimal. Mark node and
edge as red.
69
An Example Step 7
2
4
7
3
6
5
1
2
3
2
6
4
8
2
1
Add the new candidate edges. Note that IF has
weight 5 while AF has weight 7. Thus, we drop AF
from consideration at this time.
70
An Example after several more passes, C is
added we have
2
4
7
3
6
5
1
2
3
2
6
4
8
2
1
Note that when CH is added, GH is dropped as CH
has less weight. Candidate edge BC is also
dropped since it would form a back edge between
two nodes already in the MST. When there are no
more nodes to be considered, i.e. no more in V3,
we obtain the final solution.
71
An Example the final solution
2
4
7
3
6
5
1
2
3
2
6
4
8
2
1
The subgraph is clearly a tree no cycles and
connected. The tree spans i.e. all nodes are
included. While not obvious, it can be shown that
this algorithm always produces a minimal spanning
tree. The algorithm is known as Prims Algorithm
for MST.
72
An ASC MST Algorithm vs the Sequential Prims MST
Algorithm
  • First, think about how you would write the
    program in C or C.
  • The usual solution uses some way of maintaining
    the sets as lists using pointers or references.
  • See solutions to MST in Algorithms texts by
    Baase, et. al. listed in the posted references.
  • In ASC, pointers and references are not even
    supported as they are not needed and their use is
    likely to result in inefficient SIMD algorithms
  • The ASC algorithm given here basically follows
    the preceding outline provided for Prims MST,
    using pseudo-code based on the ASC language.
  • A pointer to the ASC manual will be posted on the
    course web site.
  • The ASC pseudo-code used for algorithms will
    require using only a few ASC language commands.

73
ASC-MST Algorithm Preliminaries
  • Next, a data structure level presentation of
    Prims algorithm for the MST is given.
  • The data structure used is illustrated in the
    upcoming slides.
  • This example is from the paper, ASC An
    Associative Paradigm, listed in the references
    and on the class website under the online
    references.
  • There are two types of variables for the ASC
    model, namely
  • the parallel variables (i.e., ones for the PEs)
  • the scalar variables (ie., the ones used by the
    IS).
  • Scalar variables are essentially global
    variables.
  • Can replace each with a parallel variable with
    this scalar value with a vector with each vector
    entry stored in its PE.

74
ASC-MST Algorithm Preliminaries (cont.)
  • In order to distinguish between them here, the
    parallel variables names end with a symbol.
  • This convention is optional and not part of ASC
    language
  • Each step in this algorithm takes constant time.
  • One MST edge is selected during each pass through
    the loop in this algorithm.
  • Since a spanning tree has n-1 edges, the running
    time of this algorithm is O(n) and its cost is
    O(n 2).
  • Recall, cost is (running time) ? (number of
    processors)
  • Since the sequential running time of the Prim MST
    algorithm is O(n 2) and is time optimal, this
    parallel implementation is cost optimal.

75
Graph used for Data Structure
  • Figure 6 in Potter, Baker, et. al.

76
MST Algorithm Data Structure for Figure 6 (Data
Structure Before Execution)
Data Structure for MST Algorithm
77
Shorter Version of Algorithm ASC-MST-PRIM(root)
  • Initialize candidates to waiting
  • If there are any finite values in roots field,
  • set candidate to yes
  • set parent to root
  • set current_best to the values in roots
    field
  • set roots candidate field to no
  • Loop while some candidate contain yes
  • for them
  • restrict mask to mindex(current_best)
  • set next_node to a node identified in the
    preceding step
  • set its candidate to no
  • if the value in their next_nodes field are
    less than current_best, then
  • set current_best to value in
    next_nodes field
  • set parent to next_node
  • if candidate is waiting and the value in
    its next_nodes field is finite
  • set candidate to yes
  • set parent to next_node
  • set current_best to the values in
    next_nodes field

78
Comments on ASC-MST Algorithm
  • The three preceding slides are Figure 6 in
    Potter, Baker, et.al. IEEE Computer, Nov 1994.
  • Preceding slide gives a compact, data-structures
    level pseudo-code description for this algorithm
  • Pseudo-code illustrates Potters use of pronouns
    (e.g., them, its) and possessive nouns.
  • The mindex function returns the index of a
    processor holding the minimal value.
  • This MST pseudo-code is much shorter and simpler
    than data-structure level sequential MST
    pseudo-codes
  • e.g., see one of Baases textbooks in website
    references
  • Algorithm given in Baases books is identical to
    this parallel algorithm, except it is for a
    sequential computer
  • Next, a more detailed explanation of the
    algorithm in preceding slide will be given next.

79
Algorithm ASC-MST-PRIM(A more detailed
presentation)
  • Initially assign any node to root.
  • All processors set
  • candidate to wait
  • current-best to ?
  • the candidate field for the root node to no
  • All processors whose distance d from their node
    to root node is finite do
  • Set their candidate field to yes
  • Set their parent field to root.
  • Set current_best d.

80
Algorithm ASC-MST-PRIM (cont. 2/3)
  • While the candidate field of some processor is
    yes,
  • Restrict the active processors whose candidate
    field is yes and (for these processors) do
  • Compute the minimum value x of current_best.
  • Restrict the active processors to those with
    current_best x and do
  • pick an active processor, say node y.
  • Set the candidate value of node y to no
  • Set the scalar variable next-node to y.

81
Algorithm ASC-MST-PRIM (cont. 3/3)
  • If the value z in the next_node column of a
    processor is less than its current_best value,
    then
  • Set current_best to z.
  • Set parent to next_node
  • For all processors, if candidate is waiting
    and the distance of its node from next_node y is
    finite, then
  • Set candidate to yes
  • Set current_best to the distance of its node
    from y.
  • Set parent to y

82
Trace of 1st Pass of MST Algorithm for Figure 6
83
ASC Quickhull Algorithm
  • A Second ASC Algorithm Example

84
Quickhull Algorithm for ASC
  • Reference
  • Maher, Baker, Akl, An Associative
    Implementation of Classical Convex Hull
    Algorithms
  • Review of Sequential Quickhull Algorithm
  • Suffices to find the upper convex hull of points
    that are on or above the line
  • Select point h so that the area of triangle weh
    is maximal.
  • Proceed recursively with the sets of points on or
    above the lines and .

h
e
w
85
Previous Illustration
86
Example for Data Structure
87
Data Structure for Preceding Example
88
Algorithms Assumption
  • Basic algorithms exist for the following problems
    in Euclidean geometry for plane
  • Determine whether a third point lies on, above,
    or below the line determined by two other points.
  • Compute the area of a triangle determined by
    three points.
  • Standard Assumption
  • Three arbitrary points do not all lie on the same
    line.
  • Reference Introduction to Algorithms by Cormen,
    Leisterson, Rivest, ( Stein), McGraw Hill,
    Chapter on Computational Geometry.

89
ASC Quickhull Algorithm(Upper Convex Hull)
  • ASC-Quickhull( planar-point-set )
  • Initialize ctr 1, area 0, hull 0
  • Find the PE with the minimal x-coord and let w
    be its point
  • Set its hull value to 1
  • Find the PE with the PE with maximal x-coord and
    let e be its point
  • Set its hull to 1
  • All PEs set their left-pt to w and right-pt to e.
  • If the point for a PE lies above the line
  • Then set its job value to 1
  • Else set its job value to 0

90
ASC Quickhull Algorithm (cont)
  • Loop while parallel job contains a nonzero value
  • The IS makes its active cell those with a maximal
    job value.
  • Each (active) PE computes and stores the area of
    triangle (left-pt, right-pt, point ) in area
  • Find the PE with the maximal area and let h be
    its point.
  • Set its hull value to 1
  • Each PE whose point is above
  • sets its job value to ctr
  • sets its right-pt to h
  • Each PE whose point is above
  • sets its job to ctr
  • sets its left-pt to h
  • Each PE with job lt ctr -2 sets its job value
    to 0

91
Highest Job Order Assigned to Points Above Lines
92
Order that Triangles are Computed
93
Performance of ASC-Quickhull
  • Average Case
  • Assume either of the following
  • For some integer kgt1, on average 1/k of the
    points above each line being processed are
    eliminated each round.
  • For example, consider k 3, as one of three
    different areas are eliminated each round
  • O(lg n) points are on the convex hull.
  • For randomly generated points, the number of
    convex hull points is very close to lg(n) points.

94
Performance of ASC-Quickhull (cont)
  • Either of above assumptions imply the average
    running time is O(lg n).
  • For example, each pass through algorithm loop
    produces one convex hull point.
  • The average cost is O(n lg n)
  • Worst Case
  • Running time is O(n).
  • Cost is O(n2)
  • Recall The definition of cost is
  • Cost (running time) ? (nr. of processors)
Write a Comment
User Comments (0)
About PowerShow.com