Title: An Associative Program for the MST Problem
1An Associative Program for the MST Problem
- Part 2 of Associative Computing
2Overview
- In this set of slides, we will explore an
alternate associative algorithm for the minimal
spanning tree (MST) problem. - Only slides with light blue titles will be
covered in class. - The other slides are reference slides so that
students can obtain an overview of the ASC
language. - As mentioned earlier, Professor Potter developed
an associative programming language called ASC
and a simulator for this language. - ASC has also been implemented on 3-4 SIMD
computers. - We will treat the ASC code for the MST included
here as a detailed pseudocode description of this
algorithm. - The goal of this set of slides is to prepare
students to write a Cn (ClearSpeed) program for
this algorithm.
3Content Covered in Light Blue
- References
- The MST example and background
- Variables and Data Types
- Operator Notation
- Input and Output
- Mask Control Statements
- Loop Control Statements
- Accessing Values in Parallel Variables
- Performance Monitor
- Subroutines and other topics
- Basic Program Structure
- Software Location Execution Procedures
- An Online ASC Program Data File to Execute
- ASC Code for the MST algorithm for a directed
graph - The Shortest Path homework problem
4References
- ASC Primer by Professor Jerry Potter is the
primary reference for basic ASC - A copy is posted on lab website under software
- Lab website is at www.cs.kent.edu/parallel/
- Associative Computing book by Jerry Potter has
a lot of additional information about the ASC
language. - Both references use a directed-graph version of
the Minimal Spanning Tree as an important
example.
5Features of Potters MST Algorithm
- Both versions of MST are based on Prims
sequential MST algorithm - In most algorithm books (e.g., see Baase, et. al.
in references) - A drawback of Potters version is that it
requires 1 PE for each graph edge, which in worst
case is n2 n ?(n2) - Unlike earlier MST algorithm, this is not an
optimal cost parallel algorithm - An advantage is that it works for undirected
graphs - The earlier MST algorithm covered might be
possibly be extended to work for directed graphs. - Uses less memory for most graphs than earlier
algorithm - True especially for sparse graphs
- Often will require a total of only O(n) memory
locations, since the memory required for each PE
is a small constant. - In the worst case, at most O(n2) memory locations
are needed. - Earlier algorithm always requires ?(n2) memory
locations, as each PE stores a row of the
adjacency matrix.
6Representing the Graph in a Procedural Language
- We need to find edges that are incident to a node
of the graph. What kind of data structure could
be used to make this easy? Typically there are
two choices - An adjacency matrix
- Label the rows and columns with the node names.
- Put the weight w in row i and column j if edge i
is incident to edge j with weight w. - Doing this, we would use the representation of
the graph in the problem as follows ...
7Graph Example for MST
8Adjacency Matrix For Preceding Graph
A B C D E F G H
I
2 7 3
2 4 6
4 2 2
2 1 8
1 6 2
7 6 5
3 6 3 1
2 8 3 4
2 5 1 4
A B C D E F G H I
9An Alternative Useful Representation for
Sequential Algorithms
- Another possibility is to use adjacency lists,
which can allow some additional flexibility for
this problem in representing the rest of the data
namely the sets v1, v2, and v3. - It is in these type of representations that we
see pointers or references play a role. - We link off of each node, all of the nodes which
are incident to it, keeping them in increasing
order by label.
10Adjacency Lists for the Graph in the Problem
B 2
A 2
F 7
C 4
G 3
G 6
A
B
C
D
E
F
G
H
I
ETC..... G, H, and I will have 4 entries all
others have 3. In each list, the nodes are in
increasing order by node label. Note if the node
label ordering is clear, the A, B, ... need not
be stored.
11Adding the Other Information Needed While Finding
the Solution
- Consider one of the states during the run, right
after the segment AG is selected
B
C
F
A
I
G
H
V1
V2
How will this data be maintained?
12- Cont.
- We need to know
- Which set each node is in.
- What is each nodes parent in the tree formed
below by both collections. - A list of the candidate nodes.
V2
13A Typical Data Structure Used For This Problem Is
Shown
I
V2lnk
V2 elements are linked via yellow entries with
V2lnk the head and ? the tail I ? H
? C ?F Light blue boxes appeared in earlier
states, but are no longer in use. Red entries say
what set the node is in. Green entries give
parent of node and orange entries give edge
weights. The adjacency lists are not shown, but
are linked off to right.
1
A 2 A 1
F 4 B 2
3
3
? 7 A 2
F 3 A 1
C 3 G 2
H 1 G 2
A B C D E F G H I
14I Is Now Selected and We Update
I
V2lnk
1
A 2 A 1
F 4 B 2
3
3
? 7 A 2
F 3 A 1
C 3 G 2
H 1 G 1
I is now in V1 so change its set value to 1. Look
at nodes adjacent to I E, F, G, H and add
them to V2 if they are in V3 E is added ...
A B C D E F G H I
15 E was Just Added to V2
E
V2lnk
Store Is link H in Es position and E in V2lnk.
This makes Is entry unreachable. So V2 is now
E ? H ? C ? F Now we have to add relevant edges
for I to any node in V2.
1
A 2 A 1
F 4 B 2
3
H 2
? 7 A 2
F 3 A 1
C 3 G 2
H 1 G 1
A B C D E F G H I
16 Add Relevant Edges For I to V2 Nodes
E
V2lnk
Walk Is adjacency list E ? F ? G ? H E was
just added so select EI with weight 2. wgt(FI)
5 lt wgt(FA) was 7, so drop FA and add FI (see
black blocks) G is in V1 so dont add GI. wgt(HI)
4 gt wgt(HG) 3, so no change. This is now
ready for next round.
1
A 2 A 1
F 4 B 2
3
H 2 I 2
? 5 I 2
F 3 A 1
C 3 G 2
H 1 G 1
A B C D E F G H I
17Complexity Analysis (Time and Space) for Prims
Sequential Algorithm
- Assume
- the preceding data structure is used.
- The number of nodes is n
- The number of edges is m
- Space used is 4n plus the space for adjacency
lists. - The adjacency list are T(m), which in worst case
is T(n2) - This data structure sacrifices space for time.
- Time is T(n2) in the worst case.
- The adjacency list of each node is traversed only
once when it is added to tree. The total work of
comparing weights and updating the chart during
all of these traversals is T(m), - There are n-1 rounds, as one tree node is
selected each round. - Walking the V2 list to find the minimum could
require n-1 steps the first round, n-2 the
second, etc for a max of T(n2) steps
18Alternate ASC Implemention of Prims Algorithm
using this Approach
- After setting up a data structure for the
problem, we now need to code it by manipulating
each state as we did on the preceding slides. - ASC model provides an easier approach.
- Recall that ASC does NOT support pointers or
references. - The associative searching replaces the need for
these. - Recall, we collectively think of the PE processor
memories as a rectangular structure consisting of
multiple records. - We will next introduce basic features of the ASC
language in order to implement this algorithm.
19Structuring the MST Data for ASC
- There are 15 bidirectional edges in the graph or
30 edges in total. - Each directed edge will have a head and a tail.
- So, the bidirectional edge AB will be represented
twice once as having head A and tail B and
once as having head B and
tail A - We will use 30 processors and in each PEs
memory we will store an edge representation as - State is 0, 1, 2, or 3 and will be explained
shortly.
head tail weight state
20ASC Data Types and Variables
- ASC has eight data types
- int (i.e., integer), real, hex (i.e., base 16),
oct (i.e., base 8), bin (i.e., binary), card
(i.e., cardinal), char (i.e., character),
logical, index. - Card is used for unsigned integer data.
- Variables can either be scalar or parallel.
21ASC Parallel Variables
- Parallel variables reside in the memory of
individual processors. - Consequently, tail, head, weight, and state will
be parallel variables. - In ASC, parallel variables are declared using an
array-like notation, with in index - char parallel tail, head
- int parallel weight, state
22ASC Scalar and Index Variables
- Scalar variables in ASC reside in the IS (i.e.,
the front end computer), not in the PEs
memories. - They are declared as
- char scalar node
- Index variables in ASC are used to manipulate the
index (i.e. choice of an individual processor) of
a field. For example, - graphxx
- They are declared as
- index parallel xx
- They occupy 1 bit of space per processor
23Logical Variables and Constants
- Logical variables in ASC are boolean variables.
They can be scalar or parallel. - ASC does not formally distinguish between the
index parallel and logical parallel variables - The correct type should be selected, based on
usage. - If you prefer to work with the words TRUE and
FALSE, you can define logical constants by - deflog (TRUE, 1)
- deflog (FALSE, 0)
- Constant scalars can be defined by
- define (identifier, value)
-
24Logical Parallel Variables needed for MST
- These are defined as follows
- logical parallel nextnod, graph,
result - The use of these will become clear in later
slides. - For the moment, recognize they are just bit
variables, one for each PE.
25Array Dimensions
- A parallel variable can have up to 3 dimensions
- First dimension is , the parallel dimension
- The array numbering is zero-based, so the
declaration - int parallel A,2
- creates the following 1dimensional variables
- A,0, A,1, A,2
26Mixed Mode Operations
- Mixed mode operations are supported and their
result has the natural mode. For example, given
declarations - int scalar a, b, c
- int parallel p, q, r, t,4
- index parallel x, y
- then
- c a b is a scalar integer
- q a p is a parallel integer variable
- a px is a integer value
- r tx,23p is a parallel integer
variable - x p .eq. r is an index parallel
variable - More examples are given on page 9-10 of ASC Primer
27The Memory Layout for MST
- As with most programming languages, the order of
the declarations determines the order in which
the variables are identified in memory. - To illustrate, suppose we declare for MST
- char parallel tail, head
- int parallel weight, state
- int scalar node
- index parallel xx
- logical parallel nexnod, graph,
result - The layout in the memories is given on next slide
- Integers default to the word size of the machine
so ours would be 32 bits.
28The Memory Layout for MST
tail head weight state xx nxt gr
res
PE 0 1 2 3 4 p-1 p
Last 4 are bit fields. The last 3 are
named nxtnod graph result
29Operator Notation
- Relational and Logical Operators
- Original syntax came from FORTRAN and the
examples in the ASC Primer use that syntax. - However, the more modern syntax is supported
- .lt. lt .not. !
- .gt. gt .or.
- .le. lt .and.
- .ge. gt .xor. --
- .eq.
- .ne. !
- Arithmetic Operators
- addition
- multiplication
- division /
30Parallel Input in ASC
- Input for parallel variables can be interactive
or from a data file in ASC. - We will run in a command window so file input
will be handled by redirection - If you are not familiar with command window
handling or Linux (Unix), this will be shown. - In either case, the data is entered in columns
just like it will appear in the read command. - Do not use tabs.
- THE LAST LINE MUST BE A BLANK LINE!
31Parallel read and Associate Command
- The format of the Parallel read statement is
- read parvar1, parvar2,... in ltlogical parallel
vargt - The command only works with parallel variables,
not scalars. - Input variables must be associated with a logical
parallel variable before the read statement. - The logical variable is used to indicate which
PEs was used on input. - After the read statement, the logical parallel
variable will be true (i.e., 1) for all
processors holding input values.
32Parallel Input in ASC
- The associate command and the read command for
MST would be - associate head, tail, weight, state
in graph - read tail, head, weight in graph
- Blanks can be used rather than commas, as
indicated by MST example on pg 35 of Primer. - Commenting Code
- / This is the way to comment code in ASC /
33Input of Graph
- Suppose we were just entering the data for AB,
AG, AF, BA, BC, and BG. - Order is not important,
- but the data file would
- look like
and memory would like tail head weight
graph A B 2 1 A
G 5 1 A F
9 1 B A 2
1 B C 4 1 B
G 6 1 0 0
0 0 ?
A B 2 A G 5 A F 9 B A 2 B C
4 B G 6 blank line
34Scalar variable input
- Static input can be handled in the code.
- Also, define or deflog statements can be used to
handle static input. - Dynamic input is currently not supported
directly, but can be accomplished as follows - Reserve a parallel variable dummy (of desired
type) for input. - Reserve a parallel index variable used.
- Values to be stored in scalar variables are first
read into dummy using a parallel-read and then
transferred using get or next to the appropriate
scalar variable. - Example
- read dummy in usedx
- get x in used
- scalar-variable dummyx
- endget x
35Input Summary
- Direct scalar input is not directly supported.
- Scalars can be set as constants or can be set
during execution using various commands. - We will see this shortly
- We will be able to output scalar variables
- This will also be handy for debugging purposes.
- The main problem on input is to remember to
include the blank line at the end. - I suggest always printing your input data
initially so you see it is going in properly.
36Parallel Variable Output
- Format for parallel print statement is
- print parvar1, parvar2,... in ltlogical parallel
vargt - Again, variables to be displayed must be
associated with a logical parallel variable
first. - You can use the same association as for the read
command - associate tail, head, weight with
graph - read tail, head, weight in graph
- print tail, head, weight in graph
- You can use a logical parallel variable that has
been set with another statement, like an IF
statement, to control which PEs will output data.
37MST Example
- Suppose state holds information about whether
a node is in V1, V2, etc. - Then, you could set up an association by
- if (state 1) then result TRUE endif
- You can print with this association as follows
- print tail, head, weight in result
- Only those records where state 1 would be
printed.
38Output Using msg
- The msg command
- Used to display user text messages.
- Used to display values of scalar variables.
- Used to display a dump of the parallel variables.
- The entire parallel variable contents printed
- Status of active responders or association
variables ignored - Format msg string list
- msg The answers are max BBX B
- See Page 13-14 of ASC Primer
39Assignment Statements
- Assignment can be made with compatible
expressions using the equal sign with - scalar variables
- parallel variables
- logical parallel variables
- The data types normally have to be the same on
both sides of the assignment symbol i.e. dont
mix scalar and parallel variables. - A few special cases are covered on the next slide
40Some Assignment Statement Special Cases
- Declarations for Examples
- int scalar k
- int parallel b
- Index parallel xx
- If xx is an index variable with a 1 in at least
one of its components, then following is valid - k aaxx 5
- Here, the component of aa used is one where xx is
1. - While selection is arbitrary (e.g., pick-one),
this implementation selects the smallest index
where xx is 1. - The assignment of integer arithmetic expressions
to integer parallel variables is supported. - bxx 3 5
- This statement assigns an 8 to the xx component
of b. - The component selected is identified by first 1
in xx. - See pg 9-10 of Primer for more examples.
41Exampleaa b c (1)
- Before
- mask aa b c
- 1 2 3 4
- 1 3 5 3
- 0 2 4 -3
- 0 6 4 1
- 1 2 -3 -6
- After
- mask aa b c
- 1 7 3 4
- 1 8 5 3
- 0 2 4 -3
- 0 6 4 1
- 1 -9 -3 -6
1 Note As an article, a is a reserved word in
ASC and so it cant be used as a variable name.
(see ASC Primer, pgs 29-30 and 39)
42Setscope Mask Control Statement
- Format
- setscope ltlogical parallel variablegt
- body
- endsetscope
- Resets the parallel mask register
- setscope jumps out of current mask setting to the
new mask given by its logical parallel variable. - One use is to reactivate currently inactive
processors. - Also allows an immediate return to a previously
calculated mask, such as an association. - Is an unstructured command such as go-to and
jumps from current environment to a new
environment. - Use sparingly
- endsetscope resets mask to preceding setting.
43Example
logical parallel used...used aa
5setscope used tail 100endsetscope
- After setscope
- used
- aa mask tail
- 5 1 100
- 22 0 6
- 5 1 100
- 41 0 7
- Before setscope
- mask aa used tail
- 1 5 1 7
- 1 22 0 6
- 1 5 1 9
- 0 41 0 7
After endsetscope aa mask tail 5
1 100 22 1 6 5 1
100 41 0 7
44The Scalar IF Statement
- Scalar IF similar to what you have used before
i.e. a branching statement with the else part
optional. - Example
- int scalar k
- ...
- if k 5 then sum 0
- else b sum
- endif
45The Parallel IF Mask Control Statement
- Looks like scalar IF except instead of a scalar
logical expression, a parallel logical expression
is encountered. - Format
- if ltlogical parallel expressiongt then
- ltbody of thengt
- else
- ltbody of elsegt
- endif
- Although it looks similar, the execution is
considerably different. - The parallel version normally executes both
bodies, each for the appropriate processors - Useful as a parallel search control statement
46Operation Steps of Parallel IF
- Save the mask bit of processors that are
currently active. - Broadcast code to the active processors to
calculate the IF boolean expression. - If the boolean expression is true for an active
processor, set its individual cell mask bit to
TRUE otherwise set its mask bit to FALSE. - Broadcast code for the then portion of the IF
statement and execute it on the (TRUE)
responders. - Compliment the mask bits for the processors that
were active at step 1. - Ones originally FALSE remain FALSE
- Broadcast code for the else portion of the IF
statement and execute it on the active
processors. - Reset the mask to original mask at Step 1.
47Example
if (b 1) then b 2 else b -1
endif
- Before
- b mask
- 1 1
- 7 1
- 2 1
- 1 1
- 1 0
- After
- b then mask else mask
- 2 1 0
- -1 0 1
- -1 0 1
- 2 1 0
- 1 0 0
48IF (ELSE-NOT-ANY) Format
- if ltlogical parallel expressiongt then
- body of if
- elsenany
- body of elsenany
- endif
- Note this is an if statement with an embedded
ELSENANY clause. - Either responders to if execute if-body or
else all active responders execute
elsenany-body. - While this extension is occasionally useful,
could get by with just any command - any command is covered in next construct.
49 The IF-ELSENANY Mask Control Statement
- Only one part of this IF statement is executed.
- Useful as a parallel search control statement
- Steps
- Evaluate the conditional statement.
- If there are one or more active responders,
execute the then block. - If there is no active responders, the
ELSE-NOT-ANY (ELSENANY) block is executed. - When executing the ELSENANY part, the original
mask is used i.e. the one prior to the
IF-NOT-ANY statement.
50Example
if aa gt 1 aa lt 4 /sets
mask/ if b 12 then c 1 / search
for b 12 / elsenany c 9
endif / action if no b is 12/ endif
- Before
- aa b c
- 1 17 0
- 2 13 0
- 2 8 0
- 3 12 0
- 2 9 0
- 4 67 0
- 0 0 0
- 0 12 0
- After
- mask1 mask2 aa b c
- 0 0 1 17 0
- 1 0 2 13 0
- 1 0 2 8 0
- 1 1 3 12 1
- 1 0 2 9 0
- 0 0 4 67 0
- 0 0 0 0 0
- 0 0 0 12 0
- Recall uses set mask
51Example
if aa gt 1 aa lt 4 /sets mask/ if b
12 then c 1 / search for b 12 /
elsenany c 9 endif / action if
no b is 12/ endif
- Before
- aa b c
- 1 17 0
- 2 13 0
- 2 8 0
- 3 4 0
- 2 9 0
- 4 67 0
- 0 0 0
- 0 12 0
- After
- mask1 mask2 aa b c
- 0 0 1 17 0
- 1 0 2 13 9
- 1 0 2 8 9
- 1 0 3 4 9
- 1 0 2 9 9
- 0 0 4 67 0
- 0 0 0 0 0
- 0 0 0 12 0
- Recall uses original mask
52The ANY Mask Control Statement
- Format
- any ltlogical parallel expressiongt
- body
- elsenany
- body
- endany
- ANY is the primary construct used in ASC to
support the AnyResponders associative property - The body of ANY is executed by all active
processors if any data item satisfies the
conditional statement. - The ELSENANY provides a sometimes useful but
non-essential extension of the ANY command.
53The ANY Statement
- Used to search for data items that satisfy the
conditional expression. - There must be at least one responder for the body
statement to be performed. - If there are no responders, the ANY statement
does nothing unless an ELSENANY is used. - The mask used to execute the ANY body is the
original mask prior to the ANY statement. - Consequently, all active responders are effected
if the conditional expression of the ANY
evaluates to TRUE. - If there are no responders, then the body of
ELSENANY is executed by all active processors.
54Example
if aa gt 7 then / set mask / any aa 10
b 11 endany endif
- Before
- mask aa b
- 1 3 0
- 0 9 0
- 1 16 0
- 1 10 0
- 1 8 0
- 0 0 0
- 1 0 0
- After
- mask aa b
- 0 3 0
- 0 9 0
- 1 16 11
- 1 10 11
- 1 8 11
- 0 0 0
- 0 0 0
55The Loop Control Statements
- Loop controlled by either a scalar test or a
parallel test - LOOP-UNTIL statement
- Conditional is evaluated every iteration
- Loop controlled by a parallel test
- Parallel FOR-Loop
- Conditional is evaluated only once
- Parallel While-Loop
- Conditional is evaluated every iteration
- The FOR and WHICH loop statement are the ones
normally used. - LOOP-UNTIL included for mostly for completeness.
56The LOOP-UNTIL Statement
- Similar to REPEAT UNTIL loops in other languages.
- However, it is more flexible since the UNTIL
conditional test can appear anywhere in the body
of the loop. - Format
- first
- initialization
- loop
- body1
- until (logical scalar expression) or
- (logical parallel expression) or
- (NANY logical parallel expression)
- body 2
- endloop
- Parallel exit conditions
- The UNTIL exits when responder(s) are detected
- With NANY, the UNTIL exits when a no-responder
condition occurs - body 2 represents statements executed if UNTIL
not satisfied. -
57Example
first i 0 loop if aa i then b
b 2 endif i i 1 until i gt 4
endloop
- Before
- mask aa b
- 1 0 3
- 1 3 4
- 1 0 1
- 0 1 3
- 1 1 5
- 1 4 6
- 1 5 2
-
- After i0
- mask mask1 aa b
- 1 1 0 5
- 1 0 3 4
- 1 1 0 3
- 0 0 1 3
- 1 0 1 5
- 1 0 4 6
- 1 0 5 2
58Example
first i 0 loop if aa i then b
b 2 endif i i 1 until i gt 4
endloop
- Before
- mask aa b
- 1 0 3
- 1 3 4
- 1 0 1
- 0 1 3
- 1 1 5
- 1 4 6
- 1 5 2
-
- After i0 i1
- mask aa b b
- 1 0 5 5
- 1 3 4 4
- 1 0 3 3
- 0 1 3 3
- 1 1 5 7
- 1 4 6 6
- 1 5 2 2
59Example
first i 0 loop if aa i then b
b 2 endif i i 1 until i gt 4
endloop
- Before
- mask aa b
- 1 0 3
- 1 3 4
- 1 0 1
- 0 1 3
- 1 1 5
- 1 4 6
- 1 5 2
-
- After i0 i1 i3
- mask aa b b b
- 1 0 5 5 5
- 1 3 4 4 6
- 1 0 3 3 3
- 0 1 3 3 3
- 1 1 5 7 7
- 1 4 6 6 6
- 1 5 2 2 2
60Example
first i 0 loop if aa i then b
b 2 endif i i 1 until i gt 4
endloop
- Before
- mask aa b
- 1 0 3
- 1 3 4
- 1 0 1
- 0 1 3
- 1 1 5
- 1 4 6
- 1 5 2
-
- After i0 i1 i3 i 4
- mask aa b b b b
- 1 0 5 5 5 5
- 1 3 4 4 9 9
- 1 0 3 3 3 3
- 0 1 3 3 3 3
- 1 1 5 7 7 7
- 1 4 6 6 6 8
- 1 5 2 2 2 2
Note The example is to illustrate only it could
be done easier.
61The Parallel FOR-LOOP
- FOR is used for looping and retrieving
- Used when a process must be repeated for each
cell that satisfies a certain condition. - It is similar to the sequential FOR, but the
conditional logical expression must be a parallel
one. - Initially, the conditional expression is
evaluated and the active responders are stored in
an index variable.
62The Parallel FOR-LOOP (cont)
- The top responder is processed during each pass
through the FOR-loop until no responders remain. - The contents of the index variable is updated at
the bottom of the loop (i.e., the top 1 is
changed to 0) - The index variable is used to walk through the
responders and to retrieve each responders
records. - The conditional condition is never re-evaluated.
63Example
- sum 0
- for xx in tail ! 999 /evaluates and
stores in xx/ - sum sum valuexx
- endfor xx
- tail xx value
- 3 1 10 1st time sum
sum 10 10 - 5 1 20 2nd time sum sum
20 30 - 999 0 30
- 6 1 40 3rd time sum sum
40 70
64The Parallel WHILE Loop
- Similar to LOOP-UNTIL loop except it re-evaluates
the conditional expression before each iteration. - Format
- WHILE ltpara index vargt in ltpara logical
expressiongt - body
- endwhile ltpara index vargt
- The iteration terminates when there are no
responders to the parallel logical expression. - Note the number of responders can increase,
decrease, or remain the same during a run. - Unlike the FOR loop, this loop can be infinite.
65The Parallel WHILE Loop
- Unlike the FOR statement, this construct
re-evaluates the logical conditional statement
prior to each execution of the body of the while.
- The bit array resulting from the evaluation of
the conditional statement is assigned to the
index parallel variable on each pass. - The index parallel array is available for use
within the body for each loop and can be changed
within the body. - The iteration is terminated when the conditional
statement is tested and there are no responders. - That is, all zeros in the index parallel
variable. - See ASC Primer pg 21-22 for more information
66sumit 0 while xx in (aa 2) sumit
sumit bxx if (cxx 1) then if
(aa 2) then aa 5 endif else
aaxx 7 endif msg "In loop, sumit is "
sumit print aa, c in active
endwhile xx
- Before
- aa b c
- 1 17 0
- 2 13 0
- 2 8 1
- 3 11 1
- 2 9 0
- 4 67 0
After 2nd loop In loop, sumit is 21 DUMP OF
ASSOCIATION ACTIVE FOLLOWS AA,C,
1 0 7 0 5 1 3
1 5 0 4 0
- After 1st loop
- In loop, sumit is 13
- DUMP OF ASSOCIATION
- ACTIVE FOLLOWS
- AA,C,
- 1 0
- 7 0
- 2 1
- 3 1
- 2 0
- 4 0
67When is Conditional Tested in Loops?
- UNTIL loops evaluate the test condition each time
the UNTIL statement is encountered. - WHILE loops have the test condition reevaluated
before each iteration. - The FOR loop evaluates the conditional expression
initially and stores the resulting active
responders in an index variable. This index
variable is then used to retrieve items
successively.
68Special Commands to Obtain Parallel Variable
Values
- Special Commands
- Get Statement
- Next Statement
- Minimum and maximum values
- These commands are needed to implement some of
the associative functions. - In particular, get and next allow the
programmer to select an active responder for
further processing. - get next implement the PickOne property.
69GET Statement
- Used to access a specific field in the memory of
an active processor. - Format
- get ltparallel index vargt in ltparallel logical
expressiongt - body
- elsenany
- body
- The parallel logical expression is evaluated and
its value assigned to the parallel index
variable. - The parallel index variable will identify the
first active responder (if one exists) that
satisfies the conditional test - first active responder executes the commands in
the GET body. - If there are no responders, the GET body is not
executed. - If GET contains an ELSENANY statement, its body
is executed by all active processors when GET has
no responders.
70Example
get xx in tail 1 valxx 0 endget xx
- After
- tail val
- 10 100
- 1 0
- 2 77
- 1 83
- Before
- tail val
- 10 100
- 1 90
- 2 77
- 1 83
71The NEXT Statement
- Similar to GET statement, except NEXT deactivates
the responder accessed each time it is called. - Format
- next ltparallel index vargt in ltparallel logical
expressiongt - body
- elsenany
- body
- Unlike GET, two successive calls to NEXT is
expected to select two distinct PEs and
association records. - NEXT is almost always used within a looping
statement to walk through the selected PEs to do
something in each.
72Example
int parallel aa, b used aa
4 logical parallel used next xx in
used index parallel xx
bxx -1
endnext xx
- After
- aa used b
- 1 0 2
- 4 0 -1
- 4 1 2
- 19 0 2
- 4 1 2
- Before
- aa used b
- 1 0 2
- 4 1 2
- 4 1 2
- 19 0 2
- 4 1 2
Caution xx in aa 4 is not allowed. A
logical variable used must be involved and its
top 1 is changed.
73Example see next slide for results
- main tryout
- int scalar k
- int parallel aa, b, c
- logical parallel used , active
- index parallel xx
- associate aa, b, c with active
- read aa, b, c in active
- print aa, b, c in active / to
see input / - / Tryout of assignment statements /
- b aa 5
- c 3 5
- used aa 5 / selects all
processors with 5 in aa field / - next xx in used / selects the top
processor in used / - k bxx 2 / could do this next
line in one line / - cxx k / done this way to
show a scalar can be / - endnext xx / set /
- print aa, b, c in active
74b aa 5 next xx in
used c 3 5
k bxx2 used aa 5
cxx k endnext xx
- Before
- DUMP OF ASSOCIATION ACTIVE FOLLOWS
- AA,B,C,
- 1 2 3
- 2 3 4
- 5 6 7
- 8 9 10
- 11 12 13
- 5 1 2
- 5 2 1
- After
- DUMP OF ASSOCIATION ACTIVE FOLLOWS
- AA,B,C,
- 1 6 8 Arrows show
- 2 7 8 PEs in used
- 5 10 12
- 8 13 8 xx is first PE
- 11 16 8
- 5 10 8
- 5 10 8
75Printing Scalars, Text Messages, and Dumping the
Entire Parallel Array for a Field
Format msg string list Example msg "The
values are " aa, k The values are PE 0
0 1 2 5 8 11 5 5 0 0 0 0 0
0 0 0 PE 16 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 PE 32 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 PE 48 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 ... PE288 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 PE304 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 12
76MAXVAL and MINVAL Functionsand Other Functions
- MAXVAL(MINVAL) returns the maximum (minimum)
value among active responders. - If (tail ! 1) then k maxval(weight)
- endif
- MAXDEX (MINDEX) returns the index of an entry
where the maximum (minimum) value of the
specified item occurs among the active
responders. - Recall With an associative SIMD, above are
constant time functions as they are supported in
hardware. - With a SIMD that is not associative, they can
still be performed, but they are not constant
time functions and their timings depend upon the
interconnection network. - There are several variations of above functions
in ASC Primer i.e. finding the nth smallest
value. - The function COUNT() returns the number of active
responders. It can be useful in debugging.
77Dynamic Storage Allocation
- allocate is used to identify a processor whose
association record is currently unused. - Will be used to store a new association record
- Creates a parallel index that points to the
processor selected - release is used to de-allocate storage of
specified records in an association - Can release a single record or multiple records
simultaneously. - Example
- char parallel node, parent
- logical parallel tree
- index parallel x
- associate node, level, parent with
tree - ......
- allocate x in tree
- nodex B
- endallocate x
- release parent .eq. A from tree.
78Performance Monitor
- Keeps track of number of scalar and parallel
operations. - It is turned on and off using the PERFORM
statement - perform 1
- perform 0
- The number of scalar and parallel operations can
be printed using the MSG command - MSG Number of parallel and scalar operations
are PA_PERFORM SC_PERFORM - The ASC Monitor is important for evaluation and
comparison of various ASC algorithms and
software. - It can also be used to determine or estimate
running time. - See Pg 30-31 of ASC Primer for more information
79Additional Features
- Restricted subroutine capability is currently
available - See call and include on pg 25-7 of ASC Primer.
- ASC has a rather simplistic subroutine
capability. - While not difficult, the subroutine details will
not be covered in slides. - Assignment will not require use of subroutines.
- Use of personal pronouns and articles in ASC make
code easier to read and shorter. - See page 29 of ASC Primer.
- Again, the details are not covered in slides.
80Basic Program Structure
- Main program_name
- Constants
- Variables
- Associations
- Body
- End
81Software
- Compiler and Emulator
- DOS/Windows, UNIX (Linux)
- WaveTracer
- Connection Machine
- http//www.cs.kent.edu/parallel/ and look under
software - Use any text editor.
- Careful on moving files between DOS and UNIX!
Anyprog.asc
-e -wt -cm
ASC Compiler
Anyprog.iob
-e -wt -cm
ASC Emulator
Standard I/O
File I/O
82Simple ASC Program
- Example
- Consider an ASC Program that computes the area of
various simple shapes (circle, rectangle,
triangle). - Here is an example shapes.asc
- Here is the data shapes.dat
- Here is the shapes.out
- NOTE Above links are only active during the
slide show.
83Software
- To compile the previous program
- asc1.exe e shapes.asc
- To execute your program
- asc2.exe e shapes.iob
- asc2.exe e shapes.iob lt shapes.dat
- asc2.exe e shapes.iob lt shapes.dat gt
shapes.out - Commands are executed in Windows from a command
window. - See CMD command-line Environment document at
http//www.cs.kent.edu/jbaker/PDC-F07/references/
CMD_Commands.doc - Can execute UNIX (Linux) commands from line
prompt - Dont forget to change mode of compiler
emulator to be executable using chmod command.
84MST Program Examplein ASC Primer
- View ASC code as pseudocode and consider how to
create equivalent Cn code for the ClearSpeed Board
85The Graph and Its Data File
1 2 2 1 6 7 1 7 3 2 1 2 2 3 4 2 7 6 3 2 4 3 8 2 3
4 2 4 5 1 4 8 8 4 3 2 5 4 1 5 9 2 5 6 6 6 1 7
6 9 5 6 5 6 7 2 6 7 1 3 7 9 1 7 8 3 8 7 3 8 9 4 8
4 8 8 3 2 9 7 1 9 6 5 9 8 4 9 5 2
2
4
7
3
6
5
1
2
3
2
6
4
8
2
1
86Header and declarations / The ASC Minimum
Spanning Tree - with slight modifications from
ASC PRIMER / main mst / Note Vertices were
encoded as integers / deflog (TRUE, 1) deflog
(FALSE, 0) char parallel tail, head int
parallel weight, state char scalar
node index parallel xx logical parallel
nxtnod, graph, result
87Obtain input associate head, tail,
weight, state with graph read tail,
head, weight in graph Mark the active
PEs for the next command (otherwise the zeros in
the fields where data wasnt read in would be
used.) Find a tail whose weight is
minimal. setscope graph node
tailmindex(weight) endsetscope Because of
the layout of the data file, we would find the
first PE containing the minimal weight (which is
1) to be the PE holding 4 5 1. So node would be
set to 4.
88The Graph and Its Data File
1 2 2 1 6 7 1 7 3 2 1 2 2 3 4 2 7 6 3 2 4 3 8 2 3
4 2 4 5 1 4 8 8 4 3 2 5 4 1 5 9 2 5 6 6 6 1 7
6 9 5 6 5 6 7 2 6 7 1 3 7 9 1 7 8 3 8 7 3 8 9 4 8
4 8 8 3 2 9 7 1 9 6 5 9 8 4 9 5 2
2
4
7
3
6
5
1
2
3
2
6
4
8
2
1
89Continued Mark as being in set V2, all edges
that have tails equal to node, i.e. 4 if (node
tail) then state 2 else state 3
endif This would mark the following edges as
having a state of 2, i.e. they are in V2. 4 5 1 4
8 8 4 3 2
90The Graph and Its Data File
1 2 2 1 6 7 1 7 3 2 1 2 2 3 4 2 7 6 3 2 4 3 8 2 3
4 2 4 5 1 4 8 8 4 3 2 5 4 1 5 9 2 5 6 6 6 1 7
6 9 5 6 5 6 7 2 6 7 1 3 7 9 1 7 8 3 8 7 3 8 9 4 8
4 8 8 3 2 9 7 1 9 6 5 9 8 4 9 5 2
2
4
7
3
6
5
1
2
3
2
6
4
8
2
1
91Continued
while xx in (state 2) if (state
2) then nxtnod mindex(weight) endif
node headnxtnod In loop 0 The
only edges with the state of 2 are 4 5 1 4 8
8 4 3 2 so first one is selected and node is
set to 5. statenxtnod 1 The edge 4 5
receives a state of 1.
92The Graph and Its Data File
1 2 2 1 6 7 1 7 3 2 1 2 2 3 4 2 7 6 3 2 4 3 8 2 3
4 2 4 5 1 4 8 8 4 3 2 5 4 1 5 9 2 5 6 6 6 1 7
6 9 5 6 5 6 7 2 6 7 1 3 7 9 1 7 8 3 8 7 3 8 9 4 8
4 8 8 3 2 9 7 1 9 6 5 9 8 4 9 5 2
2
4
7
3
6
5
1
2
3
2
6
4
8
2
1
93Continued
if (head node state ! 1)
then state 0 endif We no longer want
edges with a head of 5 so we throw those out of
consideration by setting their states to
0. This would be edges 6 5 and 9 5 in data file.
94The Graph and Its Data File
1 2 2 1 6 7 1 7 3 2 1 2 2 3 4 2 7 6 3 2 4 3 8 2 3
4 2 4 5 1 4 8 8 4 3 2 5 4 1 5 9 2 5 6 6 6 1 7
6 9 5 6 5 6 7 2 6 7 1 3 7 9 1 7 8 3 8 7 3 8 9 4 8
4 8 8 3 2 9 7 1 9 6 5 9 8 4 9 5 2
2
4
7
3
6
5
1
2
3
2
6
4
8
2
1
Green entries are edges thrown out.
95Continued if (state 3 node tail)
then state 2 endif The edges turned to a
state of 2 are then 5 4 5 9
5 6 Recall these are possible
candidates for the next round. Do we want 5
4? Isnt 4 5 already in? Solving the problem by
using a picture didnt run into this problem
because once 5 4 was in, the 4 5 was eliminated
from consideration automatically. So- we need to
correct this. When an edge is included like X Y,
we need to set the state of Y X to 0 to keep it
out of further consideration. Is anything else
needed?
96Correct Implement MST Algorithm in Cn
- The algorithm as coded selects D first, while we
selected A. - The business at the beginning to select a minimal
weight edge and use one of its nodes as the
starting point was to avoid the need to assign a
character to a variable. - Since we are using integer nodes, we could
eliminate - setscope graph
- node tailmindex(weight)
- endsetscope
- and just set node to 1, i.e.
- node 1
- This might help you see what is going on.
- Try to trace the MST algorithm with this change
and correct it. (Homework)
97Shortest Path Problem for Graphs
- The minimal spanning tree algorithm by Prim is
called a greedy algorithm. - Greedy algorithms are usually applied to
optimization problems i.e. a set of
configurations is searched to find one that
minimizes or maximizes some objective function
defined on these configurations. - The approach is to proceed with a sequence of
choices. - The sequence starts from some well-understood
starting configuration. - Then we iteratively make choices that are locally
best from among those currently possible. - This approach does not always lead to a solution,
but if it does, the problem is said to possess
the greedy-choice property.
98The Greedy-choice Property.
- This property says a global optimal configuration
can be reached by a series of locally optimal
choices i.e. choices that are best from among
the possibilities available at a time. - This allows us to avoid the exponential timing
that would result if, for example, we had to
generate all trees in a graph and then find the
minimal one. - Many other problems are known to have the greedy
choice problem. - However, you need to be careful. Sometimes just a
slight change in the wording of the problem turns
it into a problem that doesnt have the
greedy-choice property. In fact, a slight change
can produce an NP-complete problem.
99Some Problems Known to Have the Greedy-choice
Property
- (Minimal Spanning Tree) just discussed
- (Shortest Path) Find the shortest path between
two nodes on a connected, weighted graph where
the weights are positive and represent distances.
- (Fractional Knapsack) Given a set of n items,
such that each item i has a positive value bi and
a positive weight wi. Find a maximum value subset
that does not exceed a given weight W, provided
we can take fractional values for the items, - Think of this as a knapsack being filled to not
exceed the weight you can carry. Each item has
benefit to you, but it can be split up into
fractional parts, as is possible with granola
bars, popcorn, water, etc.
100However, The Wording is Delicate
- The Fractional Knapsack Problem is one that must
be carefully stated. If, for the n items, you
only allow an item to be taken or rejected, you
have the 0-1 Knapsack Problem which is known to
be NP-complete i.e. it doesnt have the greedy
choice property. - This has a pseudo-polynomial algorithm i.e. one
that runs in O(nW) time, where W is the weight.
So the timing is not proportional just to the
input size of the problem, n, but to a function
involved in the problem statement. - In fact, if W 2n, then the pseudo-polynomial
algorithm for this problem is as bad as the brute
force method of trying all combinations.
101Some Problems with the Greedy-choice Property
- (Task Scheduling Problem) We are given a set T of
n tasks such that each task i has a start time si
and a finish time fi where si lt fi. - Task i must start at time si and it is guaranteed
to be finished by time fi. - Each task has to be performed on a machine and
each machine can execute only one task at a time.
- Two tasks i and j are non-conflicting if fi sj
or fj si. - Two tasks can be scheduled to be executed on the
same machine only if they are non-conflicting. - What is the minimum number of machines needed to
schedule all the tasks?
102A Greedy-choice Algorithm for the Shortest Path
Problem
- Given a connected graph with positive weights and
two nodes s, the start node, and d, the
destination node. Find a shortest path from s to
d. - A greedy choice algorithm is due to Dijkstra.
- Unlike the MST algorithm, more must be considered
than just the minimum weight on edge leading out
of a node. - It is easy to find examples where that approach
wont work for this problem. - Try to find one. (Exercise)
103Dijkstras Sequential Algorithm for the Shortest
Path Problem
- Let S be the set of nodes already explored and V
all the nodes in the graph - For each u in S, we store a distance value d(u)
which will be defined below. - Initially, only s, the starting point, is in S
and d(s) 0. - While S doesnt include dp, the destination
point, - Select a node v not in S with at least one edge
from S for which the following is minimal - d(v) min d(u) wgt(u,v)
- Here, the min is taken over all edges e(u,v)
with u?S and v?S and wgt(u,v) is the weight of
edge e. - Add v to S and define d(v) d(v).
- Stop when dp, the destination point, is placed in
S. -
104Example of the Greedy-choice only part of the
graph is shown
d(a) 1 d(b) 2 d(s) 0 Choose minimal
from d(c) d(a) 3 4 d(x) min d(a) 2
, d(s) 4,
d(b) 2 3 d(e) d(b) 3 5
3
c
a
1
2
1
4
s
x
2
b
2
2
3
e
Set S
Therefore, let d(x) 3 and put x in S.
105Shortest Path Homework
- More information about this assignment will be
posted on the homework section of course webpage. - You should first complete your homework for the
MST.