Title: PRIORITY QUEUES
1PRIORITY QUEUES SORTING METHODS FOR PARALLEL
SIMULATION
- Authors
- Miltos D.Grammatikakis Stefan Liesch
- Citation
- IEEE Transactions On Software Engineering,
Volume 26, No.5, May 2000
2Overview
- Introduction
- Distributed Priority Queues
- Concurrent Priority Queues
- Dynamically Balanced Concurrent
- Priority Queues
- Experiments
- Conclusions
- Futurework
3Introduction
- A Review on
- Sequential Priority Queues
(Heap Implementation) - Previous Parallel Implementations
- Goal Of the Paper
- Applications
4Sequential Priority Queues
- Implementation using Heaps
-
- Fast Ideally Suited
-
-
5Min-Heap Properties
-
- Values Stored at any Node
- Always Less than or Equal to Values
- Stored at Both Child Nodes
-
- All Levels Except the Last Have to be Completely
Filled -
6An Example Inserting in a Min -Heap
- Start at a New Leaf Node
- To maintain the binary trees completeness
- Percolate the New Element up to its Appropriate
Position - To maintain the heap order property
7An Example Inserting in a Min -Heap
8An Example Inserting in a Min -Heap
12
9An Example Inserting in a Min -Heap
10
17
13
12
30
25
31
45
61
58
47
23
15
10An Example Inserting in a Min -Heap
10
17
12
13
30
25
31
45
61
58
47
23
15
11An Example Deleting in a Min -Heap
- Remove the Minimum Element at Root, Detach the
Rightmost Element From the Bottom Level - If It has Highest Priority,Insert at Root
- Deletion Complete.
- Otherwise Perform Heapification
12An Example Deleting in a Min -Heap
Highest Priority Element
13An Example Deleting in a Min -Heap
Delete Minimum Element
14An Example Deleting in a Min -Heap
43
51
63
77
58
65
91
73
75
87
80
Remove Check
15An Example Deleting in a Min -Heap
An Example Deleting in a Min -Heap
43
Smaller Child
51
63
77
58
65
91
73
75
87
80
16An Example Deleting in a Min -Heap
43
58
51
Smaller Child
63
77
65
91
73
75
87
80
17An Example Deleting in a Min -Heap
An Example Deleting in a Min -Heap
43
58
51
Last Element Fits Here
63
77
73
65
91
75
87
80
18Previous Implementations of PQ
- Sequential Priority Queues
- Parallel Priority Queues
- Distributed Priority Queues ( DPQ )
- Concurrent Priority Queues ( CPQ )
19Goal Of the Paper
- To propose a new concurrent data structure based
on distributed circular sorted lists, - known as
- a Balanced Concurrent
- Priority Queue( BCPQ )
- and perform comparisons of previous
implementations with this structure on a common
parallel platform
20Some Applications
- Parallel Algorithms for Decision Making in
Finance or Games - Kruscals Graph Algorithm in Network Design
- Dijkstras Algorithm for the Shortest Path Problem
- Heuristics for NP Complete Problems
- For Processing Signs based on relative
significance in Pattern Recognition - Job Event Schedulers in Operating Systems
21Previous Parallel PQ Implementations
- Distributed Priority Queues ( DPQ )
- O ( log P log N )
- Concurrent Priority Queue ( CPQ )
- O ( log N/P )
22Distributed Priority Queue (DPQ)
- A Distributed Data Structure ( Heap )
- Elements Stored at Each Processor have Higher
Priorities Than Those Stored at Their Children - Centralised Control
- ADT Operations Start at Root Processor
- Step For Generating Round Robin Sequence
- Each Node Consists of Splay Trees of Items
23The DPQ Structure as a Binary Heap
Splay trees
1
2
3
1
6
7
4
5
24DPQInsert Operation
- Send the Item to the Root Processor
- Tranfer Control to The Root Processor
- Root Computes the Position P of the Future Host
Processor, - Selects the Appropriate Path and Initiates the
Insertion Along the Path. - Heap Property Maintained
- Send the Item to the Root Processor
- Tranfer Control to The Root Processor
- Root Computes the Position P of the Future Host
Processor, - Selects the Appropriate Path and Initiates the
Insertion Along the Path. - Heap Property Maintained
25DPQDeleteMin Operation
- Root Immediately Sends Back Its Minimum Value
Item. - Gets Next Priority Item from Child Processor.
- Performs Reheapification.
- If DPQ Empty
- Stores the Number of the Requesting Processor in
the Waiting List. - The Item of The Next Incoming Insert Query Is
Sent Immediately to the First Waiting Processor.
26DPQ MPI Implementation
- http//members.aol.com/liesche/darbeit.zip
27Concurrent Priority Queue (CPQ)
- Binary Heap, Uniform Shared Memory
- Mutual Exclusion Locks on Each Node of Heap
- Mutual Exclusion Lock on a Variable that holds
the Number of Items in the Heap - Items are Tagged
- Empty Valid Transient ( pid )
- Consecutive Inserts Deletes Traverse Disjoint
Paths to the Root
28The CPQ Data Structure
15
t tag d data
0001
87
311
0010
0011
350
380
275
703
0111
0100
0101
0110
29CPQInsert Operation
- New Item Stored in a Free Node
- Moved up Level by Level
- During Each Move
- Parent Child Node Locked
- Item Priorities Compared
- If Child has higher priority, items are swapped
- Both Locks Are Released
- New Item Climbs up till Heap Condition Restored
30CPQDeleteMin Operation
- Exchange item Last Stored at last heap leaf with
item at root - Corresponding Nodes are Locked
- Last Node tagged empty, Lock released
- Root Node remains Locked
- Heapification Done with Node holding Last kept
locked - After Last stops Moving DeleteMin Stopped Lock
of Last is Released
31CPQShMem Implementation
- Details
- http//members.aol.com/liesche/darbeit.zip
32Dynamically Balanced Concurrent Priority Queue
(BCPQ)
- Each Processor Stores Part of the List in a
Circular Queue - Single Lock
- Processor 1 Stores Items with Highest Priority
- Subsequent Items Stored at Processor 2,3,4..P
- Load Balancing
33The BCPQ Data Structure
lock
8 9 5 6 7
N-2 N-1 N
1 2 3
4
30
32
4
7
8
14 17
25
84
89
75
12
tail
head
. . . . . . . . . .
Pr 2
Pr 1
Pr P
34BCPQInsert Operation
- Binary Search For Local List on which New Item to
be inserted - Lock list m
- Compare Priorities
- Release Lock
- Continue till Target List found
- Second Binary Search for Target Position within
Target List
35BCPQDeleteMin Operation
- Lock List 0
- Return Minimum Item
- Release Lock
- If List Empty, Go to Next List
- Continue till Minimum Item Found
36BCPQ
- O ( N / P ) Complexity
- After Optimization 5 faster than CPQ
37Experiments
- Based on a Cray-T3E900 system with
- 32 Processing Elements
- Involved
- Parallel Simulations of Single Buffers,
- a 64X64 Packet Switch ( with multicasting )
- Symmetric Networks
38Conclusions
- CPQ BCPQ
- easy to use
- 5 -10 times faster than DPQ
- BCPQ
- performance comparable to CPQ
- more efficient in assigning memory than
CPQ
39Conclusions
- Reduced Sorting Overheads
- Minimized Interprocessor Communications
- Optimized Scalar Processing
- Achieved Good Scalability Efficiency
40Extensions Futurework
- Further Reduce
- Communication Overhead
- Transform the BCPQ to be Lock Free
- Further Improve BCPQ to Achieve Platform Wide
Portability using MPI-2
41Questions ?