Nonblocking Data Structures for HighPerformance Computing - PowerPoint PPT Presentation

1 / 52
About This Presentation
Title:

Nonblocking Data Structures for HighPerformance Computing

Description:

Blocking, Danger of priority inversion and deadlocks. ... Decrement reference count and retries if link is changed after step 2. 5 August 2005 ... – PowerPoint PPT presentation

Number of Views:109
Avg rating:3.0/5.0
Slides: 53
Provided by: hkansu
Category:

less

Transcript and Presenter's Notes

Title: Nonblocking Data Structures for HighPerformance Computing


1
Non-blocking Data Structures for High-Performance
Computing
  • Håkan Sundell, PhD

2
Outline
  • Shared Memory
  • Synchronization Methods
  • Memory Management
  • Shared Data Structures
  • Dictionary
  • Performance
  • Conclusions

3
Shared Memory
CPU
CPU
CPU
. . .
Cache
Cache
Cache
Memory
- Uniform Memory Access (UMA)
...
...
...
CPU
CPU
CPU
CPU
CPU
CPU
. . .
Cache bus
Cache bus
Cache bus
Memory
Memory
Memory
- Non-Uniform Memory Access (NUMA)
4
Synchronization
  • Shared data structures needs synchronization!
  • Accesses and updates must be coordinated to
    establish consistency.

P1
P2
P3
5
Hardware Synchronization Primitives
  • Consensus 1
  • Atomic Read/Write
  • Consensus 2
  • Atomic Test-And-Set (TAS), Fetch-And-Add (FAA),
    Swap
  • Consensus Infinite
  • Atomic Compare-And-Swap (CAS)
  • Atomic Load-Linked/Store-Conditionally

Read
Read
Write
Mf(M,)
6
Mutual Exclusion
  • Access to shared data will be atomic because of
    lock
  • Reduced Parallelism by definition
  • Blocking, Danger of priority inversion and
    deadlocks.
  • Solutions exists, but with high overhead,
    especially for multi-processor systems

P1
P2
P3
7
Non-blocking Synchronization
  • Perform operation/changes using atomic primitives
  • Lock-Free Synchronization
  • Optimistic approach
  • Retries until succeeding
  • Guarantees progress of at least one operation
  • Wait-Free Synchronization
  • Always finishes in a finite number of its own
    steps
  • Coordination with all participants

8
Memory Management
  • Dynamic data structures need dynamic memory
    management
  • Concurrent D.S. need concurrent M.M.!

9
Concurrent Memory Management
  • Concurrent Memory Allocation
  • i.e. malloc/free functionality
  • Concurrent Garbage Collection
  • Questions (among many)
  • When to re-use memory?
  • How to de-reference pointers safely?

P2
P1
P3
10
Lock-Free Memory Management
  • Memory Allocation
  • Valois 1995 fixed block-size, fixed purpose
  • Michael 2004 Gidenstam et al. 2004, any size,
    any purpose
  • Garbage Collection
  • Valois 1995, Detlefs et al. 2001 reference
    counting
  • Michael 2002, Herlihy et al. 2002 hazard
    pointers
  • Gidenstam, Papatriantafilou, Sundell and Tsigas
    2005 hazard pointer reference counting

11
Lock-Free Reference Counting
  • De-referencing links
  • 1. Read the link contents, i.e. a pointer.
  • 2. Increment (FAA) the reference count on the
    corresponding object.
  • What if the link is changed between step 1 and 2?
  • Solution by Detlefs et al
  • Use DCAS on step 2 that operates on two arbitrary
    memory words. Retries if link is changed after
    step 2.
  • Solution by Valois et al
  • The reference count field is present
    indefinitely. Decrement reference count and
    retries if link is changed after step 2.

12
Lock-Free Hazard Pointers (Michael 2002)
  • De-referencing links
  • 1. Read the link contents, i.e. a pointer.
  • 2. Set a hazard pointer to the read pointer
    value.
  • 3. Read the link contents again if not same as
    in step 1 then restart from step 1.
  • Deletion
  • After deleted from data structure, put node on a
    local list.
  • When the local list reaches a certain size scan
    all hazard pointers globally, reclaim memory of
    all nodes which address does not match the scan.

13
Lock-Free Memory Allocation
  • Solution (lock-free), IBM freelists
  • Create a linked-list of the free nodes,
    allocate/reclaim using CAS
  • Needs some mechanism to avoid the ABA problem.

Allocate
Head
Mem 1
Mem 2
Mem n

Reclaim
Used 1
14
Shared Data StructureDictionaries (Sets)
  • Fundamental data structure
  • Works on a set of ltkey,valuegt pairs
  • Three basic operations
  • Insert(k,v) Adds a new item
  • vFindKey(k) Finds the item ltk,vgt
  • vDeleteKey(k) Finds and removes the item ltk,vgt

15
Randomized Algorithm Skip Lists
  • William Pugh Skip Lists A Probabilistic
    Alternative to Balanced Trees, 1990
  • Layers of ordered lists with different densities,
    achieves a tree-like behavior
  • Time complexity O(log2N) probabilistic!

Head
Tail

25
50
1
2
3
4
5
6
7
16
New Lock-Free Concurrent Skip List
  • Define node state to depend on the insertion
    status at lowest level as well as a deletion flag
  • Insert from lowest level going upwards
  • Set deletion flag. Delete from highest level
    going downwards

1
2
3
4
5
6
7
D
D
D
D
D
D
D
3
2
1
p
3
2
1
p
D
17
Overlapping operations on shared data
Insert 2
2
  • Example Insert operation- which of 2 or 3 gets
    inserted?
  • Solution Compare-And-Swap atomic
    primitiveCAS(ppointer to word, oldword,
    newword)booleanatomic do if p old then
    p new return true else return false

1
4
3
Insert 3
18
Concurrent Insert vs. Delete operations
b)
1
4
2
a)
  • Problem- both nodes are deleted!
  • Solution (Harris et al) Use bit 0 of pointer to
    mark deletion status

Delete
3
Insert
b)
1
4
2

a)
c)
3
19
Helping Scheme
  • Threads need to traverse safely
  • Need to remove marked-to-be-deleted nodes while
    traversing Help!
  • Finds previous node, finish deletion and
    continues traversing from previous node

or
1
4
2

1
4
2

?
?
1
4
2

20
Lock-Free Skip List - Techniques Summary
  • The Skip List is treated as layers of ordered
    lists
  • Uses CAS atomic primitive
  • Lock-Free memory management
  • IBM Freelists
  • Reference counting (ValoisMichaelScott)
  • Helping scheme
  • Back-Off strategy
  • All together proved to be linearizable

21
Lock-Free Skip List publications
  • First publications in literature
  • H. Sundell and P. Tsigas, Fast and Lock-Free
    Concurrent Priority Queues for Multi-thread
    Systems, IPDPS 2003
  • H. Sundell and P. Tsigas, Scalable and Lock-Free
    Concurrent Dictionaries, SAC 2004
  • Later publications
  • M. Fomitchev and E. Ruppert, Lock-free linked
    lists and skip lists, PODC 2004
  • K. Fraser, Practical lock-freedom, PhD thesis,
    2004

22
New Lock-Free Skip List !
  • The thread that fulfils the deletion of a node
    removes the next pointer when finished.
  • Allows other threads to traverse through even
    marked next pointers.
  • If not possible to traverse forward, go back to
    the remembered position on previous (upper)
    levels.
  • Helps deletions-in-progress only when absolutely
    necessary.
  • Works with a modified version of Michaels Hazard
    Pointer memory management!

23
Correctness
  • Linearizability (Herlihy 1991)
  • In order for an implementation to be
    linearizable, for every concurrent execution,
    there should exist an equal sequential execution
    that respects the partial order of the operations
    in the concurrent execution

24
Correctness
  • Define precise sequential semantics
  • Define abstract state and its interpretation
  • Show that state is atomically updated
  • Define linearizability points
  • Show that operations take effect atomically at
    these points with respect to sequential semantics
  • Creates a total order using the linearizability
    points that respects the partial order
  • The algorithm is linearizable

25
Memory Consistency and Out-Of-Order execution
  • Models on actual multiprocessor architectures
    Relaxed Memory Order etc.
  • Must insert special machine instructions (memory
    barriers) to enforce stronger memory consistency
    models!

Ti
W(x,1)
R(y)0
W(x,0)
R(y)1
Tj
W(y,0)
W(y,1)
R(x)1
Tk
R(x)1
R(y)1
R(x)0
t
26
Experiments
  • Experiment with 1-32 threads performed on Sun
    Fire 15K with 48 cpus.
  • Each thread performs 20000 operations, whereof
    the first total 50-10000 operations are Inserts,
    remaining are equally randomly distributed over
    Insert, FindKey and DeleteKeys.
  • Fixed Skiplist maximum level of 10.
  • Compare with implementations of other skip
    list-based dictionaries and a singly linked list
    by Michael, using same scenarios.
  • Averaged execution time of 10 experiments.

27
(No Transcript)
28
(No Transcript)
29
(No Transcript)
30
(No Transcript)
31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
(No Transcript)
35
Multi-Word Compare-And-Swap
  • Operations
  • bool CASN(int p1, int o1, int n1, )
  • int Read(int p)
  • Standard algoritmic approach
  • 1. Try to acquire a lock on all positions of
    interest.
  • 2. If already taken, help corresponding operation
  • 3. If all taken and all match change status of
    operation
  • 4. Remove locks and possibly write new values
  • My approach
  • Wait-free memory management (IPDPS 2005)
  • Lock stealing and lock hand-over
  • Allow un-sorted pointers

36
(No Transcript)
37
(No Transcript)
38
(No Transcript)
39
(No Transcript)
40
(No Transcript)
41
(No Transcript)
42
(No Transcript)
43
(No Transcript)
44
(No Transcript)
45
(No Transcript)
46
(No Transcript)
47
(No Transcript)
48
Lock-Free Deque
  • Practical algorithms in literature
  • Michael 2003, CAS-based lock-free algorithm for
    shared deques, Euro-Par 2003
  • Sundell and Tsigas, Lock-Free and Practical
    Doubly Linked List-Based Deques using Single-Word
    Compare-And-Swap, OPODIS 2004
  • Approach
  • Apply new memory management on lock-free deque

49
(No Transcript)
50
(No Transcript)
51
Conclusions
  • Work performed at EPCC
  • Improved algorithm of lock-free skip list
  • Improved Michaels hazard pointer algorithm
  • Experiments comparing with other recent
    dictionary algorithms
  • New implementation of CASN.
  • Experiments comparing with other recent CASN
    algorithms.
  • Experiments comparing a lock-free deque algorithm
    using different memory management techniques.
  • Future work
  • Implement new lock-free/ wait-free dynamic data
    structures. More experiments.

52
Questions?
  • Contact Information
  • Address Håkan Sundell Computing
    Science Chalmers University of Technology
  • Email phs_at_cs.chalmers.se
  • Web http//www.cs.chalmers.se/phs
Write a Comment
User Comments (0)
About PowerShow.com