Adaptive Locks: Combining Transactions and Locks for efficient Concurrency

About This Presentation

Title:

Adaptive Locks: Combining Transactions and Locks for efficient Concurrency

Description:

Computing is more multi processor oriented. Explicit multi threading is the most direct way to ... Hard to detect conditions such as deadlocks and races. ... – PowerPoint PPT presentation

Number of Views:84

Avg rating:3.0/5.0

Slides: 30

Provided by: ashis7

Learn more at: http://www.cse.msu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Adaptive Locks: Combining Transactions and Locks for efficient Concurrency

1
Adaptive Locks Combining Transactions and Locks
for efficient Concurrency

Takayuki Usui et all

2
Introduction.

Computing is more multi processor oriented.
Explicit multi threading is the most direct way
to program parallel system (monitor style
programming).
Flip side
Interference between threads.
Hard to detect conditions such as deadlocks and
races.
Hard to get fine grained critical sections and
course grained critical sections reduces
concurrency

3
Alternatives

Transactional Memory.
Advantages
Higher level programming model. No need to know
which locks to acquire.
No need of fine grained delineation of critical
sections.
Disadvantages
Livelocks, slower progress.
High Overhead.

4
Idea

Try to combine the advantages of locks and
transactional memory.
How do the authors propose we do that?
Adaptive Locks

5
What are adaptive locks.

Synchronization mechanism combining locks and
transactions.
Programmer can specify critical sections which
are executed as either mutex locks or atomically
as transactions.

6
How?

atomic (l1)
code
Is equivalent to
atomic code when executing in
transactional mode or
lock (l1) code unlock(l1).

7
How do we decide if it should run as a
transaction or as a mutex lock.

Let us throw out some terminology.
Nominal contention.
Actual contention.
Transactional overhead.

8
Nominal Contention
s.insert(10)
s.insert(20)
Wait
Nominal Contention 1
Acquire lock
Cannot acquire lock
Thread 1
Thread 2
void public synchronized insert(val) ssize
val size
9
Actual Contention
Atomic s.insert(10)
Atomic s.insert(20)
Actual Contention 1
Abort
Starts first
Tries to execute simultaneously
Thread 1
Thread 2
//Thread 1 starts S0 10 // Thread 2 tries
at the same time and Aborts.
10
Transactional Overhead.

How much overhead is incurred when the critical
section executes in transactional mode versus
mutex mode.

11
How are these terms helpful

The authors use these concepts to dynamically
calculate which mode the critical section should
be executed in.
Wait .. Are locks and transactions
interchangeable?
No they are not .. But we will discuss how with
certain high level correctness criteria this can
be handled.

12
Contributions of this paper.

Efficient and effective implementation of
adaptive locks.
Trading some accuracy to make it faster and
reduce overhead.
Define conditions under which transaction and
mutex locks exhibit equivalent behavior.
Evaluate adaptive locks with micro and macro
benchmarks.

13
Programming with adaptive locks

Adaptive locks introduce syntax for a labeled
atomic sections.
al_t lock1
atomic (lock1)
// critical section

14
Some rules for using adaptive locks

Programmer has the burden to make sure that if
all the instances of atomic(lock1) are replaced
by mutex blocks (mutex mode) then the program is
still correct.
Programmer also has the burden to make sure that
if all the critical sections are executed as
transactions (transactional mode) then the
program still runs correctly.

15
More rules ..

All critical sections associated with the same
lock should execute in the same mode.
Mode of nested adaptive lock should be the same
as that of the surrounding lock.
Mode switching can also be done either for
correctness (I/O operations mutex mode) or for
performance.

16
Cost benefit analysis

Remember the terms that we talked about before
Nominal Contention
Actual Contention
Transactional Overhead
The authors use these terms to come up with the
decision making logic.

17
And the winner is

a.o gt c
If this inequality holds then mutex mode is
preferable.
All these factors are computed separately for all
of the locks dynamically.

18
Implementation and Optimizations

Extension of the C language.
Compiler translates it into 2 object code
versions. One for mutex version and one for
transactional version.
Adaptive locks replace regular lock acquisition.
The adaptive lock state is packed into a memory
word.

19
What is contained in the state

Number of threads executing in transactional mode
thrdsInStmMode
Whether lock is in mutex mode mutex mode
Whether mutex lock is held lockheld
Whether we are currently in the process of
changing modes transition.

20
int acquire(al_t lock) int spins 0
int useTransact 0 INC(lock-gtthdsBlocked)
while (1) intptr_t prev,next prev
lock-gtstate if (transition(prev) 0)
if ((useTransact transactMode(lock,sp
ins))) if (lockHeld(prev) 0)
next setMutexMode(prev,0)
next setThrdsInStmMode(next,thrdsInStmMod
e(next)1) if (CAS(lock-gtstate,pre
v,next) prev) break else
next setMutexMode(prev,0)
next setTransition(next,1)
CAS(lock-gtstate,prev,next)
else if (lockHeld(prev) 0
thrdsInStmMode(prev) 0) next
setMutexMode(prev,1) next
setLockHeld(next,1) if
(CAS(lock-gtstate,prev,next) prev) break
else if (mutexMode(prev) 0)
next setMutexMode(prev,1)
next setTransition(next,1)
CAS(lock-gtstate,prev,next)
else if (mutexMode(prev)
0) if (lockHeld(prev) 0)
useTransact 1 next
setThrdsInStmMode(prev,thrdsInStmMode(prev)1)
next setTransition(next,0)
if (CAS(lock-gtstate,prev,next) prev)
break
else if (lockHeld(prev) 0
thrdsInStmMode(prev) 0)
useTransact 0 next
setLockHeld(prev,1) next
setTransition(next,0) if
(CAS(lock-gtstate,prev,next) prev) break
if (spin_thrld lt
spins) Yield() / end while(1) /
DEC(lock-gtthdsBlocked) return useTransact
Acquire is the main routine
21
Performance Optimizations

Threads need to update variables that keep count
and calculate the various statistics for adaptive
reasoning.
Remember a (actual contention).
Instead of updating it all the time, threads do
regular writes to it. Then a shared update
changes the global value.
Of course this can give rise to write-write races
but the authors seem to believe that sporadic
inaccuracies in the statistics are not
significant.
Also to note, inaccuracies in statistics will not
result in wrong program execution but choosing
the other mode to execute the critical sections.

22
Performance Optimizations contd ..

Atomic increment and decrement of variable
locks-gtthdsBlocked is also avoided.
The atomic increment and decrement of this
variable is done only if there is real spinning
else it is not done. This is contrary to the
earlier code which was shown.

23
Performance Optimizations contd ..
int acquire(al_t lock) int spins 0 ...
INC(lock-gtthdsBlocked) while (1) ...
// try to acquire, // break if successful
if (spin_thrld lt spins) Yield()
DEC(lock-gtthdsBlocked) ...
int acquire(al_t lock) int spins 0 ...
while (1) ... // try to acquire,
// break if successful if (spins 0)
INC(lock-gtthdsBlocked) if (spin_thrld
lt spins) Yield() if (0 lt
spins) DEC(lock-gtthdsBlocked) ...
24
Performance Optimizations contd ..

o (optimization overhead) depends on shared
memory updates.
To keep the estimate of o realistic but
inexpensive,
It is calculated at regular intervals.
The number of accesses to memory for that
transaction are noted and multiplied with a
static estimate of much each transaction would
take.

25
Reality Check ..

But hey is interchanging between locks and
transactions legal. Are they equivalent?
Answer No, they are not equivalent.
To be more specific, it depends on the type of
STM system. TL2 which is the STM used by the
authors differentiates between locks and
transactions when they are used interchangeably.

26
No more boring bullets. We are not MBA students
Thread 2 commits but Does not copy the value to
memory
Thread 1 commits and It removes the first item.
Thread 2 eventually Update the value
By that time, r1 and r2 Will see stale values.
27
So how can we fix this

We can make a simple observation from this which
is that there should be a lock for all the shared
memory locations.
Every access to these locations should be done
with the lock held.
This is the standard lockset well-formedness
criteria for multi threaded programs.

28
Some results

Tested with micro and macro benchmarks
Tested with red black trees (STM), splay trees
(mutex locks), fine grained hash tables
adaptive locks were as good as the better
concurrency mechanism.
Tested with (Stanford Transactional Applications
for Multi-Processing).

29
Questions?

Write a Comment

User Comments (0)

About PowerShow.com

Adaptive Locks: Combining Transactions and Locks for efficient Concurrency - PowerPoint PPT Presentation

Adaptive Locks: Combining Transactions and Locks for efficient Concurrency

Computing is more multi processor oriented. Explicit multi threading is the most direct way to ... Hard to detect conditions such as deadlocks and races. ... – PowerPoint PPT presentation