Optimistic Intra-Transaction Parallelism on Chip-Multiprocessors - PowerPoint PPT Presentation

1 / 70
About This Presentation
Title:

Optimistic Intra-Transaction Parallelism on Chip-Multiprocessors

Description:

Optimistic Intra-Transaction Parallelism on. Chip-Multiprocessors. Chris Colohan1, ... Transaction chopping (Shasha95) 14. Outline. Introduction. Related work ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 71
Provided by: vldb
Category:

less

Transcript and Presenter's Notes

Title: Optimistic Intra-Transaction Parallelism on Chip-Multiprocessors


1
Optimistic Intra-Transaction Parallelism
onChip-Multiprocessors
  • Chris Colohan1, Anastassia Ailamaki1,
  • J. Gregory Steffan2 and Todd C. Mowry1,3
  • 1Carnegie Mellon University
  • 2University of Toronto
  • 3Intel Research Pittsburgh

2
Chip Multiprocessors are Here!
AMD Opteron
IBM Power 5
Intel Yonah
  • 2 cores now, soon will have 4, 8, 16, or 32
  • Multiple threads per core
  • How do we best use them?

3
Multi-Core Enhances Throughput
Database Server
Users
Cores can run concurrent transactions and improve
throughput
4
Multi-Core Enhances Throughput
Database Server
Users
Can multiple cores improve transaction latency?
5
Parallelizing transactions
DBMS
SELECT cust_info FROM customer UPDATE district
WITH order_id INSERT order_id INTO
new_order foreach(item) GET quantity FROM
stock quantity-- UPDATE stock WITH
quantity INSERT item INTO order_line
  • Intra-query parallelism
  • Used for long-running queries (decision support)
  • Does not work for short queries
  • Short queries dominate in commercial workloads

6
Parallelizing transactions
DBMS
SELECT cust_info FROM customer UPDATE district
WITH order_id INSERT order_id INTO
new_order foreach(item) GET quantity FROM
stock quantity-- UPDATE stock WITH
quantity INSERT item INTO order_line
  • Intra-transaction parallelism
  • Each thread spans multiple queries
  • Hard to add to existing systems!
  • Need to change interface, add latches and locks,
    worry about correctness of parallel execution

7
Parallelizing transactions
DBMS
SELECT cust_info FROM customer UPDATE district
WITH order_id INSERT order_id INTO
new_order foreach(item) GET quantity FROM
stock quantity-- UPDATE stock WITH
quantity INSERT item INTO order_line
  • Intra-transaction parallelism
  • Breaks transaction into threads
  • Hard to add to existing systems!
  • Need to change interface, add latches and locks,
    worry about correctness of parallel execution

Thread Level Speculation (TLS) makes
parallelization easier.
8
Thread Level Speculation (TLS)
p
p
q
q
p
q
Sequential
Parallel
9
Thread Level Speculation (TLS)
  • Use epochs
  • Detect violations
  • Restart to recover
  • Buffer state
  • Worst case
  • Sequential
  • Best case
  • Fully parallel

Epoch 1
Epoch 2
p
Violation!
p
p
R2
q
q
p
q
Sequential
Parallel
Data dependences limit performance.
10
A Coordinated Effort
TPC-C
Transactions
DBMS
BerkeleyDB
Hardware
Simulated machine
11
A Coordinated Effort
Choose epoch boundaries
TransactionProgrammer
DBMS Programmer
Remove performance bottlenecks
Hardware Developer
Add TLS support to architecture
12
So whats new?
  • Intra-transaction parallelism
  • Without changing the transactions
  • With minor changes to the DBMS
  • Without having to worry about locking
  • Without introducing concurrency bugs
  • With good performance
  • Halve transaction latency on four cores

13
Related Work
  • Optimistic Concurrency Control (Kung82)
  • Sagas (MolinaSalem87)
  • Transaction chopping (Shasha95)

14
Outline
  • Introduction
  • Related work
  • Dividing transactions into epochs
  • Removing bottlenecks in the DBMS
  • Results
  • Conclusions

15
Case Study New Order (TPC-C)
GET cust_info FROM customer UPDATE district WITH
order_id INSERT order_id INTO
new_order foreach(item) GET quantity FROM
stock WHERE i_iditem UPDATE stock
WITH quantity-1 WHERE i_iditem
INSERT item INTO order_line
  • Only dependence is the quantity field
  • Very unlikely to occur (1/100,000)

16
Case Study New Order (TPC-C)
GET cust_info FROM customer UPDATE district WITH
order_id INSERT order_id INTO
new_order foreach(item) GET quantity FROM
stock WHERE i_iditem UPDATE stock
WITH quantity-1 WHERE i_iditem
INSERT item INTO order_line
GET cust_info FROM customer UPDATE district WITH
order_id INSERT order_id INTO
new_order TLS_foreach(item) GET quantity
FROM stock WHERE i_iditem UPDATE
stock WITH quantity-1 WHERE i_iditem
INSERT item INTO order_line
17
Outline
  • Introduction
  • Related work
  • Dividing transactions into epochs
  • Removing bottlenecks in the DBMS
  • Results
  • Conclusions

18
Dependences in DBMS
19
Dependences in DBMS
  • Dependences serialize execution!
  • Performance tuning
  • Profile execution
  • Remove bottleneck dependence
  • Repeat

20
Buffer Pool Management
CPU
get_page(5)
put_page(5)
Buffer Pool
ref 1
ref 0
21
Buffer Pool Management
CPU
get_page(5)
get_page(5)
put_page(5)
put_page(5)
get_page(5)
Buffer Pool
put_page(5)
TLS ensures first epoch gets page first. Who
cares?
ref 0
22
Buffer Pool Management
  • Escape speculation
  • Invoke operation
  • Store undo function
  • Resume speculation

CPU
get_page(5)
get_page(5)
get_page(5)
put_page(5)
put_page(5)
put_page(5)
get_page(5)
Buffer Pool
put_page(5)
ref 0
23
Buffer Pool Management
CPU
get_page(5)
get_page(5)
get_page(5)
put_page(5)
get_page(5)
Buffer Pool
Not undoable!
ref 0
24
Buffer Pool Management
CPU
get_page(5)
get_page(5)
get_page(5)
put_page(5)
Buffer Pool
ref 0
  • Delay put_page until end of epoch
  • Avoid dependence

25
Removing Bottleneck Dependences
  • We introduce three techniques
  • Delay operations until non-speculative
  • Mutex and lock acquire and release
  • Buffer pool, memory, and cursor release
  • Log sequence number assignment
  • Escape speculation
  • Buffer pool, memory, and cursor allocation
  • Traditional parallelization
  • Memory allocation, cursor pool, error checks,
    false sharing

26
Outline
  • Introduction
  • Related work
  • Dividing transactions into epochs
  • Removing bottlenecks in the DBMS
  • Results
  • Conclusions

27
Experimental Setup
  • Detailed simulation
  • Superscalar, out-of-order, 128 entry reorder
    buffer
  • Memory hierarchy modeled in detail
  • TPC-C transactions on BerkeleyDB
  • In-core database
  • Single user
  • Single warehouse
  • Measure interval of 100 transactions
  • Measuring latency not throughput

28
Optimizing the DBMS New Order
1.25
26 improvement
1
0.75
Time (normalized)
Other CPUs not helping
0.5
Cant optimize much more
Cache misses increase
0.25
0
Sequential
29
Optimizing the DBMS New Order
1.25
1
0.75
Time (normalized)
0.5
0.25
0
This process took me 30 days and lt1200 lines of
code.
Sequential
30
Other TPC-C Transactions
1
0.75
Idle CPU
Failed
Time (normalized)
Cache Miss
0.5
Busy
0.25
0
New Order
Delivery
Stock Level
Payment
Order Status
31
Conclusions
  • A new form of parallelism for databases
  • Tool for attacking transaction latency
  • Intra-transaction parallelism
  • Without major changes to DBMS
  • TLS can be applied to more than transactions
  • Halve transaction latency by using 4 CPUs

32
Any questions?
  • For more information, see
  • www.colohan.com

33
Backup Slides Follow
34
TPC-C Transactions on 2 CPUs
1
0.75
Idle CPU
Failed
Time (normalized)
Cache Miss
0.5
Busy
0.25
0
New Order
Delivery
Stock Level
Payment
Order Status
35
LATCHES
36
Latches
  • Mutual exclusion between transactions
  • Cause violations between epochs
  • Read-test-write cycle ? RAW
  • Not needed between epochs
  • TLS already provides mutual exclusion!

37
Latches Aggressive Acquire
Acquire latch_cnt work latch_cnt--
latch_cnt work (enqueue release)
latch_cnt work (enqueue release)
Commit work latch_cnt--
Commit work latch_cnt-- Release
38
Latches Lazy Acquire
Acquire work Release
(enqueue acquire) work (enqueue release)
(enqueue acquire) work (enqueue release)
Acquire Commit work Release
Acquire Commit work Release
39
HARDWARE
40
TLS in Database Systems
  • Large epochs
  • More dependences
  • Must tolerate
  • More state
  • Bigger buffers

Non-Database TLS
TLS in Database Systems
41
Feedback Loop
for() do_work()
42
Violations Feedback
p
Violation!
p
p
R2
q
q
p
q
Sequential
Parallel
43
Eliminating Violations
0x0FD8? 0xFD20 0x0FC0? 0xFC18
44
Tolerating Violations Sub-epochs
Violation!
q
Sub-epochs
45
Sub-epochs
  • Started periodically by hardware
  • How many?
  • When to start?
  • Hardware implementation
  • Just like epochs
  • Use more epoch contexts
  • No need to check violations between sub-epochs
    within an epoch

Violation!
q
Sub-epochs
46
Old TLS Design
Buffer speculative state in write back L1 cache
CPU
CPU
CPU
CPU
L1
L1
L1
L1
Restart by invalidating speculative lines
Invalidation
Detect violations through invalidations
  • Problems
  • L1 cache not large enough
  • Later epochs only get values on commit

L2
Rest of system only sees committed data
Rest of memory system
47
New Cache Design
CPU
CPU
CPU
CPU
Speculative writes immediately visible to L2 (and
later epochs)
L1
L1
L1
L1
Restart by invalidating speculative lines
Buffer speculative and non-speculative state for
all epochs in L2
L2
L2
Invalidation
Detect violations at lookup time
Rest of memory system
Invalidation coherence between L2 caches
48
New Features
New!
CPU
CPU
CPU
CPU
Speculative state in L1 and L2 cache
L1
L1
L1
L1
Cache line replication (versions)
L2
L2
Data dependence tracking within cache
Speculative victim cache
Rest of memory system
49
Scaling
Time (normalized)
50
Evaluating a 4-CPU system
Parallelized benchmark run on 1 CPU
Original benchmark run on 1 CPU
Without sub-epoch support
1
0.75
Parallel execution
Time (normalized)
0.5
Ignore violations (Amdahls Law limit)
0.25
0
TLS Seq
Baseline
Sequential
No Sub-epoch
No Speculation
51
Sub-epochs How many/How big?
  • Supporting more sub-epochs is better
  • Spacing depends on location of violations
  • Even spacing is good enough

52
Query Execution
  • Actions taken by a query
  • Bring pages into buffer pool
  • Acquire and release latches locks
  • Allocate/free memory
  • Allocate/free and use cursors
  • Use B-trees
  • Generate log entries

These generate violations.
53
Applying TLS
  1. Parallelize loop
  2. Run benchmark
  3. Remove bottleneck
  4. Go to 2

54
Outline
TransactionProgrammer
DBMS Programmer
Hardware Developer
55
TLS Execution
p
Violation!
p
p
R2
q
s
t
56
TLS Execution
p
Violation!
p
p
R2
q
s
t
57
TLS Execution
p
Violation!
p
p
R2
q
58
TLS Execution
p
Violation!
p
p
R2
q
q
59
TLS Execution
p
Violation!
p
p
R2
q
q
60
Replication
p
Violation!
p
p
R2
q
q
q
Cant invalidate line if it contains two epochs
changes
61
Replication
p
Violation!
p
p
R2
q
q
q
q
62
Replication
p
Violation!
p
p
R2
q
q
q
q
  • Makes epochs independent
  • Enables sub-epochs

63
Sub-epochs
p
1a

q
p
p
1b

q
q
p
1c

q

q
1d
p
  • Uses more epoch contexts
  • Detection/buffering/rewind is free
  • More replication
  • Speculative victim cache

64
get_page() wrapper
  • page_t get_page_wrapper(pageid_t id)
  • static tls_mutex mut
  • page_t ret
  • tls_escape_speculation()
  • check_get_arguments(id)
  • tls_acquire_mutex(mut)
  • ret get_page(id)
  • tls_release_mutex(mut)
  • tls_on_violation(put, ret)
  • tls_resume_speculation()
  • return ret

? Wraps get_page()
65
get_page() wrapper
  • page_t get_page_wrapper(pageid_t id)
  • static tls_mutex mut
  • page_t ret
  • tls_escape_speculation()
  • check_get_arguments(id)
  • tls_acquire_mutex(mut)
  • ret get_page(id)
  • tls_release_mutex(mut)
  • tls_on_violation(put, ret)
  • tls_resume_speculation()
  • return ret

? No violations while calling get_page()
66
get_page() wrapper
  • page_t get_page_wrapper(pageid_t id)
  • static tls_mutex mut
  • page_t ret
  • tls_escape_speculation()
  • check_get_arguments(id)
  • tls_acquire_mutex(mut)
  • ret get_page(id)
  • tls_release_mutex(mut)
  • tls_on_violation(put, ret)
  • tls_resume_speculation()
  • return ret

? May get bad input data from speculative thread!
67
get_page() wrapper
  • page_t get_page_wrapper(pageid_t id)
  • static tls_mutex mut
  • page_t ret
  • tls_escape_speculation()
  • check_get_arguments(id)
  • tls_acquire_mutex(mut)
  • ret get_page(id)
  • tls_release_mutex(mut)
  • tls_on_violation(put, ret)
  • tls_resume_speculation()
  • return ret

? Only one epoch per transaction at a time
68
get_page() wrapper
  • page_t get_page_wrapper(pageid_t id)
  • static tls_mutex mut
  • page_t ret
  • tls_escape_speculation()
  • check_get_arguments(id)
  • tls_acquire_mutex(mut)
  • ret get_page(id)
  • tls_release_mutex(mut)
  • tls_on_violation(put, ret)
  • tls_resume_speculation()
  • return ret

? How to undo get_page()
69
get_page() wrapper
  • Isolated
  • Undoing this operation does not cause cascading
    aborts
  • Undoable
  • Easy way to return system to initial state
  • Can also be used for
  • Cursor management
  • malloc()
  • page_t get_page_wrapper(pageid_t id)
  • static tls_mutex mut
  • page_t ret
  • tls_escape_speculation()
  • check_get_arguments(id)
  • tls_acquire_mutex(mut)
  • ret get_page(id)
  • tls_release_mutex(mut)
  • tls_on_violation(put, ret)
  • tls_resume_speculation()
  • return ret

70
Sequential Btree Inserts
4
free
free
1
4
3
2
item
free
item
item
item
item
item
item
free
free
item
item
item
free
free
free
free
free
free
free
free
free
free
free
Write a Comment
User Comments (0)
About PowerShow.com