Optimized Transaction Time Versioning Inside a Database Engine - PowerPoint PPT Presentation

About This Presentation

Title:

Optimized Transaction Time Versioning Inside a Database Engine

Description:

Optimized Transaction Time Versioning Inside a Database Engine Intern: Feifei Li, Boston University Mentor: David Lomet, MSR – PowerPoint PPT presentation

Number of Views:119

Avg rating:3.0/5.0

Slides: 35

Provided by: Fei102

Learn more at: https://users.cs.utah.edu

Category:

more less

Transcript and Presenter's Notes

Title: Optimized Transaction Time Versioning Inside a Database Engine

1
Optimized Transaction Time Versioning Inside a
Database Engine

Intern Feifei Li, Boston University
Mentor David Lomet, MSR

2
Transaction Time Support

Provide access to prior states of a database
Auditing the database
Querying the historical data
Mining the pattern of changes to a database
General approach
Build it outside the database engine
Build it inside the database engine

3
Overview of A Versioned Database
Page Header
Record B
B.0
B.1
A.1
B.2
Record A
A.0
0
1
Dynamic Slot Array
4
Key Challenges

Timestamping
Eager timestamping vs. lazy timestamping
Record takes the transaction commit timestamp
Recovery of timestamping information when system
crashes
Indexing both current versions and historical
versions simultaneously
Storage utilization
Query efficiency

5
Talk Outline

Even lazier timestamping
Deferred-key-split policy in the TSB tree
Auditing the database

6
Talk Outline

Even lazier timestamping
Deferred-key-split policy in the TSB tree
Auditing the database

7
Lazy Timestamping

When do we timestamp records affected by a
transaction?
Maintain a list of updated records and timestamp
them when transaction commits ? may lead to
additional I/Os
Timestamp records when they are accessed by other
queries, updates, page reads and writes later on.
Where to get the timestamping information?

8
Volatile timestamp table (VTT) and Persistent
timestamp table (PTT)
1. Ensure that we can recover the timestamping
information if system crashes (VTT is gone!)
Transaction 23 begins
Record A Timestamp TID.23
Insert a record A
Record B Timestamp TID.23
PTT
VTT
Insert a record B
TID Ttime

TID Ttime Refcnt

23 178432
23 NA 0
23 NA 1
23 NA 2
23 178432 2
Transaction commits
Disk
Main memory
9
Timestamping the Record
Transaction 45 begins
Record A Timestamp TID.23
Record C Timestamp TID.45
Record A Timestamp 178432
Insert a record C
Record B Timestamp TID.23
Record D Timestamp TID.88
Record D Timestamp 342234
PTT
VTT
TID Ttime

Update record A
TID Ttime Refcnt

23 178432
23 NA 0
23 NA 1
23 NA 2
23 178432 2
88 342234
23 178432 1
Update record D
45 NA 0
45 923121
45 NA 1
45 923121 1
Disk
Main memory
Transaction commits
10
The Checkpointing Process
LSN(P)
LSN(U)
EOL
EOL
LSN(U)
LSN(P)
All the log records have been removed from the
log and it is impossible to recover information
earlier than LSN(P).
The dirty pages between LSN(P) and LSN(U) have
been all flushed into the disk prior to our
current checkpoint
The current checkpoint may not finish yet and log
records with LSNs between LSN(U) and EOL are not
guaranteed to be stable yet.
kth checkpoint
k-1th checkpoint
k-2th checkpoint
k1th checkpoint
11
Garbage Collection
12
Lets Be Even More Lazier

Dont write an entry to PTT when transaction
commits
Piggyback timestamping information to the commit
log record so that we still can recover if
necessary
Batch updates entries from VTT to PTT at the
checkpoint
Why this is better?
Batch update using one transaction is faster than
write to PTT on a per transaction basis
A lot of entries have their Refcnt down to zero
by the time of checkpointing ? less number of
writes to PTT

13
The New Story
Checkpoint
Transaction 76 begins
Record B Timestamp TID.76
Record A Timestamp 178432
Transaction 23 begins
Record A Timestamp TID.23
Insert a record B
Insert a record A
Transaction commits
Update A
Transaction commits
VTT
TID Ttime

TID Ttime Refcnt

76 287544
23223 178432 1
23 178432 0
23 NA 0
23 NA 1
76 NA 1
76 NA 0
76 287544 1
Disk
PTT
Main memory
14
Be Careful When Updating the VTT and PTT at the
Checkpoint
15
Be Careful When Updating the VTT and PTT at the
Checkpoint
16
Improvement

Each record is 200 bytes
The database is initialized with 5,000 records
Generate workload containing up to 10,000
transactions
Each transaction is an insert or an update (to a
newly inserted record by another transaction)
One checkpoint every 500 transactions
Cost metrics
Execution time
Number of writes to PTT
Number of batched updates

17
Execution Time
Audit Mode Always keep everything in PTT
18
Number of Writes to PTT
19
Batched Update Analysis
20
Talk Outline

Even lazier timestamping
Deferred-key-split policy in the TSB tree
Auditing the database

21
Time Split B (TSB) Tree

Indexing both the current version pages and
historical version pages simultaneously
Time split
Create a new page and historical records in the
current page is pushed into the new page
Key split
Proceed as the normal B tree key split
When to do time split and key split?

22
What Happens Now
What if the current page exceeds the key split
threshold?
Record C
Insert C but page is full
Record C
Page Header
A.1
B.2
A.0
Historical page
Current page
2
0
1
Dynamic Slot Array
Page Header
Current page
0
1
Dynamic Slot Array
23
Why We need a Key Split Threshold?

Wait till the page is full then do the key split
Leads to too many time splits and hence lots of
replicas in the historical versions
What is the best value for the key split
threshold?
Too high overall utilization drops
Too low current version utilization is reduced
Find a balance

24
Could We Do Better?

Key split immediately follows the time split
Leads to two pages with utilization
0.5threshksplit
If the new pages are not filled up quickly,
storage utilization is wasted for no good reason
A fix
Deferring the key split until the next time that
the page requires a key split
Simulate as if a key spit has been performed on
previous occasion as it is in the current
situation

25
Deferring the Key Split
What if the current page exceeds the key split
threshold?
Now we key split if last time the page has
already satisfied the key split requirement.
Page is full again.
We still insert the record
Record C
Insert C but page is full
Record C
Update D
Page Header
B.1
A.1
B.2
A.0
A.0
D
Current page
Historical page
2
2
0
1
3
Dynamic Slot Array
Page Header
We use the key split value from the last
occasion when a key split should has happened.
D.0
D.1
Current page
0
1
2
Dynamic Slot Array
26
Analytical Result

We can show the following

Where in is the insertion ratio, up is the update
ratio and cr is the compression ratio.
27
The Goal of Our Design

To ensure that for any particular version the
version utilization is at least kept above a
specified threshold value.

28
Experiment

50,000 transactions
Each transaction inserts or updates a record
Varying the insert / update ratio in the workload
Each record is 200 bytes
Utilize the delta-compression technique to
compress the historical versions (as they share a
lot of common bits with newer version)

29
Single Version Current Utilization (SVCU)
30
Multi-Version Utilization (MVU)
31
Talk Outline

Even lazier timestamping
Deferred-key-split policy in the TSB tree
Auditing the database

32
Auditing A Database

Transaction versioning support enables the check
of any prior state of a database
Store the user id in PTT for each transaction
entry
Any change to the database is traceable
User id is grabbed from the current session that
a transaction belongs to

33
Conclusion

Transaction versioning support inside a database
engine is one step closer to be even more
practical
Other interesting applications that will become
possible now with transaction versioning support?

34
Thanks!

Write a Comment

User Comments (0)