Title: Optimized Transaction Time Versioning Inside a Database Engine
1Optimized Transaction Time Versioning Inside a
Database Engine
- Intern Feifei Li, Boston University
- Mentor David Lomet, MSR
2Transaction Time Support
- Provide access to prior states of a database
- Auditing the database
- Querying the historical data
- Mining the pattern of changes to a database
- General approach
- Build it outside the database engine
- Build it inside the database engine
3Overview of A Versioned Database
Page Header
Record B
B.0
B.1
A.1
B.2
Record A
A.0
0
1
Dynamic Slot Array
4Key Challenges
- Timestamping
- Eager timestamping vs. lazy timestamping
- Record takes the transaction commit timestamp
- Recovery of timestamping information when system
crashes - Indexing both current versions and historical
versions simultaneously - Storage utilization
- Query efficiency
5Talk Outline
- Even lazier timestamping
- Deferred-key-split policy in the TSB tree
- Auditing the database
6Talk Outline
- Even lazier timestamping
- Deferred-key-split policy in the TSB tree
- Auditing the database
7Lazy Timestamping
- When do we timestamp records affected by a
transaction? - Maintain a list of updated records and timestamp
them when transaction commits ? may lead to
additional I/Os - Timestamp records when they are accessed by other
queries, updates, page reads and writes later on. - Where to get the timestamping information?
8Volatile timestamp table (VTT) and Persistent
timestamp table (PTT)
1. Ensure that we can recover the timestamping
information if system crashes (VTT is gone!)
Transaction 23 begins
Record A Timestamp TID.23
Insert a record A
Record B Timestamp TID.23
PTT
VTT
Insert a record B
TID Ttime
TID Ttime Refcnt
23 178432
23 NA 0
23 NA 1
23 NA 2
23 178432 2
Transaction commits
Disk
Main memory
9Timestamping the Record
Transaction 45 begins
Record A Timestamp TID.23
Record C Timestamp TID.45
Record A Timestamp 178432
Insert a record C
Record B Timestamp TID.23
Record D Timestamp TID.88
Record D Timestamp 342234
PTT
VTT
TID Ttime
Update record A
TID Ttime Refcnt
23 178432
23 NA 0
23 NA 1
23 NA 2
23 178432 2
88 342234
23 178432 1
Update record D
45 NA 0
45 923121
45 NA 1
45 923121 1
Disk
Main memory
Transaction commits
10The Checkpointing Process
LSN(P)
LSN(U)
EOL
EOL
LSN(U)
LSN(P)
All the log records have been removed from the
log and it is impossible to recover information
earlier than LSN(P).
The dirty pages between LSN(P) and LSN(U) have
been all flushed into the disk prior to our
current checkpoint
The current checkpoint may not finish yet and log
records with LSNs between LSN(U) and EOL are not
guaranteed to be stable yet. Â
kth checkpoint
k-1th checkpoint
k-2th checkpoint
k1th checkpoint
11Garbage Collection
12Lets Be Even More Lazier
- Dont write an entry to PTT when transaction
commits - Piggyback timestamping information to the commit
log record so that we still can recover if
necessary - Batch updates entries from VTT to PTT at the
checkpoint - Why this is better?
- Batch update using one transaction is faster than
write to PTT on a per transaction basis - A lot of entries have their Refcnt down to zero
by the time of checkpointing ? less number of
writes to PTT
13The New Story
Checkpoint
Transaction 76 begins
Record B Timestamp TID.76
Record A Timestamp 178432
Transaction 23 begins
Record A Timestamp TID.23
Insert a record B
Insert a record A
Transaction commits
Update A
Transaction commits
VTT
TID Ttime
TID Ttime Refcnt
76 287544
23223 178432 1
23 178432 0
23 NA 0
23 NA 1
76 NA 1
76 NA 0
76 287544 1
Disk
PTT
Main memory
14Be Careful When Updating the VTT and PTT at the
Checkpoint
15Be Careful When Updating the VTT and PTT at the
Checkpoint
16Improvement
- Each record is 200 bytes
- The database is initialized with 5,000 records
- Generate workload containing up to 10,000
transactions - Each transaction is an insert or an update (to a
newly inserted record by another transaction) - One checkpoint every 500 transactions
- Cost metrics
- Execution time
- Number of writes to PTT
- Number of batched updates
17Execution Time
Audit Mode Always keep everything in PTT
18Number of Writes to PTT
19Batched Update Analysis
20Talk Outline
- Even lazier timestamping
- Deferred-key-split policy in the TSB tree
- Auditing the database
21Time Split B (TSB) Tree
- Indexing both the current version pages and
historical version pages simultaneously - Time split
- Create a new page and historical records in the
current page is pushed into the new page - Key split
- Proceed as the normal B tree key split
- When to do time split and key split?
22What Happens Now
What if the current page exceeds the key split
threshold?
Record C
Insert C but page is full
Record C
Page Header
A.1
B.2
A.0
Historical page
Current page
2
0
1
Dynamic Slot Array
Page Header
Current page
0
1
Dynamic Slot Array
23Why We need a Key Split Threshold?
- Wait till the page is full then do the key split
- Leads to too many time splits and hence lots of
replicas in the historical versions - What is the best value for the key split
threshold? - Too high overall utilization drops
- Too low current version utilization is reduced
- Find a balance
24Could We Do Better?
- Key split immediately follows the time split
- Leads to two pages with utilization
0.5threshksplit - If the new pages are not filled up quickly,
storage utilization is wasted for no good reason - A fix
- Deferring the key split until the next time that
the page requires a key split - Simulate as if a key spit has been performed on
previous occasion as it is in the current
situation
25Deferring the Key Split
What if the current page exceeds the key split
threshold?
Now we key split if last time the page has
already satisfied the key split requirement.
Page is full again.
We still insert the record
Record C
Insert C but page is full
Record C
Update D
Page Header
B.1
A.1
B.2
A.0
A.0
D
Current page
Historical page
2
2
0
1
3
Dynamic Slot Array
Page Header
We use the key split value from the last
occasion when a key split should has happened.
D.0
D.1
Current page
0
1
2
Dynamic Slot Array
26Analytical Result
- We can show the following
Where in is the insertion ratio, up is the update
ratio and cr is the compression ratio.
27The Goal of Our Design
- To ensure that for any particular version the
version utilization is at least kept above a
specified threshold value.
28Experiment
- 50,000 transactions
- Each transaction inserts or updates a record
- Varying the insert / update ratio in the workload
- Each record is 200 bytes
- Utilize the delta-compression technique to
compress the historical versions (as they share a
lot of common bits with newer version)
29Single Version Current Utilization (SVCU)
30Multi-Version Utilization (MVU)
31Talk Outline
- Even lazier timestamping
- Deferred-key-split policy in the TSB tree
- Auditing the database
32Auditing A Database
- Transaction versioning support enables the check
of any prior state of a database - Store the user id in PTT for each transaction
entry - Any change to the database is traceable
- User id is grabbed from the current session that
a transaction belongs to
33Conclusion
- Transaction versioning support inside a database
engine is one step closer to be even more
practical - Other interesting applications that will become
possible now with transaction versioning support?
34Thanks!