Title: An old problem: Time travel in databases
1Skippy Enabling Long-Lived Snapshots of the
Long-Lived Past
Combat Skew with Skippy
Key Idea
For faster scan, create higher-level logs of FEMs
with fewer repeated mappings
Motivation
P1
P2
P1
P2
P1
P3
P3
Skippy Level 1
- Divide mapLog into equal-sized chunks called
nodes - Copy each FEM in a mapLog node into Skippy Level
1 - At the end of each node record an up-link that
points to the next position in Skippy Level 1
where a mapping will be stored - To construct Skippy Level N, recursively apply
the same procedure to the previous Skippy Level - When scanning, follow up-links to Skippy Levels
(a Skippy scan)
- An old problem Time travel in databases
- Retaining past state at logical record level
(ImmortalDB, Postgres) changes arrangement of
current state - File system-level approaches block transactions
to get consistency (VSS) - A new solution Split Snapshots
- Integrates with page cache and transaction
manager to provide disk page-level snapshots - Application declares transactionally-consistent
snapshots at any time with any frequency - Snapshots are retained incrementally using
copy-on-write (COW), without reorganizing
database - All applications and access methods run
unmodified on persistent, on-line snapshots - Achieve good performance in same manner as the
database leverage db recovery to defer snapshot
writes - A new problem How to index copy-on-write split
snapshots?
P1
P1
P2
P1
P1
P1
P2
P1
P3
P3
mapLog
Snap 1
Snap 2
Snap 3
Snap 4
Snap 5
Snap 6
Start
Solid arrows denote pointers Dotted arrows
indicate copying
Indexing Split Snapshots
A Skippy scan that begins at Start(X) constructs
the same SPTX as a mapLog scan
Analysis
- Each snapshot needs its own page table (an SPT)
which points to current-state and COWd pages
- Order of Events
- Snapshot 1 declared
- Page 1 modified
- Snapshot 2 declared
- Page 1 modified again
- Page 2 modified
Expected cost to build SPT factors in
- acceleration
- cost to read sequentially at each level
- cost of disk seeks between each level
P1
P1
P2
Database
Snapshot pages
P1
P3
P2
P1
SPT1
Plot shows time to build SPT versus the number of
Skippy levels for various skews
P2
P3
- P2 is shared by SPT1 and SPT2
- P3 has not been modified so SPT1 and SPT2 point
to P3 into the database
Experimental Evaluation
P1
SPT2
P2
Implemented in Berkeley DB (BDB)
- For efficiency, delay writing mapLog and
Skippies to disk until checkpoint - For safety, leverage existing BDB recovery for
Skippy and snapshot pages
P3
Cost of Scanning to Construct SPT
Updating SPTs on disk would be costly, since one
COW may change the pointers in multiple SPTs
- Setup
- 100M database
- 50K node (holds 2560 mappings, which is 1/10th
the number of database pages) - 10,000rpm disk
- Conclusions
- Skippy could counteract 80/20 skew in 3 levels
- 99/1 has hot section much smaller than node size,
so one level is enough
Accessing Snapshots with mapLog
- Instead of maintaining many SPTs, append mappings
to snapshot pages into a log, the mapLog
(inexpensive to write) - Ordering invariant Mappings retained for
snapshot X are written into mapLog before
mappings retained for snapshot X1 - Construct SPT for snapshot X by scanning for
first-encountered mappings (FEMs) - Any page for which a mapping is not found in
mapLog is still in the database (i.e., has not
been COWd yet)
P1
SPT1
Impact of Taking Snapshots
P2
P3
Snap 1
Snap 2
P1
P1
P2
- Can we create split snapshots with a Skippy index
efficiently? - Plot shows time to complete a single-threaded
updating workload of 100,000 transactions in a
66M database with each of 50/50, 80/20, and 99/1
skews - Skippy contains 5 levels (including mapLog level)
- We can retain a snapshot after every transaction
for a 68 penalty
mapLog
Start
Impact of Skew
- Let overwrite cycle length L be the number of
page updates required to overwrite entire
database of N pages - Overwrite cycle length determines the number of
mappings that must be scanned to construct SPT - For a uniformly random workload, L N ln N (by
the coupon collectors waiting time problem) - Skew in the update workload lengthens overwrite
cycle by introducing many more repeated mappings - For example, skew of 80/20 (80 of updates to 20
of pages) increases L by a factor of 4
References
- Shaull, R., Shrira, L., and Xu, H. Skippy a New
Snapshot Indexing Method for Time Travel in the
Storage Manager. SIGMOD 2008. - Shrira, L., van Ingen, C., and Shaull, R. Time
Travel in the Virtualized Past. SYSTOR 2007. - Shrira, L., and Xu, H. Thresher An Efficient
Storage Manager for Copy-on-write Snapshots.
USENIX 2006. - Shrira L., and Xu, H. Snap Efficient Snapshots
for Back-In-Time Execution. ICDE 2005.
!
Skew hurts