Title: Transaction Time Indexing with Version Compression
1Transaction Time Indexing with Version Compression
- D. Lomet, M. Hong, R. Nehme and R. Zhang
- The 34th International Conference on Very Large
Data Bases
2Overview
- A DBMS called Immortal DB with several
improvements over existing systems. - The paper focuses on the internal manipulation of
data, rather than the more abstract issues like
query-language design.
3Outline
- Dealing with compatibility.
- Dealing with full pages.
- Comparing splits.
- Compressing the data.
- Storage utilization.
- Experimental results.
4A good starting point
- To build a temporal database, a minimal
requirement is the compatibility with traditional
databases. - The authors come up with a really clever idea for
compatibility issues version chaining!
5Version chaining
- The traditional DB is obtained simply by
disabling all pointers!
Oldest version
Tom, 25K, Clothing (2000)
Previous version
Tom, 40K, Clothing (2005)
Most recent version
Dynamic slot array
6A closer look
- In implementation, versions chains have to be
stored in memory pages and disk sectors.
John, 30K, Toys (1999)
Tom, 25K, Clothing (2000)
John, 35K, Shoes (2004)
Tom, 40K, Clothing (2005)
USED
John, 40K, Shoes (2007)
FREE
Dynamic slot array
7Adding information
- The tuples and the slots grow in the opposite
directions, much like heap and stack.
John, 30K, Toys (1999)
Tom, 25K, Clothing (2000)
John, 35K, Shoes (2004)
Tom, 40K, Clothing (2005)
John, 40K, Shoes (2007)
Tom, 45K, Clothing (2009)
Tom, 45K, Hardware (2009)
Dynamic slot array
8Deletion
- The clever delete stub trick.
John, 30K, Toys (1999)
Tom, 25K, Clothing (2000)
John, 35K, Shoes (2004)
Tom, 40K, Clothing (2005)
John, 40K, Shoes (2007)
I am a dummy node
Dynamic slot array
9The explosion
- A page finally becomes full or near full.
John, 30K, Toys (1999)
Tom, 25K, Clothing (2000)
John, 35K, Shoes (2004)
Tom, 40K, Clothing (2005)
John, 40K, Shoes (2007)
Tom, 40K, Clothing (2009)
Tom, 40K, Hardware (2009)
xxx
xxx
xxx
xxx
Dynamic slot array
10Dealing with full pages
- In a traditional database, a full page is split
according to the key and then accessed via a
B-tree. - Say a page is split into one with keys lt50 and
the other with keys gt50. - This is called a key-split in this paper.
11Key-splits work, but
- In a temporal database, the current data are
accessed most often. - Itd be nice to time-split. Then put the
current page in fast memory and the history page
in the secondary or tertiary storage! - So most accesses hit in the fast memory!
- Much like caches in computer architecture!
12An implementation issue
- Immortal DB should not be built on top of an
existing DB! - Otherwise, full pages are key-split anyway.
13Performing a time-split
- Say the clock is at t now.
John, 30K, Toys (1999)
John, 30K, Toys (1999)
John, 35K, Shoes (2004)
John, 35K, Shoes (2004)
John, 35K, Shoes (2004t-1)
John, 40K, Shoes (NOW)
Dynamic slot array
Dynamic slot array
14The problem of redundancy
- A query of John back in 2007 should logically go
to the history page!
John, 30K, Toys (1999)
John, 30K, Toys (1999)
John, 35K, Shoes (2004)
John, 35K, Shoes (2004)
John, 40K, Shoes (2007)
John, 40K, Shoes (2007)
John, 40K, Shoes (2007t-1)
Dynamic slot array
Dynamic slot array
15A tradeoff
- Neither splitting method is uniformly superior.
- So how often should we time-split? How about
key-splits?
16Index pages
- We have so far been looking at data pages.
- Recall that in the OS course, the main memory is
paged, with some pages containing page-tables. - The situation is much alike in Immortal DB the
index pages serve as those containing
page-tables!
17Pages must be rectangles
- Because we do time- and key-splits!
Each tuple has a key and a timestamp.
Each page contains several tuples.
A key-split separates a page by adding a
horizontal line.
A time-split separates a page by adding a
vertical line (ignore redundancy for now).
Key
Time
18An index page
Key
C
F
B
A
D
E
Time
D
B
A
E
F
C
Dynamic slot array
19Change-log compression
- Adjacent versions of data are usually similar.
- So its nice to record only their differences.
All except the most recent data are compressed.
Dynamic slot array
20Compression for performance
- The better compression ratio, the less we worry
about redundancy caused by time-splits. - Hence time-splits become free lunch.
- By distributing the larger fraction of splits for
time-splits rather than key-splits, the benefit
of time-splits is amplified!
21Storage Utilization
- The paper considers two measurements.
- Single Version Current Utilization(SVCU) aside
from history pages, what fraction of a page is
devoted to current data? - MultiVersion Total Utilization (MVTU) counting
duplicated data only once, what fraction of a
page is used?
22The higher, the better SVCU
- A typical query goes only to current data.
- Higher SVCU means that a single page contains
richer current data (except for history pages). - So a higher SVCU increases the chance for a
typical query to be resolved within one page!
23The higher, the better MVTU
- A higher MVTU means less waste of space!
24Effects of splits
- Performing time-splits too often decreases MVTU
because of redundant data. - Performing key-splits too often decreases SVCU
because current data get split into two pages. - Again, we see a tradeoff between time- and
key-splits!
25Effect of insertion
- If most accesses are insertions (rather than
updates), then a page will be filled with current
data, so SVCU increases!
Jimmy, 30K, hardware (2009)
Tony, 40K, Accounting (2009)
Claire, 50K, Toy (2009)
Joe, 35K, Toy (2009)
Mary, 42K, advertise (2009)
Dynamic slot array
26Effect of update (1/2)
- If most accesses are updates, then most data will
be compressed, so MVTU increases.
Dynamic slot array
27Effect of update (2/2)
- But updates do not help increase SVCU. Instead,
updates may cause key-splits, which decrease SVCU!
Dynamic slot array
28Effect of compression
- Better compression certainly increases MVTU.
- It also increases SVCU!
- It decreases the need for splits, in particular
key-splits. - So current data are less likely to become
scattered.
29Confirming with experiments (1/2)
30Confirming with experiments (2/2)
31Thank you!