Transaction Time Indexing with Version Compression - PowerPoint PPT Presentation

1 / 31

About This Presentation

Title:

Transaction Time Indexing with Version Compression

Description:

A DBMS called Immortal DB with several improvements over existing systems. ... Change-log compression. Adjacent versions of data are usually similar. ... – PowerPoint PPT presentation

Number of Views:44

Avg rating:3.0/5.0

Slides: 32

Provided by: csieN

Category:

more less

Transcript and Presenter's Notes

Title: Transaction Time Indexing with Version Compression

1
Transaction Time Indexing with Version Compression

D. Lomet, M. Hong, R. Nehme and R. Zhang
The 34th International Conference on Very Large
Data Bases

2
Overview

A DBMS called Immortal DB with several
improvements over existing systems.
The paper focuses on the internal manipulation of
data, rather than the more abstract issues like
query-language design.

3
Outline

Dealing with compatibility.
Dealing with full pages.
Comparing splits.
Compressing the data.
Storage utilization.
Experimental results.

4
A good starting point

To build a temporal database, a minimal
requirement is the compatibility with traditional
databases.
The authors come up with a really clever idea for
compatibility issues version chaining!

5
Version chaining

The traditional DB is obtained simply by
disabling all pointers!

Oldest version
Tom, 25K, Clothing (2000)
Previous version
Tom, 40K, Clothing (2005)
Most recent version
Dynamic slot array
6
A closer look

In implementation, versions chains have to be
stored in memory pages and disk sectors.

John, 30K, Toys (1999)
Tom, 25K, Clothing (2000)
John, 35K, Shoes (2004)
Tom, 40K, Clothing (2005)
USED
John, 40K, Shoes (2007)
FREE
Dynamic slot array
7
Adding information

The tuples and the slots grow in the opposite
directions, much like heap and stack.

John, 30K, Toys (1999)
Tom, 25K, Clothing (2000)
John, 35K, Shoes (2004)
Tom, 40K, Clothing (2005)
John, 40K, Shoes (2007)
Tom, 45K, Clothing (2009)
Tom, 45K, Hardware (2009)
Dynamic slot array
8
Deletion

The clever delete stub trick.

John, 30K, Toys (1999)
Tom, 25K, Clothing (2000)
John, 35K, Shoes (2004)
Tom, 40K, Clothing (2005)
John, 40K, Shoes (2007)
I am a dummy node
Dynamic slot array
9
The explosion

A page finally becomes full or near full.

John, 30K, Toys (1999)
Tom, 25K, Clothing (2000)
John, 35K, Shoes (2004)
Tom, 40K, Clothing (2005)
John, 40K, Shoes (2007)
Tom, 40K, Clothing (2009)
Tom, 40K, Hardware (2009)
xxx
xxx
xxx
xxx
Dynamic slot array
10
Dealing with full pages

In a traditional database, a full page is split
according to the key and then accessed via a
B-tree.
Say a page is split into one with keys lt50 and
the other with keys gt50.
This is called a key-split in this paper.

11
Key-splits work, but

In a temporal database, the current data are
accessed most often.
Itd be nice to time-split. Then put the
current page in fast memory and the history page
in the secondary or tertiary storage!
So most accesses hit in the fast memory!
Much like caches in computer architecture!

12
An implementation issue

Immortal DB should not be built on top of an
existing DB!
Otherwise, full pages are key-split anyway.

13
Performing a time-split

Say the clock is at t now.

John, 30K, Toys (1999)
John, 30K, Toys (1999)
John, 35K, Shoes (2004)
John, 35K, Shoes (2004)
John, 35K, Shoes (2004t-1)
John, 40K, Shoes (NOW)
Dynamic slot array
Dynamic slot array
14
The problem of redundancy

A query of John back in 2007 should logically go
to the history page!

John, 30K, Toys (1999)
John, 30K, Toys (1999)
John, 35K, Shoes (2004)
John, 35K, Shoes (2004)
John, 40K, Shoes (2007)
John, 40K, Shoes (2007)
John, 40K, Shoes (2007t-1)
Dynamic slot array
Dynamic slot array
15
A tradeoff

Neither splitting method is uniformly superior.
So how often should we time-split? How about
key-splits?

16
Index pages

We have so far been looking at data pages.
Recall that in the OS course, the main memory is
paged, with some pages containing page-tables.
The situation is much alike in Immortal DB the
index pages serve as those containing
page-tables!

17
Pages must be rectangles

Because we do time- and key-splits!

Each tuple has a key and a timestamp.
Each page contains several tuples.
A key-split separates a page by adding a
horizontal line.
A time-split separates a page by adding a
vertical line (ignore redundancy for now).
Key
Time
18
An index page
Key
C
F
B
A
D
E
Time
D
B
A
E
F
C
Dynamic slot array
19
Change-log compression

Adjacent versions of data are usually similar.
So its nice to record only their differences.

All except the most recent data are compressed.
Dynamic slot array
20
Compression for performance

The better compression ratio, the less we worry
about redundancy caused by time-splits.
Hence time-splits become free lunch.
By distributing the larger fraction of splits for
time-splits rather than key-splits, the benefit
of time-splits is amplified!

21
Storage Utilization

The paper considers two measurements.
Single Version Current Utilization(SVCU) aside
from history pages, what fraction of a page is
devoted to current data?
MultiVersion Total Utilization (MVTU) counting
duplicated data only once, what fraction of a
page is used?

22
The higher, the better SVCU

A typical query goes only to current data.
Higher SVCU means that a single page contains
richer current data (except for history pages).
So a higher SVCU increases the chance for a
typical query to be resolved within one page!

23
The higher, the better MVTU

A higher MVTU means less waste of space!

24
Effects of splits

Performing time-splits too often decreases MVTU
because of redundant data.
Performing key-splits too often decreases SVCU
because current data get split into two pages.
Again, we see a tradeoff between time- and
key-splits!

25
Effect of insertion

If most accesses are insertions (rather than
updates), then a page will be filled with current
data, so SVCU increases!

Jimmy, 30K, hardware (2009)
Tony, 40K, Accounting (2009)
Claire, 50K, Toy (2009)
Joe, 35K, Toy (2009)
Mary, 42K, advertise (2009)
Dynamic slot array
26
Effect of update (1/2)