Transaction Time Indexing with Version Compression - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Transaction Time Indexing with Version Compression

Description:

A DBMS called Immortal DB with several improvements over existing systems. ... Change-log compression. Adjacent versions of data are usually similar. ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 32
Provided by: csieN
Category:

less

Transcript and Presenter's Notes

Title: Transaction Time Indexing with Version Compression


1
Transaction Time Indexing with Version Compression
  • D. Lomet, M. Hong, R. Nehme and R. Zhang
  • The 34th International Conference on Very Large
    Data Bases

2
Overview
  • A DBMS called Immortal DB with several
    improvements over existing systems.
  • The paper focuses on the internal manipulation of
    data, rather than the more abstract issues like
    query-language design.

3
Outline
  • Dealing with compatibility.
  • Dealing with full pages.
  • Comparing splits.
  • Compressing the data.
  • Storage utilization.
  • Experimental results.

4
A good starting point
  • To build a temporal database, a minimal
    requirement is the compatibility with traditional
    databases.
  • The authors come up with a really clever idea for
    compatibility issues version chaining!

5
Version chaining
  • The traditional DB is obtained simply by
    disabling all pointers!

Oldest version
Tom, 25K, Clothing (2000)
Previous version
Tom, 40K, Clothing (2005)
Most recent version
Dynamic slot array
6
A closer look
  • In implementation, versions chains have to be
    stored in memory pages and disk sectors.

John, 30K, Toys (1999)
Tom, 25K, Clothing (2000)
John, 35K, Shoes (2004)
Tom, 40K, Clothing (2005)
USED
John, 40K, Shoes (2007)
FREE
Dynamic slot array
7
Adding information
  • The tuples and the slots grow in the opposite
    directions, much like heap and stack.

John, 30K, Toys (1999)
Tom, 25K, Clothing (2000)
John, 35K, Shoes (2004)
Tom, 40K, Clothing (2005)
John, 40K, Shoes (2007)
Tom, 45K, Clothing (2009)
Tom, 45K, Hardware (2009)
Dynamic slot array
8
Deletion
  • The clever delete stub trick.

John, 30K, Toys (1999)
Tom, 25K, Clothing (2000)
John, 35K, Shoes (2004)
Tom, 40K, Clothing (2005)
John, 40K, Shoes (2007)
I am a dummy node
Dynamic slot array
9
The explosion
  • A page finally becomes full or near full.

John, 30K, Toys (1999)
Tom, 25K, Clothing (2000)
John, 35K, Shoes (2004)
Tom, 40K, Clothing (2005)
John, 40K, Shoes (2007)
Tom, 40K, Clothing (2009)
Tom, 40K, Hardware (2009)
xxx
xxx
xxx
xxx
Dynamic slot array
10
Dealing with full pages
  • In a traditional database, a full page is split
    according to the key and then accessed via a
    B-tree.
  • Say a page is split into one with keys lt50 and
    the other with keys gt50.
  • This is called a key-split in this paper.

11
Key-splits work, but
  • In a temporal database, the current data are
    accessed most often.
  • Itd be nice to time-split. Then put the
    current page in fast memory and the history page
    in the secondary or tertiary storage!
  • So most accesses hit in the fast memory!
  • Much like caches in computer architecture!

12
An implementation issue
  • Immortal DB should not be built on top of an
    existing DB!
  • Otherwise, full pages are key-split anyway.

13
Performing a time-split
  • Say the clock is at t now.

John, 30K, Toys (1999)
John, 30K, Toys (1999)
John, 35K, Shoes (2004)
John, 35K, Shoes (2004)
John, 35K, Shoes (2004t-1)
John, 40K, Shoes (NOW)
Dynamic slot array
Dynamic slot array
14
The problem of redundancy
  • A query of John back in 2007 should logically go
    to the history page!

John, 30K, Toys (1999)
John, 30K, Toys (1999)
John, 35K, Shoes (2004)
John, 35K, Shoes (2004)
John, 40K, Shoes (2007)
John, 40K, Shoes (2007)
John, 40K, Shoes (2007t-1)
Dynamic slot array
Dynamic slot array
15
A tradeoff
  • Neither splitting method is uniformly superior.
  • So how often should we time-split? How about
    key-splits?

16
Index pages
  • We have so far been looking at data pages.
  • Recall that in the OS course, the main memory is
    paged, with some pages containing page-tables.
  • The situation is much alike in Immortal DB the
    index pages serve as those containing
    page-tables!

17
Pages must be rectangles
  • Because we do time- and key-splits!

Each tuple has a key and a timestamp.
Each page contains several tuples.
A key-split separates a page by adding a
horizontal line.
A time-split separates a page by adding a
vertical line (ignore redundancy for now).
Key
Time
18
An index page
Key
C
F
B
A
D
E
Time
D
B
A
E
F
C
Dynamic slot array
19
Change-log compression
  • Adjacent versions of data are usually similar.
  • So its nice to record only their differences.

All except the most recent data are compressed.
Dynamic slot array
20
Compression for performance
  • The better compression ratio, the less we worry
    about redundancy caused by time-splits.
  • Hence time-splits become free lunch.
  • By distributing the larger fraction of splits for
    time-splits rather than key-splits, the benefit
    of time-splits is amplified!

21
Storage Utilization
  • The paper considers two measurements.
  • Single Version Current Utilization(SVCU) aside
    from history pages, what fraction of a page is
    devoted to current data?
  • MultiVersion Total Utilization (MVTU) counting
    duplicated data only once, what fraction of a
    page is used?

22
The higher, the better SVCU
  • A typical query goes only to current data.
  • Higher SVCU means that a single page contains
    richer current data (except for history pages).
  • So a higher SVCU increases the chance for a
    typical query to be resolved within one page!

23
The higher, the better MVTU
  • A higher MVTU means less waste of space!

24
Effects of splits
  • Performing time-splits too often decreases MVTU
    because of redundant data.
  • Performing key-splits too often decreases SVCU
    because current data get split into two pages.
  • Again, we see a tradeoff between time- and
    key-splits!

25
Effect of insertion
  • If most accesses are insertions (rather than
    updates), then a page will be filled with current
    data, so SVCU increases!

Jimmy, 30K, hardware (2009)
Tony, 40K, Accounting (2009)
Claire, 50K, Toy (2009)
Joe, 35K, Toy (2009)
Mary, 42K, advertise (2009)
Dynamic slot array
26
Effect of update (1/2)
  • If most accesses are updates, then most data will
    be compressed, so MVTU increases.

Dynamic slot array
27
Effect of update (2/2)
  • But updates do not help increase SVCU. Instead,
    updates may cause key-splits, which decrease SVCU!

Dynamic slot array
28
Effect of compression
  • Better compression certainly increases MVTU.
  • It also increases SVCU!
  • It decreases the need for splits, in particular
    key-splits.
  • So current data are less likely to become
    scattered.

29
Confirming with experiments (1/2)
30
Confirming with experiments (2/2)
31
Thank you!
Write a Comment
User Comments (0)
About PowerShow.com