Indexing transaction time databases - PowerPoint PPT Presentation

About This Presentation
Title:

Indexing transaction time databases

Description:

Indexing transaction time databases Toga Bozkaya, Meral Ozsoyoglu presented by Priyatham Pamu – PowerPoint PPT presentation

Number of Views:157
Avg rating:3.0/5.0
Slides: 43
Provided by: pamu150
Learn more at: https://infolab.usc.edu
Category:

less

Transcript and Presenter's Notes

Title: Indexing transaction time databases


1
Indexing transaction time databases
  • Toga Bozkaya, Meral Ozsoyoglu
  • presented by
  • Priyatham Pamu

2
Overview
  • The problem addressed
  • Indexing time intervals of temporal objects
    assuming a record based storage model with
    object-versioning.
  • Object-versioning
  • Different versions of the same time-varying
    entity are kept linked to each other (by means of
    time-invariant key attributes in relational
    model, by using pointers in object-oriented data
    model).

3
Overview
  • We consider a transaction time database with two
    states
  • The past state contains the temporal data that
    has once been valid, but is not valid any more
    (i.e. historical data).
  • We suggest to use IB-tree, AD tree, R trees
    depending on the application requirement.
  • The current state contains only the data that is
    valid at the current time.
  • We suggest to use a version of B-tree structure
    called Modified B-tree for indexing the current
    state.
  • We can also use IB-tree structure indexing valid
    time intervals in an effective way.

4
Existing Temporal index structures
  • Time-index
  • AP-tree (Append-only tree)
  • TP-index (Time polygon index)
  • SR-tree (Segment R-tree)
  • Snapshot Index
  • windows indexing scheme

5
Temporal data model
  • A temporal data model can support two types of
    time
  • Transaction time data
  • Bounded by the current time pt and the time when
    the database was initiated.
  • Data is never deleted from a transaction time
    database, and the new data arrives in an
    append-only fashion.
  • Valid time data
  • Validity of some temporal entity may span into
    future - No upper bound
  • Valid time databases are not append-only
    databases.

6
A Temporal Indexing Scheme
  • Assumptions
  • We assume a discrete time model in our scheme.
  • The time points are mapped to natural numbers
    starting from 0 (time when database is initiated)
    to current time point denoted by the variable
    now.
  • In a transaction database, all object versions
    with their transaction time intervals of the form
    a,now (altnow) belong to the current state of
    the database as they are currently valid.
  • All other versions that have transaction time
    intervals of the form a,b (altb and bltnow)
    belong to the past state of the database as their
    validity had ceased at some point in the past.
  • Transaction time databases are append-only
    databases where random deletions or insertions do
    not take place.

7
Temporal Indexing Scheme contd
  • We denote a temporal object version as a 2-tuple,
    (I,V), where I is the transaction time interval
    of the object version V.
  • Operations defined on a transaction time
    databases are
  • Deletion If the current version v of an object o
    is to be deleted (ts,now,v) is deleted from
    the database and (ts,td,v) is inserted to the
    past state where td is the time of deletion.
  • Insertion For insertions, all we do is create
    the initial version v of an object o and insert
    to the current state. (ts,now,v) is inserted to
    the current state where ts is the time of
    insertion.
  • Updates In case of updates, we have to create a
    new most recent version (tu,now,v) of the
    object o. This new version is inserted into the
    current state and the old version (ts,tu-1,v)
    is migrated to the past state.

8
Figure for current state and past state.
9
Indexing the current state
  • Features
  • Unlike past-index, there are deletions as well as
    insertions to the current-index.
  • Deletions are done when object versions migrate
    from the current state to the past state of the
    database.
  • Insertions are done when new object versions are
    inserted into the database.
  • For current-index, starting points of the
    transaction time intervals have to be indexed,
    because the finish points of all such intervals
    are the same (i.e. now). Hence we only need a 1-D
    index structure for indexing the current state.

10
Indexing the current state..
  • Properties specific to the current-state.
  • Insertions come ordered in starting time, and
    deletions are arbitrary.
  • We exploit the above property to come up with a
    B like structure (i.e.MB -tree) that supposedly
    has a higher storage utilization which also
    directly affects the height of the structure and
    hence the search efficiency.

11
MB -tree
  • MB-tree is simply a modified version of B-tree.
  • MB-tree
  • Insertions are done from the right end, and can
    handle deletions from anywhere in the tree.
  • The nodes along the rightmost path of the tree
    can have as few as one child, and the rightmost
    leaf can have as few as one key.
  • The deletion algorithm is same as it is in B
    tree.
  • Depending on the distribution of the duration of
    the transaction time intervals, this structure
    may provide considerable improvement over the
    regular B-tree in both efficiency and storage.

12
MB-tree
13
Indexing the past state
  • Three different indexing tree structures are
    proposed.
  • Interval B-trees
  • ADtrees
  • One and two-dimensional R-trees.
  • These index structures meet different
    requirements for different applications.
  • In past state, we dont have any dynamic
    deletions other than vacuuming. It is an
    append-only database.
  • Since IB -trees are similar to Interval trees,
    we discuss Interval trees first.

14
Interval-trees
  • Interval-tree is a binary tree (AVL or Red-black
    tree...) that is augmented to support operations
    on a dynamic set of intervals.
  • A node x of an Interval-tree contains an interval
    (intx), and the key of x is the starting point
    of that interval.
  • An inorder tree walk of the data structure lists
    the intervals in sorted order by their starting
    points. In addition, each node contains the
    maximum finish point of the intervals stored in
    the subtree rooted at that node.
  • Insertions and Deletions can be done in O(log2n).

15
A balanced interval tree figure
16
INTERVAL-SEARCH
  • INTERVAL-SEARCH(T,I)
  • (For a given interval Iis,if, find an
    interval that intersects with I in the
    interval-tree T.) (leftx and rightx stand for
    the left and right child of a node x)
  • (1)xroot(T),
  • (2) while x!NIL and I does not intersect the
    interval intx do
  • (2.1) if leftx ! NIL and maxleftx gtis,
    then xleftx
  • (2.2) else xrightx
  • (3) Return x if it is not NIL.

17
Interval B-tree structure
  • IB-tree is a direct generalization of the
    Interval-tree to a multi-way B-tree structure.
    It is basically a B-tree on the starting points
    of intervals where each node is augmented with
    the same kind of information as binary
    Interval-trees.
  • Unlike in Interval-trees, internal nodes of
    IB-trees do not keep data intervals. All data
    intervals are kept in the leaf nodes.
  • Number of children is equal to the number of
    keys.
  • Refer to the figure for internal node structure.

18
Internal IB node structure figure.
19
INTERVAL-SEARCH for IB-tree
  • INTERVAL-SEARCH (N,I)
  • (For a given search interval Iis,if, find an
    interval that intersects with I in the Interval
    B-tree T. Here, N is node of the Interval
    B-tree and the initial call is
    INTERVAL-SEARCH(root(T),I).)
  • INTERVAL-SEARCH(root(T),I)
  • (We assume that N has k children (if an internal
    node), or k data items (if a leaf node))
  • (1) If N is a leaf node then check if there is an
    intersection interval with I among the intervals
    in N.
  • (2)else if N is an internal node then
  • (2.1) i1
  • (2.2) if I intersects ai,mi then
    INTERVAL-SEARCH(ci,I)
  • else if iltk then ii1, goto 2.2.

20
INTERVAL-SEARCH
  • Note that INTERVAL-SEARCH algorithm returns one
    interval (if there exists atleast one) that
    intersects with the given search interval.
  • To obtain all the intersecting intervals
  • we have to use the links between the leaf nodes
    for a sequential search from that point on. Since
    the intervals in the leaf nodes are sorted as per
    their starting points, so it is efficient.
  • Or we can follow all the child pointers that
    satisfy condition in step2.2 of the algorithm.

21
Insertions and Deletions in IBtree
  • Insertion and deletion operations for IBtree are
    similar to those for Btrees with the only
    exception of a little overhead to maintain the
    augmented information.
  • For every merge operation done during insertion
    or deletion, the maximum fields for all the nodes
    along the ancestral path have to be updated
    accordingly.
  • Complexity of insertion and deletion operations
    for IBtrees is still O(logkn) (the same as
    B-trees), where n is the number of leaf nodes
    and k is the average fanout of a node in the
    tree.

22
IB-tree figure.
23
Storage Utilization of IB-tree
  • Storage utilization of IB tree is similar to
    Btree.
  • Let size of a node (leaf or internal) be Mbytes
  • Let each pointer take p bytes while a key value
    is taken as k bytes.
  • Max number of entries in leaf node is P1M/(2kp)
    where the tuple identifier is p bytes.
  • Max number of entries in internal node is
    PiM/(2kp) which same as P1.
  • If there are N intervals to index, there will be
    (N/P1 ln2) leaf nodes as each node is shown to
    be ln2 full on the average.
  • Number of internal nodes (N/P1 ln2) (1/(P1 ln2)
    1/(P1 ln2)2..(h-1) terms
  • N/(P1 ln2 1) approximately.

24
Comparison with a regular B tree.
  • A regular B tree with no augmentation
    information will take less little number of pages
    (for internal nodes) as Pi would be (Mk)/(kp)
    (children is one more than keys). As fanout
    increases, the difference between a IB and B is
    very insignificant.
  • Height comparison
  • hIB-tree/hB-tree (ln Pi(B-tree) ln ln2)/(ln
    Pi(IBtree) ln ln2)
  • Which is very close to 1. For p8, k4, M2048,
    Pi(IBtree) is 128, and Pi(B-tree) becomes 171.
    In this case, the height ratio becomes 1.06 app.
  • Observation The height of B-tree and IB-tree
    structure built on the same set of interval data
    will most likely be the same.

25
AD-trees AD-trees
  • We propose a one-dimensional AD-tree structure
    for indexing transaction time intervals with
    respect to their finish points.
  • AD tree structure is simply an augmented
    AD-tree.
  • AD-tree is built on finish points of the data
    intervals and is augmented with minimum starting
    point information to obtain AD structure.
  • Since AD and AD-trees are similar, we just
    discuss the features of AD-tree.
  • Since the finish points of the insertions are
    ordered, we do the insertion at the right most
    node.

26
AD-tree properties
27
AD-tree with k4 after an insertion and a deletion
28
(No Transcript)
29
Deletion in AD-trees figure.
30
AD-tree vs MB-tree
  • Lemma1 The minimum number of keys in an AD-tree
    of height h with order k (maximum fanout) is
    (k-1)kh-2 2, where k,hgt2.
  • Lemma 2 The minimum number of keys in an
    MB-tree of height h with order k (maximum
    fanout) is (floor(k/2)kh-2) 2, where k,hgt2
  • Theorem The worst case density of an AD-tree of
    height h with order k is more than that of an
    MB-tree of the same height and the same order,
    (kgt3).
  • Experimental results have shown that the height
    of AD-tree increases slower than the MB-tree
    when we assume they index the same set of keys
    and have the same parameters.

31
R-trees
  • 1D R-tree
  • Each internal node entry contains a minimum
    bounding interval (MBI) and a child pointer.
  • The deletion, insertion, and search algorithms of
    the general R-trees are not changed.
  • 2D R-trees, each interval is mapped to a point in
    a 2D-space where the dimensions are the starting
    pt. and the finish pt.
  • In each internal nodes, each entry contains a
    child pointer and a MBR that encloses the 2D
    points indexed below in the corresponding
    subtree.
  • R-tree does not assume, no consider any ordering
    among the data intervals as in IB tree.
  • 2D R-trees require more storage and perform worse
    for common intersection queries.

32
Temporal relationships between intervals
33
Querying the past state
  • Queries on intervals employ different temporal
    operators.
  • Depending on the querying requirement, different
    operators can be applied.
  • Operators after, met by, right overlaps, left
    covered by, right covered by, right covers,
    equals, covered by are well supported by
    IB-tree.
  • Operators before, meets, left overlaps, left
    covers, right-covered by, equals, covered by,
    left covered by are well supported by AD-tree.
  • All the operators either invoke an intersection
    or an inclusion search in 1-D R-tree structure.
    All the operators are uniformly supported by
    R-trees
  • For 2D R-trees, each of the operators corresponds
    to a 2D query region as shown in fig.

34
Temporal relationships in 2D space.
35
Experimental results. AD-trees
36
Experimental results .
  • Fig a.
  • MB-tree performs better than Btree when its
    height is smaller than the height of Btree. When
    they have the same height, Btree performs better
    than MBtree.
  • Fig b.
  • B outperforms MBtree when they have same
    height.
  • AD-tree outperforms both the structures.
  • In terms of insertion performance, AD-tree and
    MBtree are almost same, but Btree performs
    worse than the other two.
  • When deletions are not strictly from the left
    hand MB-tree performs better than B-tree in all
    categories.

37
Experimental results IB R-trees
38
Experimental results IB R-trees
  • Insertion and deletion in IBtrees are simpler
    and less costly compared to R-trees.
  • In Fig a.
  • For 2D R-trees, this operation is a window query
    so it is not very efficient.
  • For IB trees, this is related to the fact that
    the starting points of all qualifying intervals
    fall in the query interval. Thus it performs
    best.
  • In fig b.
  • For 1D R-trees, all nodes whose MBIs include the
    finish point of the query interval have to be
    retrieved. This becomes costly as MBI size
    increases.
  • Similarly with 2d R-trees.

39
Experimental results IB R-trees
40
Experimental results IB R-trees
  • In Fig a.
  • 1D R-tree performed the best with 30-40 edge
    over IB-tree and 30 over 2D R-trees.
  • Sequential search method and the range search
    method of IB-tree performed very close each
    other with the sequential search method slightly
    better than range search method.
  • In Fig b.
  • 1D R-tree performed the best.
  • Exponential distribution affects the way the
    augmented information in an IB-tree is utilized.
  • IB-tree range search performed better than the
    sequential search method.

41
Indexing valid time databases
  • For valid time data, we need to use indexing
    structures that support dynamic insertions and
    deletions.
  • IB-trees can also support indexing on valid time
    intervals, including the intervals that span into
    the current time and into future.
  • We make the following assumptions
  • Valid time intervals should have an absolute
    (fixed) starting time point, and the finish
    points of the valid time intervals can either be
    an absolute time point or the current time
    variable now.

42
Indexing valid-time intervals
Write a Comment
User Comments (0)
About PowerShow.com