XML indexing Ak indices - PowerPoint PPT Presentation

About This Presentation
Title:

XML indexing Ak indices

Description:

Approximate index handling. Implementation and testing. Summary. 10/6/09 ... Go for refinements (approximations) similarity. bisimilarity. 10/6/09 ... Approximate ... – PowerPoint PPT presentation

Number of Views:70
Avg rating:3.0/5.0
Slides: 29
Provided by: Rag47
Learn more at: https://cse.buffalo.edu
Category:

less

Transcript and Presenter's Notes

Title: XML indexing Ak indices


1
XML indexing A(k) indices
  • - Ragini
    Rahalkar

  • - Roshith Rajagopal

2
Outline
  • Introduction
  • Motivation
  • Labeled graph and index graph
  • Bisimilarity and A(k) index
  • Construction of A(k) index
  • Query Evaluation
  • Approximate index handling
  • Implementation and testing
  • Summary

3
Introduction
  • Structural summaries
  • Evaluating Path Expressions
  • A(K) index
  • Indexing scheme for large graph data like XML
  • Not all structure is interesting
  • Paths longer than k
  • Smaller and faster
  • Schemaless data
  • Competitive for arbitrary path expressions

4
Prior Schemes
  • 1-index Milo, Suciu 1999
  • NFA rather than DFA (smaller)
  • split graph nodes into equivalence classes based
    on incoming paths from the root
  • Go for refinements (approximations)
  • similarity
  • bisimilarity

5
Limitations of Prior Work
  • Size
  • Each and every path is indexed which is not
    necessary (does not exploit local similarity)
  • 1-index size can be big too!
  • Designed to answer queries involving arbitrarily
    complex paths, but...
  • such paths may never show up in queries

6
Labeled Graph
  • G(Vg, Eg, root, SG, label, oid, value)
  • Node path and label path
  • Path expression
  • Regular language

7
(No Transcript)
8
Index Graph I(G)
  • Extent of a node
  • Regular expression execution with I(G)
  • Safe extent mapping
  • Containment of results of path expressions
  • Precise index graph
  • 1-index graph never bigger than data graph
  • Can be computed in O( m log n )

9
Notion of Bisimilarity
  • Symmetric and binary relation
  • For two nodes u and v , u b v if
  • u and v have same labels
  • If u is a parent of u, then there is a parent v
    of v such that u v and vice versa
  • Objects 8 and 9 are bisimilar
  • Objects 21 and 23 are not bisimilar

10
The A(k) index
  • Local similarity
  • Using Equivalence class partition
  • Grouping according to labels
  • Notion of false paths
  • Classification by labelbusiness and cultural
  • Absolute precision and grouping similar data to
    allow index size affected by updates in the
    values of k

11
(No Transcript)
12
(No Transcript)
13
K-bisimilarity
  • Defined inductively as
  • for any two nodes, u and v, u 0 v if u and v
    have same labels
  • u k v iff
  • u k-1 v and
  • For every parent u of u and v of v
  • u k-1 v

14
1
1
A
A
1
1
A
A
B
B
C
2
3
2
3
C
C
2
3
2
3
B
B
C
4
5
D
4
5
4
5
4,5
D
D
D
D
D
D
6,7
E
E
E
E
E
E
7
6
6,7
7
6
A (0)
G
A (1)
A (2) 1-INDEX
15
A(k) index properties
  • If nodes u and v are k-bisimilar, then the set of
    labelpaths of length k into them is the same.
  • The set of label-paths of length k into an
    A(k)-index node is the set of label-paths of
    length k into any node in its extent.
  • The A(k)-index is precise for any simple path
    expression of length less than or equal to k.
  • The A(k)-index is safe, i.e., its result on a
    path expression always contains the graph result
    for that query.
  • The (k 1)-bisimulation is either equal to or is
    a refinement of the k-bisimulation.
  • Let v x y be three nodes such that the shortest
    path to x from v or to y from v contains more
    than k edges. If an edge is added or deleted
    going from a node u to v, this update does not
    affect the k-bisimilarity relationship between x
    and y

16
A(k) index construction
  • Partitioning compute_k_bisim
  • Notion of successor of a node
  • Notion of stability
  • Two sets of nodes A and B- Partition as
  • A n SUCC(B)
  • A SUCC(B)
  • Computation of k1 bisimulation from k
    bisimulation
  • Copy of k bisimulation divided into equivalence
    classes until they are stable with equivalence
    classes of k bisimulation
  • Time O(km) Space- O(m) where m is no of edges

17
Compute_k_bisim(G,k)
  • Begin
  • Q and X are each a list of node-sets
  • Q partition VG by label
  • X (a copy of) Q
  • for i1 to k do
  • foreach X1 in X do
  • compute Succ(X1)
  • for each Q1 in Q do
  • replace Q1 by Q1 n Succ(X1) and Q1-
    Succ(X1)
  • if there was no split then
  • break
  • X (a copy of) Q
  • End

18
Compute_A(k)_index(G,k)
  • Begin
  • Compute_k_bisim(G,k)
  • foreach equiv. class in k-bisimulation do
  • create an index node I
  • extI data nodes in the equivalence Class
  • foreach edge from u to v in G do
  • Iu index node containing u
  • Iv index node containing v
  • if there is no edge from Iu to Iv then
  • add an edge from Iu to Iv
  • End

19
Query Evaluation Schemes
  • Index is queried using regular path expressions.
  • Path expressions are of the form
  • P Root.R
  • Query Evaluation Techniques
  • Forward Evaluation
  • Backward Evaluation

20
Query Evaluation Techniques
  • Forward Evaluation Strategy
  • Simulation of NFA on the graph
  • Index graph traversed breadth first , making
    corresponding transitions
  • Backward Evaluation Strategy
  • Find nodes bearing final labels in R
  • R evaluated in reverse manner from these nodes
  • Intuition end of the expression more selective
    than the earlier paths, thus processing cheaper

21
Approximate Index Graphs
  • While evaluating R on Index graph, we add nodes
    in the ExtB rather than B to the result set.
  • A(k) index is safe
  • Result set for R is superset of the target set in
    the data graph.

22
Approximate Index Graphs
  • When node B is accepted along a path of length
    ltK in the A(k) Index Graph , a node in ExtB
    must be in the target set of R
  • When index node accepted by a longer path, the
    data node initially added to a maybe set M
    instead of result set.
  • Nodes in M are validated by reverse execution of
    the automation on the data graph beginning with
    each node in M

23
Implementation Data Structures
  • Data Graph Representation
  • Element_HT
  • Hashtable (NodeID, Element) Pairs
  • Attribute_HT
  • Hashtable
  • (NodeID -1, IDREF Attribute) Pairs
  • The Key of this hashtable is the NodeID of the
    element of this attribute.

24
Implementation
  • Index Tree
  • EqClass_HT - Hashtable
  • (EqClassID, Vector of NodeIDs in that EqClass)
  • Generated from Compute_K_Bisim
  • Link Table
  • Linktable_HT - Hashtable
  • (EqClassID, Vector of Child EqClassIDs)
  • Generated from Compute_A(K)_index

25
Sample ResultsSize of Index graph v/s K
26
Summary
  • Generalization of 1-index
  • Value of k and tradeoff between the size of the
    index graph and accuracy
  • Small values of k perform better than 1-Index
  • Future scope
  • Use in schema extraction and query optimization

27
References
  • Exploiting Local Similarity for Indexing Paths in
    Graph-Structured Data Raghav Kaushik, Ehud Gudes
    et all
  • Index Structures for Path Expressions Milo,
    Suciu 1999

28
  • THANK YOU
Write a Comment
User Comments (0)
About PowerShow.com