XML indexing Ak indices - PowerPoint PPT Presentation

About This Presentation

Title:

XML indexing Ak indices

Description:

Approximate index handling. Implementation and testing. Summary. 10/6/09 ... Go for refinements (approximations) similarity. bisimilarity. 10/6/09 ... Approximate ... – PowerPoint PPT presentation

Number of Views:70

Avg rating:3.0/5.0

Slides: 29

Provided by: Rag47

Learn more at: https://cse.buffalo.edu

Category:

more less

Transcript and Presenter's Notes

Title: XML indexing Ak indices

1
XML indexing A(k) indices

- Ragini
Rahalkar
- Roshith Rajagopal

2
Outline

Introduction
Motivation
Labeled graph and index graph
Bisimilarity and A(k) index
Construction of A(k) index
Query Evaluation
Approximate index handling
Implementation and testing
Summary

3
Introduction

Structural summaries
Evaluating Path Expressions
A(K) index
Indexing scheme for large graph data like XML
Not all structure is interesting
Paths longer than k
Smaller and faster
Schemaless data
Competitive for arbitrary path expressions

4
Prior Schemes

1-index Milo, Suciu 1999
NFA rather than DFA (smaller)
split graph nodes into equivalence classes based
on incoming paths from the root
Go for refinements (approximations)
similarity
bisimilarity

5
Limitations of Prior Work

Size
Each and every path is indexed which is not
necessary (does not exploit local similarity)
1-index size can be big too!
Designed to answer queries involving arbitrarily
complex paths, but...
such paths may never show up in queries

6
Labeled Graph

G(Vg, Eg, root, SG, label, oid, value)
Node path and label path
Path expression
Regular language

7
(No Transcript)
8
Index Graph I(G)

Extent of a node
Regular expression execution with I(G)
Safe extent mapping
Containment of results of path expressions
Precise index graph
1-index graph never bigger than data graph
Can be computed in O( m log n )

9
Notion of Bisimilarity

Symmetric and binary relation
For two nodes u and v , u b v if
u and v have same labels
If u is a parent of u, then there is a parent v
of v such that u v and vice versa
Objects 8 and 9 are bisimilar
Objects 21 and 23 are not bisimilar

10
The A(k) index

Local similarity
Using Equivalence class partition
Grouping according to labels
Notion of false paths
Classification by labelbusiness and cultural
Absolute precision and grouping similar data to
allow index size affected by updates in the
values of k

11
(No Transcript)
12
(No Transcript)
13
K-bisimilarity

Defined inductively as
for any two nodes, u and v, u 0 v if u and v
have same labels
u k v iff
u k-1 v and
For every parent u of u and v of v
u k-1 v

14
1
1
A
A
1
1
A
A
B
B
C
2
3
2
3
C
C
2
3
2
3
B
B
C
4
5
D
4
5
4
5
4,5
D
D
D
D
D
D
6,7
E
E
E
E
E
E
7
6
6,7
7
6
A (0)
G
A (1)
A (2) 1-INDEX
15
A(k) index properties

If nodes u and v are k-bisimilar, then the set of
labelpaths of length k into them is the same.
The set of label-paths of length k into an
A(k)-index node is the set of label-paths of
length k into any node in its extent.
The A(k)-index is precise for any simple path
expression of length less than or equal to k.
The A(k)-index is safe, i.e., its result on a
path expression always contains the graph result
for that query.
The (k 1)-bisimulation is either equal to or is
a refinement of the k-bisimulation.
Let v x y be three nodes such that the shortest
path to x from v or to y from v contains more
than k edges. If an edge is added or deleted
going from a node u to v, this update does not
affect the k-bisimilarity relationship between x
and y

16
A(k) index construction

Partitioning compute_k_bisim
Notion of successor of a node
Notion of stability
Two sets of nodes A and B- Partition as
A n SUCC(B)
A SUCC(B)
Computation of k1 bisimulation from k
bisimulation
Copy of k bisimulation divided into equivalence
classes until they are stable with equivalence
classes of k bisimulation
Time O(km) Space- O(m) where m is no of edges

17
Compute_k_bisim(G,k)

Begin
Q and X are each a list of node-sets
Q partition VG by label
X (a copy of) Q
for i1 to k do
foreach X1 in X do
compute Succ(X1)
for each Q1 in Q do
replace Q1 by Q1 n Succ(X1) and Q1-
Succ(X1)
if there was no split then
break
X (a copy of) Q
End

18
Compute_A(k)_index(G,k)

Begin
Compute_k_bisim(G,k)
foreach equiv. class in k-bisimulation do
create an index node I
extI data nodes in the equivalence Class
foreach edge from u to v in G do
Iu index node containing u
Iv index node containing v
if there is no edge from Iu to Iv then
add an edge from Iu to Iv
End

19
Query Evaluation Schemes

Index is queried using regular path expressions.
Path expressions are of the form
P Root.R
Query Evaluation Techniques
Forward Evaluation
Backward Evaluation

20
Query Evaluation Techniques

Forward Evaluation Strategy
Simulation of NFA on the graph
Index graph traversed breadth first , making
corresponding transitions
Backward Evaluation Strategy
Find nodes bearing final labels in R
R evaluated in reverse manner from these nodes
Intuition end of the expression more selective
than the earlier paths, thus processing cheaper

21
Approximate Index Graphs

While evaluating R on Index graph, we add nodes
in the ExtB rather than B to the result set.
A(k) index is safe
Result set for R is superset of the target set in
the data graph.

22
Approximate Index Graphs

When node B is accepted along a path of length
ltK in the A(k) Index Graph , a node in ExtB
must be in the target set of R
When index node accepted by a longer path, the
data node initially added to a maybe set M
instead of result set.
Nodes in M are validated by reverse execution of
the automation on the data graph beginning with
each node in M

23
Implementation Data Structures

Data Graph Representation
Element_HT
Hashtable (NodeID, Element) Pairs
Attribute_HT
Hashtable
(NodeID -1, IDREF Attribute) Pairs
The Key of this hashtable is the NodeID of the
element of this attribute.

24
Implementation

Index Tree
EqClass_HT - Hashtable
(EqClassID, Vector of NodeIDs in that EqClass)
Generated from Compute_K_Bisim
Link Table
Linktable_HT - Hashtable
(EqClassID, Vector of Child EqClassIDs)
Generated from Compute_A(K)_index

25
Sample ResultsSize of Index graph v/s K
26
Summary

Generalization of 1-index
Value of k and tradeoff between the size of the
index graph and accuracy
Small values of k perform better than 1-Index
Future scope
Use in schema extraction and query optimization

27
References

Exploiting Local Similarity for Indexing Paths in
Graph-Structured Data Raghav Kaushik, Ehud Gudes
et all
Index Structures for Path Expressions Milo,
Suciu 1999