XML Query Evaluation - PowerPoint PPT Presentation

About This Presentation
Title:

XML Query Evaluation

Description:

Databases with extensions for transferring data between XML documents and themselves ... As a DOM tree, making a walk through all elements in document order easy ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 28
Provided by: rajasekark
Category:
Tags: xml | dom | evaluation | query

less

Transcript and Presenter's Notes

Title: XML Query Evaluation


1
XML Query Evaluation
Based on Slides by Alan Halverson Raghav
Kaushik University of Wisconsin, Madison
2
Evaluating XML queries
  • Native XML database
  • Store and query XML in a native form
  • Tamino, Excelon, XIndice
  • Niagara, Timber
  • XML-enabled databases
  • Databases with extensions for transferring data
    between XML documents and themselves
  • relational/object-oriented/
  • Focus on native XML databases

3
Native XML database
  • An XML document is stored in two formats in
    Niagara
  • As a DOM tree, making a walk through all elements
    in document order easy
  • Actual data, can reconstruct original document
  • Data Manager
  • As inverted lists, mapping terms to document
    locations
  • Index on tag name
  • Index Manager

4
Niagara System Architecture
5
Numbering Scheme
  • Assign to each element a start number, end
    number, and level
  • Allows for easy relationship testing
  • Example B is a child element of A if and only
    if B.start gt A.start B.end lt A.end
  • B.level A.level 1

6
Data Manager
start
  • XKey (docid, elementid) pair

7
Data Manager detail
  • Two types of cursors
  • Child Axis (CA) given an XKey, enumerates child
    elements with optional filtering on a tag name
  • Descendant Axis (DA) enumerates all proper
    descendant elements of an XKey

8
Unnest Algorithm
  • Algorithm uses the Child and Descendant cursors
    provided by the Data Manager
  • Top picture shows in Yellow the elements
    enumerated by a Child cursor opened at element B
  • Bottom picture shows the same for a Descendant
    cursor opened at B

9
Unnest for Path expression queries
10
Unnest for Path expression queries
  • Existential predicate can be handled as a
    separate state machine
  • Positional predicate can be handled by augmented
    state information

11
Index Manager
  • Path expression queries comprise of a series of
    steps
  • Each step has a tag name and a relationship with
    context node
  • Parend-child or Ancestor-descendant
  • Build an inverted index on tagname
  • Associate information with each occurrence to
    check containment relationship

12
Index Manager detail
  • B-tree index keyed by TermID, DocID
  • Each leaf entry contains a 2nd-level index
    pointing to pages of postings
  • Posting (DocId, Start, End, Level)

13
ZigZag Join Example
  • Each list is sorted on (document id, start)
  • Merge based algorithm for evaluating path
    expression queries

Evaluate path A//B
A1 A2 A3 A4
B1 B2 B3 B4 B5 B6 B7
B1
B5
B6
B7
Output
14
ZigZag Join Algorithm
  • Secondary index on start number in the inverted
    lists
  • Utilize this index on the posting lists to skip
    forward over postings in one input list which
    are guaranteed not to match any future postings
    on the other list

15
ZigZag Example
Evaluate path A//B
A1 A2 A3 A4
B1 B2 B3 B4 B5 B6 B7
B1
B5
B6
B7
Output
16
Structure Indexes
  • Inverted lists completely answer single step path
    expression queries
  • Structure indexes can be used to answer more
    complex queries efficiently
  • Can be viewed as a summary of the data graph

17
Data Model Directed Graphs
article
1
2
title
Data on the Web
citedby
section
5
3
section
4

section
title
p
figure
section
6
7
8
9
p
title
10
12
11
15
16
p
title
title
p
17
18

Introduction

img
14
13
title

Graph Representation
18
In-coming paths Backward Direction
article
1
/article//figure/title
citedby
section
5
title
3,4
2
9,12
p
section
figure
title
8
6,10
7,11
p
title
15,17
16,18
title
img
13
14
19
Outgoing Paths Forward Direction
article
1
//section/figure
citedby
section
5
title
3,4
2
9,12
p
section
figure
title
8
6,10
7,11
p
title
15,17
16,18
title
img
13
14
20
Covering
  • Index I
  • Path expression P
  • I covers P if the result of executing P on I and
    taking the union of the extents is correct

21
Path Index Definition
  • Quotient graph obtained from a partition of the
    element nodes

1
5
citedby
extent
title
3,4
2
9,12
7,11
6,10
8
15,17
16,18
13
14
22
Challenge in Path Indexing
  • What is the partition we begin with?
  • How do we group together nodes into equivalence
    classes?
  • Requirements
  • Generic Make index as widely applicable as
    possible
  • Small Make index as small as possible (even the
    data graph is an index according to our
    definition!)

23
Proposal For Index Specification
  • Pick only a subset of tags S to index
  • Attach priority to tree edges over non-tree edges
  • Specify which non-tree (ref) edges R to index
  • Exploit local similarity
  • Tree depth

24
Local Similarity
  • section/section/figure forward look-ahead
    of 2
  • section/figure forward look-ahead of 1
  • section forward look-ahead of 0
  • The idea is to be able to say only care about
    path expressions where the forward look-ahead is
    at most k
  • In other words, group nodes based on local
    structure

25
Integrating Structure Indexes and Inverted Lists
title
article
1
citedby
title
2
XML
section
3,4
5
XML
9,12
section
p
title
figure
E2 ..


8
6,10
7,11
p
title
15,17
16,18
title
img
13
14
26
Using the Modified Lists
article/citedby/XML
article
XML
E4?
E4



section
title
citedby
section
p
title
figure
title
p
title
img
27
Putting it all together
  • A path expression can be split into multiple
    parts
  • Different algorithm can be applied to each part
  • Optimization problem
  • Need good statistics and cost models
Write a Comment
User Comments (0)
About PowerShow.com