BLAS: An Efficient XPath Processing System - PowerPoint PPT Presentation

1 / 27

About This Presentation

Title:

BLAS: An Efficient XPath Processing System

Description:

Ancestor-descendant relationship between the results of the suffix path queries. Query ... query based on D-labeling and the ancestor-descendant relationship. ... – PowerPoint PPT presentation

Number of Views:59

Avg rating:3.0/5.0

Slides: 28

Provided by: DB278

Category:

more less

Transcript and Presenter's Notes

Title: BLAS: An Efficient XPath Processing System

1
BLAS An Efficient XPath Processing System

Zhimin Song
Advanced Database System
Professor Dr. Mengchi Liu

2
Outline

Introduction
BLAS System
Experimental Results
Conclusions

ltProteinDatabasegt
ltProteinEntrygt
ltProteingt
ltNamegt cytochrome c validatedlt/namegt
ltclassificationgt
ltsuperfamilygtcytochrome clt/superfamilygt
lt/classificationgt
lt/proteingt
ltreferencegt
ltrefinfogt
ltauthorsgt
ltauthorgtEvans, M.J.lt/authorgt
lt/authorsgt
ltyeargt2001lt/yeargt
lttitlegt The human somatic cytochrome c gene
lt/titlegt
lt/refinfogt
lt/referencegt
lt/ProteinEntrygt
lt/ProteinDatabasegt

4
Introduction

XML has complex, tree-like structure(nodes).
Languages for Querying XML are based on path
navigation(XPath 1).
Given node ? Child node(Child axis)
Given node ? Descendant node(Descendant axis)

5
Introduction(cont..)

Some techniques were already proposed in order to
improve XPath Processing. For example, D-labeling
which is used to efficiently handle descendant
axis traversal.
What about complex queries including child axis,
branch???
In this case P-labeling is proposed in this
paper. It optimizes an important class of queries
called suffix path queries.

6
BLAS(Bi-LAbeling based System)

Basic definitions
The labeling scheme(Index generator)
Query translator

Basic definitions
BLAS a system for efficiently process complex
queries based D-labeling and P-labeling.
The BLAS deals with a subset of XPath queires
consisting of
Child axis navigation ( / )
Descendant axis navigation ( // )
Branches ( .. )
The evaluation of a path expression P( P )
returns the set of nodes in an XML tree T which
are reachable by P starting from the root of T.
Since P can be evaluated to retrieve a set of XML
nodes, we use Path expression and query
interchangeably.
P Q if and only if P Q.
P Q if and only if P Q

Basic definitions(cont..)
Suffix path expression a path expression P which
optionally begins with a descendant axis
step(//), followed by zero or more child axis
steps (/).
Example //protein/name
Another one /proteinDatabase/proteinEntry/protei
n/name
SP(n) the unique simple path P from the root to
the node n.
So evaluating a suffix path expression Q is to
find all the nodes n such that SP(n) Q.

9
Architecture of BLAS
10

The labeling scheme(Index generator)
D-labeling scheme triplet ltd1,d2,d3gt for a XML
node n(n.d1 lt n.d2) and m(m.d1ltm.d2).
m is a descendant of n if and only if n.d1ltm.d1
and n.d2gtm.d2.
m is a child of n if and only if m is a
descendant of n and n.d31m.d3.
Let d1 and d2 for a node n be the position of the
start tag and end tag.
d3 is set to be the level of n in the XML tree
which is the length of the path from the root to
n.
? D-label will be represented as ltstart,end,levelgt

Example using D-labeling

proteinDatabase
proteinEntry
protein
reference
superfamily
//
refinfo
cytochrome c
//
year
author
Title
Select pDB.start,pDB.end,refinfo.start,refinfo.end
From pDB, refinfo Where pDB.start lt
refinfo.start and pDB.end gt refinfo.end
Evans, M.J.
2001
12

P-labeling Scheme
It is also important to implement child axis
navigation efficiently.
e.g. /proteinDatabase/proteinEntry/protein/name
Target improve / evaluation
Focus on suffix path queries
e.g. //protein/name

Assign each node a numberltp1gt, and each suffix
path an interval ltp1,p2gt such that
For any two suffix paths Q1 and Q2, Q1 is
contained in Q2 if
Q1.p1lt Q2.p1 and Q1.p2gt Q2.p2
A node n is contained in the suffix path Q if
Q.p1lt SP(n).p1 ltQ.p2.
Let Q be a suffix path query. Then
Q n Q.p1 lt n.plabelltQ.p2 when
n.plabelSP(n).p1

P-labeling Construction(algorithm)
Suppose that there are n distinct tags
(t1,t2,.,tn).
Assign / a ratio r0 and each tag ti a ratio ri
such that
r0r1r2.ri 1.
Let ri 1/(n1).
Define the domain of the numbers in a P-label to
be integers in 0, m-1, here m is chosen such
that
mgt , where h is the longest path
in an XML tree.
Algorithms as follows
Path // is assigned an interval(P-label) of lto,
m-1gt.
Partition the interval lt0, m-1gt in tag order
proportional to tis ratio ri, for each path //ti
and child axis navigations ratio r0.
This means we allocate the intervallt0, mr0 -1gt
to / and ltpi, pi1gt to each ti such that (pi1
- pi)/mri and p1/m r0

P-labeling Construction(Example)

Query //protein/name M1012 99 tags Ri0.01
16

Query translatortranslates an input XPath query
into standard SQL.
Query decomposition
Splits the query in to a set of suffix path
queries and records the ancestor-descendant
relationship.
SQL generation
Computes the querys p-labeling and generates a
corresponding subquery in SQL.
SQL composition
The subqueries are combined into a single SQL
query based on D-labeling and the
ancestor-descendant relationship.

Split algorithm
D-elimination(query tree Q)

P//q ? p and //q
Q1
proteinDatabase
proteinEntry
Depth-first traversal
protein
reference
Split p//q into p and //q
Q2
Invokes the B-elimination if branches in Q.
Otherwise, it evaluates Q using P-labels.
refinfo
//
superfamily
year
cytochrome c
Title
2001
Join intermediate results by their D-labels
//
author
Q3
Evans, M.J.
18

B-elimination(query tree Q1)

Pq1,q2.qi/r ? p, //q1, //q2,..,//qi, //r
19
B-elimination(cont..)
Q4
proteinDatabase
proteinEntry
Q7
//
Q5
reference
//
refinfo
Q8
Q9
//
year
//
Title
2001
20

Push up algorithm optimize the branch
elimination (B-elimination).

Since p/qi and p/r are more specific than //qi
and //r,
Then split Pq1,q2,.,qi/r ? p, p/q1, p/q2,
..p/qi, p/r
proteinDatabase
Q4
proteinDatabase
proteinEntry
proteinEntry
proteinDatabase
reference
proteinEntry
refinfo
reference
Q5
proteinDatabase
refinfo
proteinDatabase
proteinEntry
year
reference
proteinEntry
2001
refinfo
protein
title
21

Unfold algorithmA further optimization of
descendant-axis elimination(D-elimination).
There is example as follows
Q2/ProteinDatabase/ProteinEntry/protein//superfam
ilycytochrome c
Q21 /ProteinDatabase/ProteinEntry/protein/classi
fication/
superfamilycytochrome c ,

P//q ? p/r1/q, p/r2/q, .., p/ri/q
22
Experimental Results

Data sets
Query sets
Suffix path queries
Path queries
XPath queries
Query Engine RDBMS or File System

23
Query Execution Time
1 suffix path query 2 path query 3 XPath
query
AAuction P Protein S Shakespeare
Query time for Shakespeare, Protein and Auction
data sets
24
Scalability
The performance of D-labeling, Split and Push up
for the suffix path query
25
Conclusion

P-labeling scheme is proposed to evaluate suffix
path queries efficiently.
BLAS combines P-labeling and D-labeling to
evaluate XPath queries.
BLAS is more efficient because the queries
translated from XPath queries require
fewer disk accesses
fewer joins
Experiments show the effectiveness of BLAS

1J. Clark and S. DeRose. XML Path language
(XPath), November 1999. http//www.w3.org/TR/xpat
h.
13 D. DeHaan, D. Toman, M. Consens, and M. T.
Ozsu. A
comprehensive XQuery to SQL translation
using dynamic interval encoding. In Proceedings
of SIGMOD, 2001.
26 J.-K. Min, M.-J. Park, and C.-W. Chung.
XPRESS A queriable compression for XML data. In
Proceedings of SIGMOD, 2003.

27
Thank you!
Question ?

Write a Comment

User Comments (0)