BLAS: An Efficient XPath Processing System - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

BLAS: An Efficient XPath Processing System

Description:

Ancestor-descendant relationship between the results of the suffix path queries. Query ... query based on D-labeling and the ancestor-descendant relationship. ... – PowerPoint PPT presentation

Number of Views:59
Avg rating:3.0/5.0
Slides: 28
Provided by: DB278
Category:

less

Transcript and Presenter's Notes

Title: BLAS: An Efficient XPath Processing System


1
BLAS An Efficient XPath Processing System
  • Zhimin Song
  • Advanced Database System
  • Professor Dr. Mengchi Liu

2
Outline
  • Introduction
  • BLAS System
  • Experimental Results
  • Conclusions

3
  • ltProteinDatabasegt
  • ltProteinEntrygt
  • ltProteingt
  • ltNamegt cytochrome c validatedlt/namegt
  • ltclassificationgt
  • ltsuperfamilygtcytochrome clt/superfamilygt
  • lt/classificationgt
  • lt/proteingt
  • ltreferencegt
  • ltrefinfogt
  • ltauthorsgt
  • ltauthorgtEvans, M.J.lt/authorgt
  • lt/authorsgt
  • ltyeargt2001lt/yeargt
  • lttitlegt The human somatic cytochrome c gene
    lt/titlegt
  • lt/refinfogt
  • lt/referencegt
  • lt/ProteinEntrygt
  • lt/ProteinDatabasegt

4
Introduction
  • XML has complex, tree-like structure(nodes).
  • Languages for Querying XML are based on path
    navigation(XPath 1).
  • Given node ? Child node(Child axis)
  • Given node ? Descendant node(Descendant axis)

5
Introduction(cont..)
  • Some techniques were already proposed in order to
    improve XPath Processing. For example, D-labeling
    which is used to efficiently handle descendant
    axis traversal.
  • What about complex queries including child axis,
    branch???
  • In this case P-labeling is proposed in this
    paper. It optimizes an important class of queries
    called suffix path queries.

6
BLAS(Bi-LAbeling based System)
  • Basic definitions
  • The labeling scheme(Index generator)
  • Query translator

7
  • Basic definitions
  • BLAS a system for efficiently process complex
    queries based D-labeling and P-labeling.
  • The BLAS deals with a subset of XPath queires
    consisting of
  • Child axis navigation ( / )
  • Descendant axis navigation ( // )
  • Branches ( .. )
  • The evaluation of a path expression P( P )
    returns the set of nodes in an XML tree T which
    are reachable by P starting from the root of T.
  • Since P can be evaluated to retrieve a set of XML
    nodes, we use Path expression and query
    interchangeably.
  • P Q if and only if P Q.
  • P Q if and only if P Q

8
  • Basic definitions(cont..)
  • Suffix path expression a path expression P which
    optionally begins with a descendant axis
    step(//), followed by zero or more child axis
    steps (/).
  • Example //protein/name
  • Another one /proteinDatabase/proteinEntry/protei
    n/name
  • SP(n) the unique simple path P from the root to
    the node n.
  • So evaluating a suffix path expression Q is to
    find all the nodes n such that SP(n) Q.

9
Architecture of BLAS
10
  • The labeling scheme(Index generator)
  • D-labeling scheme triplet ltd1,d2,d3gt for a XML
    node n(n.d1 lt n.d2) and m(m.d1ltm.d2).
  • m is a descendant of n if and only if n.d1ltm.d1
    and n.d2gtm.d2.
  • m is a child of n if and only if m is a
    descendant of n and n.d31m.d3.
  • Let d1 and d2 for a node n be the position of the
    start tag and end tag.
  • d3 is set to be the level of n in the XML tree
    which is the length of the path from the root to
    n.
  • ? D-label will be represented as ltstart,end,levelgt

11
  • Example using D-labeling

proteinDatabase
proteinEntry
protein
reference
superfamily
//
refinfo
cytochrome c
//
year
author
Title
Select pDB.start,pDB.end,refinfo.start,refinfo.end
From pDB, refinfo Where pDB.start lt
refinfo.start and pDB.end gt refinfo.end
Evans, M.J.
2001
12
  • P-labeling Scheme
  • It is also important to implement child axis
    navigation efficiently.
  • e.g. /proteinDatabase/proteinEntry/protein/name
  • Target improve / evaluation
  • Focus on suffix path queries
  • e.g. //protein/name

13
  • Assign each node a numberltp1gt, and each suffix
    path an interval ltp1,p2gt such that
  • For any two suffix paths Q1 and Q2, Q1 is
    contained in Q2 if
  • Q1.p1lt Q2.p1 and Q1.p2gt Q2.p2
  • A node n is contained in the suffix path Q if
  • Q.p1lt SP(n).p1 ltQ.p2.
  • Let Q be a suffix path query. Then
  • Q n Q.p1 lt n.plabelltQ.p2 when
    n.plabelSP(n).p1

14
  • P-labeling Construction(algorithm)
  • Suppose that there are n distinct tags
    (t1,t2,.,tn).
  • Assign / a ratio r0 and each tag ti a ratio ri
    such that
  • r0r1r2.ri 1.
  • Let ri 1/(n1).
  • Define the domain of the numbers in a P-label to
    be integers in 0, m-1, here m is chosen such
    that
  • mgt , where h is the longest path
    in an XML tree.
  • Algorithms as follows
  • Path // is assigned an interval(P-label) of lto,
    m-1gt.
  • Partition the interval lt0, m-1gt in tag order
    proportional to tis ratio ri, for each path //ti
    and child axis navigations ratio r0.
  • This means we allocate the intervallt0, mr0 -1gt
    to / and ltpi, pi1gt to each ti such that (pi1
    - pi)/mri and p1/m r0

15
  • P-labeling Construction(Example)

Query //protein/name M1012 99 tags Ri0.01
16
  • Query translatortranslates an input XPath query
    into standard SQL.
  • Query decomposition
  • Splits the query in to a set of suffix path
    queries and records the ancestor-descendant
    relationship.
  • SQL generation
  • Computes the querys p-labeling and generates a
    corresponding subquery in SQL.
  • SQL composition
  • The subqueries are combined into a single SQL
    query based on D-labeling and the
    ancestor-descendant relationship.

17
  • Split algorithm
  • D-elimination(query tree Q)

P//q ? p and //q
Q1
proteinDatabase
proteinEntry
Depth-first traversal
protein
reference
Split p//q into p and //q
Q2
Invokes the B-elimination if branches in Q.
Otherwise, it evaluates Q using P-labels.
refinfo
//
superfamily
year
cytochrome c
Title
2001
Join intermediate results by their D-labels
//
author
Q3
Evans, M.J.
18
  • B-elimination(query tree Q1)

Pq1,q2.qi/r ? p, //q1, //q2,..,//qi, //r
19
B-elimination(cont..)
Q4
proteinDatabase
proteinEntry
Q7
//
Q5
reference
//
refinfo
Q8
Q9
//
year
//
Title
2001
20
  • Push up algorithm optimize the branch
    elimination (B-elimination).

Since p/qi and p/r are more specific than //qi
and //r,
Then split Pq1,q2,.,qi/r ? p, p/q1, p/q2,
..p/qi, p/r
proteinDatabase
Q4
proteinDatabase
proteinEntry
proteinEntry
proteinDatabase
reference
proteinEntry
refinfo
reference
Q5
proteinDatabase
refinfo
proteinDatabase
proteinEntry
year
reference
proteinEntry
2001
refinfo
protein
title
21
  • Unfold algorithmA further optimization of
    descendant-axis elimination(D-elimination).
  • There is example as follows
  • Q2/ProteinDatabase/ProteinEntry/protein//superfam
    ilycytochrome c
  • Q21 /ProteinDatabase/ProteinEntry/protein/classi
    fication/
  • superfamilycytochrome c ,

P//q ? p/r1/q, p/r2/q, .., p/ri/q
22
Experimental Results
  • Data sets
  • Query sets
  • Suffix path queries
  • Path queries
  • XPath queries
  • Query Engine RDBMS or File System

23
Query Execution Time
1 suffix path query 2 path query 3 XPath
query
AAuction P Protein S Shakespeare
Query time for Shakespeare, Protein and Auction
data sets
24
Scalability
The performance of D-labeling, Split and Push up
for the suffix path query
25
Conclusion
  • P-labeling scheme is proposed to evaluate suffix
    path queries efficiently.
  • BLAS combines P-labeling and D-labeling to
    evaluate XPath queries.
  • BLAS is more efficient because the queries
    translated from XPath queries require
  • fewer disk accesses
  • fewer joins
  • Experiments show the effectiveness of BLAS

26
  • 1J. Clark and S. DeRose. XML Path language
    (XPath), November 1999. http//www.w3.org/TR/xpat
    h.
  • 13 D. DeHaan, D. Toman, M. Consens, and M. T.
    Ozsu. A
  • comprehensive XQuery to SQL translation
    using dynamic interval encoding. In Proceedings
    of SIGMOD, 2001.
  • 26 J.-K. Min, M.-J. Park, and C.-W. Chung.
    XPRESS A queriable compression for XML data. In
    Proceedings of SIGMOD, 2003.

27
Thank you!
Question ?
Write a Comment
User Comments (0)
About PowerShow.com