Title: Transforming XPath Queries for Bottom-Up Query Processing
1Transforming XPath Queries for Bottom-Up Query
Processing
- Yoshiharu Ishikawa
- Takaaki Nagai
- Hiroyuki Kitagawa
- University of Tsukuba
- ishikawa,kitagawa_at_is.tsukuba.ac.jp
2Presentation Overview
- Background
- Motivation and Our Approach
- The Proximal Nodes Model
- Query Translation
- Translation Example
- Related Work
- Conclusions and Future Work
3Background
- XML content-description language on the Web
- XPath
- pattern-based query language for XML
- extracts XML nodes based on the specified pattern
- has navigational semantics
- XSLT uses XPath for the node specification
- XQuery also uses XPath
4XML Example
ltitemlistgt ltitem category"audio equipment"gt
ltcatalog-infogt lttypegtCD playerlt/typegt
ltmanufacturergtStar Electronicslt/manufacturergt
ltcatalog-nogtCDP-R55Nlt/catalog-nogt
lt/catalog-infogt ltsales-infogt
ltprod-yeargt2001lt/prod-yeargt
ltpricegt125.00lt/pricegt lt/sales-infogt
lt/itemgt ... lt/itemlistgt
5XPath Query
- Sample query Q retrieve prices of CD players
- XPath sentence
- contains location steps separated by "/"
- a location step has the format axisnode_testpre
dicate...predicate - location steps can be abbreviated
- e.g., /descendantfoo ? //foo, /attributebar ?
_at_bar
/itemlist/item_at_category "audio equipment"
catalog-info/type "CD player"/sales-info/price
6Presentation Overview
- Background
- Motivation and Our Approach
- The Proximal Nodes Model
- Query Translation
- Translation Example
- Related Work
- Conclusions and Future Work
7XPath Semantics
- XPath assumes top-down query processing
- Not efficient for large XML databases
- Bottom-up processing is better in some cases
query /article/authorsauthor "Miller"
article
article
top-down
bottom-up
authors
authors
author
author
author
author
"Smith"
"White"
"Chen"
"Miller"
8Bottom-Up Query Processing
- We can process the example query when
- we can determine the specified leaf elements
(i.e., "Miller") with the help of an index, and - we can select the parent for a specific author
node. - We do not need to access all the authors/author
elements
9Our Objective and Approach
- Our Objective
- Efficient bottom-up processing of XPath queries
with the help of index structures - Our Approach
- Use of the proximal nodes model as the underlying
retrieval model - The model enables bottom-up query evaluation
- Development of transformation rules from XPath
queries to proximal nodes expressions
10Presentation Overview
- Background
- Motivation and Our Approach
- The Proximal Nodes Model
- Query Translation
- Translation Example
- Related Work
- Conclusions and Future Work
11The Proximal Nodes Model (1)
- Proposed by Navarro and Baeza-Yates 7 as a
structured document retrieval model - Uses bottom-up query processing approach
- XML data can be treated as nested nodes
- a node corresponds to an element or attribute in
XML - each node has an associated text region (called
the segment) segments can take nested structure - Expressive power and efficiency are well-balanced
- evaluation cost is almost O(n) n is the no. of
nodes
12The Proximal Nodes Model (2)
- The model consists of three components
- Text pattern matching language
- specifies pattern matching conditions
- implementation dependent
- returns a set of the matched nodes
- example "ABC Corporation"
- Retrieval operators based on document structures
- returns a set of nodes for a given element or
attribute name - example chapter, price
- Operators to integrate partial retrieval results
- calculates the result node set from the given
node sets - efficient computation based on segment
relationships
13Proximal Nodes Operators
P and Q are nodes with associated segments
14Example of Proximal Nodes Expression
- Example expression of proximal nodes model
- Query processing steps
- 1. determine the node sets that corresponds to
the elements "item" and "type" using indexes - 2. determine the node set that corresponds to the
pattern "CD player" using an index - 3. compute the result of "same" operator
- 4. compute the result of "with" operator
item with (type same "CD player")
15Presentation Overview
- Background
- Motivation and Our Approach
- The Proximal Nodes Model
- Query Translation
- Translation Example
- Related Work
- Conclusions and Future Work
16Translation Rules (1)
- Supports major XPath patterns
- Based on the XPath semantic description by Wadler
10 - Use of denotational semantics
17Translation Rules (2)
18Translation Rules (3)
19Auxiliary Functions
20Simplification Using the Knowledge of Document
Structure
- If we know the DTD of the target XML, we can
derive more simplified translation results
21Presentation Overview
- Background
- Motivation and Our Approach
- The Proximal Nodes Model
- Query Translation
- Translation Example
- Related Work
- Conclusions and Future Work
22Translation Example
- Original query Q
- Translation result
- t1 item with (item with (category same "audio
equipment")) - t2 catalog-info child t1
- t3 t1 with (t1 with (((type child t2) child t2)
same "CD player")) - t4 sales-info child t3
- ans (((price child t4) child t4) child t3)
child itemlist
/itemlist/item_at_category "audio equipment"
catalog-info/type "CD player"/sales-info/price
23Simplification of Query Plan (1)
- The translated result contains multiple
application of an operator - We can delete redundant operators considering the
operator semantics - Example
- t1 item with (item with (category same "audio
equipment")) ? item with (category same "audio
equipment")
24Simplification of Query Plan (2)
- If we can use the DTD information, we can further
simplify the expressions - Example
- t3 t1 with ((type child (catalog-info child
t1)) same "CD player") ? t1 with ((type in t1)
same "CD player") - Simplified query plan for query Q
- t1 item with (category name "audio equipment")
- ans price in (t1 with ((type in t1) same "CD
player"))
25Presentation Overview
- Background
- Motivation and Our Approach
- The Proximal Nodes Model
- Query Translation
- Translation Example
- Related Work
- Conclusions and Future Work
26Related Work
- Translation of XQL queries into proximal nodes
expressions (Baeza-YatesNavarro 2) - Rewriting techniques for XQL queries (Wood 13)
- Use of document structure for the query
optimization 3,11,12,13 - Optimization of regular path expressions in the
context of semistructured DBs 4,8
27Presentation Overview
- Background
- Motivation and Our Approach
- The Proximal Nodes Model
- Query Translation
- Translation Example
- Related Work
- Conclusions and Future Work
28Conclusions and Future Work
- Conclusions
- Bottom-up processing approach for XPath queries
- Support of major XPath query patterns
- Translation to proximal nodes expressions
- Simplification and optimization techniques
- Future work
- Support of more complete XPath semantics
- Application of hybrid approach (top-down and
bottom-up)