CoXML: A Cooperative XML Query Answering System - PowerPoint PPT Presentation

About This Presentation
Title:

CoXML: A Cooperative XML Query Answering System

Description:

2000-2005. section. spam detection. article. title. year. search engine. 2003. section. spam detection ... (S. Amer-Yahia, et al., 2000) 9/2/09. 6. XML Query ... – PowerPoint PPT presentation

Number of Views:67
Avg rating:3.0/5.0
Slides: 23
Provided by: csU5
Learn more at: http://web.cs.ucla.edu
Category:

less

Transcript and Presenter's Notes

Title: CoXML: A Cooperative XML Query Answering System


1
CoXML A Cooperative XML Query Answering System
  • Shaorong Liu and Wesley W. Chu
  • APWeb/WAIM 2007

2
Motivation
  • XML has become the standard format for
    information representation and data exchange
  • XML schema is usually very complex
  • E.g., the XML schema for the IEEE Computer
    Society publications contains about 170 distinct
    tags and more than 1000 distinct paths
  • It is often unrealistic for users to fully
    understand a schema before asking queries
  • Exact query answering is inadequate and
    approximate query answering is more desirable!

3
Our Contribution CoXML
4
Roadmap
  • Introduction
  • Background
  • CoXML
  • Related Work
  • Conclusion

5
XML Query Relaxation Types
  • Value relaxation enlarging a value conditions
    search scope
  • Node relabel changing the label a node to a
    similar or a more general label by domain
    knowledge

1 Tree Pattern Relaxation (S. Amer-Yahia, et
al., 2000)
6
XML Query Relaxation Types
  • Edge generalization relaxing a / edge to a
    // edge
  • Node deletion dropping a node from a query tree

7
XML Relaxation Properties
  • Definition
  • Relaxation operation an application of a
    relaxation type to a specific query node or edge
  • Lemma
  • Given a query tree with n applicable relaxation
    operations, there are potentially up to 2n
    relaxed trees
  • Possible combinations

8
XML Query Relaxation Challenges
  • Query relaxation is often user-specific
  • Different users may have different approximate
    matching specifications for a given query tree
  • How to provide user-specific approximate query
    answering?
  • A query with n relaxation operations has
    potentially up to 2n relaxed queries
  • How to systematically relax a query?
  • Query relaxation generates a set of approximate
    answers
  • How to effectively rank the returned approximate
    answers?

9
CoXML System Overview
relaxation language
ranked results
query
Relaxation Engine
Ranking Module
results
relaxed query
Relaxation Index Builder
exact answers
query
CoXML
XML Database Engine
10
Roadmap
  • Introduction
  • Background
  • CoXML
  • Relaxation Language
  • Relaxation Index Structure
  • Ranking of Approximate Answers
  • Experimental Studies
  • Related Work
  • Conclusion

11
Relaxation Language
  • A relaxation-enabled query is a tuple T, R, C,
    S
  • T tree-pattern query
  • R relaxation constructs
  • E.g., delete/re-label a node, generalize an edge
  • C relaxation controls
  • E.g., prefer/reject certain relaxation
    operations, use certain relaxation types, control
    relaxation orders, etc
  • S stop condition
  • E.g., the minimum of approximate answers to be
    returned

12
Relaxation Language Example
ltinex_topic topic_id"267" gt ltcastitlegt
//article//fm//atlabout(., "digital
libraries") lt/castitlegt ltdescriptiongt
Articles containing "digital libraries" in their
title. lt/descriptiongt ltnarrativegt I'm
interested in articles discussing Digital
Libraries as their main subject. Therefore I
require that the title of any relevant article
mentions "digital library" explicitly. Documents
that mention digital libraries only under the
bibliography are not relevant, as well as
documents that do not have the phrase "digital
library" in their title. lt/narrativegt lt/inex_to
picgt
13
How to Relax Queries?
  • Naïve approach
  • Generate all possible relaxed queries
    iteratively select the best relaxed query to
    derive approximate answers
  • Exhaustive, but not scalable
  • Observation
  • Many queries share the same (or similar) tree
    structures
  • Our approach relaxation index structure
  • Consider the structure of a query tree T as a
    template
  • Build indexes on the relaxed trees of T
  • Use the index to guide the relaxations of any
    query with the same (or similar) tree structure
    as that of T

14
Relaxation Index Structure - XTAH
  • XTAH
  • A hierarchical multi-level labeled cluster of
    relaxed trees for a given query tree
  • Building an XTAH
  • Given a query structure template T, generate all
    possible relaxed trees
  • Each relaxed trees uses an unique set of
    relaxation operations
  • Cluster relaxed trees into groups based on
    relaxation operations and distances -- similar to
    suffix-tree clustering

15
XTAH Example for Template Structure T
gen(eu, v) relaxing the edge between u and
v del(u) deleting the node u
16
XTAH Properties
  • Each group consists of a set of relaxed trees
    derived from similar relaxation operations
  • The relaxed trees can be located efficiently
    based on the type of relaxation operation
  • The higher level group in the XTAH yields lesser
    relaxation than the lower group
  • Query can be relaxed to different level of
    granularities by traversing up and down the XTAH

17
Ranking of XML Approximate Answers
  • Content similarity cont_sim(A, Q)
  • An extended vector space model 2
  • Structure similarity struct_dist(A, Q)
  • Use tree editing distance for measuring structure
    similarity
  • Propose a cost model that assigns operation cost
    based on relaxation semantics
  • Overall relevancy sim(A, Q)
  • A ranking model combing both content similarity
    and structure distance

? is a small constant between 0 and 1
2 Configurable Indexing and Ranking for XML
Information Retrieval (S. Liu, et al., 2004)
18
Experimental Studies
  • Experiment Setup
  • INEX (INitiative for the Evaluation of Xml) 05
    test collection
  • Document collection
  • Query set
  • Gold standard
  • Evaluation Metrics
  • nxCG (normalized extended cumulative gain)
  • the official evaluation metric used in INEX 05
  • Given a number i (i?1), nxCG_at_i, similar to
    precision_at_i, measures the relative gain users
    accumulated up to the rank i

19
Retrieval performance improvements with semantic
cost model
  • Query set all content-and-structure queries in
    INEX 05

nxCG_at_10 (?, cost model)

Assigning relaxation operation with different
cost based on the similarities of the nodes being
operated improves retrieval performance! nxCG_at_25
and nxCG_at_50 yield similar results
20
Evaluation of Relaxation Control
  • Query topic 267
  • Result

Relaxation control enables the system to provide
answers with greater relevancy!
21
Related Work
  • Relaxation based on schema conversions (LC01,
    LMC01, LMC03)
  • Without structure relaxation
  • Native XML relaxation
  • Proposed structure relaxation types e.g., KS01,
    ACS02
  • Used the relaxation types ACS02 in our work
  • Investigate efficient algorithms for deriving
    top-K answers based on relaxation types e.g,
    Sch02, ACS02, ALP04, AKM05
  • Without relaxation control

22
Conclusion
  • Cooperative XML (CoXML) query answering
  • Relaxation-enabled query language allows users to
    effectively express the relaxed query conditions
    as well as controlling the relaxation process
  • XTAH provides systematic query relaxation
    guidance
  • Used both content and structure similarity
    metrics for evaluating the relevancy of
    approximate answers
  • Evaluation studies with the INEX test collections
    validate the effectiveness of our methodology
Write a Comment
User Comments (0)
About PowerShow.com