XQuery and Information Retrieval - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

XQuery and Information Retrieval

Description:

Can embed structured queries in text queries ... Can embed XQuery Full-Text primitives in XQuery and vice versa. Flexible scoring construct ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 32
Provided by: jayavelsha
Category:

less

Transcript and Presenter's Notes

Title: XQuery and Information Retrieval


1
XQuery and Information Retrieval
  • Jayavel Shanmugasundaram
  • Cornell University
  • (Invited Expert XQuery Full-Text Task Force)

2
Motivation
  • A key benefit of XML is its ability to represent
    a mix of structured and unstructured (text) data
  • Applications
  • Digital libraries
  • Content management
  • Many such XML repositories already available
  • IEEE INEX collection
  • Library of Congress documents
  • Shakespeares plays
  • SIGMOD, DBLP,

3
Example XML Document
ltbook isbn3647593 year1995gt
ltauthorgtElina Roselt/titlegt ltcontentgt
ltparagt The usability of
software measures how well the software
provides support for quickly achieving
specified goals. lt/paragt
ltparagt The users must not only
be well served but must feel well
served. ltparagt
lt/contentgt lt/bookgt
4
Current Query Languages
  • Current XML query languages are mostly database
    languages
  • Examples XQuery, XPath
  • Provide very rudimentary text/IR support
  • fncontains(e, keywords)
  • Returns true iff element e contains keywords
  • No support for complex IR queries
  • Distance predicates, stemming, scoring,

5
Example Queries
  • From XQuery Full-Text Use Cases Document
  • Find the titles of the books whose body contains
    the phrases Usability and Web site in that
    order, in the same paragraph, using stemming if
    necessary to match the tokens
  • Find the titles of the books whose body contains
    Usability and testing within a window of 3
    words, and return them in score order

6
Why not use SQL/MM (or variant)?
  • Key difference No strict demarcation between
    structured and text data in XML
  • Can issue structured and text queries over same
    data
  • Find books with year gt 1995
  • Find books containing keyword 1998
  • Can embed structured queries in text queries
  • Find books that contain the keywords that occur
    in the title of Richard Dawkins books
  • Other important differences
  • XML/XQuery data model
  • Composability of full-text primitives

7
Outline
  • XQuery Full-Text Language
  • Research Challenges
  • Conclusion

8
XQuery Full-Text
  • Full-text search extension to XQuery
  • W3C Working Draft
  • Tightly integrated with the XQuery data model
  • Provides well defined model for reasoning about
    full-text operations and integration with XQuery
  • Composability
  • Fully composable full-text primitives, including
    Boolean connectives, distance predicates,
    stemming
  • Can embed XQuery Full-Text primitives in XQuery
    and vice versa
  • Flexible scoring construct

9
XQuery Full-Text Evolution
Quark Full-TextLanguage (Cornell)
2002
IBM, Microsoft,Oracle proposals
TeXQuery(Cornell, ATT)
2003
XQuery Full-Text
2004
XQuery Full-Text (Second Draft)
2005
10
Design Goals
  • Should be able to specify the following
  • Context spec Defines nodes over which full-text
    search is to be performed. e.g., book chapters
  • Return spec Defines nodes that are to be
    returned to users. e.g., book titles
  • Search spec Defines full-text search condition.
    e.g., and, or, proximity, stemming
  • Scoring spec Define how results are to be
    scored. e.g., how keywords should be weighted

11
Syntax Overview
  • Two new XQuery constructs
  • FTContainsExpr
  • Expresses Boolean full-text search predicates
  • Seamlessly composes with other XQuery expressions
  • FTScoreClause
  • Extension to FLWOR expression
  • Can score FTContainsExpr and other expressions

12
Outline
  • XQuery Full-Text Language
  • FTContainsExpr
  • FTScoreClause
  • Research Challenges
  • Conclusion

13
FTContainsExpr
  • Like other XQuery expressions
  • Takes in sequences of items (nodes) as input
  • Produces a sequence of items (nodes) as output
  • Can seamlessly compose with other XQuery
    expressions

XqueryExpression
Evaluate to aSequence of items
14
FTContainsExpr
  • ContextExpr ftcontains FTSelection
  • ContextExpr (any XQuery expression) is context
    spec
  • FTSelection is search spec
  • Returns true iff at least one node in ContextExpr
    satisfies the FTSelection
  • Examples
  • //book ftcontains Usability testing
    distance 5
  • //book./content ftcontains Usability with
    stems/title
  • //book ftcontains /articleauthorDawkins/title

15
FTSelection
  • Encapsulates all full-text conditions in
    FTContainsExpr
  • Works in a new data model called AllMatch
  • Operates on positions within XML nodes (more fine
    grained than XQuery data model)
  • Fully composable similar to composition of
    relational (and XML) operators!

FTSelection
Evaluate toAllMatch
16
FTSelection Composability
  • Usability
  • /bookauthorDawkins/title
  • Usability /bookauthorDawkins/title
  • (Usability /bookauthorDawkins/title)
    same sentence
  • (Usability /bookauthorDawkins/title)
    same sentence window 5
  • All of these evaluate to an AllMatch!
  • Allows arbitrary composition of full-text
    primitives

17
FTContextModifier
  • Can be applied on any FTSelection to specify
    aspects such as stemming, thesauri, case, etc.
  • Fully composable with other context modifiers and
    FTSelections
  • Examples
  • Usability testing with stems
  • Usability testing with stems window 5
    without stop words
  • Usability testing with stems window 5
    without stop words case insensitive

18
Outline
  • XQuery Full-Text Language
  • FTContainsExpr
  • FTScoreClause
  • Research Challenges
  • Conclusion

19
FTScoreClause
  • Two alternatives
  • Both extensions to FLWOR clause
  • Alternative 1
  • Score Boolean XQuery expressions, including
    FTContainsExpr
  • Current working draft syntax
  • Alternative 2
  • Score arbitrary XQuery expressions
  • Under discussion

20
Alternative 1
  • FOR
  • LET
  • SCORE var AS Expr (Expr returns Boolean)
  • WHERE
  • ORDER BY
  • RETURN
  • Example
  • FOR b in /pubs/book
  • SCORE s AS
  • b ftcontains software weight 0.8
    testing weight 0.2ORDER BY sRETURN ltresult
    scoresgt b lt/resultgt

In any order
21
Alternative 1
  • FOR
  • LET
  • SCORE var AS Expr (Expr returns Boolean)
  • WHERE
  • ORDER BY
  • RETURN
  • Example
  • FOR b in /pubs/book
  • SCORE s AS
  • b/price lt 10.00ORDER BY sRETURN
    ltresult scoresgt b lt/resultgt

In any order
22
Alternative 1 Analysis
  • Not powerful enough for some XML IR queries
  • Case study XML INEX initiative
  • Want to relax /pubs/book (in addition to
    full-text predicates)
  • Boolean scoring expressions insufficient

/pubs/book. ftcontains Usability testing
23
Alternative 2
In any order
  • FOR v SCORE s? AT i? IN FUZZY Expr
  • LET
  • WHERE
  • ORDER BY
  • RETURN
  • Example
  • FOR b SCORE s in
  • /pub/book. ftcontains Usability
    testing
  • ORDER BY sRETURN ltresult scoresgt b
    lt/resultgt

24
Alternative 2
In any order
  • FOR v SCORE s? AT i? IN FUZZY Expr
  • LET
  • WHERE
  • ORDER BY
  • RETURN
  • Example
  • FOR b SCORE s in FUZZY
  • /pub/book. ftcontains Usability
    testing
  • ORDER BY sRETURN ltresult scoresgt b
    lt/resultgt

25
Outline
  • XQuery Full-Text Language
  • Research Challenges
  • Conclusion

26
Challenge 1 System Architecture
Integration Layer
XQuery Engine
IR Engine
27
Challenge 1 System Architecture
XQuery IR Engine
28
Challenge 2 Structural Relaxation
  • FOR b SCORE s in FUZZY
  • /pub/book.
    ftcontains Usability with stems
  • ORDER BY s
  • RETURN ltresult scoresgt b lt/resultgt

29
Challenge 3 Search Over Views
LET bookrevs FOR book IN //book
RETURN ltbookrevsgt
book

ltreviewsgt
FOR rev IN //review

WHERE rev/bookid book/id
RETURN
rev
lt/reviewsgt
lt/bookrevsgt
FOR bookrev IN bookrevs SCORE score AS
bookrev ftcontains Usability with stems ORDER
BY score RETURN bookrev
30
Outline
  • XQuery Full-Text Language
  • Research Challenges
  • Conclusion

31
Conclusion
  • Unified querying of structured data and text is
    one of the most promising benefits of XML
  • XQuery Full-Text is a language designed to enable
    this goal
  • Many research challenges
  • System implementation
  • Scoring
  • Requirements of a new class of applications
  • Starting to see research prototypes
  • Quark (Open-source software, Cornell)
  • GalaTeX (Reference implementation, ATT)
Write a Comment
User Comments (0)
About PowerShow.com