Querying XML Documents - PowerPoint PPT Presentation

1 / 49
About This Presentation
Title:

Querying XML Documents

Description:

Find recipes that use some ingredients in the refrigerator ... baked chicken. marinated chicken. small chickens, cut up. Herbes de Provence. dry white wine ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 50
Provided by: webNc9
Category:

less

Transcript and Presenter's Notes

Title: Querying XML Documents


1
Querying XML Documents
2
Objectives
  • How XML generalizes relational databases
  • The XQuery language
  • How XML may be supported in databases

3
Only Some Trees are Relations
  • They have height two
  • The root has an unbounded number of children
  • All nodes in the second layer (records) have a
    fixed number of child nodes (fields)
  • Trees are ordered, while both rows and columns of
    tables may be permuted without changing the
    meaning of the data

4
An XML tree
5
Relational Tables
6
Query Usage Scenarios
  • Data-oriented
  • to query XML data, like using SQL to query
    relational databases
  • Document-oriented
  • to retrieve parts of documents
  • to provide dynamic indexes
  • to perform context-sensitive searching
  • to generate new documents as combinations of
    existing documents
  • Programming
  • to automatically generate documentation, similar
    to the Javedoc tool
  • Hybrid
  • to retrieve information from hybrid data, such as
    patient records

7
XQuery Design Requirements
  • XML syntax and human-readable
  • Declarative
  • Namespace aware
  • Coordinate with XML Schema
  • Support simple and complex data types
  • Can combine multiple documents
  • Can transform and create XML trees

8
What is XQuery?
  • XQuery is the language for querying XML data
  • XQuery 1.0 is a strict superset of XPath 2.0
  • XQuery 1.0 and XPath 2.0 share the same data
    model and support the same functions and
    operators
  • XQuery has the extra expressive power to
  • join information from different sources and
  • generate new XML fragments
  • XQuery and XSLT are both domain-specific
    languages for combining and transforming XML data
    from multiple sources
  • XQuery is designed from scratch, XSLT is an
    intellectual descendant of CSS
  • Technically, they may emulate each other
  • XQuery is defined by the W3C
  • XQuery is supported by all the major database
    engines (IBM, Oracle, Microsoft, etc.)

9
XQuery Prolog
  • XQuery expressions are evaluated relatively to a
    context, which is explicitly provided by a prolog
  • Declarations specify various parameters for the
    XQuery processor, such as
  • xquery version "1.0"
  • declare xmlspace preserve
  • declare xmlspace strip
  • declare default element namespace URI
  • declare default function namespace URI
  • import schema at URI
  • declare namespace NCName URI

10
Implicit Declarations
  • Declarations implicitly defined in any XQuery
    implementation
  • declare namespace xml
  • "http//www.w3.org/XML/1998/namespace"
  • declare namespace xs
  • "http//www.w3.org/2001/XMLSchema"
  • declare namespace xsi
  • "http//www.w3.org/2001/XMLSchema-instance"
  • declare namespace fn
  • "http//www.w3.org/2005/11/xpath-functions"
  • declare namespace xdt
  • "http//www.w3.org/2005/11/xpath-datatypes"
  • declare namespace local
  • "http//www.w3.org/2005/11/xquery-local-functions"

11
XPath vs. XQuery
  • XPath expressions are also XQuery expressions
  • XPath expressions are required to be evaluated in
    a static context, which must be provided by the
    invoking application
  • The XQuery prolog gives the required static
    context
  • The initial context node, position, and size are
    undefined because an XQuery expression may work
    on multiple XML documents
  • Only axes child, descendant, parent, attribute,
    self, and descendant-of-self are required to be
    implemented in XQuery.

12
Datatype Expressions
  • Same atomic values as XPath 2.0
  • Also lots of primitive simple values
  • xsstring("XML is fun")
  • xsboolean("true")
  • xsdecimal("3.1415")
  • xsfloat("6.02214199E23")
  • xsdateTime("1999-05-31T132000-0500")
  • xstime("132000-0500")
  • xsdate("1999-05-31")
  • xsgYearMonth("1999-05")
  • xsgYear("1999")
  • xshexBinary("48656c6c6f0a")
  • xsbase64Binary("SGVsbG8K")
  • xsanyURI("http//www.brics.dk/ixwt/")
  • xsQName("rcprecipe")

13
XML Expressions
  • XQuery expressions may compute new XML nodes
  • Expressions may denote element, character data,
    comment, and processing instruction nodes
  • Each node is created with a unique node identity
  • Constructors may be either direct or computed

14
Direct Constructors
  • Uses the standard XML syntax
  • The expression
  • ltfoogtltbar/gtbazlt/foogt
  • evaluates to the given XML fragment
  • Note that nodes are created with unique identity
    and therefore ltfoo/gt is ltfoo/gt evaluates to
    false (operators is, ltlt, gtgt are used to compare
    nodes on identity and document order)

15
Same Namespace Declarations (1/3)
  • declare default element namespace
    "http//businesscard.org"
  • ltcardgt
  • ltnamegtJohn Doelt/namegt
  • lttitlegtCEO, Widget Inc.lt/titlegt
  • ltemailgtjohn.doe_at_widget.comlt/emailgt
  • ltphonegt(202) 555-1414lt/phonegt
  • ltlogo uri"widget.gif"/gt
  • lt/cardgt

16
Same Namespace Declarations (2/3)
  • declare namespace b "http//businesscard.org"
  • ltbcardgt
  • ltbnamegtJohn Doelt/bnamegt
  • ltbtitlegtCEO, Widget Inc.lt/btitlegt
  • ltbemailgtjohn.doe_at_widget.comlt/bemailgt
  • ltbphonegt(202) 555-1414lt/bphonegt
  • ltblogo uri"widget.gif"/gt
  • lt/bcardgt

17
Same Namespace Declarations (3/3)
  • ltcard xmlns"http//businesscard.org"gt
  • ltnamegtJohn Doelt/namegt
  • lttitlegtCEO, Widget Inc.lt/titlegt
  • ltemailgtjohn.doe_at_widget.comlt/emailgt
  • ltphonegt(202) 555-1414lt/phonegt
  • ltlogo uri"widget.gif"/gt
  • lt/cardgt

18
Enclosed Expressions
  • Expressions create an element named numbers which
    has a single character data node with value 1 2 3
    4 5 (if boundary-space is set to strip)
  • lt numbers gt1 2 3 4 5lt/ numbers gt
  • lt numbers gt1, 2, 3, 4, 5lt/ numbers gt
  • lt numbers gt1, "2", 3, 4, 5lt/ numbers gt
  • lt numbers gt1 to 5lt/ numbers gt
  • lt numbers gt1 11 " " "3" " " 4 to 5lt/
    numbers gt
  • Enclosed expressions are allowed inside attribute
    values
  • ltfoo bar"1 2 3 4 5"/gt
  • ltfoo bar"1, 2, 3, 4, 5"/gt
  • ltfoo bar"1 2 to 4 5"/gt

19
Explicit Constructors
  • The constant expression
  • ltcard xmlns"http//businesscard.org"gt
  • ltnamegtJohn Doelt/namegt
  • lttitlegtCEO, Widget Inc.lt/titlegt
  • ltemailgtjohn.doe_at_widget.comlt/emailgt
  • ltphonegt(202) 555-1414lt/phonegt
  • ltlogo uri"widget.gif"/gt
  • lt/cardgt
  • Can be written as
  • element card
  • namespace "http//businesscard.org" ,
  • element name text "John Doe" ,
  • element title text "CEO, Widget Inc." ,
  • element email text "john.doe_at_widget.com"
    ,
  • element phone text "(202) 555-1414" ,
  • element logo attribute uri "widget.gif"

20
Computed QNames
  • Qualified names can be replaced by enclosed
    expressions evaluating to equivalent strings
  • element "card"
  • namespace "http//businesscard.org" ,
  • element "name" text "John Doe" ,
  • element "title" text "CEO, Widget Inc."
    ,
  • element "email" text "john.doe_at_widget.co
    m" ,
  • element "phone" text "(202) 555-1414"
    ,
  • element "logo"
  • attribute "uri" "widget.gif"

21
Biliingual Business Cards
  • Controlled by a global variable lang
  • element if (lang"Danish") then "kort" else
    "card"
  • namespace "http//businesscard.org" ,
  • element if (lang"Danish") then "navn" else
    "name"
  • text "John Doe" ,
  • element if (lang"Danish") then "titel" else
    "title"
  • text "CEO, Widget Inc." ,
  • element "email"
  • text "john.doe_at_widget.inc" ,
  • element if (lang"Danish") then "telefon"
    else "phone"
  • text "(202) 456-1414" ,
  • element "logo"
  • attribute "uri" "widget.gif"

22
FLWOR Expressions
  • Used for general queries
  • ltdoublesgt
  • for s in fndoc("students.xml")//student
  • let m s/major
  • where fncount(m) ge 2
  • order by s/_at_id
  • return ltdoublegt
  • s/name/text()
  • lt/doublegt
  • lt/doublesgt

23
The Difference Between For and Let (1/4)
  • A FLWOR expression
  • for x in (1, 2, 3, 4)
  • let y ("a", "b", "c")
  • return (x, y)
  • Output
  • 1, a, b, c, 2, a, b, c, 3, a, b, c, 4, a, b, c

24
The Difference Between For and Let (2/4)
  • A FLWOR expression
  • let x in (1, 2, 3, 4)
  • for y ("a", "b", "c")
  • return (x, y)
  • Output
  • 1, 2, 3, 4, a, 1, 2, 3, 4, b, 1, 2, 3, 4, c

25
The Difference Between For and Let (3/4)
  • A FLWOR expression
  • for x in (1, 2, 3, 4)
  • for y in ("a", "b", "c")
  • return (x, y)
  • Output
  • 1, a, 1, b, 1, c, 2, a, 2, b, 2, c, 3, a, 3, b,
    3, c, 4, a, 4, b, 4, c

26
The Difference Between For and Let (4/4)
  • A FLWOR expression
  • let x (1, 2, 3, 4)
  • let y ("a", "b", "c")
  • return (x, y)
  • Output
  • 1, 2, 3, 4, a, b, c

27
An XML Example Document books.xml
  • lt?xml version"1.0" encoding"ISO-8859-1"?gt
  • ltbookstoregt
  • ltbook category"COOKING"gt
  • lttitle lang"en"gtEveryday Italianlt/titlegt
  • ltauthorgtGiada De Laurentiislt/authorgt
  • ltyeargt2005lt/yeargt
  • ltpricegt30.00lt/pricegt
  • lt/bookgt
  • ltbook category"CHILDREN"gt
  • lttitle lang"en"gtHarry Potterlt/titlegt
  • ltauthorgtJ K. Rowlinglt/authorgt
  • ltyeargt2005lt/yeargt
  • ltpricegt29.99lt/pricegt
  • lt/bookgt

28
An XML Example Document books.xml (cont.)
  • ltbook category"WEB"gt
  • lttitle lang"en"gtXQuery Kick Startlt/titlegt
  • ltauthorgtJames McGovernlt/authorgt
  • ltauthorgtPer Bothnerlt/authorgt
  • ltauthorgtKurt Caglelt/authorgt
  • ltauthorgtJames Linnlt/authorgt
  • ltauthorgtVaidyanathan Nagarajanlt/authorgt
  • ltyeargt2003lt/yeargt
  • ltpricegt49.99lt/pricegt
  • lt/bookgt
  • ltbook category"WEB"gt
  • lttitle lang"en"gtLearning XMLlt/titlegt
  • ltauthorgtErik T. Raylt/authorgt
  • ltyeargt2003lt/yeargt
  • ltpricegt39.95lt/pricegt
  • lt/bookgt
  • lt/bookstoregt

29
Path Expressions
  • The following path expression is used to select
    all the title elements in the "books.xml" file
  • doc("books.xml")/bookstore/book/title
  • The XQuery above will extract the following
  • lttitle lang"en"gtEveryday Italianlt/titlegt
  • lttitle lang"en"gtHarry Potterlt/titlegt
  • lttitle lang"en"gtXQuery Kick Startlt/titlegt
  • lttitle lang"en"gtLearning XMLlt/titlegt

30
Predicates
  • The following predicate is used to select all the
    book elements under the bookstore element that
    have a price element with a value that is less
    than 30
  • doc("books.xml")/bookstore/bookpricelt30
  • The XQuery above will extract the following
  • ltbook category"CHILDREN"gt
  • lttitle lang"en"gtHarry Potterlt/titlegt
  • ltauthorgtJ K. Rowlinglt/authorgt
  • ltyeargt2005lt/yeargt
  • ltpricegt29.99lt/pricegt
  • lt/bookgt

31
FLOWR Expressions
  • The following FLWOR expression will select all
    the title elements under the book elements that
    are under the bookstore element that have a price
    element with a value that is higher than 30
  • for x in doc("books.xml")/bookstore/book
  • where x/pricegt30
  • return x/title
  • The result will be
  • lttitle lang"en"gtXQuery Kick Startlt/titlegt
  • lttitle lang"en"gtLearning XMLlt/titlegt

32
Conditional Expressions
  • Notes on the "if-then-else" syntax parentheses
    around the if expression are required. else is
    required, but it can be just else ().
  • for x in doc("books.xml")/bookstore/book
  • return if (x/_at_category"CHILDREN")
  • then ltchildgtdata(x/title)lt/childgt
  • else ltadultgtdata(x/title)lt/adultgt
  • The result of the example above will be
  • ltadultgtEveryday Italianlt/adultgt
  • ltchildgtHarry Potterlt/childgt
  • ltadultgtLearning XMLlt/adultgt
  • ltadultgtXQuery Kick Startlt/adultgt

33
Comparisons
  • General comparisons , !, lt, lt, gt, gt
  • Value comparisons eq, ne, lt, le, gt, ge
  • bookstore//book/_at_q gt 10
  • The expression above returns true if any q
    attributes have values greater than 10.
  • bookstore//book/_at_q gt 10
  • The expression above returns true if there is
    only one q attribute returned by the expression,
    and its value is greater than 10. If more than
    one q is returned, an error occurs.

34
Selecting and Filtering Elements
  • The at keyword can be used to count the
    iteration
  • for x at i in doc("books.xml")/bookstore/book/t
    itle
  • return ltbookgti. data(x)lt/bookgt
  • Result
  • ltbookgt1. Everyday Italianlt/bookgt
  • ltbookgt2. Harry Potterlt/bookgt
  • ltbookgt3. XQuery Kick Startlt/bookgt
  • ltbookgt4. Learning XMLlt/bookgt
  • It is also allowed with more than one in
    expression in the for clause. Use comma to
    separate each in expression
  • for x in (10,20), y in (100,200)
  • return lttestgtxx and yylt/testgt
  • Result
  • lttestgtx10 and y100lt/testgt
  • lttestgtx10 and y200lt/testgt
  • lttestgtx20 and y100lt/testgt
  • lttestgtx20 and y200lt/testgt

35
Computing Joins
  • fridge.xml
  • ltfridgegt
  • ltstuffgteggslt/stuffgt
  • ltstuffgtolive oillt/stuffgt
  • ltstuffgtketchuplt/stuffgt
  • ltstuffgtunrecognizable moldy thinglt/stuffgt
  • lt/fridgegt
  • Find recipes that use some ingredients in the
    refrigerator
  • declare namespace rcp "http//www.brics.dk/ixwt
    /recipes"
  • for r in fndoc("recipes.xml")//rcprecipe
  • for i in r//rcpingredient/_at_name
  • for s in fndoc("fridge.xml")//stufftext()i
  • return fndistinct-values(r/rcptitle/text() )

36
Inverting a Relation
  • declare namespace rcp "http//www.brics.dk/ixwt/
    recipes"
  • ltingredientsgt
  • for i in distinct-values(
  • fndoc("recipes.xml")//rcpingredient/_at_name
  • )
  • return ltingredient name"i"gt
  • for r in fndoc("recipes.xml")//rcprecipe
  • where r//rcpingredient_at_namei
  • return lttitlegtr/rcptitle/text()lt/titlegt
  • lt/ingredientgt
  • lt/ingredientsgt

37
Sorting the Results
  • declare namespace rcp "http//www.brics.dk/ixwt/
    recipes"
  • ltingredientsgt
  • for i in distinct-values(fndoc("recipes.xml")
    //rcpingredient/_at_name)
  • order by i
  • return ltingredient name"i"gt
  • for r in fndoc("recipes.xml")//rcprecipe
  • where r//rcpingredient_at_namei
  • order by r/rcptitle/text()
  • return lttitlegtr/rcptitle/text()lt/titlegt
  • lt/ingredientgt
  • lt/ingredientsgt

38
A More Complicated Sorting
  • for s in document("students.xml")//student
  • order by
  • fncount(
  • s/results/resultfncontains(_at_grade,"A")
  • ) descending,
  • fncount(s/major) descending,
  • xsinteger(s/age/text()) ascending
  • return s/name/text()

39
Using Functions
  • declare function localgrade(g)
  • if (g"A") then 4.0 else if (g"A-") then 3.7
    else if (g"B") then 3.3
  • else if (g"B") then 3.0 else if (g"B-") then
    2.7 else if (g"C") then 2.3
  • else if (g"C") then 2.0 else if (g"C-") then
    1.7 else if (g"D") then 1.3
  • else if (g"D") then 1.0 else if (g"D-") then
    0.7 else 0
  • declare function localgpa(s)
  • fnavg(for g in s/results/result/_at_grade return
    localgrade(g))
  • ltgpasgt
  • for s in fndoc("students.xml")//student
  • return ltgpa id"s/_at_id" gpa"localgpa(s)
    "/gt
  • lt/gpasgt

40
Extend the Expressive Power
  • Using recursive function to generate an XML tree
    of a given height
  • declare function gen(n)
  • if (n eq 0) ltbar/gt
  • else ltfoogt gen(n - 1), gen(n - 1) lt/foogt
  • Compute the height of an XML tree
  • declare function localheight(x)
  • if (fnempty(x/)) then 1
  • else fnmax(for y in x/ return
    localheight(y))1

41
A Textual Outline
  • Cailles en Sarcophages
  • pastry
  • chilled unsalted butter
  • flour
  • salt
  • ice water
  • filling
  • baked chicken
  • marinated chicken
  • small chickens, cut up
  • Herbes de Provence
  • dry white wine
  • orange juice
  • minced garlic
  • truffle oil
  • ...

42
Computing Textual Outlines
  • declare namespace rcp "http//www.brics.dk/ixwt/
    recipes"
  • declare function localingredients(i,p)
  • fnstring-join(
  • for j in i/rcpingredient
  • return fnstring-join((p,j/_at_name,"x0A",
  • localingredients(j,fnconcat(p,"
    "))),""),"")
  • declare function localrecipes(r)
  • fnconcat(r/rcptitle/text(),"x0A",localingre
    dients(r," "))
  • fnstring-join(
  • for r in fndoc("recipes.xml")//rcprecipe5
  • return localrecipes(r),""
  • )

43
XML Databases
  • How can XML and databases be merged?
  • Several different approaches
  • extract XML views of relations
  • use SQL to generate XML
  • shred XML into relational databases

44
Automatic XML Views (1/2)
  • ltStudentsgt
  • ltrecord id"100026" name"Joe Average"
    age"21"/gt
  • ltrecord id"100078" name"Jack Doe" age"18"/gt
  • lt/Studentsgt
  • xmlelement(name, "Students",
  • select xmlelement(name,
  • "record",
  • xmlattributes(s.id, s.name, s.age))
  • from Students
  • )

45
Automatic XML Views (2/2)
  • ltStudentsgt
  • ltrecordgt
  • ltidgt100026lt/idgt
  • ltnamegtJoe Averagelt/namegt
  • ltagegt21lt/agegt
  • lt/recordgt
  • ltrecordgt
  • ltidgt100078lt/idgt
  • ltnamegtJack Doelt/namegt
  • ltagegt18lt/agegt
  • lt/recordgt
  • lt/Studentsgt
  • xmlelement(name, "Students",
  • select xmlelement(name,
  • "record",
  • xmlforest(s.id, s.name, s.age))
  • from Students
  • )

46
Storing XML Documents
  • Designing a specialized system for storing native
    XML data
  • Using a DBMS to store the whole XML documents as
    text fields
  • Using a DBMS to store the document contents as
    data elements
  • It must support the XMLs ordered data model

47
Using a DBMS Relational DTD
  • Schema-aware
  • An element that can occur at most once in its
    parent is stored as a column of the table
    representing its parent

48
Using a DBMS Edge
  • Schema-less
  • A single table is used to store the entire
    document
  • Each node is assigned an ID in depth first order

1
root
2
references
3
4
book
book

author
author
author
title
year
author

2000
Data on the Web
S. Abiteboul
D. Suciu
P. Buneman
49
Using a DBMS Edge (cont.)
Write a Comment
User Comments (0)
About PowerShow.com