Title: Module 3 XML Processing XPath, XQuery, XUpdate
1Module 3XML Processing(XPath, XQuery, XUpdate)
2Managing XML data
- Huge amounts of XML information, and growing
- We need to manage it
- Store it efficiently
- Verify the correctness
- Filter, search, select, join, aggregate
- Create new data
- Update it
- Take actions based on the existing data
- No conceptual organization like for relational
databases (applications are too heterogeneous)
3Frequent solutions to XML data management
- Map it to generic programming APIs
- Manually map it to non-generic APIs
- Automatically map it to non-generic structures
- Use XML extensions of existing languages
- Shredding for relational stores
- Native XML processing through XSLT and XQuery
41. Mapping to generic structures
- Represent the data
- Original UNICODE form or
- Some binary representation
- Store it
- Directly on a file system or
- On a transacted file system (e.g. SleepyCat, or
a relational database) - Map the XML data to generic XML programmatic APIs
- E.g. Dom, Sax, Stax (JSR 173), XMLReader
- Use the native programming languages (e.g. Java,
C) to manipulate the data - Re-serialize it at the end
51. Manual mapping to generic structures (example)
- ltpurchaseOrdergt
- ltlineItemgt
- ..
- lt/lineItemgt
- ltlineItemgt
- ..
- lt/lineItemgt
- lt/purchaseOrdergt
- ltbookgt
- ltauthorgtlt/authorgt
- lttitlegt.lt/titlegt
- ..
- lt/bookgt
Class DomNode public String getNodeName() publi
c String getNodeValue() public void
setNodeValue(nodeValue) public short
getNodeType()
Hard coded mappings
62. Manual mapping to non-generic structures
- ltpurchaseOrdergt
- ltlineItemgt
- ..
- lt/lineItemgt
- ltlineItemgt
- ..
- lt/lineItemgt
- lt/purchaseOrdergt
- ltbookgt
- ltauthorgtlt/authorgt
- lttitlegt.lt/titlegt
- ..
- lt/bookgt
Class PurchaseOrder public List
getLineItems() ..
Class Book public List getAuthor() public
String getTitle()
Hard coded mappings
73. Automatic mapping to non-generic structures
- lttype namebook-typegt
- ltsequencegt
- ltattribute nameyear typexsintegergt
- ltelement nametitle typexsstringgt
- ltsequence minoccurs0gt
- ltelement nameauthor typexsstringgt
- lt/sequencegt
- lt/sequencegt
- lt/typegt
- ltelement namebook typebook-typegt
Class Book-type public integer
getYear() public string getTitle() public List
getAuthors() ..
Automatic mapping e.g.XMLBeans
84. XML extensions of existing procedural languages
- Examples
- C-omega, ECMAscript, PHP extensions,
- Phyton extensions, etc.
- Most of them define
- A way of importing XML data into their native
type system - A rich API for XML data manipulation
- A way of navigating/searching/querying the XML
data via their extensions (Xpath based or Xpath
inspired)
95. Native XML processingXSLT and XQuery
- Most promising alternative for the future ()
- The only alternative such that
- the data is modeled only once
- is well integrated with XML Schema type system
- it preserves the logical/physical data
independence - the code deals with non-generic structures
- Data is stored
- in plain file systems or in sophisticated data
stores (e.g. XML extensions of relational stores) - Currently an incomplete solution
- No procedural logic
- Other disadvantages (hopefully temporary)
- Perceived complexity
- Perceived loss of performance
10Why XQuery ?
- Why a query language for XML ?
- Need to process XML data
- Preserve logical/physical data independence
- The semantics is described in terms of an
abstract data model, independent of the physical
data storage - Declarative programming
- Such programs should describe the what, not the
how - Why a native query language ? Why not SQL ?
- We need to deal with the specificities of XML
(hierarchical, ordered , textual, potentially
schema-less structure) - Why another XML processing language ? Why not
XSLT? - The template nature of XSLT was not appealing to
the database people. Not declarative enough.
11What is XQuery ?
- A programming language that can express
arbitrary XML to XML data transformations - Logical/physical data independence
- Declarative
- High level
- Side-effect free
- Strongly typed language
- An expression language for XML.
- Commonalities with functional programming,
imperative programming and query languages - The query part might be a misnomer ()
12XQuery Use Case Scenarios
- XML transformation language in Web Services
- Large and very complex queries
- Input message external data sources
- Small and medium size data sets (xK -gt xM)
- Transient and streaming data (no indexes)
- With or without schema validation
- XML message brokers
- Simple path expressions, single input message
- Small data sets
- Transient and streaming data (no indexes)
- Mostly non schema validated data
- Semantic data verification
- Mostly messages
- Potentially complex (but small) queries
- Streaming and multiquery optimization required
13XQuery Usage Scenarios (ctd.)
- Data Integration
- Complex but smaller queries (FLOWRs, aggregates,
constructors) - Large, persistent, external data repositories
- Dynamic data (via Web Services invocations)
- Large volumes of centralized XML data
- Logs and archives
- Complex queries (statistics, analytics)
- Mostly read only
- Large content repositories
- Large volume of data (books, manuals, etc)
- With or without schema validation
- Full text essential, update required
14XQuery Usage Scenarios (ctd.)
- Large volumes of distributed textual data
- XML search engines
- High volume of data sources
- Full text, semantic search crucial
- RSS data
- High number of input data channels
- Data is pushed, not pulled
- Structure of the data very simple, each item
bounded size - Aggregators using mostly full-text search
15XQuery Implementations
- Open Source
- Saxon (Michael Kay)
- Galax (ATT, Mary Fernandez)
- Commercial
- BEA System (WebLogic Integration)
- IBM, Microsoft, Oracle (with DB products)
- Some freelancers
- Visit www.w3c.org/xquery
16Roadmap for XQuery
- XML Data Model
- XML Type System
- XQuery Environment
- XQuery Expressions
- XUpdate
- Examples
17XML Data Model
- Abstract (I.e. logical) data model for XML data
- Same role for XQuery as the relational data model
for SQL - Purely logical --- no standard storage or access
model (in purpose) - XQuery is closed with respect to the Data Model
XQuery Xpath 2.0 XSLT 2.0
Infoset
XML Data Model
PSVI
18XML Data model life cycle
XQuery Data Model
XQuery Data Model
.xml
parse
Xpath 2.0
serialize
.xml
XQuery
validate
.xsd
XSLT 2.0
application- dependent
19XML Data Model
Remember Lisp ?
- Instance of the data model
- a sequence composed of zero or more items
- The empty sequence often considered as the null
value - Items
- nodes or atomic values
- Nodes
- document element attribute text
namespaces PI comment - Atomic values
- Instances of all XML Schema atomic types
- string, boolean, ID, IDREF, decimal, QName, URI,
... - untyped atomic values
- Typed (I.e. schema validated) and untyped (I.e.
non schema validated) nodes and values
20Sequences
- Can be heterogeneous (nodes and atomic values)
- (lta/gt, 3)
- Can contain duplicates (by value and by identity)
- (1,1,1)
- Are not necessarily ordered in document order
- Nested sequences are automatically flattened
- ( 1, 2, (3, 4) ) (1, 2, 3, 4)
- Single items and singleton sequences are the same
- 1 (1)
21Atomic values
- The values of the 19 atomic types available in
XML Schema - E.g. xsinteger, xsboolean, xsdate
- All the user defined derived atomic types
- E.g myNSShoeSize
- xdtuntypedAtomic
- Atomic values carry their type together with the
value - (8, myNSShoeSize) is not the same as (8,
xsinteger)
22XML nodes
- 7 types of nodes
- document element attribute text
namespaces PI comment - Every node has a unique node identifier
- Scope of node identifier uniqueness is
implementation dependent - Nodes have children and an optional parent
- conceptual tree
- Nodes are ordered based of the topological order
in the tree (document order)
23Node accessors
- base-uri xsanyURI ?
- Document-uri xsanyURI ?
- node-kind xsstring
- node-name xsQname ?
- parent node() ?
- string-value xsstring
- typed-value xdtanyAtomicType
- type-name xsQname ?
- children node()
- attributes attribute()
- namespaces node()
- nilled xsboolean ?
24Example of well formed XML
- ltbook year1967 gt
- lttitlegtThe politics of experiencelt/titlegt
- ltauthorgtR.D. Lainglt/authorgt
- lt/bookgt
- 3 element nodes, 1 attribute node, 2 text nodes
- name(book element) -book
- In the absence of schema validation
- type(book element) xdtuntyped
- type(author element) xdtuntyped
- type(year attribute) xdtuntypedAtomic
- typed-value(author element) (R.D. Laing ,
xdtuntypedAtomic) - typed-value(year attribute) (1967,
xdtuntypedAtomic)
25XML schema example
- lttype namebook-typegt
- ltsequencegt
- ltattribute nameyear typexsintegergt
- ltelement nametitle typexsstringgt
- ltsequence minoccurs0gt
- ltelement nameauthor typexsstringgt
- lt/sequencegt
- lt/sequencegt
- lt/typegt
- ltelement namebook typebook-typegt
26Schema validated XML data
- ltbook year1967 gt
- lttitlegtThe politics of experiencelt/titlegt
- ltauthorgtR.D. Lainglt/authorgt
- lt/bookgt
- After schema validation
- type(book element) uribook-type
- type(author element) xsstring
- type(year attribute) xsinteger
- typed-value(author element) (R.D. Laing ,
xsstring) - typed-value(year attribute) (1967 , xsinteger)
- Schema validation impacts the data model
representation and therefore the XQuery
semantics!!
27Lexical and binary aspect
- Every node holds (logically) redundant
information - lta xsitypexsintegergt001lt/agt
- dmstring-value () 001 as xs
- dmtyped-value ()
- 001 as an xdtuntyped before validation
- 1 as an xsinteger after validation
- Implementations can store
- The string value
- Retrieve the typed value dynamically based on the
type, every time is needed - The typed value
- Retrieve an acceptable lexical value for that
type every time this is required - Both
- In case of unvalidated data the two are the same
28XML queries
- An XQuery basic structure
- a prolog an expression
- Role of the prolog
- Populate the context where the expression is
compiled and evaluated - Prologue contains
- namespace definitions
- schema imports
- default element and function namespace
- function definitions
- collations declarations
- function library imports
- global and external variables definitions
- etc
29XQuery expressions
- XQuery Expr Constants Variable
FunctionCalls PathExpr - ComparisonExpr ArithmeticExpr LogicExpr
- FLWRExpr ConditionalExpr
QuantifiedExpr - TypeSwitchExpr InstanceofExpr CastExpr
- UnionExpr IntersectExceptExpr
- ConstructorExpr ValidateExpr
- Expressions can be nested with full generality !
- Functional programming heritage.
30Constants
- XQuery grammar has built-in support for
- Strings 125.0 or 125.0
- Integers 150
- Decimal 125.0
- Double 125.e2
- 19 other atomic types available via XML Schema
- Values can be constructed
- with constructors in FO doc fntrue(),
fndate(2002-5-20) - by casting
- by schema validation
31Variables
- QName
- bound, not assigned
- XQuery does not allow variable assignment
- created by let, for, some/every, typeswitch
expressions, function parameters - example
- let x ( 1, 2, 3 )
- return count(x)
- above scoping ends at conclusion of return
expression
32A built-in function sampler
- fndocument(xsanyURI)gt document?
- fnempty(item) gt boolean
- fnindex-of(item, item) gt xsunsignedInt?
- fndistinct-values(item) gt item
- fndistinct-nodes(node) gt node
- fnunion(node, node) gt node
- fnexcept(node, node) gt node
- fnstring-length(xsstring?) gt xsinteger?
- fncontains(xsstring, xsstring) gt xsboolean
- fntrue() gt xsboolean
- fndate(xsstring) gt xsdate
- fnadd-date(xsdate, xsduration) gt xsdate
- See Functions and Operators W3C
specification
33Constructing sequences
- (1, 2, 2, 3, 3, lta/gt, ltb/gt)
- , is the sequence concatenation operator
- Nested sequences are flattened
- (1, 2, 2, (3, 3)) gt (1, 2, 2, 3,3)
- range expressions (1 to 3) gt (1, 2,3)
34Combining sequences
- Union, Intersect, Except
- Work only for sequences of nodes, not atomic
values - Eliminate duplicates and reorder to document
order - x lta/gt, y ltb/gt, z ltc/gt
- (x, y) union (y, z) gt (lta/gt, ltb/gt, ltc/gt)
- FO specification provides other functions
operators eg. fndistinct-values() and
fndistinct-nodes() particularly useful
35Arithmetic expressions
- 1 4 a div 5
- 5 div 6 b mod 10
- 1 - (4 8.5) -55.5
- ltagt42lt/agt 1 ltagtbazlt/agt 1
- validate lta xsitypexsintegergt42lt/agt 1
- validate lta xsitypexsstringgt42lt/agt 1
- Apply the following rules
- atomize all operands. if either operand is (), gt
() - if an operand is untyped, cast to xsdouble (if
unable, gt error) - if the operand types differ but can be promoted
to common type, do so (e.g. xsinteger can be
promoted to xsdouble) - if operator is consistent w/ types, apply it
result is either atomic value or error - if type is not consistent, throw type exception
36Atomization
- If every item in the input sequence is either an
atomic value or a node whose typed value is a
sequence of atomic values, then return them - Otherwise, raise a type error.
- Fndata(node) extracts the typed value of a node.
- Often implicit
- In arithmetic, comparisons, function calls, node
constructors, sorting, etc
37Logical expressions
- expr1 and expr2
- expr1 or expr2 fnnot() as a function
- return true, false
- Different from SQL
- two value logic, not three value logic
- Different from imperative languages
- and, or are commutative
- Rules
- first compute the Boolean Effective Value (BEV)
for each operand - if (), , NaN, 0, then return false
- if the operand is of type xsboolean, return it
- If operand is a sequence with first item a node,
return true - else raises an error
- then use standard two value Boolean logic on the
two BEV's as appropriate - false and error gt false or error !
(non-deterministically)
38Comparisons
39Value and general comparisons
- ltagt42lt/agt eq 42 true
- ltagt42lt/agt eq 42 error
- ltagt42lt/agt eq 42.0 false
- ltagt42lt/agt eq 42.0 error
- ltagt42lt/agt 42 true
- ltagt42lt/agt 42.0 true
- ltagt42lt/agt eq ltbgt42lt/bgt true
- ltagt42lt/agt eq ltbgt 42lt/bgt false
- ltagtbazlt/agt eq 42 error
- () eq 42 ()
- () 42 false
- (ltagt42lt/agt, ltbgt43lt/bgt) 42.0 true
- (ltagt42lt/agt, ltbgt43lt/bgt) 42 true
- nsshoesize(5) eq nshatsize(5) true
- (1,2) (2,3) true
40Conditional expressions
- if ( book/_at_year lt1980 )
- then nsWS(ltoldgtx/titlelt/oldgt)
- else nsWS(ltnewgtx/titlelt/newgt)
- Only one branch allowed to raise execution errors
- Impacts scheduling and parallelization
41Path Expressions by Example
- Names of all family members (Navigation)/family/m
ember/name ( Projection) - Names of four year olds./family/member_at_age
4/name (Selection) - Name of the second eldest./family/member2/name
(Selection Ranking) - Names of members who have a hobby./family/member
hobby/name(Selection by Type) - All names (of anything).//name
(Transitive Closure, Recursion)
42Path expressions
- Second order expression
- expr1 / expr2
- Semantics
- Evaluate expr1 gt sequence of nodes
- Bind . to each node in this sequence
- Evaluate expr2 with this binding gt sequence of
nodes - Concatenate the partial sequences
- Eliminate duplicates
- Sort by document order
- Implicit iteration
- A standalone step is an expression
- step (axis, nodeTest) where
- nodeTest (node kind, node name, node type)
43More on Xpath expressions
- A stand-alone step is an expression
- Any kind of expression can be a step !
- Two syntaxes for steps abbreviated or not
- Step in the non-abbreviated syntax
- axis nodeTest
- Axis control the navigation direction in the tree
- attribute, child, descendant, descendant-or-self,
parent, self - The other Xpath 1.0 axes are optional
- Node test by
- Name (e.g. publisher, myNSpublisher,
publisher, myNS , ) - Kind of item (e.g. node(), comment(), text() )
- Type test (e.g. element(nsPO, nsPoType),
attribute(, xsinteger)
44Long syntax of XPath
- document(bibliography.xml)/childbib
- x/childbib/childbook/attributeyear
- x/parent
- x/child/descendentcomment()
- x/childelement(, nsPoType)
- x/attributeattribute(, xsinteger)
- x/ancestorsdocument(schema-element(nsPO))
- x/(childelement(, xsdate)
attributeattribute(, xsdate) - x/f(.)
45XPath abbreviated syntax
- Axis can be missing
- By default the child axis
- x/childperson -gt x/person
- Short-hands for common axes
- Descendent-or-self
- x/descendant-or-self/childcomment()-gt
x//comment() - Parent
- x/parent -gt x/..
- Attribute
- x/attributeyear -gt x/_at_year
- Self
- x/self -gt x/.
46XPath filter predicates
- Syntax
- expression1 expression2
- is an overloaded operator
- Filtering by position (if numeric value)
- /book3
- /book3/author1
- /book3/author1 to 2
- Filtering by predicate
- //book author/firstname ronald
- //book _at_price lt25
- //book count(author _at_genderfemale )gt0
- Classical Xpath mistake
- x/a/b1 means x/a/(b1) and not (x/a/b)1
47 Simple iteration expression
- Syntax
- for variable in expression1
- return expression2
- Example
- for x in document(bib.xml)/bib/book
- return x/title
- Semantics
- bind the variable to each root node of the
forest returned by expression1 - for each such binding evaluate expression2
- concatenate the resulting sequences
- nested sequences are automatically flattened
48Static context
- Xpath 1.0 compatibility mode
- Statically known namespaces
- Default element/type namespace
- Default function namespace
- In-scope schema definitions
- In-scope variables
- In scope function signatures
- Statically known collations
- Default collation
- Construction mode
- Ordering mode
- Boundary space policy
- Copy namespace mode
- Base URI
- Statically known documents and collections
- change XQuery expression semantics
- impact compilation
- can be set by application or by
- prolog declarations
49Dynamic context
- Values for external variables
- Values for the current item, current position and
size - Implementation for external functions
- Current date and time
- Implicit timezone
- Available documents and collections
50XQuery processing
51Local variable declaration
- Syntax
- let variable expression1
- return expression2
- Example
- let x document(bib.xml)/bib/book
- return count(x)
- Semantics
- bind the variable to the result of the
expression1 - add this binding to the current environment
- evaluate and return expression2
52FLW(O)R expressions
- Syntactic sugar that combines FOR, LET, IF
- Example
- for x in //bib/book
/ similar to FROM in SQL / - let y x/author
/ no analogy in SQL / - where x/titleThe politics of experience
-
/ similar to WHERE in SQL / - return count(y)
/ similar to SELECT in SQL /
This slide is not up-to-date, it omits ORDER BY.
53FLWR expression semantics
- FLWR expression
- for x in //bib/book
- let y x/author
- where x/titleUlysses
- return count(y)
- Equivalent to
- for x in //bib/book
- return (let y x/author
- return
- if (x/titleUlysses )
- then count(y)
- else ()
- )
-
54More FLWR expression examples
- Selections
- for b in document("bib.xml")//book
- where b/publisher Springer Verlag" and
- b/_at_year "1998"
- return b/title
- Joins
- for b in document("bib.xml")//book,
- p in //publisher
- where b/publisher p/name
- return ( b/title , p/address)
55The O in FLW(O)R expressions
- Syntactic sugar that combines FOR, LET, IF
- Syntax
- for x in //bib/book
/ similar to FROM in SQL / - let y x/author
/ no analogy in SQL / - stable order by ( expr empty-handling ?
Asc-vs-desc? Collation? ) - / similar to ORDER-BY in SQL /
- return count(y)
/ similar to SELECT in SQL /
56Node constructors
- Constructing new nodes
- elements
- attributes
- documents
- processing instructions
- comments
- text
- Side-effect operation
- Affects optimization and expression rewriting
- Element constructors create local scopes for
namespaces - Affects optimization and expression rewriting
57Literal vs. evaluated element content
- ltresultgt
- literal text content
- lt/resultgt
- ltresultgt
- x/name -- evaluated content --
- lt/resultgt
- ltresultgt
- some content here x/name and some more
here - lt/resultgt
- Braces "" used to delineate evaluated content
- Same works
for attributes
58Nested scopes
- declare namespace nsuri1
- for x in fndoc(uri)/nsa
- where x/nsb eq 3
- return
- ltresult xmlnsnsuri2gt
- for x in fndoc(uri)/nsa
- return x / nsb
- lt/resultgt
Local scopes impact optimization and rewriting !
59Operators on datatypes
- expression instanceof sequenceType
- returns true if its first operand is an instance
of the type named in its second operand - expression castable as singleType
- returns true if first operand can be casted as
the given sequence type - expression cast as singleType
- used to convert a value from one datatype to
another - expression treat as sequenceType
- treats an expr as if its datatype is a subtype of
its static type (down cast) - typeswitch
- case-like branching based on the type of an input
expression
60Schema validation
- Explicit syntax
- validate validation mode expression
- Validation mode strict or lax
- Semantics
- Translate XML Data Model to Infoset
- Apply XML Schema validation
- Ignore identity constraints checks
- Map resulting PSVI to a new XML Data Model
instance - It is not a side-effect operation
61Ignoring order
- In the original application XML was totally
ordered - Xpath 1.0 preserves the document order through
implicit expensive sorting operations - In many cases the order is not semantically
meaningful - The evaluation can be optimized if the order is
not required - Ordered expr and unordered expr
- Affect path expressions, FLWR without order
clause, union, intersect, except - Leads to non-determinism
- Semantics of expressions is again context
sensitive - let x (//a)1 unordered
(//a)1/b - return unordered x/b
62Functions in XQuery
- In-place XQuery functions
- declare function nsfoo(x as xsinteger) as
element() - ltagt x1lt/agt
- Can be recursive and mutually recursive
- External functions
XQuery functions as database views
63How to pass input data to a query ?
- External variables (bound through an external
API) - declare variable x as xsinteger external
- Current item (bound through an external API)
- .
- External functions (bound through an external
API) - declare function orasql(x as xsstring) as
node() external - Specific built-in functions
- fndoc(uri), fncollection(uri)
64XQuery optional features
- Schema import feature
- Static typing feature
- Full axis feature
- Module feature
65Typed vs. untyped XML Data
- Untyped data (non XML Schema validated)
- ltagt3lt/agt eq 3
- ltagt3lt/agt eq 3
- Typed data (after XML Schema validation)
- lta xsitypexsintegergt3lt/agt eq 3
- lta xsitypexsstringgt3lt/agt eq 3
- lta xsitypexsintegergt3lt/agt eq 3
- lta xsitypexsstringgt3lt/agt eq 3
66XML data equivalence
- XQuery has multiple notions of data equality
- , eq, is, fndeep-equal()
- Expected properties
- Transitivity, reflexivity and symmetry
- Necessary for grouping, indexing and hashing
- Additional property
- if ( data1 equal data2 ) then ( f(data1) equal
f(data2) ) - Necessary for memoization, caching
- None of the equality relationships above (except
is) satisfies those properties - The is relationship only applies to nodes
- Careful implementations for indexes, hashing,
caches
67XQuery type system
- XQuery has a powerful (and complex!) type system
- XQuery types are imported from XML Schemas
- Every XML data model instance has a dynamic type
- Every XQuery expression has a static type
- Pessimistic static type inference
- The goal of the type system is
- detect statically errors in the queries
- infer the type of the result of valid queries
- ensure statically that the result of a given
query is of a given (expected) type if the input
dataset is guaranteed to be of a given type
68XQuery type system components
- Atomic types
- xdtuntypedAtomic
- All 19 primitive XML Schema types
- All user defined atomic types
- Empty, None
- Type constructors (simplification!)
- Elements element name type
- Attributes attribute name type
- Alternation type1 type 2
- Sequence type1, type2
- Repetition type
- Interleaved product type1 type2
- type1 intersect type2 ?
- type1 subtype of type2 ?
- type1 equals type2 ?
69SQL vs. XQuery
- XQuery has implicit Operations
- casts, exists, duplicate elimination, sorting,
... - Important for heterogeneous dataImportant for
queries if the schema is unknown - XQuery has Constructors
- Important for Transformations of Messages (Info
Hubs) - XQuery can be used for Documents
- Important for natural-language processing, CMS
- XQuery ist Turing-complete
- Can be extended to be a full-fledge PL
- XQuery has formals semantics
- Easier to implement, optimize, and teach (???)
70- Give Company of Authors (implicit exists)
- for a in ltaddressgt
- ltnamegtChamberlinlt/namegt
- ltcompanygtIBMlt/companygt
- lt/addressgt ...
- for b in ltarticlegt
- ltauthorgtChamberlinlt/authorgt
- ltauthorgtFlorescult/authorgt ...
- lt/articlegt
- where a//name b//author
- return a//company
71SQL vs. XQuery
- SELECT auttor
- FROM article
- ERROR!
- article/auttor
- () or ERROR!
72Constructors / Transformation
- This is legal XQuery
- ltbookgt
- ltauthorgtChamberlinlt/authorgt
- lttitlegtDB2 Universal Databaselt/titlegt
- lt/bookgt
- This is also legal XQuery
- ltbookgt
- ltauthorgt addresscompany IBM/name
- lt/authorgt
- lttitlegtDB2 Universal Databaselt/titlegt
- lt/bookgt
- SQL needs DDL Operations (Administrator) for this!
73Transformation
- Group Books by Author
- for a in distinct-values(bib//author)
- let t bib//author a//title
- return
- ltbibgt
- ltauthor name a gt t lt/authorgt
- lt/bibgt
74Transformation
- Group Books by Author
- ltbibgt
- ltbookgt
- ltauthorgtChamberlinlt/authorgt
- lttitlegtDB2 Universal databaselt/titlegt
- lt/bookgt
- ...
- lt/bibgt
75Transformation
- Group Books by Author
- ltbibgt
- ltauthor name Chamberlingt
- lttitlegtDB2 Universal databaselt/titlegt
- lttitlegtQuilt An XML Query...lt/titlegt
- ...
- lt/authorgt
- ...
- lt/bibgt
76Library modules (example)
Importing module
Library module
- module namespace modmoduleURI
- declare namespace nsURI1
- define variable modzero as xsinteger 0
- define function modadd(x as xsinteger, y as
xsinteger) - as xsinteger
-
- xy
-
import module namespace nsmoduleURI nsadd(2,
nszero)
77Some missing functionalities
- Standard semantics for Web services invocation
- Try-catch mechanism
- Window-based aggregates
- Group by
- Distinct by
- Eval () function
- Full text search ()
- Updates()
- Integrity constraints / assertions
- Metadata introspection
78XQuery Full Text Search Extension
- Complete specification
- Current W3C Working Draft
- Examples
- /book_at_year2004" and ./title ftcontains
"Expert" -
- for book in /book.//author ftcontains Laing"
- let score ftscore(book/title ftcontains
"Web Site Usability") - where score gt 0.8
- order by score descending return book/_at_number
79A fraction of a real customer XQuery
80let wlc document("tests/ebsample/data/ebSample
.xml") let ctrlPackage "foo.pkg" let wfPath
"test" let tp-list for tp in
wlc/wlc/trading-partner return lttrading-partner
name"tp/_at_name"
business-id"tp/party-identifier/_at_business-id"
description"tp/_at_description"
notes"tp/_at_notes" type"tp/_at_type"
email"tp/_at_email"
phone"tp/_at_phone" fax"tp/_at_fax"
username"tp/_at_user-name"
81 for tp-ad in tp/address
return tp-ad for
eps in wlc/extended-property-set where
tp/_at_extended-property-set-name eq eps/_at_name
return eps for
client-cert in tp/client-certificate
return ltclient-certificate
name"client-cert/_at_name" gt
lt/client-certificategt
82 for server-cert in tp/server-certific
ate return ltserver-certificate
name"server-cert/_at_name"
gt lt/server-certificategt
for sig-cert in tp/signature-certificate
return ltsignature-certificate
name"sig-cert/_at_name" gt
lt/signature-certificategt for
enc-cert in tp/encryption-certificate
return ltencryption-certificate
name"enc-cert/_at_name" gt
lt/encryption-certificategt
83 for eb-dc in
tp/delivery-channel for eb-de
in tp/document-exchange for
eb-tp in tp/transport where
eb-dc/_at_document-exchange-name eq eb-de/_at_name
and eb-dc/_at_transport-name
eq eb-tp/_at_name and
eb-de/_at_business-protocol-name eq "ebXML"
return ltebxml-binding
name"eb-dc/_at_name"
business-protocol-name"eb-de/_at_b
usiness-protocol-name"
business-protocol-version"eb-de/_at_protocol-versi
on" \
is-signature-required"eb-dc/_at_nonrepudiation-of-
origin"
is-receipt-signature-required"eb-dc/_at_nonrepudia
tion-of-receipt"
signature-certificate-name"eb-de/EBXML-binding/
_at_signature-certificate-n"
delivery-semantics"eb-de/EBXML-binding/_at_delive
ry-semantics"
if(xfempty(eb-de/EBXML-binding/_at_ttl))
then()
else attribute persist-duration
concat((eb-de/EBXML-binding/_at_ttl
div 1000), " seconds")
84 if( xfempty(eb-de/EBX
ML-binding/_at_retries))
then () else
eb-de/EBXML-binding/_at_retries
if(
xfempty(eb-de/EBXML-binding/_at_retry-interval))
then ()
else attribute retry-interval
concat((eb-de/EBXML-binding/_at_ret
ry-interval div 1000), " seconds")
lttransport
protocol"eb-tp/_at_protocol"
protocol-version"eb-tp/_at_protocol-ve
rsion"
endpoint"eb-tp/endpoint1/_at_uri"
gt
85 for ca in wlc/wlc/collaboration-agreement
for p1 in
ca/party1 for
p2 in ca/party2
for tp1 in wlc/wlc/trading-partner
for tp2 in
wlc/wlc/trading-partner
where p1/_at_delivery-channel-name eq
eb-dc/_at_name and
tp1/_at_name eq p1/_at_trading-partner-name
and tp2/_at_name eq
p2/_at_trading-partner-name
or p2/_at_delivery-channel-name eq
eb-dc/_at_name and
tp1/_at_name eq p1/_at_trading-partner-name
and tp2/_at_name eq
p2/_at_trading-partner-name
86 return
if (p1/_at_trading-partner-nametp/_at_name)
then
ltauthentication
client-partner-name"tp2/_at_name"
client-certificate-name"tp2/client-certificate/
_at_name"
client-authentication"
if(xfempty(tp2/client-certificate))
then
"NONE"
else "SSL_CERT_MUTUAL"
"
server-certificate-n
ame"
if(tp1/_at_type"REMOTE")
then
tp1/server-certificate/_at_name
else ""
"
server-authentication"
if(eb-tp/_at_protocol"htt
p")
then "NONE"
else "SSL_CERT"
"
87 gt
lt/authenticationgt
else
ltauthentication
client-partner-name"tp1/_at_na
me"
client-certificate-name"tp1/client-certifica
te/_at_name"
client-authentication"
if(xfempty(tp1/client-certificate))
then
"NONE"
else "SSL_CERT_MUTUAL"
"
server-certificate-n
ame"
if(tp2/_at_type"REMOTE")
then
tp2/server-certificate/_at_name
else ""
"
server-authentication"
if(eb-tp/_at_protocol"htt
p")
then "NONE"
else "SSL_CERT"
"
gt
lt/authenticationgt
88 lt/transportgt
lt/ebxml-bindinggt -- RosettaNet
Binding -- for eb-dc
in tp/delivery-channel for
eb-de in tp/document-exchange
for eb-tp in tp/transport where
eb-dc/_at_document-exchange-name eq eb-de/_at_name
and eb-dc/_at_transport-name
eq eb-tp/_at_name and
eb-de/_at_business-protocol-name eq "RosettaNet"
return
ltrosettanet-binding
name"eb-dc/_at_name"
business-protocol-name"eb-de/_at_business-protocol
-name"
business-protocol-version"eb-de/_at_protocol-versi
on"
89 is-signature-required"eb-dc/_at_nonrepudiation-of
-origin"
is-receipt-signature-required"eb-dc/_at_nonrepudia
tion-of-receipt"
signature-certificate-name"eb-de/RosettaNet-bin
ding/_at_signature-certi\ ficate-name"
encryption-certificate-name"eb-de/Ro
settaNet-binding/_at_encryption-cer\ tificate-name"
cipher-algorithm"eb-de/
RosettaNet-binding/_at_cipher-algorithm"
encryption-level"
if (eb-de/RosettaNet-binding/_at_encr
yption-level 0)
then "NONE" else
if(eb-de/RosettaNet-binding/_at_encryption-level
1) then
"PAYLOAD"
else "ENTIRE_PAYLOAD"
" -- process-timeout"eb-d
e/RosettaNet-binding/_at_time-out" --
gt
if( xfempty(eb-de/RosettaNet-binding/_at_retries))
then ()
else eb-de/RosettaNet-binding/_at_retries
90 if(xfempty(eb-de/Rose
ttaNet-binding/_at_retry-interval))
then () else
attribute retry-interval
concat((eb-de/RosettaNet-binding/_at_retry-i
nterval div 1000), "\ seconds")
if(xfempty(eb-de/RosettaNet-binding/_at_time-out))
then()
else attribute process-timeout
concat((eb-de/RosettaNet-bindi
ng/_at_time-out div 1000), " secon\ ds")
lttransport
protocol"eb-tp/_at_protocol"
protocol-version"eb-tp/_at_protoco
l-version"
endpoint"eb-tp/endpoint1/_at_uri"
gt
91 for ca in wlc/wlc/collaboration-agreement
for p1 in
ca/party1 for
p2 in ca/party2
for tp1 in wlc/wlc/trading-partner
for tp2 in
wlc/wlc/trading-partner
where p1/_at_delivery-channel-name eq
eb-dc/_at_name and
tp1/_at_name eq p1/_at_trading-partner-name
and tp2/_at_name eq
p2/_at_trading-partner-name
or p2/_at_delivery-channel-name eq
eb-dc/_at_name and
tp1/_at_name eq p1/_at_trading-partner-name
and tp2/_at_name eq
p2/_at_trading-partner-name
return
if (p1/_at_trading-partner-nametp/_at_name)
then
ltauthentication
92 ltauthentication
client-partner-name"tp2/_at_name"
client-certificate-name"tp2/client-certificate/
_at_name"
client-authentication"
if(xfempty(tp2/client-certificate))
then
"NONE"
else "SSL_CERT_MUTUAL"
"
server-certificate-n
ame"
if(tp1/_at_type"REMOTE")
then
tp1/server-certificate/_at_name
else ""
"
server-authentication"
if(eb-tp/_at_protocol"http")
then "NONE"
else
"SSL_CERT"
"
gt
lt/authenticationgt
93 else
ltauthentication
client-partner-name"tp1/_at_name"
client-certificate-name"tp1/client-certificate/
_at_name"
client-authentication"
if(xfempty(tp1/client-certificate))
then
"NONE"
else "SSL_CERT_MUTUAL"
"
server-certificate-n
ame"
if(tp2/_at_type"REMOTE")
then
tp2/server-certificate/_at_name
else ""
"
server-authentication"
if(eb-tp/_at_protocol"http")
then "NONE"
else
"SSL_CERT"
"
gt
lt/authenticationgt
94 lt/transportgt
lt/rosettanet-bindinggt lt/trading-partne
rgt let sv for cd in wlc/wlc/conversation-de
finition for role in cd/role where
xfnot(xfempty(role/_at_wlpi-template) or
role/_at_wlpi-template"") and cd/_at_business-protoc
ol-name"ebXML" or cd/_at_business-protocol-name"Ro
settaNet" return ltservicePairgt
ltservice name"xfconcat(wfPa
th, role/_at_wlpi-template, '.jpd')"
description"role/_at_description"
note"role/_at_note"
service-type"WORKFLOW"
business-protocol"xfupper-case(cd/_at_business-pr
otocol-name)" gt
95. . . (60 )
96XQuery Use Case Scenarios
- XML transformation language in Web Services
- Large and very complex queries
- Input message external data sources
- Small and medium size data sets (xK -gt xM)
- Transient and streaming data (no indexes)
- With or without schema validation
- XML message brokers
- Simple path expressions, single input message
- Small data sets
- Transient and streaming data (no indexes)
- Mostly non schema validated data
- Semantic data verification
- Mostly messages
- Potentially complex (but small) queries
- Streaming and multiquery optimization required
97XQuery Usage Scenarios (cont.)
- Data Integration
- Complex but smaller queries (FLOWRs, aggregates,
constructors) - Large, persistent, external data repositories
- Dynamic data (via Web Services invocations)
- Large volumes of centralized XML data
- Logs and archives
- Complex queries (statistics, analytics)
- Mostly read only
- Large content repositories
- Large volume of data (books, manuals, etc)
- With or without schema validation
- Full text essential, update required
98XQuery Usage Scenarios (cont.)
- Large volumes of distributed textual data
- XML search engines
- High volume of data sources
- Full text, semantic search crucial
- RSS data
- High volume of input data channels
- Data is pushed, not pulled
- Structure of the data very simple, each item
bounded size - Aggregators using mostly full-text search
99XQuery Usage Scenarios (cont.)
- Data Integration
- Complex but smaller queries (FLOWRs, aggregates,
constructors) - Large, persistent, external data repositories
- Dynamic data (via Web Services invocations)
- Large volumes of centralized XML data
- Logs and archives
- Mostly read only
- Large volumes of distributed textual data
- XML data sources scattered on the Web
- BLOGS
- Lots (e.g. millions) of input data channels
- Data is pushed, not pulled
- Structure of the data very simple, each item
bounded size - Aggregators using mostly full-text search
100Criteria for XQuery usages
- Type of queries (e.g. simple, complex,
construction-intensive, full text search
intensive) - Volume of queries
- Native XML or virtual XML views of other forms of
data - XML Schema validated data or not
- Volume of data per query
- Number of data sources
- Transient data vs. persistent data
- Push data vs. pull data
- Typed vs. untyped data
- Read only data vs. updatable data
- Distributed vs. centralized data sets
- Data compressed/encrypted or not
- Target architectures
- Customer expectation
Each scenario requires different processing
techniques.
101XUpdate
102Update XQuery extension
- Activity in W3C is just beginning
- W3C Requirements document
- Use as transformation DB operation
(side-effect) - Preserve Ids of affected nodes! (No
NodeConstruction!) - Tentative examples
- delete //book_at_yearlt1968
- insert ltauthor/gt into //book_at_ISBN34556
- for x in //book
- where x/yearlt2000 and x/price gt100
- do replace value of x/price with
x/price-0.3x/price - if (book/price gt200) then do rename book as
expensive-book
103Overview
- Insert
- Insert new XML instances
- Delete
- Delete XML instances
- Replace, Rename
- Replace/Rename XML Instances
- Empty Update
- No operation
- FLWUpdate
- bulk update (For-Loop)
- Conditional Update
- Conditional update (If)
104INSERT - Variant 1
- Insert a new element into a documentinsert
UpdateContent into TargetNode - UpdateContent any sequence of items (nodes,
values) - TargetNode Exactly one document or element
- otherwise ERROR
- Specify whether to insert at the beginning or end
- as last Content becomes first child of Target
(default) - as first Content becomes last child of Target
- Nodes in Content assume a new Id.
- Whitespace, Textconventions as in
ElementConstruction of XQuery
105INSERT Variant 1
- Insert new book at the end of the library
- insert ltbookgt lttitlegtDie wilde Wutzlt/titlegt
lt/bookgt - into document(www.uni-bib.de)//bib
- Insert new book at the beginning of the
libraryinsert ltbookgt lttitlegtDie wilde
Wutzlt/titlegt lt/bookgt - as first into document(www.uni-bib.de)//bib
- Insert new attribte into an element
- insert (attribute age 13 , ltparents xsinil
true/gt) - into document(ewm.de)//person_at_name KD
106INSERT - Variant 2
- Insert at a particular point in the
documentinsert UpdateContent (after before)
TargetNode - UpdateContent No attributes allowed!
- TargetNode One Element, Comment or PI.
- Otherwise ERROR
- Specify whether before or behind target
- Before vs. After
- Nodes in Content assume new Identity
- Whitespace, Text conventions as
ElementConstructors of XQuery
107Insert - Variant 2
insert ltauthorgtFlorescult/authorgt before
//articletitle XL/author. Grünhagen
108INSERT - Open Questions
- Insert in Schema-validated Documents?
- When and how to validate types?