Title: Positional Grouping in XQuery
1Positional Groupingin XQuery
2Definitions
- Grouping
- a function that takes a sequence as input, and
produces a sequence of sequences as output - Value-based Grouping
- assignment of items to groups is based on
properties of the item - Positional Grouping
- Identification of structure based on patterns in
the sequence of items
3Requirements
- Based on use cases
- Real-life problems gathered from xsl-list
1999-2005 - Some were captured in the XSLT 2.0 requirements
(2001), others came later - No attempt to define any theoretical notion of
completeness, e.g. a class of grammars
4Use Case 1 Headings and Paragraphs
ltbodygt lth2gtMorninglt/h2gt ltpgtGot
uplt/pgt ltpgtMade tealt/pgt lth2gtAfternoonlt/h2gt ltp
gtHad lunchlt/pgt ltpgtFed the catlt/pgt ltpgtPosted a
letterlt/pgtlt/bodygt
ltchaptergt ltsection title"Morning"gt ltparagtGo
t uplt/paragt ltparagtMade tealt/paragt lt/sectiongt
ltsection title"Afternoon"gt ltparagtHad
lunchlt/paragt ltparagtFed the catlt/paragt ltpar
agtPosted a letterlt/paragt lt/sectiongtlt/chaptergt
5Use Case 2 Adjacent Bullets
ltp/gt ltq/gt ltbulletgtonelt/bulletgt ltbulletgttwolt/bullet
gt ltx/gt lty/gt
ltp/gt ltq/gt ltlistgt ltbulletgtonelt/bulletgt
ltbulletgttwolt/bulletgt lt/listgt ltx/gt lty/gt
6Use Case 3 Term Definition Lists
ltdtgtXMLlt/dtgt ltddgtExtensible Markup
Languagelt/ddgt ltdtgtXSLTlt/dtgt ltdtgtXSL
Transformationslt/dtgt ltddgtA language for
transforming XMLlt/ddgt ltddgtA specification
produced by W3Clt/ddgt
lttermgt ltdtgtXMLlt/dtgt ltddgtExtensible Markup
Languagelt/ddgt lt/termgt lttermgt ltdtgtXSLTlt/dtgt
ltdtgtXSL Transformationslt/dtgt ltddgtA language for
transforming XMLlt/ddgt ltddgtA specification
produced by W3Clt/ddgt lt/termgt
7Use Case 4 Continuation Markers
ltin cont"yes"gt One way tolt/ingtltin
cont"yes"gt understand positional grouping
islt/ingtltingt as an exercise in
parsing.lt/ingt ltin cont"yes"gt To get from a
sequence of itemslt/ingt ltin cont"yes"gt to a
tree, we could uselt/ingt ltingt some kind of
grammar.lt/ingt
ltparagtOne way to understand positional grouping
is as an exercise in parsing.lt/paragt ltparagtTo
get from a sequence of items to a tree, we could
use some kind of grammar.lt/paragt
8Use Case 5 Page ranges
4, 6, 9, 11, 12, 13, 18, 20, 21
4, 6, 9, 11-13, 18, 20-21
9Use Case 6 Arrange in rows
"Green", "Pink", "Lilac", "Turquoise", "Peach",
"Opal", "Champagne"
lttablegt lttrgt lttdgtGreenlt/tdgt lttdgtPinklt/tdgt
lttdgtLilaclt/tdgt lt/trgt lttrgt lttdgtTurquoiselt/tdgt
lttdgtPeachlt/tdgt lttdgtOpallt/tdgt lt/trgt lttrgt
lttdgtChampagnelt/tdgt lttdgt lt/tdgt lttdgt
lt/tdgt lt/trgtlt/tablegt
10Use Case 7 Level Numbers
ltdatagt ltgedcom level"0"/gt ltindi
level"1"/gt ltname level"2"/gt ltfirst
level"3"gtMichaellt/firstgt ltlast
level"3"gtKaylt/lastgt ltemail level"2"gtmike_at_saxonic
a.comlt/emailgt ltindi level"1"/gt ltname
level"2"/gt ltfirst level"3"gtNormlt/firstgt ltlast
level"3"gtWalshlt/lastgt ltemail level"2"gtnorm_at_nwals
h.comlt/emailgt lt/datagt
ltgedcomgt ltindigt ltnamegt
ltfirstgtMichaellt/firstgt ltlastgtKaylt/lastgt
lt/namegt ltemailgtmike_at_saxonica.comlt/emailgt
lt/indigt ltindigt ltnamegt
ltfirstgtNormlt/firstgt ltlastgtWalshlt/lastgt
lt/namegt ltemailgtnorm_at_nwalsh.comlt/emailgt
lt/indigt lt/gedcomgt
11XQuery 1.0 Solutions
- Head/tail recursion
- Positional indexing
12"Headings and Paragraphs"using head/tail
recursion
declare function localsection(e as element(H2))
ltsectiongt localnextPara(e/following
-sibling1selfP) lt/sectiongt decla
re function localnextPara(p as element(P)?)
if (p) then (p, localnextPara(p/following-
sibling1selfP)) else () ltoutgtfor
h in doc('doc.xml')//BODY/H2 return
localsection(h)lt/outgt
13"Term Definition Lists"using positional indexing
let s (for e at p in input
where eselfdt and
not(preceding-sibling1selfdt)
return p, count(input)1) for i in 1 to
count(s) - 1 return lttermgt for j in si
to si 1 - 1 return inputj lt/termgt
14XSLT 2.0 approach
- ltxslfor-each-groupgt handles both value-based and
positional grouping - But the two are largely distinct
- Three varieties of positional grouping
- group-starting-with
- group-ending-with
- group-adjacent
15Identifying Breaks
- All use cases have these properties
- input sequence is the concatenation of the output
sequence - "breaks" depend only on
- the item before the break
- the item after the break
- position (use case "arrange in rows" only)
16Conceptual approach to solution
partition( population as item(),
break-function as function(after as
item(), before as
item(), position as
xsinteger) as xsboolean(),
action as function(group as item())
as item() ) as item()
17Syntactic realisation
partition g in population break after a
before b at p where condition return
action
18Solution to Use Case 1 Headings and Paragraphs
partition section in body/ break before e
where eselfh2 return ltsection
title"section/h2"gt for p in
section/p return ltparagtplt/paragt
lt/sectiongt
19Solution to Use Case 2 Adjacent Bullets
partition children in break after a before
b where not(aselfbullet and
bselfbullet) return if
(children/selfbullet) then ltlistgt
children lt/listgt else children
20Use Case 3 Term Definition Lists
ltdtgtXMLlt/dtgt ltddgtExtensible Markup
Languagelt/ddgt ltdtgtXSLTlt/dtgt ltdtgtXSL
Transformationslt/dtgt ltddgtA language for
transforming XMLlt/ddgt ltddgtA specification
produced by W3Clt/ddgt
lttermgt ltdtgtXMLlt/dtgt ltddgtExtensible Markup
Languagelt/ddgt lt/termgt lttermgt ltdtgtXSLTlt/dtgt
ltdtgtXSL Transformationslt/dtgt ltddgtA language for
transforming XMLlt/ddgt ltddgtA specification
produced by W3Clt/ddgt lt/termgt
partition term in break after a before b
where (aselfdd and bselfdt) return
lttermgttermlt/termgt
21Use Case 4 Continuation Markers
ltin cont"yes"gtOne way tolt/inltin cont"yes"gt
understand positional grouping is ltingt as an
exercise in parsing.lt/ingt ltin cont"yes"gtTo get
from a sequence of itemslt/ingt ltin cont"yes"gt to
a tree, we could uselt/ingt ltingt some kind of
grammar.lt/ingt
ltparagtOne way to understand positional grouping
is as an exercise in parsing.lt/paragt ltparagtTo
get from a sequence of items to a tree, we could
use some kind of grammar.lt/paragt
partition para in ./in break after a where
not(a/_at_cont "yes") return ltparagtparalt/paragt
22Use Case 5 Page ranges
4, 6, 9, 11, 12, 13, 18, 20, 21
4, 6, 9, 11-13, 18, 20-21
partition range in page-numbers break after a
before b where (b ! a 1) return if
(count(range) 1) then range
else concat(range1, "-",
rangelast()
23Use Case 6 Arrange in rows
"Green", "Pink", "Lilac", "Turquoise", "Peach",
"Opal", "Champagne"
lttablegt lttrgt...lt/trgtlttrgt...lt/trgtlttrgt...lt/trgt lt/tab
legt
partition rows in colours break at p where
((p - 1) mod 3 0) return lttrgt for i in
rows return lttdgtilt/tdgt lt/trgt
24Use Case 7 Level Numbers
declare function fgroup( items as
element(), level as xsinteger) as
element() partition group in items break
before b where b/_at_level level return
element group1/node-name()
fgroup(remove(group, 1), level 1)
fgroup(/data/, 0)
25Performance
- Algorithm is intrinsically O(n)
- Memory usage
- naive implementation uses memory proportional to
size of largest group - smart implementation can be fully streamed
- Almost inevitably better than the XQuery 1.0
solutions
26Conclusions
- Need exists for both value-based and positional
grouping - Positional grouping use-cases can be solved by
identifying breaks in terms of (before, after,
position) - Conceptual approach based on higher-order
functions, realised in concrete syntax