Title: <verse num=
1ltverse num15"gt ltsentencegt lt/versegt lt/sentencegt
Implementing Concurrent Markup in XML
Implementing Concurrent Markup in XML
Patrick Durusau (pdurusau_at_emory.edu) Society of
Biblical Literature
Matthew Brook ODonnell (m.odonnell_at_roehampton.ac.
uk) OpenText.org and University of Surrey
Roehampton
2Why Concurrent Hierarchies?
- Different Interpretations of Text
- Structures that do not properly nest in the XML
sense
- Complex textual traditions with multiple
witnesses and variants
- Recording physical layout of text and other
analysis
3Overlapping Example
- Matthew 38 Bear fruit that befits repentance,
- Matthew 39 and to not presume to say to
yourselves, We have Abraham as our father for
I tell you, God is able from these stones to
raise up children of Abraham.
4Matthew 38-9 First Choice
- ltverse idMatt.3.8gt
- Bear fruit that befits repentance,
- lt/versegt
- ltverseMatt.3.9gt
- and to not presume to say to yourselves, We
have Abraham as our father for I tell you, God
is able from these stones to raise up children of
Abraham. - lt/versegt
5Matthew 38-9 Second Choice
- ltsentencegt
- Bear fruit that befits repentance, and to not
presume to say to yourselves, We have Abraham as
our father for I tell you, God is able from
these stones to raise up children of Abraham. - lt/sentencegt
6Matthew 38-9 Verboten!
- ltverse idMatt.3.8gt
- ltsentencegt
- Bear fruit that befits repentance,
- lt/versegt
- ltverseMatt.3.9gt
- and to not presume to say to yourselves, We
have Abraham as our father for I tell you, God
is able from these stones to raise up children of
Abraham. - lt/versegt
- lt/sentencegt
7Design Principles Part 1
- Capacity to represent all occurring or imaginable
kinds of structures
- Suitability for formal or mechanical validation
- Clear identity with the notations needed for
simpler cases
- Allow for conditional indexing and processing
8Design Principles Part 2
- Allow for extraction of well-formed subtrees and
documents
- Allow for query of the position of the element
between two or more hierarchies
- Use standard XML syntax and mechanisms
- Validation and processing must be possible with
standard XML software
- Can be used with existing documents encoded in
XML markup
9Bottom Up Virtual Hierarchies
- Membership of PCDATA in a particular hierarchy
- Record that information using XPath syntax
- Gather information from multiple document
instances into a base file
- Query membership in and across hierarchies with
BUVH
10A Simple Example (1)
- Four separate (overlapping) hierarchies
11A Simple Example (2)
1
2
12A Simple Example (3)
1
2
13A Simple Example (4)
1
2
14A Simple Example (5)
1
2
15A Simple Example (6)
ltpagesgt lttextgt ltpara id"p1"gt
ltpage id"p1"gt ltline id"l1"gt
This is lt/linegt ltline
id"l2"gttextlt/linegt lt/pagegt ltpage id"p2"gt
ltline id"l1"gtin a baselt/linegt ltline
id"l2"gtfilelt/linegt lt/pagegt
lt/paragt lt/textgt lt/pagesgt
- To encode these hierarchies in a single file, one
view must be selected as base hierarchy
- The other hierarchies must be inserted into this
base hierarchy in a way that avoids overlapping
elements
ltpagesgt ltpage id"p1"gt ltline id"l1"gtThis
islt/linegt ltline id"l2"gttextlt/linegt lt/pagegt
ltpage id"p2"gt ltline id"l1"gtin a
baselt/linegt ltline id"l2"gtfilelt/linegt
lt/pagegt lt/pagesgt
ltpagesgt ltpage id"p1"gt ltline
id"l1"gtThis islt/linegt ltline
id"l2"gttextlt/linegt lt/pagegt ltpage id"p2"gt
ltline id"l1"gtin a baselt/linegt ltline
id"l2"gtfilelt/linegt lt/pagegt lt/pagesgt
16A Simple Example (7)
- . Create common base file with divisions for
- Atomic PCDATA
17A Simple Example (7)
- . For each Atomic PCDATA element
a. Locate in each hierarchy
b. Construct XML Membership XPath Expression
describing its position within the hierarchy
c. Add Tree Structure Position Attribute for
elements position in hierarchy to element in
base file
18A Simple Example (7)
ltpagesgt ltpage id"p1"gt ltline id"l1"gtThis
islt/linegt ltline id"l2"gttextlt/linegt lt/pagegt
ltpage id"p2"gt ltline id"l1"gtin a
baselt/linegt ltline id"l2"gtfilelt/linegt
lt/pagegt lt/pagesgt
ltpagesgt ltpage id"p1"gt ltline id"l1"gtThis
islt/linegt ltline id"l2"gttextlt/linegt lt/pagegt
ltpage id"p2"gt ltline id"l1"gtin a
baselt/linegt ltline id"l2"gtfilelt/linegt
lt/pagegt lt/pagesgt
lttextgt ltpara id"p1"gt This is text in a base
file lt/paragt lt/textgt
lttextgt ltpara id"p1"gt This is text in a base
file lt/paragt lt/textgt
lttextgt ltpara id"p1"gt This is text in a base
file lt/paragt lt/textgt
lttextgt ltpara id"p1"gt This is text in a base
file lt/paragt lt/textgt
ltclausesgt ltclause id"c1"gt ltsubjectgtThislt/subje
ctgt ltpredicategtislt/predicategt ltcomplementgttext
lt/complementgt ltadjunctgtin a base
filelt/adjunctgt lt/clausegt lt/clausesgt
ltclausesgt ltclause id"c1"gt ltsubjectgtThislt/subje
ctgt ltpredicategtislt/predicategt ltcomplementgttext
lt/complementgt ltadjunctgtin a base
filelt/adjunctgt lt/clausegt lt/clausesgt
ltclausesgt ltclause id"c1"gt ltsubjectgtThislt/subje
ctgt ltpredicategtislt/predicategt ltcomplementgttext
lt/complementgt ltadjunctgtin a base
filelt/adjunctgt lt/clausegt lt/clausesgt
ltclausesgt ltclause id"c1"gt ltsubjectgtThislt/subje
ctgt ltpredicategtislt/predicategt ltcomplementgttext
lt/complementgt ltadjunctgtin a base
filelt/adjunctgt lt/clausegt lt/clausesgt
ltclausesgt ltclause id"c1"gt ltsubjectgtThislt/subje
ctgt ltpredicategtislt/predicategt ltcomplementgttext
lt/complementgt ltadjunctgtin a base
filelt/adjunctgt lt/clausegt lt/clausesgt
ltpagesgt ltpage id"p1"gt ltline id"l1"gtThis
islt/linegt ltline id"l2"gttextlt/linegt lt/pagegt
ltpage id"p2"gt ltline id"l1"gtin a
baselt/linegt ltline id"l2"gtfilelt/linegt
lt/pagegt lt/pagesgt
ltpagesgt ltpage id"p1"gt ltline id"l1"gtThis
islt/linegt ltline id"l2"gttextlt/linegt lt/pagegt
ltpage id"p2"gt ltline id"l1"gtin a
baselt/linegt ltline id"l2"gtfilelt/linegt
lt/pagegt lt/pagesgt
ltpagesgt ltpage id"p1"gt ltline id"l1"gtThis
islt/linegt ltline id"l2"gttextlt/linegt lt/pagegt
ltpage id"p2"gt ltline id"l1"gtin a
baselt/linegt ltline id"l2"gtfilelt/linegt
lt/pagegt lt/pagesgt
a. Locate in hierarchy
b. Construct XPath expression
c. Add TSP Attribute
a. Locate in hierarchy
b. Construct XPath expression
c. Add TSP Attribute
a. Locate in hierarchy
b. Construct XPath expression
c. Add TSP Attribute
ltbaseFilegt ltw id"w1" pgpages"/pages/page1
_at_id'p1'/line1_at_id'l1'/1" txtext"/text
/para1/_at_id'p1'/1" snclauses"/clauses/c
lause1_at_id'c1'/subject1/1"gt This lt/wgt
ltbaseFilegt ltw id"w1gtThislt/wgt ltw
id"w2"gtislt/wgt ...
ltbaseFilegt ltw id"w1 pgpages"/pages/page1
_at_id'p1'/line1_at_id'l1'/1"gt This lt/wgt
ltbaseFilegt ltw id"w1" pgpages"/pages/page1
_at_id'p1'/line1_at_id'l1'/1" txtext"/text
/para1/_at_id'p1'/1"gt This lt/wgt
1. Page hierarchy
2. Text hierarchy
3. Linguistic hierarchy
/clauses
/pages
/pages/page1_at_idp1
/pages/page1_at_id"p1"/line1_at_id"l1"
/pages/page1_at_id"p1"/line1_at_id"l1"/1
/text
/text/para1_at_id"p1
/text/para1_at_id"p1"/1
/clauses/clause1_at_idc1"
/clauses/clause1_at_idc1/subject1
/clauses/clause1_at_idc1/subject1/1
19A Simple Example (8)
ltbaseFile xmlnssn"urnclause" xmlnstx
"urntext" xmlnspg"urnpages" xmln
svr"urnvariants"gt ltw id"w1" snclauses"/
clauses/clause1_at_id'c1'/s1/1" txtext
"/text/para1_at_id'p1'/1" pgpages"/pages
/page1_at_id'p1'/line1_at_id'l1'/1" gtThislt
/wgt ltw id"w2" snclauses"/clauses/clause1
_at_id'c1'/p1/1" txtext"/text/para1_at_
id'p1'/2" pgpages"/pages/page1_at_id'p1
'/line1_at_id'l1'/2" gtislt/wgt ltw
id"w3" snclauses"/clauses/clause1_at_id'c1
'/c1/1" txtext"/text/para1_at_id'p1'/
3" pgpages"/pages/page1_at_id'p1'/line2
_at_id'l2'/1" vrvariants"/variants/app1
_at_id'tv1'/rdg1_at_wit'A'_at_val'texs'" gttextlt
/wgt
20A Simple Example (8)
ltw id"w4" snclauses"/clauses/clause1_at_id
'c1'/a1/1" txtext"/text/para1_at_id'p
1'/4" pgpages"/pages/page2_at_id'p2'/li
ne1_at_id'l1'/1" gtinlt/wgt ltw
id"w5" snclauses"/clauses/clause1_at_id'c1
'/a1/2" txtext"/text/para1_at_id'p1'/
5" pgpages"/pages/page2_at_id'p2'/line1
_at_id'l1'/2" vrvariants"/variants/app2
_at_id'tv2'/rdg1_at_wit'C'_at_val'an'" gtalt/wgt
ltw id"w6" snclauses"/clauses/clause1_at_id
'c1'/a1/3" txtext"/text/para1_at_id'p
1'/6" pgpages"/pages/page2_at_id'p2'/li
ne1_at_id'l1'/3" gtbaselt/wgt ltw
id"w7" snclauses"/clauses/clause1_at_id'c1
'/a1/4" txtext"/text/para1_at_id'p1'/
7" pgpages"/pages/page2_at_id'p2'/line2
_at_id'l2'/1" gtfilelt/wgt lt/baseFilegt
21A Simple Example (9)
- Queries across different hierarchies can be
carried out using XPath expressions, e.g. using
XSLT
- Example 1
- Locate words that have textual variants and are
found on page 2
XPath query
//w
//w_at_vrvariants
//w_at_vrvariantscontains(_at_pgpages,'p2')
that takes part in variants hierarchy (i.e. It
has a textual variant)
Every ltwgt element in base file
And has a pgpages attribute that contains the
string p2, i.e. an id for the second page
22A Simple Example (9)
- Queries across different hierarchies can be
carried out using XPath expressions, e.g. using
XSLT
- Example 2
- Locate words in the first clause that do not
occur on the first line of their page
XPath query
//w
//wcontains(_at_snclauses,'clause1')
//wcontains(_at_snclauses,'clause1')
not(contains(_at_pgpages,'line1'))
Every ltwgt element in base file
that is a child of a ltclausegt element that is the
first child of the ltclausesgt element
And has a pgpages attribute that does not
contain the string line1, i.e. not the first
ltlinegt child
23Summary BUVH Approach
- Authoring of XML occurs within a single hierarchy
(any XML editor)
- Automatic construction of base file with any XSLT
processor
- Query with any XSLT processor
24Future Plans
- Development of XSLT Extensions to process BUVH
Base File
- Base file format (possible use of Xalans DTM
format?)
- Testing of BUVH against more complex examples
- Use of XLink with BUVH for read-only or large
corpora
25- ltpartingThoughtgt Markup is metadata about
PCDATA - lt/partingThoughtgt