Title: Processing of structured documents
1Processing of structured documents
- Spring 2002, Part 2
- Helena Ahonen-Myka
2XML Namespaces
- An XML document may contain multiple markup
vocabularies - reuse of existing markup, e.g. including HTML
markup in some document type - An XML namespace is a collection of names,
identified by a URI reference, which are used in
XML documents as element types and attribute names
3Author A writes a document
lt?xml version1.0?gt ltreferencesgt
ltnamegtMacmillanlt/namegt ltlink
hrefhttp//www.mcp.com/gt ltnamegtABC
Newslt/namegt ltlink hrefhttp//www.abcnews.com
/gt lt/referencesgt
4Author B adds some rating.
lt?xml version1.0?gt ltreferencesgt
ltnamegtMacmillanlt/namegt ltlink
hrefhttp//www.mcp.com/gt ltratinggt5
starslt/ratinggt ltnamegtABC Newslt/namegt
ltlink hrefhttp//www.abcnews.com/gt
ltratinggt3 starslt/ratinggt lt/referencesgt
5Also Author C wants to add some rating...
lt?xml version1.0?gt ltreferencesgt
ltnamegtMacmillanlt/namegt ltlink
hrefhttp//www.mcp.com/gt
ltratinggtGlt/ratinggt ltnamegtABC Newslt/namegt
ltlink hrefhttp//www.abcnews.com/gt
ltratinggtPGlt/ratinggt lt/referencesgt
6Author D would like to combine the documents...
lt?xml version1.0?gt ltreferencesgt
ltnamegtMacmillanlt/namegt ltlink
hrefhttp//www.mcp.com/gt ltratinggt5
starslt/ratinggt ltratinggtGlt/ratinggt
ltnamegtABC Newslt/namegt ltlink
hrefhttp//www.abcnews.com/gt ltratinggt3
starslt/ratinggt ltratinggtPGlt/ratinggt lt/reference
sgt
7Which rating? -gt different names
lt?xml version1.0?gt ltreferencesgt
ltnamegtMacmillanlt/namegt ltlink
hrefhttp//www.mcp.com/gt ltqa-ratinggt5
starslt/qa-ratinggt ltpa-ratinggtGlt/pa-ratinggt
ltnamegtABC Newslt/namegt ltlink
hrefhttp//www.abcnews.com/gt ltqa-ratinggt3
starslt/qa-ratinggt ltpa-ratinggtPGlt/pa-ratinggt lt/
referencesgt
8Namespaces give a disciplined method for naming
lt?xml version1.0?gt ltreferences
xmlnsqahttp//joker.com/2000/star-rating
xmlnspahttp//penguin.xmli.com/2
000/review
xmlnshttp//pineapplesoft.com/1999/refgt
ltnamegtMacmillanlt/namegt ltlink
hrefhttp//www.mcp.com/gt ltqaratinggt5
starslt/qaratinggt ltparatinggtGlt/paratinggt
... lt/referencesgt
9Namespaces
- xmlnsqahttp//joker.com/2000/star-rating
- qa prefix
- http//joker.com/2000/star-rating
- the namespace
- a unique name (URI guarantees) no need to
retrieve anything from the address - xmlns http//pineapplesoft.com/1999/refgt
- the default namespace
- elements without prefixes belong to this
namespace - references, name, link
10Namespaces
- qarating
- a qualified name (Qname)
- scoping
- The namespace is valid for the element where it
is declared and all the elements within its
content
11Scoping
lt?xml version1.0?gt ltrefreferences
xmlnsrefhttp//pineapplesoft.com/1999/refgt
ltrefnamegtMacmillanlt/refnamegt ltreflink
hrefhttp//www.mcp.com/gt ltparating
xmlnspahttp//penguin.xmli.com/2000/reviewgtGlt/
paratinggt ltrefnamegtABC Newslt/refnamegt
ltreflink hrefhttp//www.abcnews.com/gt
ltqarating xmlnsqahttp//joker.com/2000/star-r
atinggt 3 starslt/qaratinggt lt/refrefer
encesgt
12Namespaces and DTD
- XML 1.0 DTDs are not namespace-aware
- all the elements and attributes that are in some
namespace have to be declared using the
corresponding prefix - for elements with prefix pre
- an attribute xmlnspre has to be declared
13Namespaces and DTD
lt?xml version1.0?gt lt!DOCTYPE refreferences
lt!ELEMENT refreferences
(refname, reflink, (parating
qarating))gt lt!ATTLIST refreferences xmlnsref
CDATA REQUIREDgt lt!ELEMENT refname
(PCDATA)gt lt!ELEMENT reflink EMPTYgt lt!ATTLIST
reflink href CDATA REQUIREDgt lt!ELEMENT
parating (PCDATA)gt lt!ATTLIST parating xmlnspa
CDATA REQUIREDgt lt!ELEMENT qarating
(PCDATA)gt lt!ATTLIST qarating xmlnsqa CDATA
REQUIREDgt gt
14DTD external and internal subsets
- external and internal subset make up the DTD
internal has higher precedence - syntax
- lt!DOCTYPE root-type-name SYSTEM ex.dtd
lt!-- external subset in file ex.dtd --gt
lt!-- internal subset may come here
--gt gt - internal subset may declare new elements (with
attributes) or new attributes for existing
elements - namespaces facilitate the control of name
conflicts
15Namespaces and XML Schema
- An XML Schema document contains declarations of
namespaces that are used in the document - e.g. xmlnsxsdhttp//www.w3.org/2001/XMLSchema
for the elements with special XML Schema
semantics - Target namespace these definitions included in
this schema give definition to this namespace - targetNamespaceurimywork
16Namespaces and XML Schema
- In XML Schema, schema components from different
target namespaces can be used together - -gt enables the schema validation of instance
content defined across multiple namespaces
17XML Information set
- An XML documents information set consists of a
number of information items - an information item is an abstract description of
some part of an XML document - mainly to be used in other specifications
- each information item has a set of associated
named properties
18XML Information set
- Tree structure provided by the processor (no
special interface is specified) - e.g. entities expanded to their replacement text,
attributes with their default values - properties e.g. for each element its child
elements and attributes
19Information items
- document information item
- element information items
- attribute information items
- processing instruction information items
- unexpanded entity reference information items
- character information items
20Information items (cont.)
- comment information items
- document type declaration information item
- unparsed entity information items
- notation information items
- namespace information items
21Example document information item
- There is exactly one document information item in
the information set - all information items are accessible from the
properties of the document information item,
either directly or indirectly through the
properties of other information items
22Example document information item
- Properties
- children
- document element
- notations
- unparsed entities
- base URI
- character encoding scheme
- standalone
- version
- all declarations processed
23Example element information items
- There is an element information item for each
element appearing in the XML document - one of the element information items is the value
of the document element property of the document
information item (root element) - all other element information items are
accessible recursively
24Example element information items
- An element information item has the following
properties - namespace name
- local name
- prefix
- children
- attributes
- namespace attributes
- in-scope namespaces
- base URI
- parent
25Example
lt?xml version1.0?gt ltmsgmessage
docdate19990421 xmlnsdochttp//doc.example
.org/namespaces/doc xmlnsmsghttp//message.ex
ample.org/ gtPhone home!lt/msgmessagegt
26The information set for the sample document
- A document information item
- an element information item with namespace name
http//message.example.org/, local part
message, and prefix msg
27The information set for the sample document
(cont.)
- an attribute information item with the namespace
name http//doc.example.org/namespaces/doc,
local part date, prefix doc, and normalized
value 19990421 - three namespace information items for the
http//www.w3.org/XML/1998/namespace,
http//doc.example.org/namespaces/doc,
http//message.example.org namespaces
28The information set for the sample document
(ctnd.)
- Two attribute information items for the namespace
attributes - eleven character information items for the
character data
29XML 1.0 reporting requirements
- For instance
- an XML processor must always provide all
characters in a document that are not part of
markup to the application - a validating XML processor must inform the
application which of the character data in a
document is white space appearing within element
content - an XML processor must normalize line-ends to LF
before passing them to the application
30XML 1.0 reporting requirements (ctnd.)
- A validating XML processor must include the
replacement text of an entity in place of an
entity reference - an XML processor must supply the default value of
attributes declared in the DTD for a given
element type but not appearing in the elements
start tag
31What is not in the information set?
- For instance,
- the document type name
- the difference between the two forms of an empty
element ltfoo/gt and ltfoogtlt/foogt - the order of attributes within a start-tag
- white space within start-tags (other than
significant white space in attribute values) and
end-tags - the difference between CR, CR-LF, and LF line
termination
32XML Schema
- DTDs have drawbacks
- they can only define the element structure and
attributes - they cannot define any database-like constraints
for elements - Value (min, max, etc.)
- Type (integer, string, etc.)
- DTDs are not written in XML and cannot thus be
processed with the same tools as XML documents,
XSL(T), etc. - difficult to combine different vocabularies
(namespaces) - XML Schemas
- are written in XML
- avoid most of the DTD drawbacks
33XML Schema
- XML Schema Part 1 Structures
- Element structure definition as with DTD
Elements, attributes, also enhanced ways to
control structures - XML Schema Part 2 Datatypes
- Primitive datatypes (string, boolean, float,
etc.) - Derived datatypes from primitive datatypes (time,
recurringDate) - Constraining facets for each datatype (minLength,
maxLength, pattern, precision, etc.) - The following is based on
- XML Schema Part 0 Primer (2.5.2001)
34Reminder DTD declarations
- lt!ELEMENT name (fname, lname)gt
- lt!ELEMENT address (name, street, (city, state,
zipcode) (zipcode, city))gt - lt!ELEMENT contact
(address, phone, email?)gt - lt!ELEMENT fname (PCDATA)gt
35A sample document
lt?xml version1.0?gt ltpurchaseOrder
orderDate1999-10-20gt ltshipTo countryUSgt
ltnamegtAlice Smithlt/namegt ltstreetgt123 Maple
Streetlt/streetgt ltcitygtMill Valleylt/citygt ltstategt
CAlt/stategt ltzipgt90952lt/zipgt lt/shipTogt
36Continues...
ltbillTo countryUSgt ltnamegtRobert
Smithlt/namegt ltstreetgt8 Oak Avenuelt/streetgt ltcity
gtOld Townlt/citygt ltstategtPAlt/stategt ltzipgt95819lt/z
ipgt lt/billTogt ltcommentgtHurry, my lawn is going
wild!lt/commentgt
37 continues
ltitemsgt ltitem partNum"872-AA"gt
ltproductNamegtLawnmowerlt/productNamegt
ltquantitygt1lt/quantitygt ltpricegt148.95lt/pricegt
ltcommentgtConfirm this is
electriclt/commentgt lt/itemgt ltitem
partNum"926-AA"gt ltproductNamegtBaby
Monitorlt/productNamegt ltquantitygt1lt/quantitygt
ltpricegt39.98lt/pricegt
ltshipDategt1999-05-21lt/shipDategt lt/itemgt
lt/itemsgt lt/purchaseOrdergt
38DTD
lt!ELEMENT purchaseOrder (shipTo, billTo,
comment?, items) gt lt!ATTLIST purchaseOrder
orderDate CDATA REQUIREDgt lt!ELEMENT shipTo
(name, street, city, state, zip)gt lt!ATTLIST
shipTo country CDATA REQUIREDgt lt!ELEMENT billTo
(name, street, city, state, zip)gt lt!ATTLIST
billTo country CDATA REQUIREDgt lt!ELEMENT comment
(PCDATA)gt lt!ELEMENT items (item)gt lt!ELEMENT
name (PCDATA)gt lt!ELEMENT street (PCDATA)gt
39DTD continues
lt!ELEMENT city (PCDATA)gt lt!ELEMENT state
(PCDATA)gt lt!ELEMENT zip (PCDATA)gt lt!ELEMENT
item (productName, quantity, USPrice, (comment
shipDate))gt lt!ATTLIST item partNum CDATA
REQUIREDgt lt!ELEMENT productName
(PCDATA)gt lt!ELEMENT quantity (PCDATA)gt lt!ELEMENT
USPrice (PCDATA)gt lt!ELEMENT shipDate (PCDATA)gt
40Complex and simple types
- Schema defines types for elements and attributes
- complex types allow elements in their content
and may have attributes - simple types cannot have element content and
cannot have attributes - elements can have complex or simple types,
attributes can have simple types
41XML Schema structure
ltxsdschema xmlnsxsdhttp//www.w3.org/2
001/XMLSchemagt ltxsdannotationgt
lt/xsdannotationgt ltxsdelement namepurchaseOrder
typePurchaseOrderType/gt ltxsdelement
namecomment typexsdstring/gt ltxsdcomplexTyp
e namePurchaseOrderTypegt ltxsdsequencegt
lt/xsdsequencegt ltxsdattribute
nameorderDate typexsddate/gt lt/xsdcomplexTy
pegt lt/xsdschemagt
42USAddress type
ltxsdcomplexType nameUSAddress gt
ltxsdsequencegt ltxsdelement namename
typexsdstring /gt ltxsdelement
namestreet typexsdstring /gt
ltxsdelement namecity typexsdstring /gt
ltxsdelement namestate typexsdstring
/gt ltxsdelement namezip
typexsddecimal /gt lt/xsdsequencegt
ltxsdattribute namecountry typexsdNMTOKEN
fixedUS /gt lt/xsdcomplexTypegt
43PurchaseOrderType
ltxsdcomplexType namePurchaseOrderTypegt
ltxsdsequencegt ltxsdelement
nameshipTo typeUSAddress /gt
ltxsdelement namebillTo typeUSAddress
/gt ltxsdelement refcomment
minOccurs0 /gt ltxsdelement
nameitems typeItems /gt
lt/xsdsequencegt ltxsdattribute
nameorderDate typexsddate
/gt lt/xsdcomplexTypegt
44Shared types, references
- element declarations for shipTo and billTo
associate different element names with the same
complex type - attribute declarations must reference simple
types - element comment declared on the top level of the
schema (here reference only)
45Occurrence constraints
- minOccurs, maxOccurs (defaults 1)
- minOccurs minimun number of times an element may
appear - element is optional, if minOccurs 0
- maxOccurs maximum number of times an element may
appear - attributes may appear once or not at all
46Attributes use, default and fixed (in attribute
declarations)
- Attribute use is used in an attribute
declaration to indicate whether the attribute is
required, optional or prohibited - default value may be provided if optional is
set - if the instance does not give the value the
default is used
47Attributes use, default and fixed (in attribute
declarations)
- Attribute fixed
- the value of the attribute is the value of fixed
ltxsdattribute nametemp1 typexsddecimal
useoptional default37 /gt ltxsdattribute
nametemp2 typexsddecimal useoptional
fixed37 /gt ltxsdattribute nametemp2
typexsddecimal userequired fixed37 /gt
48Items
ltxsdcomplexType name"Items"gt ltxsdsequencegt
ltxsdelement name"item" minOccurs"0"
maxOccurs"unbounded"gt ltxsdcomplexTypegtltxsds
equencegt ltxsdelement nameproductName
typexsdstring /gt ltxsdelement
name"quantity"gt ltxsdsimpleTypegt
ltxsdrestriction base"xsdpositiveInteger"gt
ltxsdmaxExclusive value"100"/gt
lt/xsdrestrictiongt lt/xsdsimpleTypegt
lt/xsdelementgt ltxsdelement name"USprice"
type"xsddecimal"/gt ltxsdelement
ref"comment" minOccurs"0"/gt ltxsdelement
name"shipDate" type"xsddate
minOccurs"0"/gt lt/xsdsequencegt
ltxsdattribute name"partNum" type"Sku
userequired/gt lt/xsdcomplexTypegt
lt/xsdelementgtlt/xsdsequencegt lt/xsdcomplexTypegt
49Anonymous type definitions
- Schemas can be constructed by defining sets of
named types such as PurchaseOrderType on the top
level and then declaring elements such as
purchaseOrder - if a type is used only once, it is more compactly
defined as an anonymous type
50Anonymous type definitions
- You can define anonymous types by the lack of
type in an element declaration and by the
presence of an unnamed (simple or complex) type
definition following the element name - see the Items type definition
51Global elements and attributes
- Global elements and attributes have declarations
that appear as the children of the schema element - global elements and attributes can be referenced
in one or more declarations using the ref
attribute
52Global elements and attributes
- global elements can appear in the instance
document in the place where they have been
referenced, or at the top level of the document - global declarations cannot contain references
- global declarations cannot contain occurrence
constraints
53Simple types
- Built-in types
- e.g. string, integer, positiveInteger, decimal,
float, boolean, time, date, recurringDay,
uriReference, language, ID, IDREF - must have XML Schema namespace prefix
- derived types
- derived from built-in and other derived types
- by defining restrictions to the base type
- each base type has a set of facets that can be
used for restrictions
54Facets
- XML Schema defines 15 facets
- e.g. string has facets length, minLength,
maxLength, pattern, enumeration - e.g. integer has facets pattern, enumeration,
maxInclusive, maxExclusive, minInclusive,
minExclusive, precision, scale
55Defining a new type of integer
- New type whose range of values is between 10000
and 99999
ltxsdsimpleType namemyIntegergt
ltxsdrestriction basexsdintegergt
ltxsdminInclusive value10000/gt
ltxsdmaxInclusive value99999/gt
lt/xsdrestrictiongt lt/xsdsimpleTypegt
56Patterns
ltxsdsimpleType nameSkugt ltxsdrestriction
basexsdstringgt ltxsdpattern
value"\d3-A-Z2"/gt ltxsdrestrictiongt lt/xsd
simpleTypegt
- three digits followed by a hyphen followed by
two upper-case ASCII letters
57Enumeration facet
- Limits values to a set of distinct values
ltxsdsimpleType nameUSStategt
ltxsdrestriction basexsdstringgt
ltxsdenumeration valueAK/gt
ltxsdenumeration valueAL/gt
ltxsdenumeration valueAR/gt lt!--
and so on --gt lt/xsdrestrictiongt lt/xsdsimp
leTypegt
58List types
- List types are comprised of sequences of simple
types
ltxsdelement namelistOfMyInt
typelistOfMyIntTypegt ltxsdsimpleType
namelistOfMyIntTypegt ltxsdlist
itemtypemyInteger/gt lt/xsdsimpleTypegt instance
ltlistOfMyIntgt20003 15037 95977
95945lt/listOfMyIntgt
59Union types
- Type can be chosen from a set
ltxsdelement namezips typezipUniongt ltxsdsim
pleType namezipUniongt ltxsdunion
memberTypesUSState listOfMyIntType/gt lt/xsdsimp
leTypegt ltzipsgtCAlt/zipsgt ltzipsgt95630 95977
95945lt/zipsgt
60Element content
- How to define attributes for elements with simple
type content? - In instance ltinternationalPrice
currencyEURgt423.45lt/internationalPricegt - in the sample schema ltxsdelement nameUSPrice
typexsddecimal/gt comes close - but simple types cannot have attributes
- -gt a complex type has to be defined
61Element content
- New complex type is derived from type decimal
ltxsdelement nameinternationalPricegt
ltxsdcomplexTypegt ltxsdsimpleContentgt
ltxsdextension basexsddecimalgt
ltxsdattribute namecurrency typexsdstring
/gt lt/xsdextensiongt
lt/xsdsimpleContentgt lt/xsdcomplexTypegt lt/xsdel
ementgt
62Mixed content
- Element contains both character data and
subelements
ltletterBodygt ltsalutationgtDear Mr.ltnamegtRobert
Smithlt/namegt.lt/salutationgt Your order of
ltquantitygt1lt/quantitygt ltproductNamegtBaby Monitorlt/
productNamegt shipped from our warehouse
on ltshipDategt1999-05-21lt/shipDategt
lt/letterBodygt
63Mixed content
ltxsdelement nameletterBodygt
ltxsdcomplexType mixedtruegt
ltxsdsequencegt ltxsdelement
namesalutationgt ltxsdcomplexType
mixedtruegt ltxsdsequencegt
ltxsdelement namename
typexsdstring/gt lt/xsdsequencegt
lt/xsdcomplexTypegt
lt/xsdelementgt ltxsdelement
namequantity typexsdpositiveInteger/gt
lt/xsdsequencegtlt/xsdcomplexTypegtlt/xsde
lementgt
64Empty content
- Assume we want the internationalPrice element to
have both the unit of currency and the price as
attribute values - ltinternationalPrice currencyEUR value423.45
/gt - i.e. the element has no content
- solution no elements defined in the content model
65Empty content
ltxsdelement nameinternationalPrice
ltxsdcomplexTypegt ltxsdcomplexContentgt
ltxsdrestriction basexsdanyTypegt
ltxsdattribute namecurrency
typexsdstring /gt
ltxsdattribute namevalue typexsddecimal
/gt lt/xsdrestrictiongt
lt/xsdcomplexContentgt lt/xsdcomplexTypegt lt/xsd
elementgt
66Shorthand for empty complex type
ltxsdelement nameinternationalPrice
ltxsdcomplexTypegt
ltxsdattribute namecurrency typexsdstring
/gt ltxsdattribute namevalue
typexsddecimal /gt lt/xsdcomplexTypegt lt/xsd
elementgt
67anyType
- The anyType seen in the definition for an empty
content model represents an abstraction which is
the base type from which all simple and complex
types are derived - anyType does not constrain its content in any way
- can be used like other types
- is a default if no type is specified
- ltxsdelement nameanything /gt
68Building content models
- ltxsdsequencegt fixed order
- ltxsdchoicegt (1) choice of alternatives
- ltxsdgroupgt grouping (also named)
- ltxsdallgt no order specified
69Nested choice and sequence groups
ltxsdcomplexType namePurchaseOrderTypegt
ltxsdsequencegt ltxsdchoicegt
ltxsdgroup refshipAndBill /gt
ltxsdelement namesingleUSAddress
typeUSAddress
/gt lt/xsdchoicegt ltxsdelement
nameitems typeItems /gt
lt/xsdsequencegt
70Nested choice and sequence groups
ltxsdgroup nameshipAndBillgt
ltxsdsequencegt ltxsdelement
nameshipTo typeUSAddress /gt
ltxsdelement namebillTo typeUSAddress /gt
lt/xsdsequencegt lt/xsdgroupgt
71An all group
- An all group all the elements in the group may
appear once or not at all, and they may appear in
any order - limited to the top-level of any content model
- has to be the only child at the top
- groups children must all be individual elements
(no groups), and no element in the content model
may appear more than once
72An all group
ltxsdcomplexType namePurchaseOrderTypegt
ltxsdallgt ltxsdelement nameshipTo
typeUSAddress /gt ltxsdelement
namebillTo typeUSAddress /gt
ltxsdelement refcomment minOccurs0 /gt
ltxsdelement nameitems typeItems /gt
lt/xsdallgt ltxsdattribute
nameorderDate typexsddate /gt
lt/xsdcomplexTypegt
73Attribute groups
- Also attribute definitions can be grouped and
named
ltxsdelement nameitem gt ltxsdcomplexTypegt
ltxsdsequencegt lt/xsdsequencegt
ltxsdattributeGroup refItemDelivery /gt
lt/xsdcomplexTypegtlt/xsdelementgt ltxsdattributeGr
oup nameItemDeliverygt ltxsdattribute
namepartNum typeSKU /gt
lt/xsdattributeGroupgt
74Namespaces and XML Schema
- An XML Schema document contains declarations of
namespaces that are used in the document - e.g. xmlnsxsdhttp//www.w3.org/2001/XMLSchema
for the elements with special XML Schema
semantics - Target namespace these definitions included in
this schema give definition to this namespace - targetNamespaceurimywork
75Namespaces and XML Schema
- In XML Schema, schema components from different
target namespaces can be used together - -gt enables the schema validation of instance
content defined across multiple namespaces
76Importing schema declarations
- Every top-level schema component is associated
with a target namespace (or, explicitly, with
none, if the target namespace is not defined) - a component may refer to another component that
is in a different namespace, using an import
element
77Import
ltschema xmlnshttp//www.w3.org/2001/XMLSchema
xmlnshtmlhttp//www.w3.org/1999/x
html targetNamespaceurimywork
xmlnsmyurimyworkgt ltimport
namespacehttp//www.w3.org/1999/xhtmlgt ltcompl
exType namemyTypegt ltsequencegt
ltelement refhtmlp minOccurs0/gt
lt/sequencegt lt/complexTypegt ltelement
namemyElt typemymyType/gt lt/schemagt
78Type libraries
- As XML schemas become more widespread, schema
authors will want to create simple and complex
types that can be shared and used as the basic
building blocks for building new schemas - XML Schemas already provide types that play this
role the simple types - other examples currency, units of measurement,
business addresses
79Example currencies
ltschema targetNamespacehttp//www.example.com/Cu
rrency xmlnschttp//www.example
.com/Currency xmlnshttp//www.w3
.org/2000/08/XMLSchemagt ltcomplexType
nameCurrencygt ltsimpleContentgt
ltextension basedecimalgt ltattribute
namenamegt ltsimpleTypegt
ltrestriction basestringgt
ltenumeration valueAED/gt
ltenumeration valueAFA /gt ltenumeration
valueALL /gt
80Extending content models
- Mixed content models
- an element can contain, in addition to
subelements, also arbitrary character data - import
- an element can contain elements whose types are
imported from external namespaces - e.g. this element may contain an HTML p element
here - more flexible way
- any element, any attribute
81Example
ltpurchaseReport xmlnshttp//www.example.com/Rep
ortgt ltregionsgt lt!-- part sales by regions --gt
lt/regionsgt ltpartsgt lt!-- part descriptions --gt
lt/partsgt lthtmlExamplegt lttable
xmlnshttp//www.w3.org/1999/xhtml
border0 width100gt lttrgt ltth
alignleftgtZip Codelt/thgt ltth
alignleftgtPart Number lt/thgt ltth
alignleftgtQuantitylt/thgt lt/trgt
lttrgtlttdgt95819lt/tdgtlttdgt lt/tdgt lttdgt lt/tdgtlt/trgt
lttrgtlttdgt lt/tdgtlttdgt872-AAAlt/tdgtlttdgt1lt/tdgtlt/trgt
...
82Including an HTML table
- To permit the appearance of HTML in the instance
document we modify the report schema by declaring
the content of the element htmlExample by the any
element - in general, an any element specifies that any
well-formed XML is permissible in a types
content model - in the example, we require the XML to belong to
the namespace http//www.w3.org/1999/xhtml - -gt the XML should be XHTML
83Schema declaration with any
ltelement namepurchaseReportgt ltcomplexTypegt
ltsequencegt ltelement nameregions
typerRegionsType/gt ltelement
nameparts typerPartsType/gt ltelement
namehtmlExamplegt ltcomplexTypegt
ltsequencegt ltany
namespacehttp//www.w3.org/1999/xhtml
minOccurs1 maxOccursunbounded
processContentsskip/gt
lt/sequencegt ...
84Schema validation
- The attribute processContents
- skip no validation
- strict an XML processor is obliged to obtain the
schema associated with the required namespace and
validate the HTML appearing within the
HTMLExample element
85anyAttribute
ltelement namehtmlExamplegt ltcomplexTypegt
ltsequencegt ltany namespacehttp//w
ww.w3.org/1999/xhtml
minOccurs1 maxOccursunbounded
processContentsskip/gt
lt/sequencegt ltanyAttribute
namespacehttp//www.w3.org/1999/xhtml/gt
lt/complexTypegt lt/elementgt