Title: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)
1CIS336Website design, implementation and
management(also Semester 2 of CIS219, CIS221 and
IT226)
- Lecture 5
- XML Schema
- (Based on Møller and Schwartzbach, 2006,
pp.113-159)
David Meredith d.meredith_at_gold.ac.uk www.titanmus
ic.com/teaching/cis336-2006-7.html
2Problems with DTDs
- DTDs cannot constrain character data
- e.g., cannot specify that (PCDATA) must only be
a valid integer representation - need more powerful datatype mechanism
- Attribute types are too limited
- e.g., cannot specify that an attribute value must
be an integer, a URI etc. - Element and attribute definitions cannot depend
on context - e.g., cannot specify that unit attribute only
allowed if amount attribute is present - Character data cannot be combined with regular
expression content model - i.e., mixed content always has form (PCDATA e1
e2) - cannot specify order in which character data may
be interspersed with elements - Element content model lacks "interleaving"
operator that allows us to specify that an
element may occur anywhere inside an element - e.g., cannot (easily) specify that comment
element may occur anywhere in contents of recipe
element
3More problems with DTDs
- DTD provides very limited support for modularity,
reuse and evolution of schemas - hard to write, maintain and read large DTD
schemas - ID/IDREF mechanism is too limited
- sometimes want to specify a more restricted scope
for an ID attribute than the whole instance
document - also might want to use multiple attribute values
or character data as keys rather than just single
attribute value - DTDs do not support namespaces
4XML Schema
- DTDs defined as part of the XML 1.0 specification
(February 1998) - inherited from SGML
- Shortly afterwards, W3C initiated XML Schema
project to deal with problems in DTDs - XML Schema Requirements (1999) specifies that XML
Schema should be - more expressive than XML DTD
- a well-formed XML language
- self-describing
- i.e., it should be possible to describe the
syntax of XML Schema using an XML Schema (since
XML Schema is an XML language) - simple enough to implement with modest design and
runtime resources (which limits expressiveness) - XML Schema specification should be
- defined quickly to prevent competing schema
languages gaining a foothold - precise, concise, human-readable and illustrated
with examples
5XML Schema technical requirements
- XML Schema should
- contain mechanism for constraining use of
namespaces - allow creation of user-defined datatypes for
describing character data and attribute values - enable inheritance for element, attribute and
datatype definitions - support evolution of schemas
- permit embedded structured documentation within
schemas
6XML Schema recommendation
- Official XML Schema specification published as
W3C recommendation in 2001 - in 2 parts
- XML Schema Part 1 Structures
- Describes core XML Schema including, for example,
element and attribute declarations - Most recent version Second Edition, 28 October
2004 - Available online at
- http//www.w3.org/TR/xmlschema-1/
- XML Schema Part 2 Datatypes
- Defines facilities for defining datatypes in XML
Schema - Most recent version Second Edition, 28 October
2004 - Available online at
- http//www.w3.org/TR/xmlschema-2/
- Does not satisfy all original requirements
- not simple
- Partly remedied by XML Schema Part 0 Primer
- Provides easily readable description of the XML
Schema facilities - Most recent version 28 October 2004
- Available online at
- http//www.w3.org/TR/xmlschema-0/
7XML Schema overview
- Contains a sophisticated type system like those
in common programming languages - Facilitates re-use and improves schema structure
- Four central constructs in XML Schema all based
on types and are as follows - Simple type definition
- Defines a family of Unicode text strings
- Describes text without markup
- Complex type definition
- Defines validity requirements for attributes,
sub-elements and character data in an element of
that type - Describes text which may contain markup
- Element declaration
- Associates element name with either a simple or
complex type - Attribute declaration
- Associates attribute name with simple type
- Attribute values are always unstructured text
8An example schema written in XML Schema
- Schema at left shows
- one element declaration
- student
- two attribute declarations
- id, score
- one complex type definition
- StudentType
- one simple type definition
- Score
- XML Schema elements identified by namespace
http//www.w3.org/2001/XMLSchema - Namespace prefix ("xsd") is arbitrary but
conventional - Root element in XML Schema document is named
schema - usually contains targetNamespace attribute
- defines namespace being defined by the schema
- also declare this namespace with a prefix so that
can refer to definitions within the schema - Definitions create new types declarations
describe constituents of the instance document - Definitions and declarations populate the target
namespace
9Syntax for element and attribute declarations
- Element declaration has form ltelement
name"name" type"type"/gt - associates simple or complex type, type, with the
element named name - Attribute declaration has form ltattribute
name"name" type"type"/gt - associates simple type, type, with an attribute
named name
10Simple student instance document
- Can avoid use of prefixes in attribute names
Can avoid use of
11Business card example
- Instance doc at top left in language defined at
bottom left - Assume we own the domain businesscard.org
- so no-one else uses this namespace
- Can fix it so that no need for prefix in uri
attribute - Compare DTD
12Connecting instance documents and schemas
- Instance document can refer to a schema using
schemaLocation attribute from the namespace,
http//www.w3.org/2001/XMLSchema-instance - Value of schemaLocation attribute has two parts,
separated by whitespace - target namespace of schema
- URI of schema document
- schemaLocation indicates that document is
supposed to be valid with respect to the schema - schemaLocation attributes may appear in any
element - usually appear in root element
- can also appear in another element to indicate
that the schema applies to the subtree under that
element - means XML languages can be combined at will
- schemaLocation attribute value is actually
sequence of "namespace URI" pairs - if more than one pair, all schemas apply
independently
13More on schemaLocation
- All attributes defined in http//www.w3.org/2001/X
MLSchema-instanceimplicitly declared for all
elements in instance document - schemaLocation attributes are optional
- make instance documents self-describing
- Applications require documents to be valid
relative to schemas decided by application
developers, not schemas decided by document
authors - XMLSchema does not directly enforce a particular
root element - e.g., an XMLSchema definition of XHTML cannot
express that the root element must be html - means that application must check root element as
well as carrying out XML validation
14Simple types
- Simple type or datatype is set of Unicode strings
with a particular semantic interpretation - e.g., decimal datatype is built-in XML Schema
datatype which consists of all strings that
represent decimal numbers (e.g., 3.1415) - 3.1415 is equal to 3.141500
- 42 is less than 117
- XML Schema contains some primitive simple types
with pre-defined meanings - XML Schema also provides various mechanisms for
deriving new types from existing ones
15Simple Types (Datatypes) Primitive
- string any Unicode string
- boolean true, false, 1, 0
- decimal 3.1415
- float 6.02214199E23
- double 42E970
- dateTime 2004-09-26T162900-0500
- time 162900-0500
- date 2004-09-26
- hexBinary 48656c6c6f0a
- base64Binary SGVsbG8K
- anyURI http//www.brics.dk/ixwt/
- QName rcprecipe, recipe
- ...
16Some built-in derived simple types
- normalizedString
- as string but whitespace facet is replace
- token
- as string but whitespace facet is collapse
- language
- "en", "da", "en-US", etc.
- NMTOKEN
- e.g., "42", "my.form", "r103"
- NMTOKENS
- e.g., "42 my.form r103"
- nonPositiveInteger
- e.g., "-87", "0"
17A simple type element declaration
- ltelement name"serialnumber" type"nonNegativeInte
ger"/gt - assigns built-in primitive simple type,
nonNegativeInteger, to elements named
serialnumber - contents of a serialnumber element must match
nonNegativeInteger (possibly with surrounding
whitespace) - serialnumber element cannot contain child
elements or attributes
18Deriving new simple types by restriction
- Restriction of a simple type defines a new type
by restricting possible values of a base type - restriction performed on facets of base type (see
table above left) - restriction may contain multiple constraining
facets - Facet restrictions operate at semantic not
syntactic level - e.g., lttotalDigits value"3"/gt allows 123, 0123
and 0123.0 but not 1234 and 123.05
19Deriving new simple types by restriction
- enumeration facet restricts values to a finite
set of possibilities (see above left) - pattern facet allows values to be constrained to
satisfy regular expressions (see above right) - symbols that have a special meaning within
regular expressions can be escaped by prefixing
with a backslash (e.g., \) - For most facets, restrictions may be changed in
further derivations unless fixed"true" attribute
is added to constraining facet
20Deriving simple types using list and union
- Use the list element inside a simpleType
definition to define a whitespace separated
string of values of a particular type (see above
left) - e.g., "23 4 56 -7" is of type integerlist
- Use union element inside a simpleType definition
to specify that a value must be one of two or
more types - e.g., "true" and "1.3" are both of type
boolean_or_decimal
21Complex types
- An element declaration may assign a complex type
to an element nameltelement name"card"
type"bcard_type"/gt - means that elements with the name card must
satisfy all the requirements specified in the
definition of the type card_type - complex type definition may specify attributes,
child element types and ordering and character
data - Complex type defined using XML Schema element,
complexType - content of complexType element can be either
complex or simple
22Element reference
- Element reference takes the formltelement
ref"name" /gt - name is the name of an element that has already
been declared - Note difference between element element with name
attribute and one with a ref attribute!
23sequence element
- Concatenation within the content of an element
with a complex content model is expressed using
the sequence element
24choice element
- Union (i.e., the '' operator in a regular
expression) corresponds to the choice element - At left, each card element contains either an
email element or zero or 1 phone elements but not
both
25all element
- A content sequence matches an all expression if
each constituent of the expression is matched
somewhere in the content model and every element
in the content model is matched by a constituent
in the expression - Essentially variant of sequence in which order
does not matter
26any element
- any empty element is a wildcard that matches any
element - Attribute namespace limits matching elements in
various ways - whitespace separated list of URIs
- targetNamespace
- local
- empty namespace
- any
- other
- any namespace except targetNamespace
27any element
- Can be used to specify that a different language
is used inside an element - e.g., XHTML used inside the info element in
WidgetML (see above) - content must consist of one or more elements from
the XHTML namespace
28Some restrictions
- all element may only contain element references
- sequence and choice elements cannot contain all
elements - complexType contents cannot consist of single
element or any declaration - need to wrap it in a sequence or choice element
29Attribute references
- A complex type may optionally contain a number of
attribute references of the formltattribute
ref"name" /gt - name is the name of the attribute that has been
declared elsewhere - attribute reference must appear after the content
model description of a complex type - attribute reference can contain an attribute
named use which can take the values optional
(default) or required
30minOccurs and maxOccurs
- minOccurs and maxOccurs attributes can be used
with - element, sequence, choice, all and any elements
- define possible cardinalities of the element
- values must be non-negative integers or, for
maxOccurs, unbounded - by default, minOccurs and maxOccurs are 1
31mixed attribute
- complexType may optionally have an attribute,
mixed"true" - means arbitrary character data is permitted
anywhere in the content in addition to the
elements declared in the content model - Without mixed"true" attribute, only whitespace
allowed between elements in content model - Character data cannot be constrained if we also
want to allow elements in the content