CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226) - PowerPoint PPT Presentation

About This Presentation
Title:

CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

Description:

also declare this namespace with a prefix so that can refer to definitions within the schema ... an email element or zero or 1 phone elements but not both ... – PowerPoint PPT presentation

Number of Views:68
Avg rating:3.0/5.0
Slides: 32
Provided by: titan
Category:

less

Transcript and Presenter's Notes

Title: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)


1
CIS336Website design, implementation and
management(also Semester 2 of CIS219, CIS221 and
IT226)
  • Lecture 5
  • XML Schema
  • (Based on Møller and Schwartzbach, 2006,
    pp.113-159)

David Meredith d.meredith_at_gold.ac.uk www.titanmus
ic.com/teaching/cis336-2006-7.html
2
Problems with DTDs
  • DTDs cannot constrain character data
  • e.g., cannot specify that (PCDATA) must only be
    a valid integer representation
  • need more powerful datatype mechanism
  • Attribute types are too limited
  • e.g., cannot specify that an attribute value must
    be an integer, a URI etc.
  • Element and attribute definitions cannot depend
    on context
  • e.g., cannot specify that unit attribute only
    allowed if amount attribute is present
  • Character data cannot be combined with regular
    expression content model
  • i.e., mixed content always has form (PCDATA e1
    e2)
  • cannot specify order in which character data may
    be interspersed with elements
  • Element content model lacks "interleaving"
    operator that allows us to specify that an
    element may occur anywhere inside an element
  • e.g., cannot (easily) specify that comment
    element may occur anywhere in contents of recipe
    element

3
More problems with DTDs
  • DTD provides very limited support for modularity,
    reuse and evolution of schemas
  • hard to write, maintain and read large DTD
    schemas
  • ID/IDREF mechanism is too limited
  • sometimes want to specify a more restricted scope
    for an ID attribute than the whole instance
    document
  • also might want to use multiple attribute values
    or character data as keys rather than just single
    attribute value
  • DTDs do not support namespaces

4
XML Schema
  • DTDs defined as part of the XML 1.0 specification
    (February 1998)
  • inherited from SGML
  • Shortly afterwards, W3C initiated XML Schema
    project to deal with problems in DTDs
  • XML Schema Requirements (1999) specifies that XML
    Schema should be
  • more expressive than XML DTD
  • a well-formed XML language
  • self-describing
  • i.e., it should be possible to describe the
    syntax of XML Schema using an XML Schema (since
    XML Schema is an XML language)
  • simple enough to implement with modest design and
    runtime resources (which limits expressiveness)
  • XML Schema specification should be
  • defined quickly to prevent competing schema
    languages gaining a foothold
  • precise, concise, human-readable and illustrated
    with examples

5
XML Schema technical requirements
  • XML Schema should
  • contain mechanism for constraining use of
    namespaces
  • allow creation of user-defined datatypes for
    describing character data and attribute values
  • enable inheritance for element, attribute and
    datatype definitions
  • support evolution of schemas
  • permit embedded structured documentation within
    schemas

6
XML Schema recommendation
  • Official XML Schema specification published as
    W3C recommendation in 2001
  • in 2 parts
  • XML Schema Part 1 Structures
  • Describes core XML Schema including, for example,
    element and attribute declarations
  • Most recent version Second Edition, 28 October
    2004
  • Available online at
  • http//www.w3.org/TR/xmlschema-1/
  • XML Schema Part 2 Datatypes
  • Defines facilities for defining datatypes in XML
    Schema
  • Most recent version Second Edition, 28 October
    2004
  • Available online at
  • http//www.w3.org/TR/xmlschema-2/
  • Does not satisfy all original requirements
  • not simple
  • Partly remedied by XML Schema Part 0 Primer
  • Provides easily readable description of the XML
    Schema facilities
  • Most recent version 28 October 2004
  • Available online at
  • http//www.w3.org/TR/xmlschema-0/

7
XML Schema overview
  • Contains a sophisticated type system like those
    in common programming languages
  • Facilitates re-use and improves schema structure
  • Four central constructs in XML Schema all based
    on types and are as follows
  • Simple type definition
  • Defines a family of Unicode text strings
  • Describes text without markup
  • Complex type definition
  • Defines validity requirements for attributes,
    sub-elements and character data in an element of
    that type
  • Describes text which may contain markup
  • Element declaration
  • Associates element name with either a simple or
    complex type
  • Attribute declaration
  • Associates attribute name with simple type
  • Attribute values are always unstructured text

8
An example schema written in XML Schema
  • Schema at left shows
  • one element declaration
  • student
  • two attribute declarations
  • id, score
  • one complex type definition
  • StudentType
  • one simple type definition
  • Score
  • XML Schema elements identified by namespace
    http//www.w3.org/2001/XMLSchema
  • Namespace prefix ("xsd") is arbitrary but
    conventional
  • Root element in XML Schema document is named
    schema
  • usually contains targetNamespace attribute
  • defines namespace being defined by the schema
  • also declare this namespace with a prefix so that
    can refer to definitions within the schema
  • Definitions create new types declarations
    describe constituents of the instance document
  • Definitions and declarations populate the target
    namespace

9
Syntax for element and attribute declarations
  • Element declaration has form ltelement
    name"name" type"type"/gt
  • associates simple or complex type, type, with the
    element named name
  • Attribute declaration has form ltattribute
    name"name" type"type"/gt
  • associates simple type, type, with an attribute
    named name

10
Simple student instance document
  • Can avoid use of prefixes in attribute names

Can avoid use of
11
Business card example
  • Instance doc at top left in language defined at
    bottom left
  • Assume we own the domain businesscard.org
  • so no-one else uses this namespace
  • Can fix it so that no need for prefix in uri
    attribute
  • Compare DTD

12
Connecting instance documents and schemas
  • Instance document can refer to a schema using
    schemaLocation attribute from the namespace,
    http//www.w3.org/2001/XMLSchema-instance
  • Value of schemaLocation attribute has two parts,
    separated by whitespace
  • target namespace of schema
  • URI of schema document
  • schemaLocation indicates that document is
    supposed to be valid with respect to the schema
  • schemaLocation attributes may appear in any
    element
  • usually appear in root element
  • can also appear in another element to indicate
    that the schema applies to the subtree under that
    element
  • means XML languages can be combined at will
  • schemaLocation attribute value is actually
    sequence of "namespace URI" pairs
  • if more than one pair, all schemas apply
    independently

13
More on schemaLocation
  • All attributes defined in http//www.w3.org/2001/X
    MLSchema-instanceimplicitly declared for all
    elements in instance document
  • schemaLocation attributes are optional
  • make instance documents self-describing
  • Applications require documents to be valid
    relative to schemas decided by application
    developers, not schemas decided by document
    authors
  • XMLSchema does not directly enforce a particular
    root element
  • e.g., an XMLSchema definition of XHTML cannot
    express that the root element must be html
  • means that application must check root element as
    well as carrying out XML validation

14
Simple types
  • Simple type or datatype is set of Unicode strings
    with a particular semantic interpretation
  • e.g., decimal datatype is built-in XML Schema
    datatype which consists of all strings that
    represent decimal numbers (e.g., 3.1415)
  • 3.1415 is equal to 3.141500
  • 42 is less than 117
  • XML Schema contains some primitive simple types
    with pre-defined meanings
  • XML Schema also provides various mechanisms for
    deriving new types from existing ones

15
Simple Types (Datatypes) Primitive
  • string any Unicode string
  • boolean true, false, 1, 0
  • decimal 3.1415
  • float 6.02214199E23
  • double 42E970
  • dateTime 2004-09-26T162900-0500
  • time 162900-0500
  • date 2004-09-26
  • hexBinary 48656c6c6f0a
  • base64Binary SGVsbG8K
  • anyURI http//www.brics.dk/ixwt/
  • QName rcprecipe, recipe
  • ...

16
Some built-in derived simple types
  • normalizedString
  • as string but whitespace facet is replace
  • token
  • as string but whitespace facet is collapse
  • language
  • "en", "da", "en-US", etc.
  • NMTOKEN
  • e.g., "42", "my.form", "r103"
  • NMTOKENS
  • e.g., "42 my.form r103"
  • nonPositiveInteger
  • e.g., "-87", "0"

17
A simple type element declaration
  • ltelement name"serialnumber" type"nonNegativeInte
    ger"/gt
  • assigns built-in primitive simple type,
    nonNegativeInteger, to elements named
    serialnumber
  • contents of a serialnumber element must match
    nonNegativeInteger (possibly with surrounding
    whitespace)
  • serialnumber element cannot contain child
    elements or attributes

18
Deriving new simple types by restriction
  • Restriction of a simple type defines a new type
    by restricting possible values of a base type
  • restriction performed on facets of base type (see
    table above left)
  • restriction may contain multiple constraining
    facets
  • Facet restrictions operate at semantic not
    syntactic level
  • e.g., lttotalDigits value"3"/gt allows 123, 0123
    and 0123.0 but not 1234 and 123.05

19
Deriving new simple types by restriction
  • enumeration facet restricts values to a finite
    set of possibilities (see above left)
  • pattern facet allows values to be constrained to
    satisfy regular expressions (see above right)
  • symbols that have a special meaning within
    regular expressions can be escaped by prefixing
    with a backslash (e.g., \)
  • For most facets, restrictions may be changed in
    further derivations unless fixed"true" attribute
    is added to constraining facet

20
Deriving simple types using list and union
  • Use the list element inside a simpleType
    definition to define a whitespace separated
    string of values of a particular type (see above
    left)
  • e.g., "23 4 56 -7" is of type integerlist
  • Use union element inside a simpleType definition
    to specify that a value must be one of two or
    more types
  • e.g., "true" and "1.3" are both of type
    boolean_or_decimal

21
Complex types
  • An element declaration may assign a complex type
    to an element nameltelement name"card"
    type"bcard_type"/gt
  • means that elements with the name card must
    satisfy all the requirements specified in the
    definition of the type card_type
  • complex type definition may specify attributes,
    child element types and ordering and character
    data
  • Complex type defined using XML Schema element,
    complexType
  • content of complexType element can be either
    complex or simple

22
Element reference
  • Element reference takes the formltelement
    ref"name" /gt
  • name is the name of an element that has already
    been declared
  • Note difference between element element with name
    attribute and one with a ref attribute!

23
sequence element
  • Concatenation within the content of an element
    with a complex content model is expressed using
    the sequence element

24
choice element
  • Union (i.e., the '' operator in a regular
    expression) corresponds to the choice element
  • At left, each card element contains either an
    email element or zero or 1 phone elements but not
    both

25
all element
  • A content sequence matches an all expression if
    each constituent of the expression is matched
    somewhere in the content model and every element
    in the content model is matched by a constituent
    in the expression
  • Essentially variant of sequence in which order
    does not matter

26
any element
  • any empty element is a wildcard that matches any
    element
  • Attribute namespace limits matching elements in
    various ways
  • whitespace separated list of URIs
  • targetNamespace
  • local
  • empty namespace
  • any
  • other
  • any namespace except targetNamespace

27
any element
  • Can be used to specify that a different language
    is used inside an element
  • e.g., XHTML used inside the info element in
    WidgetML (see above)
  • content must consist of one or more elements from
    the XHTML namespace

28
Some restrictions
  • all element may only contain element references
  • sequence and choice elements cannot contain all
    elements
  • complexType contents cannot consist of single
    element or any declaration
  • need to wrap it in a sequence or choice element

29
Attribute references
  • A complex type may optionally contain a number of
    attribute references of the formltattribute
    ref"name" /gt
  • name is the name of the attribute that has been
    declared elsewhere
  • attribute reference must appear after the content
    model description of a complex type
  • attribute reference can contain an attribute
    named use which can take the values optional
    (default) or required

30
minOccurs and maxOccurs
  • minOccurs and maxOccurs attributes can be used
    with
  • element, sequence, choice, all and any elements
  • define possible cardinalities of the element
  • values must be non-negative integers or, for
    maxOccurs, unbounded
  • by default, minOccurs and maxOccurs are 1

31
mixed attribute
  • complexType may optionally have an attribute,
    mixed"true"
  • means arbitrary character data is permitted
    anywhere in the content in addition to the
    elements declared in the content model
  • Without mixed"true" attribute, only whitespace
    allowed between elements in content model
  • Character data cannot be constrained if we also
    want to allow elements in the content
Write a Comment
User Comments (0)
About PowerShow.com