Title: Module 3 XML Schema
1Module 3XML Schema
2Recapitulation (Module 2)
- XML as inheriting from the Web history
- SGML, HTML, XHTML, XML
- XML key concepts
- Documents, elements, attributes, text
- Order, nested structure, textual information
- Namespaces
- XML usage scenarios
- Financial, medical, Web Services, blogs, etc
- communication data, meta data, documents
- DTDs and the need for describing the structure
of an XML file - Next XML Schemas
3Limitations of DTDs
- DTDs describe only the grammar of the XML file,
not the detailed structure and/or types - This grammatical description has some obvious
shortcomings - we cannot express that a length element must
contain a non-negative number (constraints on the
type of the value of an element or attribute) - The unit element should only be allowed when
amount is present (co-occurrence constraints) - the comment element should be allowed to appear
anywhere (schema flexibility) - There is no subtyping / inheritance (reuse of
definitions) - There are no composite keys (referential
integrity)
4Overview XML Schema
- ComplexTypes and SimpleTypes
- ComplexType correspond to Records
- string is an example of a SimpleType
- Built-in and user-defined Types
- ComplexTypes are always user-defined
- Elements have complexTypes or simpleTypes
Attributes have simpleTypes - Type of Root element of a document is global
- (almost) downward comptable with DTDs
- Schemas are XML Documents (Syntax)
- Namespaces etc. are part of XML Schemas
5Example Schema
- lt?xml version1.0 ?gt
- ltxsdschema xmlnsxsdhttp//w3.org/2001/XMLSchem
agt - ltxsdelement namebook typeBookType/gtltxsd
complexType nameBookTypegt - ltxsdsequencegt
- ltxsdelement nametitle
typexsdstring/gt - ltxsdelement nameauthor typePersonType m
inOccurs1 maxOccursunbounded/gt - ltxsdcomplexType namePersonTypegt
ltxsdsequencegt ... ltxsdsequencegt - lt/xsdcomplexTypegt
- ltxsdelement namepublisher
typexsdanyType/gt - lt/xsdsequencegt
- lt/xsdcomplexTypegt
- lt/xsdschemagt
6Example Schema
- lt?xml version1.0 ?gt
- ltxsdschema xmlnsxsdhttp//w3.org/2001/XMLSchem
agt - ...
- lt/xsdschemagt
- Schema in a separate XML Document
- Vocabulary of Schema defined in special
Namespace. Prefix xsd is commonly used - There is a Schema for Schemas (dont worry!)
- schema Element is always the Root
7Example Schema
- ltxsdelement namebook typeBookType/gt
- element Element in order to declare elements
- name defines the name of the element.
- type defines the type of the element
- Declarations under schema are global
- Global element declarations are potential roots
- Example book is the only global element, root
element of a valid document must be a book. - The type of a book is BookType (defined next).
8Example Schema
- ltxsdcomplexType nameBookTypegt
- ltxsdsequencegt
- ...
- lt/xsdsequencegt
- lt/xsdcomplexTypegt
- User-defined complex type
- Defines a sequence of sub-elements
- Attribute name specifies name of Type
- This Typedefinition is global.Type can be used
in any other definition.
9Example Schema
- ltxsdsequencegt
- ltxsdelement nametitle
typexsdstring/gt - lt/xsdsequencegt
-
- Local element declaration within a complex type
- (title cannot be root element of documents)
- name and type as before
- xsdstring is built-in type of XML Schema
10Example Schema
- ltxsdelement nameauthor typePersonType m
inOccurs1 maxOccursunbounded/gt -
- Local element declaration
- PersonType is user-defined type
- minOccurs, maxOccurs specify cardinality of
author Elements in BookType. - Default minOccurs1, maxOccurs1
11Example Schema
- ltxsdcomplexType namePersonTypegt ltxsdsequenc
egt - ltxsdelement namefirst typexsdstring/gt
- ltxsdelement namelast typexsdstring/gt
- ltxsdsequencegtlt/xsdcomplexTypegt
- Local type definition
- PersonType may only be used inside the scope of
the definition of BookType. - The same syntax as for BookType.
12Example Schema
- ltxsdelement namepublisher
typexsdanyType/gt -
- Local element declaratation
- Every book has exactly one publisher
minOccurs, maxOccurs by default 1 - anyType is built-in Type
- anyType allows any content
- anyType is default type. Equivalent
definition - ltxsdelement namepublisher /gt
13Example Schema
- lt?xml version1.0 ?gt
- ltxsdschema xmlnsxsdhttp//w3.org/2001/XMLSchem
agt - ltxsdelement namebook typeBookType/gtltxsd
complexType nameBookTypegt - ltxsdsequencegt
- ltxsdelement nametitle
typexsdstring/gt - ltxsdelement nameauthor typePersonType m
inOccurs1 maxOccursunbounded/gt - ltxsdcomplexType namePersonTypegt
- ltxsdsequencegt ... ltxsdsequencegt
- lt/xsdcomplexTypegt
- ltxsdelement namepublisher
typexsdanyType/gt - lt/xsdsequencegt
- lt/xsdcomplexTypegt
- lt/xsdschemagt
14Valid Document
- lt?xml version1.0gt
- ltbookgt
- lttitlegtDie Wilde Wutzlt/titlegt
- ltauthorgtltfirstgtD.lt/firstgt
- ltlastgtK.lt/lastgtlt/authorgt
- ltpublishergt Addison Wesley,
- ltstategtCAlt/stategt, USA
- lt/publishergt
- lt/bookgt
15Valid Document
Root is book
- lt?xml version1.0gt
- ltbookgt
- lttitlegtDie Wilde Wutzlt/titlegt
- ltauthorgtltfirstgtD.lt/firstgt
- ltlastgtK.lt/lastgtlt/authorgt
- ltpublishergt Addison Wesley,
- ltstategtCAlt/stategt, USA
- lt/publishergt
- lt/bookgt
16Valid Document
- lt?xml version1.0gt
- ltbookgt
- lttitlegtDie Wilde Wutzlt/titlegt
- ltauthorgtltfirstgtD.lt/firstgt
- ltlastgtK.lt/lastgtlt/authorgt
- ltpublishergt Addison Wesley,
- ltstategtCAlt/stategt, USA
- lt/publishergt
- lt/bookgt
Exactly one title of Type string
17Valid Document
- lt?xml version1.0gt
- ltbookgt
- lttitlegtDie Wilde Wutzlt/titlegt
- ltauthorgtltfirstgtD.lt/firstgt
- ltlastgtK.lt/lastgtlt/authorgt
- ltpublishergt Addison Wesley,
- ltstategtCAlt/stategt, USA
- lt/publishergt
- lt/bookgt
Subelements In right order
At least one author of Type PersonType
One publisherwith arbitrary content.
18Schema Validation
- Conformance Test
- Result true or false
- Infoset Contribution
- Annotate Types
- Set Default Values
- Result new instance of the data model
- Tools Xerces (Apache)
- Theory Graph Simulation Algorithms
- Validation is a-posteri explicit - not implicit!
19Global vs. Local Declarations
- Instances of global element declarations are
potential root elements of documents - Global declarations can be referencedltxsdschema
xmlnsxsd...gt ltxsdelement namebook
typeBookType/gt ltxsdelement namecomment
typexsdstring/gt ltxsdComplexType
nameBookTypegt ... ltxsdelement
refcomment minOccurs0/gt... - Constraints
- ref not allowed in global declarations
- No minOccurs, maxOccurs in global Decl.
20Attribute Declarations
- Attributes may only have a SimpleType
- SimpleTypes are, e.g., string (more later)
- Attribute declarations can be global
- Reuse declarations with ref
- Compatible to Attribute lists in DTDs
- Default values possible
- Required and optional attributes
- Fixed attributes
- (In addition, there are prohibited attributes)
21Attribute Declarations
- ltxsdcomplexType nameBookTypegt
- ltxsdsequencegt ... lt/xsdsequencegt
- ltxsdattribute nameisbn typexsdstring
- userequired /gt
- ltxsdattribute nameprice typexsddecimal
- useoptional /gt
- ltxsdattribute namecurr typexsdstring
- fixedEUR /gt
- ltxsdattribute nameindex typexsdidrefs
- default /gt
- lt/xsdcomplexTypegt
22Anonymous Types
- PersonType need not be named
- ltxsdcomplexType nameBookTypegt
- ...
- ltxsdelement nameauthorgt
- ltxsdcomplexTypegt
- ltxsdsequencegt
- ltxsdelement namefirst
typexsdstring/gt - ltxsdelement namelast
typexsdstring/gt - lt/xsdsequencegt lt/xsdcomplexTypegt
- lt/xsdelementgt ...
23Simple Elements Attributes
- ltxsdelement namepricegt
- ltxsdcomplexTypegt
- ltxsdsimpleContentgt
- ltxsdextension base xsddecimal gt
- ltxsdattribute namecurr
typexsdstring/gt - lt/xsdextensiongt
- lt/xsdsimpleContentgt
- lt/xsdcomplexTypegt lt/xsdelementgt
- Valid Instance ltprice currUSD gt69.95lt/pricegt
24Element Attributes, no Content
- ltxsdelement namepricegt
- ltxsdcomplexTypegt
- ltxsdattribute namecurr
typexsdstring/gt - ltxsdattribute nameval typexsddecimal/
gt - lt/xsdcomplexTypegt
- lt/xsdelementgt
- Valid Instance
- ltprice currUSD val69.95 /gt
25Pre-defined SimpleTypes
- Numeric Values
- Integer, Short, Decimal, Float, Double,
HexBinary, ... - Date, Timestamps, Periods
- Duration, DateTime, Time, Date, gMonth, ...
- Strings
- String, NMTOKEN, NMTOKENS, NormalizedString
- Others
- Qname, AnyURI, ID, IDREFS, Language, Entity, ...
- In summary, 44 pre-defined simple types
- Question How many does SQL have?
26Derived SimpleTypes
- Restrict domain
- ltxsdsimpleType nameMyIntegergt
- ltxsdrestriction basexsdintegergt
- ltxsdminInclusive value10000/gt
- ltxsdmaxInclusive value99999/gt
- lt/xsdrestrictiongt
- lt/xsdsimpleTypegt
- minInclusive, maxInclusive are Facets
27Derived SimpleTypes
- Restriction by Pattern Matching
- Currencies have three capital letters
- ltxsdsimpleType nameCurrencygt
- ltxsdrestriction basexsdstring gt
- ltxsdpattern valueA-Z3/gt
- lt/xsdrestrictiongt
- lt/xsdsimpleTypegt
28Derived SimpleTypes
- Restriction by Enumeration
- ltxsdsimpleType nameCurrencygt
- ltxsdrestriction basexsdstring gt
- ltxsdenumeration valueATS/gt
- ltxsdenumeration valueEUR/gt
- ltxsdenumeration valueGBP/gt
- ltxsdenumeration valueUSD/gt
- lt/xsdrestrictiongt
- lt/xsdsimpleTypegt
29Derived SimpleTypes
- There are 15 different kinds of Facets
- e.g., minExclusive, totalDigits, ...
- Most built-in types are derived from other
built-in types by restriction - e.g., Integer is derived from Decimal
- there are only 19 base types (out of 44)
- Ref Appendix B of XML Schema Primer
30List Types
- SimpleType for Lists
- Built-in List Types IDREFS, NMTOKENS
- User-defined List Types
- ltxsdsimpleType name intList gt
- ltxsdlist itemType xsdinteger /gt
- lt/xsdsimpleTypegt
- Items in instances are separed by whitespace
- 5 -10 7 -20
- Facets for Restrictions
- length, minLength, maxLength, enumeration
31Facets of List Types
- ltxsdsimpleType name Participants gt
- ltxsdlist itemType xsdstring /gt
- ltxsdsimpleTypegt
- ltxsdsimpleType name Medalists gt
- ltxsdrestriction base Participants gt
- ltxsdlength value 3 /gt
- lt/xsdrestrictiongt
- lt/xsdsimpleTypegt
32Union Types
- Corresponds to the in DTDs
- (Variant Records in Pascal or Union in C)
- Valid instances are valid to any of the types
- ltxsdsimpleType name Potpurri gt
- ltxsdunion memberTypes xsdstring
intList/gt - lt/xsdsimpleTypegt
- Valid Instanzes
- fünfzig 1 3 17 wunderbar 15
- Supported Facets
- pattern, enumeration
33Choice Union in ComplexTypes
- A book has either an author or an editor
- ltxsdcomplexType name Book gt ltxsdsequencegt
- ltxsdchoicegt
- ltxsdelement name author type
Person - maxOccurs
unbounded /gt - ltxsdelement name editor type
Person /gt - lt/xsdchoicegt
- lt/xsdsequencegt lt/xsdcomplexTypegt
34Element Groups
- If the book has an editor, then the book also
has a sponsor - ltxsdcomplexType name Book gt ltxsdsequencegt
- ltxsdchoicegt
- ltxsdelement name Author type Person
.../gt - ltxsdgroup ref EditorSponsor /gt
- lt/xsdchoicegt lt/xsdsequencegt lt/xsdcomplexTypegt
- ltxsdgroup name EditorSponsor gt
ltxsdsequencegt ltxsdelement name Editor
typePerson /gt - ltxsdelement name Sponsor type Org /gt
- lt/xsdsequencegt lt/xsdgroupgt
-
35Optional Element Groups
- All or nothing unordered content
- PubInfo has name, year, city or gar
nothing - ltxsdcomplexType name PubInfo gt
ltxsdsequencegt - ltxsdallgt
- ltxsdelement name name type
xsdstring/gt - ltxsdelement name year type xsdstring
/gt - ltxsdelement name ort type xsdstring
/gt - lt/xsdallgt lt!-- Attributdeklarationen --gt
- lt/xsdsequencegt lt/xsdcomplexTypegt
- No other element declarations allowed!!!
- maxOccurs must be 1
36Attribute Groups
- ltxsdattributeGroup name PriceInfo gt
- ltxsdattribute name curr type
xsdstring /gt - ltxsdattribute name val type
xsddecimal /gt - lt/xsdattributeGroupgt
- ltxsdcomplexType name Book gt
- ...
- ltxsdattributeGroup ref PriceInfo /gt
- lt/xsdcomplexTypegt
37Definition of Keys
- Keys are defined as part of elements
- Special sub-element key
- selector describes the context (e.g., emp in a
DB) - field describes the key within context
- several fields composite keys
- selector and fields are XPath expressions
- Validation
- Eval selector -gt Sequence of Nodes
- Eval fields on Nodes -gt Set of tuples
- Check that there are no duplicates in Set of
tuples
38Definition of Keys
- isbn is key of books in bib
- ltelement name bibgt ltcomplexTypegt ltsequencegt
- ltelement book maxOccurs unboundedgt
ltcomplexTypegt ltsequencegt ... lt/sequencegt - ltattribute name isbn type
string /gt - lt/complexTypegt lt/elementgt lt/sequencegt
- ltkey name constraintX gt
- ltselector xpath book /gt
ltfield xpath _at_isbn /gt - lt/keygt
- lt/complexTypegt lt/elementgt
39References (foreign keys)
- Part of the definition of an element
- Concept selector and field(s)
- selector determines context of the foreign keys
- field(s) specify the foreign key
- refer gives the scope of the references (key
constr.) - Syntax e.g., books referencing other books
- ltkeyref name constraintY refer
constraintX gt - ltselector xpath book/references /gt
- ltfield xpath _at_isbn /gt
- lt/keyrefgt
40UNIQUE Constraints
- Same concept as in SQL
- uniqueness, but no referentiability
- Syntax and concept almost the same as for keys
- ltunique name constraintZgt
- ltselector xpath book /gt
- ltfield xpath title /gt
- lt/uniquegt
- Part of the definition of an element
41Null Values
- not there vs. unknown (i.e., null)
- empty vs. unknown
- Concept Attribute nilwith value true
- Only works for elements
- Schema definition NULL ALLOWED
- ltxsdelement name publisher type PubInfo
- nillable true /gt
- Valid Instance with content unknown
- ltpublisher xsinil true /gt
- xsi Namespace for predefined Instances
- Publisher may have other attributes, but content
must be empty!
42Derived Complex Types
- Two concepts of subtyping / inheritance
- Subtyping via Extension
- Add Elementen
- Similar to inheritance in OO
- Subtyping via Restriction
- e.g., constrain domains of types used
- substituitability is preserved
- Further features
- Constrain Sub-typisierung (final)
- Abstracte Types
43Subtyping via Extension
- A book is a publikation
- ltxsdcomplexType name Publicationgt
ltxsdsequencegt - ltxsdelement name title type
xsdstring /gt - ltxsdelement name year type
xsdinteger /gt - lt/xsdsequencegt lt/xsdcomplexTypegt
-
- ltxsdcomplexType name Bookgt
ltxsdcomplexContentgt - ltxsdextension base Publication gt
ltxsdsequencegt - ltxsdelement name author type
Person /gt - lt/xsdsequencegt lt/xsdextensiongt
- lt/xsdcomplexContentgt lt/xsdcomplexTypegt
44Subtyping by Extension
- A bib contains Publications
- ltxsdelement name bib gt ltxsdsequencegt
- ltxsdelement name pub type Publication
- maxOccurs unbounded/gt
- lt/xsdsequencegt lt/xsdelementgt
- pub Elements may be books!
- Instanzes have xsitype Attribute
- ltbibgt ltpub xsitype Bookgt
- lttitlegtWilde Wutzlt/titlegtltyeargt1984lt/yeargt
- ltauthorgtD.A.K.lt/authorgt lt/pubgt
- lt/bibgt
45Subtyping via Restriction
- The following restrictions are allowed
- Instances of subtypes have default values
- Instances of subtypes are fixed (i.e., constant)
- Instances of subtypes have stronger types (e.g.,
string vs. anyType) - Instances of subtypes have mandatory fields which
optional in supertype - Supertype.minOccurs lt Subtype.minOccursSupertype
.maxOccurs gt Subtype.maxOccurs
46Subtyping via Restriction
- ltcomplexType name superTypegt ltsequencegt
- ltelement name a type string minOccurs
0 /gt - ltelement name b type anyType /gt
- ltelement name c type decimal /gt
- lt/sequencegt ltcomplexTypegt
- ltcomplexType name subTypegt ltcomplexContentgt
- ltrestriction base superTypegt ltsequencegt
- ltelement name a type string
minOccurs 0
maxOccurs 0 /gt - ltelement name b type string /gt
- ltelement name c type decimal /gt
- lt/sequencegt lt/restrictiongt
- lt/complexContentgt lt/complexTypegt
47Substitution Groups
- Elements, which substitute global elem.
- E.g., editor is a personltelement name
person type string /gtltcomplexType name
Book gt ltsequencegt ltelement ref person /gt
...lt/sequencegt lt/complexTypegtltelement name
author type string
substitutionGroup person /gtltelement name
editor type string
substitutionGroup person /gt
48Abstract Elements and Types
- No instances exist
- Only instances of subtypes of substitions exist
- person in Book must be an author or editor
ltelement name person type string
abstract true /gtltcomplexType name
Book gt ltsequencegt ltelement ref person
/gt ...lt/sequencegt lt/complexTypegt...
49Constrain Subtyping
- Corresponds to final in Java
- XML Schema is more clever!(?)
- Constrain the kind of subtyping (extension,
restriction, all) - Constrain the facets used
- ltsimpleType name ZipCode gt
- ltrestriction base stringgt
- ltlength value 5 fixed true /gt
- lt/restrictiongt ltsimpleTypegt
- ltcomplexType name Book final restriction
gt - ... lt/complexTypegt
- You may subtype ZipCode. But all subtypes have
length 5.
50Constrain Substituability
- ltcomplexType name Book block all gt
- ... lt/complexTypegt
- It is possible to define subtypes of Book
- So, it is possible to reuse structe of Book
- But instances of subtypes of Book are NOT books
themselves. - (Now, things get really strange!)
51Namespaces and XML Schema
- Declare the Namespace of Elements?
- TargetNamespace for Global Elements
- qualifies names of root elements
- elementFormDefault
- qualifies names of local (sub-) elements
- attributeFormDefault
- qualifies names of attributes
52Namespaces in the Schemadef.
- ltxsdschema xmlnsxsdhttp//w3.org/2001/XMLSche
ma - xmlnsbohttp//www.Book.com
- targetNamespacehttp//www.Book.c
omgt - ltxsdelement namebook typeboBookType/gt
- ltxsdcomplexType nameboBookType gt
- ... lt/xsdcomplexTypegt
- lt/xsdschemagt
- book und BookType are part of TargetNameSp.
53Namespaces in Schemadef.
- ltschema xmlns http//w3.org/2001/XMLSchema
- xmlnsbohttp//www.Book.com
- targetNamespacehttp//www.Book.c
om gt - ltelement namebook type boBookType /gt
- ltcomplexType nameBookType gt
- ... lt/complexTypegt
- lt/schemagt
54Namespaces in Schemadef.
- ltxsdschema xmlnsxsdhttp//w3.org/2001/XMLSche
ma - xmlns http//www.Book.com
- targetNamespacehttp//www
.Book.com gt - ltxsdelement namebook type BookType /gt
- ltxsdcomplexType xsdnameBookType gt
- ... lt/xsdcomplexTypegt
- lt/xsdschemagt
- Target www.Book.com as Default Namespace
55Instances of www.Book.com
- ltbobook xmlnsbo http//www.Book.com gt
- ...
- lt/bobookgt
- Valid according to all three schemas!
56Schema Location in Instance
- Declare within an XML document, where to find the
schema that should valid that document - Declare target Namespace
- Declare URI of Schema
- ltbook xmlns http//www.Book.com
- xmlnsxsi http//w3.org/XMLSchema-instance
- xsischemaLocation http//www.Book.com
- http//www.book.com/Book.xsd
- ...
- lt/bookgt
- This is not enforced! Validation using other
Schemas is legal.
57Unqualified Locals
- Local Declarations are not qualifiziert
- ltbobook xmlnsbo http//www.Book.com
- price 69.95 curr EUR gt
- lttitlegtDie wilde Wutzlt/titlegt ...
- lt/bobookgt
- Valide Instance globally qualifed, locally not
- Even works within Schema
- ltxsdelement name ... type ... /gt
- Full flexibility to control use of namespaces
58Qualifizied Sub-elements
- ltschema xmlns http//w3.org/2001/XMLSchema
- xmlnsbohttp//www.Book.com
- targetNamespacehttp//www.Book.c
om gt - elementFormDefaultqualified
- ltelement namebook type boBookType /gt
- ltcomplexType nameBookType gt ltsequencegt
- ltelement name title type string /gt
- ltelement name author /gt ltsequencegt
- ltelement name vname type string
/gt - ltelement name nname type string
/gt - lt/sequencegt lt/sequencegt lt/complexTypegt
- lt/schemagt
59Valid Instances
- ltbobook xmlnsbo http//www.Book.com
ltbotitlegtDie wilde Wutzlt/botitlegt - ltboauthorgtltbovnamegtD.lt/bovnamegt
ltbonnamegtK.lt/bonnamegtlt/boauthorgt - lt/bobookgt
- ltbook xmlns http//www.Book.com lttitlegtDie
wilde Wutzlt/titlegt - ltauthorgtltvnamegtD.lt/vnamegt
ltnnamegtK.lt/nnamegtlt/authorgt - lt/bookgt
60Qualified Attributes
- Enforce Qualified Attributes
- attributeFormDefault qualified in Element
definition - Enforce that certain attributes must be qualified
- ltattribute name ... type ... form
qualified /gt - (Analogous, enforce that Sub-elements must be
qualified)
61Composition of Schemas
- Construct libraries of schemas
- Include a Schema
- Parent and child have the same Target Namespace
- Only Parent used for Validation
- Redefine Include Modify
- Again, parent and child have the same Target
Namespace - Include individual types from a schema
- ltelement ref libimpType /gt
62Summary
- XML Schema is very powerful
- simple Types and complex Types
- many pre-defined types
- many ways to derive and create new types
- adopts database concepts (key, foreign keys)
- full control and flexibility
- fully inline with namespaces and other XML
standards - XML Schema is too powerful?
- too complicated, confusing?
- difficult to implement
- people use only a fraction anyway
- XML Schema is very different to what you know!
- the devil is in the detail
63XML vs. OO
- Encapsulation
- OO hides data
- XML makes data explicit
- Type Hierarchy
- OO defines superset / subset relationship
- XML shares structure set rel. make no sense
- Data Behavior
- OO packages them together
- XML separates data from its interpretation
64XML vs. Relational
- Structural Differences
- Tree vs. Table
- Heterogeneous vs. Homegeneous
- Optional vs. Strict typing
- Unnormalized vs. Normalized data
- Some commonalities
- Logical and physical data independance
- Declarative semantics
- Generic data model