Title: Writing and validating XML: DTDs and Schemas
1Writing and validating XMLDTDs and Schemas
- David Cornforth
- School of ITEE, UNSW_at_ADFA
- Some material from
- Eric van der Vlist Using W3C XML Schema
- http//www.xml.com/pub/a/2000/11/29/schemas/part1.
html
2Outline
- What is XML?
- Validation
- DTDs
- XML schemas
3What is XML?
- A language for defining new languages with
- Nested, Structured Tagging
- With Attributes
- And Links
4XML Design Goals
- Easily usable over the Internet
- Must support a variety of applications
- Must be compatible with SGML
- Easy to write programs to process docs
- Minimum optional features
- Docs easily understood by non-programmers
- Design prepared quickly
- Exact and concise design
- Docs easy to create
- Terseness not important
- Source Carey, New Perspectives on XML, Thompson,
2004.
5A comparison
- HTML
- lth2gtKind of Bluelt/h2gt
- lth3gtMiles Davislt/h3gt
- ltolgtTracks
- ltligtSo What (922)lt/ligt
- ltligtBlue in Green (537)lt/ligt
- ltligtAll Blues (1133)lt/ligt
- lt/olgt
XML ltCDTITLEgtKind of Bluelt/ CDTITLEgt ltARTISTgtMile
s Davislt/ARTISTgt ltCONTENTSgt ltTRACKgtSo What
(922)lt/TRACKgt ltTRACKgtBlue in Green
(537)lt/TRACKgt ltTRACKgtAll Blues
(1133)lt/TRACKgt lt/CONTENTSgt
Source Carey, New Perspectives on XML, Thompson,
2004.
6An Example
- lt?xml version"1.0" ?gt
- ltbook isbn"1234567890"gt lt!--
an attribute --gt - Â lttitlegtThe Lord of the Ringslt/titlegt lt!--
an element --gt - Â ltauthorgtJ R R Tolkienlt/authorgt
- ltpersongt
- Â ltnamegtFrodo Bagginslt/namegt
- Â ltrelative-ofgtBilbo Bagginslt/relative-ofgt
- Â ltsincegt1960-01-01lt/sincegt
- Â ltdescriptiongtthoughtful hobbitlt/descriptiongt
- Â lt/persongt
- ltpersongt
- Â ltnamegtBilbo Bagginslt/namegt
- Â ltsincegt1956-01-01lt/sincegt
- Â ltdescriptiongtbold and brash
hobbitlt/descriptiongt - Â lt/persongt
- lt/bookgt
7Rules for Elements
- Names are case-sensitive
- Names must begin with a letter or _
- Names cannot contain blanks
- Names cannot begin with xml
- Opening closing tag must match
- Elements can be nested but not overlapped
- Empty elements are allowed ltthing /gt
- Each doc has a single root
- Source Carey, New Perspectives on XML, Thompson,
2004.
8Rules for attributes
- Followed by an and the value enclosed in double
quotes - e.g. ltthing attributevaluegt
- Must begin with letter or _
- No spaces allowed
- Cannot begin with xml
- Can only appear once in the same starting tag
9Special characters
- Use n, where n is a number
- Examples
10The CDATA section
- When you know you may use some of these
character, and dont want to use the reference
every time - Syntax
- lt!CDATASome text heregt
- Example
- lt!CDATATim and Janelle say Hi dont forget
that the pen gt the swordgt
11Validation
- To ensure strict conformity to a standard (unlike
HTML) - A well-formed document contains no syntax errors
- A valid document is well-formed and also obeys
the definition given by a DTD or by a Schema - A parser is required to test for both these
qualities
12How to validate?
- XMLSpy has a built-in validator
- Several web pages have a validator (e.g.
w3schools.com) - Download other GUI-based validators
- Download command-line based validators
13Document Type Definitions (DTDs)
- The original SGML mechanism for defining a class
of documents - Also apply to XML, as a subset of SGML
- More flexible (to handle the greater flexibility
of SGML - But correspondingly more complex and verbose
- Not valid XML
- Dont properly support namespaces
- Dont support data typing
14DTD Example
ltCDgt ltTITLEgtKind of Bluelt/TITLEgt
ltARTISTgtMiles Davislt/ARTISTgt ltCONTENTSgt
ltTRACK length"922"gtSo Whatlt/TRACKgt ltTRACK
length"537"gtBlue in Greenlt/TRACKgt ltTRACK
length"1133"gtAll Blueslt/TRACKgt
lt/CONTENTSgt lt/CDgt
- lt!ELEMENT CD (TITLE, ARTIST, CONTENTS)gt
- lt!ELEMENT TITLE (PCDATA)gt
- lt!ELEMENT ARTIST (PCDATA)gt
- lt!ELEMENT CONTENTS (TRACK)gt
- lt!ELEMENT TRACK (PCDATA)gt
- lt!ATTLIST TRACK length (PCDATA) REQUIREDgt
Source Carey, New Perspectives on XML, Thompson,
2004.
15Multiple elements
- ? means zero or one
- means zero or more
- means one or more
16Internal DTD
lt?xml version1.0?gt lt!DOCTYPE CD lt!ELEMENT
CD (TITLE, ARTIST, CONTENTS)gt lt!ELEMENT TITLE
(PCDATA)gt lt!ELEMENT ARTIST (PCDATA)gt
lt!ELEMENT CONTENTS (TRACK)gt lt!ELEMENT TRACK
(PCDATA)gt lt!ATTLIST TRACK length CDATA
REQUIREDgt gt ltCDgt ltTITLEgtKind of
Bluelt/TITLEgt ltARTISTgtMiles Davislt/ARTISTgt
ltCONTENTSgt ltTRACK length"922"gtSo
Whatlt/TRACKgt ltTRACK length"537"gtBlue in
Greenlt/TRACKgt ltTRACK length"1133"gtAll
Blueslt/TRACKgt lt/CONTENTSgt lt/CDgt
17External DTD
- lt?xml version1.0?gt
- lt!DOCTYPE CD SYSTEM jazzy.dtdgt
- ltCDgt
- ltTITLEgtKind of Bluelt/TITLEgt
- ltARTISTgtMiles Davislt/ARTISTgt
- ltCONTENTSgt
- ltTRACK length"922"gtSo Whatlt/TRACKgt
- ltTRACK length"537"gtBlue in Greenlt/TRACKgt
- ltTRACK length"1133"gtAll Blueslt/TRACKgt
- lt/CONTENTSgt
- lt/CDgt
18What is a schema?
- An alternative way to validate XML
- Based on XML itself, instead of using a different
format - The newer standard
19Schemas vs. DTDs
Source Carey, New Perspectives on XML, Thompson,
2004.
20Defining a Schema
- Go through all the tagging elements of the
document in turn - Give an element definition for each
- Attribute definitions are required to follow the
corresponding set of element definitions - Note complex type sequence to define a
structure with a fixed order
21Schema example
- lt?xml version"1.0" ?gt
- ltschema xmlns"http//www.w3.org/2001/XMLSchema"gt
- ltelement name"CD"gt
- ltcomplexTypegt
- ltsequencegt
- ltelement name"TITLE" type"string"
/gt - ltelement name"ARTIST" type"string"
/gt - ltelement name"CONTENTS"gt
- ltcomplexTypegt
- ltsequencegt
- ltelement name"TRACK"
maxOccurs"unbounded"gt - ltcomplexTypegt
- ltsimpleContentgt
- ltextension
base"string"gt - ltattribute
name"length" type"string" /gt - lt/extensiongt
- lt/simpleContentgt
- lt/complexTypegt
22Repeated elements
- minOccurs1
- maxOccursunbounded
- Easier than remembering ? or
23Changes to the instance document
- lt?xml version"1.0"?gt
- ltCD xsinoNamespaceSchemaLocation"Jazzy.xsd"
xmlnsxsi"http//www.w3.org/2001/XMLSchema-instan
ce"gt - ltTITLEgtKind of Bluelt/TITLEgt
- ltARTISTgtMiles Davislt/ARTISTgt
- ltCONTENTSgt
- ltTRACK length"922"gtSo Whatlt/TRACKgt
- ltTRACK length"537"gtBlue in Greenlt/TRACKgt
- ltTRACK length"1133"gtAll Blueslt/TRACKgt
- lt/CONTENTSgt
- lt/CDgt
Source Carey, New Perspectives on XML, Thompson,
2004.
24Elements
- Corresponds to a tag in the XML
- Can be a built-in type
- ltelement name"TITLE" type"string" /gt
- Can be a user-defined type
- ltelement name"CD"gt
- ltcomplexTypegt
- other stuff
- A simple type that has any attribute must use
ltcomplexTypegt
25ltcomplexTypegt
- Must follow the pattern
- Compositor (sequence choice all)
- Element declarations
- Compositor
- Attribute declarations
26Structuring the Schema
- Notice how the example schema used a ridiculous
amount of indentation - This is nothing to what you can achieve if you
set your mind to it! - The above structure can be simplified
- Once an element has been defined, it can be
referred to by a reference - Effectively a form of variable name for an
element - BUT
27Structuring the schema
- First, we have to distinguish between our own
defined types, and ones defined by - http//www.w3.org/2001/XMLSchema
- This is done using a namespace (more in week 4)
- Method 1 we put our types into a namespace
- Instead of
- ltschema xmlns"http//www.w3.org/2001/XMLSchema"gt
- We use
- ltxsdschema xmlnsxsd "http//www.w3.org/2001/XM
LSchema" - xmlnsjazz
"http//ZITE8126/2006/SimpleExamples" - targetNamespace "http//ZITE8126/2006/Si
mpleExamples"gt
28Structured schema
- ltelement name"TITLE" type"string" /gt
- ltelement name"ARTIST" type"string" /gt
- ltelement name"CD" type"jazzCD_type" /gt
- ltelement name"CONTENTS" type"jazzCONTENTS_type"
/gt - ltelement name"TRACK" type"jazzTRACK_type" /gt
- ltcomplexType name"CD_type"gt
- ltsequencegt
- ltelement ref"jazzTITLE" /gt
- ltelement ref"jazzARTIST" /gt
- ltelement ref"jazzCONTENTS" /gt
- lt/sequencegt
- lt/complexTypegt
- ltcomplexType name"CONTENTS_type"gt
- ltsequencegt
- ltelement ref"jazzTRACK"
maxOccurs"unbounded" /gt - lt/sequencegt
29Changes to the instance document
- lt?xml version"1.0"?gt
- ltCD xmlnsxsi"http//www.w3.org/2001/XMLSchema-in
stance" - xsischemaLocation"http//ZITE8126/2006/Simpl
eExamples Jazzy2.xsd" - xmlns"http//ZITE8126/2006/SimpleExamples"gt
-
- ltTITLEgtKind of Bluelt/TITLEgt
- ltARTISTgtMiles Davislt/ARTISTgt
- ltCONTENTSgt
- ltTRACK length"922"gtSo Whatlt/TRACKgt
- ltTRACK length"537"gtBlue in Greenlt/TRACKgt
- ltTRACK length"1133"gtAll Blueslt/TRACKgt
- lt/CONTENTSgt
- lt/CDgt
30Questions?