Writing and validating XML: DTDs and Schemas - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Writing and validating XML: DTDs and Schemas

Description:

A language for defining new languages with. Nested, Structured ... relative-of Bilbo Baggins /relative-of since ... name Bilbo Baggins /name since ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 31
Provided by: CS087
Category:

less

Transcript and Presenter's Notes

Title: Writing and validating XML: DTDs and Schemas


1
Writing and validating XMLDTDs and Schemas
  • David Cornforth
  • School of ITEE, UNSW_at_ADFA
  • Some material from
  • Eric van der Vlist Using W3C XML Schema
  • http//www.xml.com/pub/a/2000/11/29/schemas/part1.
    html

2
Outline
  • What is XML?
  • Validation
  • DTDs
  • XML schemas

3
What is XML?
  • A language for defining new languages with
  • Nested, Structured Tagging
  • With Attributes
  • And Links

4
XML Design Goals
  • Easily usable over the Internet
  • Must support a variety of applications
  • Must be compatible with SGML
  • Easy to write programs to process docs
  • Minimum optional features
  • Docs easily understood by non-programmers
  • Design prepared quickly
  • Exact and concise design
  • Docs easy to create
  • Terseness not important
  • Source Carey, New Perspectives on XML, Thompson,
    2004.

5
A comparison
  • HTML
  • lth2gtKind of Bluelt/h2gt
  • lth3gtMiles Davislt/h3gt
  • ltolgtTracks
  • ltligtSo What (922)lt/ligt
  • ltligtBlue in Green (537)lt/ligt
  • ltligtAll Blues (1133)lt/ligt
  • lt/olgt

XML ltCDTITLEgtKind of Bluelt/ CDTITLEgt ltARTISTgtMile
s Davislt/ARTISTgt ltCONTENTSgt ltTRACKgtSo What
(922)lt/TRACKgt ltTRACKgtBlue in Green
(537)lt/TRACKgt ltTRACKgtAll Blues
(1133)lt/TRACKgt lt/CONTENTSgt
Source Carey, New Perspectives on XML, Thompson,
2004.
6
An Example
  • lt?xml version"1.0" ?gt
  • ltbook isbn"1234567890"gt lt!--
    an attribute --gt
  •   lttitlegtThe Lord of the Ringslt/titlegt lt!--
    an element --gt
  •   ltauthorgtJ R R Tolkienlt/authorgt
  • ltpersongt
  •   ltnamegtFrodo Bagginslt/namegt
  •   ltrelative-ofgtBilbo Bagginslt/relative-ofgt
  •   ltsincegt1960-01-01lt/sincegt
  •   ltdescriptiongtthoughtful hobbitlt/descriptiongt
  •   lt/persongt
  • ltpersongt
  •   ltnamegtBilbo Bagginslt/namegt
  •   ltsincegt1956-01-01lt/sincegt
  •   ltdescriptiongtbold and brash
    hobbitlt/descriptiongt
  •   lt/persongt
  • lt/bookgt

7
Rules for Elements
  • Names are case-sensitive
  • Names must begin with a letter or _
  • Names cannot contain blanks
  • Names cannot begin with xml
  • Opening closing tag must match
  • Elements can be nested but not overlapped
  • Empty elements are allowed ltthing /gt
  • Each doc has a single root
  • Source Carey, New Perspectives on XML, Thompson,
    2004.

8
Rules for attributes
  • Followed by an and the value enclosed in double
    quotes
  • e.g. ltthing attributevaluegt
  • Must begin with letter or _
  • No spaces allowed
  • Cannot begin with xml
  • Can only appear once in the same starting tag

9
Special characters
  • Use n, where n is a number
  • Examples

10
The CDATA section
  • When you know you may use some of these
    character, and dont want to use the reference
    every time
  • Syntax
  • lt!CDATASome text heregt
  • Example
  • lt!CDATATim and Janelle say Hi dont forget
    that the pen gt the swordgt

11
Validation
  • To ensure strict conformity to a standard (unlike
    HTML)
  • A well-formed document contains no syntax errors
  • A valid document is well-formed and also obeys
    the definition given by a DTD or by a Schema
  • A parser is required to test for both these
    qualities

12
How to validate?
  • XMLSpy has a built-in validator
  • Several web pages have a validator (e.g.
    w3schools.com)
  • Download other GUI-based validators
  • Download command-line based validators

13
Document Type Definitions (DTDs)
  • The original SGML mechanism for defining a class
    of documents
  • Also apply to XML, as a subset of SGML
  • More flexible (to handle the greater flexibility
    of SGML
  • But correspondingly more complex and verbose
  • Not valid XML
  • Dont properly support namespaces
  • Dont support data typing

14
DTD Example
ltCDgt ltTITLEgtKind of Bluelt/TITLEgt
ltARTISTgtMiles Davislt/ARTISTgt ltCONTENTSgt
ltTRACK length"922"gtSo Whatlt/TRACKgt ltTRACK
length"537"gtBlue in Greenlt/TRACKgt ltTRACK
length"1133"gtAll Blueslt/TRACKgt
lt/CONTENTSgt lt/CDgt
  • lt!ELEMENT CD (TITLE, ARTIST, CONTENTS)gt
  • lt!ELEMENT TITLE (PCDATA)gt
  • lt!ELEMENT ARTIST (PCDATA)gt
  • lt!ELEMENT CONTENTS (TRACK)gt
  • lt!ELEMENT TRACK (PCDATA)gt
  • lt!ATTLIST TRACK length (PCDATA) REQUIREDgt

Source Carey, New Perspectives on XML, Thompson,
2004.
15
Multiple elements
  • ? means zero or one
  • means zero or more
  • means one or more

16
Internal DTD
lt?xml version1.0?gt lt!DOCTYPE CD lt!ELEMENT
CD (TITLE, ARTIST, CONTENTS)gt lt!ELEMENT TITLE
(PCDATA)gt lt!ELEMENT ARTIST (PCDATA)gt
lt!ELEMENT CONTENTS (TRACK)gt lt!ELEMENT TRACK
(PCDATA)gt lt!ATTLIST TRACK length CDATA
REQUIREDgt gt ltCDgt ltTITLEgtKind of
Bluelt/TITLEgt ltARTISTgtMiles Davislt/ARTISTgt
ltCONTENTSgt ltTRACK length"922"gtSo
Whatlt/TRACKgt ltTRACK length"537"gtBlue in
Greenlt/TRACKgt ltTRACK length"1133"gtAll
Blueslt/TRACKgt lt/CONTENTSgt lt/CDgt
17
External DTD
  • lt?xml version1.0?gt
  • lt!DOCTYPE CD SYSTEM jazzy.dtdgt
  • ltCDgt
  • ltTITLEgtKind of Bluelt/TITLEgt
  • ltARTISTgtMiles Davislt/ARTISTgt
  • ltCONTENTSgt
  • ltTRACK length"922"gtSo Whatlt/TRACKgt
  • ltTRACK length"537"gtBlue in Greenlt/TRACKgt
  • ltTRACK length"1133"gtAll Blueslt/TRACKgt
  • lt/CONTENTSgt
  • lt/CDgt

18
What is a schema?
  • An alternative way to validate XML
  • Based on XML itself, instead of using a different
    format
  • The newer standard

19
Schemas vs. DTDs
Source Carey, New Perspectives on XML, Thompson,
2004.
20
Defining a Schema
  • Go through all the tagging elements of the
    document in turn
  • Give an element definition for each
  • Attribute definitions are required to follow the
    corresponding set of element definitions
  • Note complex type sequence to define a
    structure with a fixed order

21
Schema example
  • lt?xml version"1.0" ?gt
  • ltschema xmlns"http//www.w3.org/2001/XMLSchema"gt
  • ltelement name"CD"gt
  • ltcomplexTypegt
  • ltsequencegt
  • ltelement name"TITLE" type"string"
    /gt
  • ltelement name"ARTIST" type"string"
    /gt
  • ltelement name"CONTENTS"gt
  • ltcomplexTypegt
  • ltsequencegt
  • ltelement name"TRACK"
    maxOccurs"unbounded"gt
  • ltcomplexTypegt
  • ltsimpleContentgt
  • ltextension
    base"string"gt
  • ltattribute
    name"length" type"string" /gt
  • lt/extensiongt
  • lt/simpleContentgt
  • lt/complexTypegt

22
Repeated elements
  • minOccurs1
  • maxOccursunbounded
  • Easier than remembering ? or

23
Changes to the instance document
  • lt?xml version"1.0"?gt
  • ltCD xsinoNamespaceSchemaLocation"Jazzy.xsd"
    xmlnsxsi"http//www.w3.org/2001/XMLSchema-instan
    ce"gt
  • ltTITLEgtKind of Bluelt/TITLEgt
  • ltARTISTgtMiles Davislt/ARTISTgt
  • ltCONTENTSgt
  • ltTRACK length"922"gtSo Whatlt/TRACKgt
  • ltTRACK length"537"gtBlue in Greenlt/TRACKgt
  • ltTRACK length"1133"gtAll Blueslt/TRACKgt
  • lt/CONTENTSgt
  • lt/CDgt

Source Carey, New Perspectives on XML, Thompson,
2004.
24
Elements
  • Corresponds to a tag in the XML
  • Can be a built-in type
  • ltelement name"TITLE" type"string" /gt
  • Can be a user-defined type
  • ltelement name"CD"gt
  • ltcomplexTypegt
  • other stuff
  • A simple type that has any attribute must use
    ltcomplexTypegt

25
ltcomplexTypegt
  • Must follow the pattern
  • Compositor (sequence choice all)
  • Element declarations
  • Compositor
  • Attribute declarations

26
Structuring the Schema
  • Notice how the example schema used a ridiculous
    amount of indentation
  • This is nothing to what you can achieve if you
    set your mind to it!
  • The above structure can be simplified
  • Once an element has been defined, it can be
    referred to by a reference
  • Effectively a form of variable name for an
    element
  • BUT

27
Structuring the schema
  • First, we have to distinguish between our own
    defined types, and ones defined by
  • http//www.w3.org/2001/XMLSchema
  • This is done using a namespace (more in week 4)
  • Method 1 we put our types into a namespace
  • Instead of
  • ltschema xmlns"http//www.w3.org/2001/XMLSchema"gt
  • We use
  • ltxsdschema xmlnsxsd "http//www.w3.org/2001/XM
    LSchema"
  • xmlnsjazz
    "http//ZITE8126/2006/SimpleExamples"
  • targetNamespace "http//ZITE8126/2006/Si
    mpleExamples"gt

28
Structured schema
  • ltelement name"TITLE" type"string" /gt
  • ltelement name"ARTIST" type"string" /gt
  • ltelement name"CD" type"jazzCD_type" /gt
  • ltelement name"CONTENTS" type"jazzCONTENTS_type"
    /gt
  • ltelement name"TRACK" type"jazzTRACK_type" /gt
  • ltcomplexType name"CD_type"gt
  • ltsequencegt
  • ltelement ref"jazzTITLE" /gt
  • ltelement ref"jazzARTIST" /gt
  • ltelement ref"jazzCONTENTS" /gt
  • lt/sequencegt
  • lt/complexTypegt
  • ltcomplexType name"CONTENTS_type"gt
  • ltsequencegt
  • ltelement ref"jazzTRACK"
    maxOccurs"unbounded" /gt
  • lt/sequencegt

29
Changes to the instance document
  • lt?xml version"1.0"?gt
  • ltCD xmlnsxsi"http//www.w3.org/2001/XMLSchema-in
    stance"
  • xsischemaLocation"http//ZITE8126/2006/Simpl
    eExamples Jazzy2.xsd"
  • xmlns"http//ZITE8126/2006/SimpleExamples"gt
  • ltTITLEgtKind of Bluelt/TITLEgt
  • ltARTISTgtMiles Davislt/ARTISTgt
  • ltCONTENTSgt
  • ltTRACK length"922"gtSo Whatlt/TRACKgt
  • ltTRACK length"537"gtBlue in Greenlt/TRACKgt
  • ltTRACK length"1133"gtAll Blueslt/TRACKgt
  • lt/CONTENTSgt
  • lt/CDgt

30
Questions?
Write a Comment
User Comments (0)
About PowerShow.com