XML Schema: An Intensive One-Day Tutorial - PowerPoint PPT Presentation

About This Presentation
Title:

XML Schema: An Intensive One-Day Tutorial

Description:

Reuters. Henry S. Thompson. XML Schema, London 1999-12-15. 4. Background ... Reuters. Henry S. Thompson. XML Schema, London 1999-12-15. 5. Document Structure ... – PowerPoint PPT presentation

Number of Views:119
Avg rating:3.0/5.0
Slides: 67
Provided by: xmlCove
Category:

less

Transcript and Presenter's Notes

Title: XML Schema: An Intensive One-Day Tutorial


1
XML SchemaAn Intensive One-Day Tutorial
  • Henry S. Thompson
  • HCRC Language Technology Group
  • University of Edinburgh

2
Overview
  • What are schemata, anyway?
  • The nature of document structure
  • Schema as contract
  • Taking control of structure definition
  • XML Schema the activity
  • The W3C and its WGs
  • The Charter and Requirements
  • The state of play
  • The Draft RECs
  • A detailed walkthrough
  • Schemas and Layered Architecture

3
Terminology
  • Documents have structure
  • Document types
  • Document instances
  • Structure can be defined
  • Informally (D. S. D.)
  • SGML DTD
  • XML DTD
  • Schema using XML

4
Background
  • SGML DTDs for D. S. D
  • Sperberg-McQueen
  • Others
  • Considered for XML itself
  • MCF, then RDF, now DCD, by Bray et al.
  • XML-Data, two versions, now XML-Data reduced, by
    Layman et al., then Frankston and Thompson
  • SOX, from Veo Corp.
  • XSchema, from an ad-hoc group of designers

5
Document Structure
  • Two relations are constitutive
  • Part-of
  • Kind-of
  • Existing DSD mechanisms use Content Models to
    specify part-of relations
  • But they only specify kind-of relations
    implicitly or informally
  • Making kind-of relations explicit would make both
    understanding and maintenance easier

6
Taking Control of D. S. D.
  • Eric Naggum used to talk about SGML allowing
    users to take control of their data
  • XML allows the same move one level up, for
    developers
  • The starting point is much simpler
  • The architecture is congenial
  • The demand is there
  • We need to do this, to make the transition to
    validation easier

7
Why validate?
  • A D. S. D. is a contract between producers and
    consumers
  • It provides a guaranteed interface
  • Producers validate to ensure they are providing
    what they promised
  • Consumers validate to check up on producers
  • and to protect their applications
  • Application authors validate to simplify their
    task
  • Leave error detection and analysis to the
    validating parser

8
Reconstructing DTDs
  • The Schema DTD is expressed in vanilla XML
  • Top level elements for declaring
  • Elements -)
  • Types
  • Notations
  • . . .
  • Subordinate element types for declaring
  • Attributes
  • Content models
  • . . .

9
An aside about terminology
  • SGML and XML 1.0 talk about element types
  • XML Schema to date has been more casual and just
    talked about elements
  • Meaning either an element in an instance
  • Or the abstraction which is described in a DTD or
    Schema
  • Further confused by XML Schema making extensive
    use of type
  • Also, schema means many different things to
    different people
  • I'll try always to say/write XML Schema. . .

10
A simple example
  • lt!ELEMENT text (PCDATAemphname)gt
  • lt!ATTLIST text timestamp NMTOKEN
    REQUIREDgt
  • ltelement name"text"gt
  • lttype content"mixed"gt
  • ltelement ref"emph"/gt
  • ltelement ref"name"/gt
  • ltattribute name"timestamp"
    type"date" minOccurs"1"/gt
    lt/typegtlt/elementgt

11
The Schema Architecture Static
  • A document or an application or a user identifies
    a schema
  • Each is well-formed XML
  • The schema is valid w.r.t the Schema DTD
  • The document is schema-valid w.r.t the schema
  • The schema is schema-valid wrt the schema for
    schemas

12
The Schema Architecture Dynamic
  • An XML application (XSP) which schema-validates
  • Takes control because changing how schemata
    work means
  • changing the Schema DTD/schema for schemas
  • upgrading XSP accordingly
  • not changing XML itself

13
The W3C
  • XML Schema hopes to be a W3C Recommendation
  • The W3C is The World Wide Web Consortium, a
    voluntary association of companies and non-profit
    organisations. Membership costs serious money,
    confers voting rights. Complex procedures, with
    the Chairman (Tim Berners-Lee) holding all the
    high cards, but the big vendors (e.g. Microsoft,
    Adobe, Netscape) have a lot of power.

14
. . . and its WGs
  • The XML recommendation was written by the W3Cs
    XML Working Group
  • Which split itself into pieces, of which one is
    the XML Schema WG
  • Chartered in the autumn of 1998
  • Requirements document out in February of 1999
  • Due to go to Last Call early in 2000

15
Requirements document
  • Full of good and hopeful requirements
  • DTDs and more
  • Support inheritance
  • Data-friendly
  • Good inventory of primitive datatypes

16
The state of play
  • Two component documents
  • Structures
  • Datatypes
  • Three public working drafts so far
  • May 1999
  • September 1999
  • November 1999
  • Further (near-final) PWD out December 1999
  • http//www.w3.org/TR/xmlschema-1/
  • contains pointers to previous drafts

17
The XML Schema worldview
  • Validity and well-formedness are XML 1.0 concepts
  • They are defined over character sequences
  • Namespace-compliant is a Namespace concept
  • It's defined over character sequences too
  • Schema-validity is the XML Schema concept
  • It is defined over XML document Infosets
  • So the whole XML Schema exercise is predicated on
    and layered on top of XML 1.0 well-formedness
    plus Namespaces
  • Because they are constitutive of the Infoset

18
What's the Infoset?
  • The XML 1.0 plus Namespaces abstract data model
  • Defines a modest number of information items
  • Element, attribute, namespace declaration, ...
  • Each has required and optional properties
  • Name, children,

19
What the Infoset isn't
  • It's not the DOM
  • Much higher level
  • It's not about implementation or interfacing at
    all
  • But you can think of it as a data structure if
    that helps
  • It's not an SGML property set/grove
  • But it's close
  • It doesn't have the entity problem
  • a mixed blessing, as we will see

20
The Schema and the Infoset
  • So crucially, schemas are about infosets, not
    character sequences
  • You could schema-validate a DOM tree you built by
    hand!
  • Using a schema which exists only as a DOM tree
    ditto
  • This simplifies things tremendously
  • but is hard to get your head around at first

21
Basic XML Schema concepts
  • Syntax is not the Schema
  • Namespaces are fundamental
  • But a schema is not a namespace
  • Separation of tag from type
  • Simple and Complex types
  • Modular Schema construction
  • Powerful type construction
  • Local tag-type association
  • Powerful wildcards
  • Element equivalence classes
  • Extension mechanism
  • Documentation mechanism

22
Schema Walkthrough 1
  • A Toy Purchase Order schema

23
Types and Type Derivation
  • For purposes of discussion, consider only the
    content type aspects of types (attributes are
    analogous)
  • A content type definition (simple or complex)
    consists of a set of constraints on what's
    allowed as content.

24
Permissions and obligations
  • You can think of the type itself as the set of
    strings/EIIs its constraints allow. It's helpful
    to think of constraints as composed of
    obligations and permissions
  • (\d )?(\d3-)?\d3-\d4
  • regexp definition facet for US 'phone number
    type
  • the ? and the \d can be seen as permissions, the
    - and the 3 as obligations
  • 1 337-6818 and 207-422-6240 belong to this type

25
Complex types
  • (title?,forename,surname)
  • (shorthand for) content model for name
  • the ? can be seen as permission, the , and the
    'surname' as obligations (at the end of the day,
    each component involves both permission AND
    obligation, but the balance of impact is as
    suggested)

26
Complex types, cont'd
  • (title?,forename,surname)
  • ltnamegt ltforenamegt...lt/forenamegt
    ltsurnamegt...lt/surnamegt lt/namegt
  • and
  • ltnamegt lttitlegt...lt/titlegt
    ltsurnamegt...lt/surnamegt lt/namegt
  • are both members of this type

27
Restriction
  • A type definition may be a restriction of another
    type's definition if it reduces permissions,
    sometimes to the point of inducing obligations
  • \d01\d-\d3-\d4 (a restriction
  • (\d )?(\d3-)?\d3-\d4 of US p)
  • The membership of this type, which includes
  • 207-422-6240 but not 1 337-6818
  • is a (proper) subset of the membership of the
    original type,
  • because by construction every member of the new
    type is a member of the original.

28
Restriction, cont'd
  • Similarly,
  • (forename,surname)
  • is a restriction of the original type definition
    for name
  • (title?,forename,surname)
  • and the same relation holds.

29
Restriction, cont'd
  • Note first that
  • (forename,surname)
  • ltnamegt ltforenamegt...lt/forenamegt
    ltsurnamegt...lt/surnamegt lt/namegt
  • is a member of the new type, but
  • ltnamegt lttitlegt...lt/titlegt
    ltsurnamegt...lt/surnamegt lt/namegt
  • is not.

30
Extension
  • Now consider
  • (title?, forename, surname, genMark?)
  • This type extends the original type definition
    for name.
  • ltnamegt ltforenamegtAllt/forenamegt
    ltsurnamegtGorelt/surnamegt ltgenMarkgtJrlt/genMarkgtlt
    /namegt
  • is an instance of this new type, but not of the
    original.

31
Any
  • Finally note that the ltany/gt content model
    particle, in all of its forms, introduces
    particularly broad permissions into complex
    content types.

32
Where are we headed?
  • A number of design decisions can now be stated
  • Should we make it easy to construct type
    definitions which restrict or extend other type
    definitions, by specifying only the method of
    derivation and the differences between the source
    and derived type definitions?
  • The new proposal says 'yes', you do this by using
    the "source" and "derivedBy" attributes on your
    lttypegt or ltdatatypegt element.

33
Datatype example
  • Consider the simple type case first
  • ltdatatype name'bodytemp'
    source'decimal'gt ltprecision value'4'/gt
    ltscale value'1'/gt ltminInclusive
    value'97.0'/gt ltmaxInclusive value'105.0'/gt
    lt/datatypegt

34
Derived type
  • ltdatatype name'healthyBodytemp'
    source'bodytemp'gt ltmaxInclusive
    value'99.5'/gt lt/datatypegt
  • The healthyBodytemp type definition is defined by
    closing down the permitted range of bodytemp. We
    say it 'inherits' the other facets of bodytemp,
    so the 'effective type definition' of
    healthyBodytemp is

35
Effective type
  • ltdatatype name'healthyBodytemp'
    source'decimal'gt ltprecision value'4'/gt
    ltscale value'1'/gt ltminInclusive
    value'97.0'/gt ltmaxInclusive value'99.5'/gt
    lt/datatypegt
  • Since it doesn't in general make sense to extend
    one simple type by another, the "derivedBy"
    attribute is actually redundant for ltdatatypegt.

36
Extension for complex types
  • The next simplest case is extension for complex
    types
  • lttype name'name'gt ltelement name'title'
    minOccurs'0'/gt ltelement
    name'forename' minOccurs'0'
    maxOccurs''/gt ltelement name'surname'/gt
    lt/typegt

37
Derived type
  • lttype name'fullName' source'name'
    derivedBy'extension'gt ltelement
    name'genMark' minOccurs'0'/gt
    lt/typegt

38
The effective type
  • lttype name'fullName'gt ltelement
    name'title' minOccurs'0'/gt
    ltelement name'forename'
    minOccurs'0' maxOccurs''/gt
    ltelement name'surname'/gt ltelement
    name'genMark' minOccurs'0'/gt
    lt/typegt

39
Restriction for complex types
  • Restriction for complex types is harder to handle
    syntactically, because of the significance of
    linear order in content models, but the semantics
    are completely parallel to the simple type case

40
Restriction example
  • lttype name'simpleName' source'name'
    derivedBy'restriction'gt ltrestrictionsgt
    ltelement name'title'
    maxOccurs'0'/gt ltelement name'forename'
    minOccurs'1'/gt lt/restrictionsgt
    lt/typegt

41
Restriction and Inheritance
  • Just as in the ltdatatypegt case, the content model
    aspects not mentioned are left alone, including
    the "maxOccurs''" on ltforenamegt and the whole
    particle for ltsurnamegt, so the 'effective content
    model' of 'simpleName' is

42
Effective type
  • lttype name'simpleName'gt ltelement
    name'title' maxOccurs'0'
    minOccurs'0'/gt lt!-- i.e. forbidden --gt
    ltelement name'forename'
    minOccurs'1' maxOccurs''/gt
    ltelement name'surname'/gt lt/typegt

43
Instances
  • Given all the example definitions above, all of
  • ltnamegtlttitlegtMslt/titlegtltsurnamegtSteinemlt/surname
    gtlt/namegt
  • ltname xsitype'simpleName'gt
    ltforeNamegtHarrylt/foreNamegt ltforeNamegtSlt/foreNam
    egt ltsurnamegtTrumanlt/surnamegt lt/namegt

44
Another instance
  • ltname xsitype'fullName'gt
    ltforenamegtAllt/forenamegt ltsurnamegtGorelt/surnamegt
    ltgenMarkgtJrlt/genMarkgt lt/namegt
  • all would be schema-valid per
  • ltelement name'name' type'name'/gt

45
Connecting Instances and Schemas
  • Like I said
  • A schema is not a namespace
  • The connection cannot be made rigid
  • The draft identifies three layers, first is
  • schema-valid(EII,TypeName,ComponentSet)
  • The TypeName is a (namespaceURI,NCName) pair
  • The component set is made up of
    (namespaceURI,NCName,component) triples

46
Other layers
  • Layer 2 transfer syntax
  • Layer 3 web connections

47
Schema Walkthrough 2
  • The Schema for Datatypes

48
Schema Walkthrough 3
  • The Schema for Schemas

49
Change of Gear
  • Let's look at the role of schemas in supporting
    the layered architecture which is emerging all
    around us

50
XML is ASCII for the 21st century
  • ASCII (ISO 646) solved a fundamental interchange
    problem for flat text documents
  • What bits encode what characters
  • (For a pretty parochial definition of
    'character')
  • UNICODE/ISO 10646 extends that solution to the
    whole world
  • XML thought it was doing the same for simple
    tree-structured documents
  • The emphasis in the XML design was on simplifying
    SGML to move it to the Web
  • XML didn't touch SGML's architectural vision
  • flexible linearisation/transfer syntax
  • for tree-structured documents with internal links

51
Just what is XML?
  • It's a markup language used for annotating text
  • It is concerned with logical structure
  • to identify sections, titles, section headers,
    chapters, paragraphs,
  • It is not concerned with appearance
  • you say 'this is a subtitle'not 'this is in
    bold, 14pt, centered'
  • you say 'this is an example'not 'this is in
    verbatim, indented by 5pts, ragged right'

52
Take Two Just what is XML?
  • It's a markup language used for transferring data
  • It is concerned with data models
  • to convert between application-appropriate and
    transfer-appropriate forms
  • It is not concerned with human beings
  • It's produced and consumed by programs

53
XML as UI
  • A slogan of Adam Bosworth
  • I interpret it in two ways
  • At the client end
  • Use XML plus XSL as the basis for what the user
    sees on his/her screen
  • Use XLinks from a master document to pull
    together disparate sources of information
  • At the server end
  • Use XML as a uniform interface for any data
    source onto the web
  • Not just documents, but E.g. Databases, process
    control information, stock quotes

54
Application data
55
Structured markup
  • ltPOORDERHDRgtltDATETIME qualifier"DOCUMENT"gt
    ltYEARgt1996lt/YEARgt ltMONTHgt06lt/MONTHgt
    ltDAYgt30lt/DAYgt ltHOURgt23lt/HOURgt
    ltMINUTEgt59lt/MINUTEgt ltSECONDgt59lt/SECONDgt
    ltSUBSECONDgt0000lt/SUBSECONDgt ltTIMEZONEgt0100lt/TIM
    EZONEgt lt/DATETIMEgt ltOPERAMT qualifier"EXTENDED"
    type"T"gt ltVALUEgt670000lt/VALUEgt
    ltNUMOFDECgt2lt/NUMOFDECgt ltSIGNgtlt/SIGNgt
    ltCURRENCYgtUSDlt/CURRENCYgt. . .

56
What just happened!?
  • The whole transfer syntax story just went meta,
    that's what happened!
  • XML has been a runaway success, on a much greater
    scale than its designers anticipated
  • Not for the reason they had hoped
  • Because separation of form from content is right
  • But for a reason they barely thought about
  • Data must travel the web
  • Tree structured documents are a useable transfer
    syntax for just about anything
  • So data-oriented web users think of XML as a
    transfer mechanism for their data

57
The Cambridge Communiqué
  • A W3C Note resulting from a meeting this August
    (http//www.w3.org/TR/schema-arch)
  • Signalled a widespread acceptance of layering
  • "XML has defined a transfer syntax for
    tree-structured documents
  • "Many data-oriented applications are being
    defined which build their own data structures on
    top of an XML document layer, effectively using
    XML documents as a transfer mechanism for
    structured data "

58
The Communiqué, cont'd
  • Called for support in XML Schema for specifying
    mapping between the XML document data model (or
    XML Infoset) and application-specific data models
  • XML Schema is a W3C recommendation-in-progress
    for definiing the structure of document families
  • A grammar for markup structure
  • E.g.
  • artice -gt title, subtitle?, section
  • or
  • POORDERHDR -gt DATETIME, ORDERAMT

59
Mapping between layers
  • Fortunately, XML Schema is actually notated in
    XML itself
  • So there are elements defined for use in schemas
    to define. . .
  • Elements -)
  • Attributes
  • Types
  • A type is a collection of constraints on element
    content and attribute values
  • A type may be either
  • simple, for constraining string values
  • complex, for constraining elements which contain
    other elements

60
Type definition example
  • lttype name'personName'gt ltelement name'title'
    minOccurs'0'/gt ltelement
    name'forename' minOccurs'0'
    maxOccurs''/gt ltelement name'surname'/gt
    ltattribute name'id'
    type'integer'/gtlt/typegt
  • ltelement name'owner' type'personName'/gt

61
Mapping between layers 2
  • We can think of this in two ways
  • In terms of an abstract data modelling language
  • Entity-Relation
  • UML
  • RDF
  • In concrete implementation terms
  • Tables and rows
  • Class instances and instance variables
  • The first is more portable
  • The second more immediately useful

62
Mapping between layers 3
  • Regardless of what approach we take, we need
  • A vocabulary of data model components
  • An attachment of that vocabulary to schema
    components
  • Sample vocabularies
  • entity, relationship, collection
  • table, row, column
  • instance, variable, list, dictionary
  • Where should attachment be specified?
  • In the schema
  • convenient
  • Outside it
  • modular

63
Specifying mapping in the schema
  • Probably reasonable if done in high-level (ER,
    UML) terms
  • See example infoset-xmpl.xml, infoset-uml.xsd

64
Specifying mapping outside
  • Requires some duplication of structural
    information
  • Encourages cross-language working
  • See example infoset-xmpl.xsl

65
Take-home message
  • The point at which idiosyncratic scripting takes
    over can be moved one layer up
  • Using public consensual declarative standards is
    a Good Thing
  • Interoperability makes things better for everyone

66
Overall Conclusion
  • "Schemas are coming Start using them!"
  • ____Tim Berners-Lee, 1999-11-05
Write a Comment
User Comments (0)
About PowerShow.com