Writing Your Last DTD - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Writing Your Last DTD

Description:

By DTD I mean simply the formal declarations as allowed by XML 1.x ... XML can be seen as just' a serialisation format, in which case the models need just' to work ... – PowerPoint PPT presentation

Number of Views:69
Avg rating:3.0/5.0
Slides: 29
Provided by: alexb2
Category:

less

Transcript and Presenter's Notes

Title: Writing Your Last DTD


1
Writing Your Last DTD ?
  • Alex BrownGriffin Brown Digital Publishing Ltd

2
Background
  • By DTD I mean simply the formal declarations as
    allowed by XML 1.x
  • A last DTD doesnt mean a last validation
    mechanism the future is not well-formed
  • This presentation is in two parts
  • Modelling
  • DTD-specific features

3
DTDs on the Wane?
  • Some say DTDs are on the way out have been
    saying this for a while
  • Some evidence of shift, mostly driven by new
    tools and new XML implementers
  • Rise of the pipelining model of validation (DSDL)
    likely. DTDs need to cooperate with other
    technologies
  • DTDs are not very complete instruments of
    validation

4
Part I - Modelling
5
Human-facing XML Models
  • XML can be seen as just a serialisation format,
    in which case the models need just to work
  • This presentation concerned also with models that
    people experience (at some level)
  • People often look at raw markup, and experience
    content models through tools (e.g.
    syntax-directed editors)

6
Machine-facing XML Models
  • Desirable features
  • Normalised
  • Machine efficient
  • Programmer efficient ?
  • Techniques fairly easily borrowed from other
    disciplines (database schema design, type system
    design, etc.)

7
Machines vs People
  • Also known as data vs documents ?
  • In reality few resources are at the extremes of
    this spectrum
  • Many resources mix data-like and document-like
    features
  • The challenge is in finding a balance and
    tolerating the mess

8
Data Normalisation
  • i.e., single items of data appear once
  • A really good idea for some data
  • E.g. link targets, database dumps

9
Mixed Content
  • Normalisation not a natural feature of human
    languages
  • ltpgtThe cat sat on the matltpgt
  • not
  • ltpgtThe cat ltverb infinitiveto sit
    tenseperfect/gt on the matlt/pgt

10
When natural language is suitable
  • Dont be afraid to model mixed content (diamonds
    in the mud approach)
  • e.g. bibliographic references
  • Sometimes the precision of human language cannot
    be modelled precisely
  • e.g. addresses

11
Type Hierarchies (1)
Credit-card _at_type
Expiry
Number
Name
Credit-card _at_typeSWITCH
Expiry
Number
Name
Issue Number
?
12
Type Hierarchies (2)
Credit-card
switch-card
visa-card (etc.)
visa-card
Expiry
Number
Name
Expiry
Number
Name
Issue Number
Expiry
Number
Name
13
Optional Elements?
  • Optional often doesnt mean optional, in
    practice it is used to mean must exist or must
    not exist
  • Consider making choice explicit e.g.,
    (issue-numberno-issue-number)
  • Type-safe models are good for machine facing
    data but require maintenance

14
Mega Markup
  • Just Tag It ?
  • Models should have a justification (often a
    business justification)
  • Rich inline tagging in particular needs to be
    thought-through (KM technologies often better for
    enriching documents)

15
Part II - Practicalities
16
Documentation
  • DTDs are comparatively easy to document content
    models are terse but expressive (people like
    them) e.g.
  • A DTD is not a .DTD and documentation is
    costly!
  • Dont make the limits of the DTD the limits of
    your specification DTDs rough out content
  • We need a graphical standard for representing
    models (not UML please)

17
Deployment
  • Deploy a normalised version of your DTD via a web
    server
  • Require that this authoritative version is used
    during data handovers
  • Consider requiring the use of PUBLIC identifiers

18
Parameterisation
  • Parameter entities macro-like features for use
    in DTDs
  • lt!ENTITY p.zz "(p.el)(p.tbl)(p.lst.d)(
    p.form)" gt
  • More useful in development than mature phases in
    a DTDs life time.

19
Entities
  • Entity declarations are a DTD-only feature. Not
    in W3 Schema or RELAX NG (but maybe in DSDL)
  • Good reason for sticking with DTDs especially
    character entities.
  • But, will make your data DTD-dependent
  • In publishing, losing entities has not proved a
    problem (surprisingly)

20
Namespaces
  • DTDs and Namespaces are uneasy partners
  • Prefix inflexibility
  • Conventions and kludges, not standard
  • Buggy software (microsoft parsers)
  • Avoid using Namespaces with DTDs whenever possible

21
But if you must
  • Do not use FIXED or default attributes in the
    DTD (tools will complain)
  • Pre-pick your prefixes, and qualify the names of
    vocabularies within your DTD (e.g. m for MathML)
  • REQUIRE the xmlns attribute(s) on your root
    elements, and use an external tool to enforce
    this

22
Example
  • lt!ELEMENT root ()gt
  • lt!ATTLIST root
  • xmlns CDATA REQUIRED
  • xmlnsm CDATA REQUIREDgt
  • ltrootxmlnshttp//myorg.com/ns/xmlnsmhttp/
    /www.w3.org/1998/Math/MathMLgt

23
But if you must (2)
  • This works with tools, and means your namespaces
    work with/without the DTD being present
  • Dont get stressed remember XSLT

24
Defaulting
  • DTDs provide the means to add items to the
    infoset default attribute values
  • So do W3 Schemas RELAX NG does not
  • Using defaulting makes your document depend on
    your DTD/Schema do not use it (remember XSLT)

25
Example
  • lt!ATTLIST para hide (yesno) nogt
  • lt!ATTLIST para hide (yesno) IMPLIEDgt
  • Make the value inferable, and document it
  • Again, remember XSLT

26
Off-the-shelf standards
  • For XML MathML, SVG, CALS or Exchange Tables,
    XHTML, etc.
  • Forget XLink much pain, no gain
  • Remember there are standards for many things
    country, language, date time, latitude/longtitude.
    Good DTDs leverage standards.

27
In Summary
  • Pick good models
  • Document your DTD and control its deployment
  • Use Namespaces defensively
  • Do not use entity (or notation) declarations
  • Do not use attribute defaulting
  • Use standards where possible

28
Thank You
  • Any Questions ?
  • alexb_at_griffinbrown.co.uk
  • http//www.griffinbrown.co.uk/
Write a Comment
User Comments (0)
About PowerShow.com