Effective XML - PowerPoint PPT Presentation

1 / 45
About This Presentation
Title:

Effective XML

Description:

Item 19: Encode binary data using quoted printable and/or Base64. Quoted printable works well for mostly text. Base-64 for non-text data ... – PowerPoint PPT presentation

Number of Views:67
Avg rating:3.0/5.0
Slides: 46
Provided by: cafeco
Category:

less

Transcript and Presenter's Notes

Title: Effective XML


1
Effective XML
  • XML Developers Network of the Capital District
  • Elliotte Rusty Harold
  • elharo_at_metalab.unc.edu
  • http//www.cafeconleche.org/

2
Part I Syntax
3
Item 1 Include an XML declaration
  • lt?xml version"1.0" encoding"UTF-8"?gt
  • Optional, but treat as required
  • Specifies version, character set, and encoding
  • Very important for detecting encoding
  • Identifies XML when file and media type
    information is unavailable or unreliable

4
Item 3 Stay with XML 1.0
  • XML 1.1
  • New name characters
  • C0 control characters
  • C1 control characters
  • NEL
  • Undeclare namespace prefixes
  • Incompatible with
  • Most XML parsers
  • W3C and RELAX NG schema languages
  • XOM, JDOM

5
Part II Structure
6
The XML Stack
7
Item 14 Allow All XML syntax
  • CDATA sections
  • Entity references
  • Processing instructions
  • Comments
  • Numeric character references
  • Document type declarations
  • Different ways of representing the same core
    content not different information

8
Item 9 Distinguish text from markup
  • A DocBook element
  • ltprogramlistinggtlt!CDATAltvaluegt
    ltdoublegt28657lt/doublegtlt/valuegtgtlt/programlisting
    gt
  • The content isltvaluegt ltdoublegt28657lt/doublegtlt
    /valuegt
  • This is the sameltprogramlistinggtltvaluegt
    ltdoublegt28657lt/doublegt
    lt/valuegtlt/programlistinggt

9
The reverse problem
  • Tools that create XML from strings
  • Tree-based editors like ltOxygen/gt or XML Spy
  • WYSIWYG applications like OpenOffice Writer
  • Programming APIs such as DOM, JDOM, and XOM
  • The tool automatically escapes reserved
    characters like lt, gt, or .
  • Just because something looks like an XML tag does
    not mean it is an XML tag.

10
Item 10 White space matters
  • Parsers report all white space in element
    content, including boundary white space
  • An xmlspace attribute is for the client
    application only, not the parser
  • White space in attribute values is normalized
  • Parsers do not report white space in the prolog,
    epilog, the document type declaration, and tags.

11
Item 11 Make structure explicit through markup
  • Bad
  • ltTransactiongtWithdrawal 2003 12 15
    200.00lt/Transactiongt
  • Better
  • ltTransaction type"withdrawal"gt
  • ltDategt2003-12-15lt/Dategt
  • ltAmountgt200.00lt/Amountgt
  • lt/Transactiongt

12
Item 12 Store metadata in attributes
  • Material the reader doesnt want to see
  • URLs
  • IDs
  • Styles
  • Revision dates
  • Authors name
  • No substructure
  • Revision tracking
  • Citations
  • No multiple elements

13
Item 13 Remember mixed content
  • Narrative documents
  • Record-like documents
  • The RSS problem
  • ltitemgt
  • lttitlegtXerlin 1.3 releasedlt/titlegt
  • ltdescriptiongt
  • Xerlin 1.3, an open source XML Editor written
    in
  • Java, has been released. Users can extend the
  • application via custom editor interfaces for
  • specific DTDs. New features in version 1.3
    include
  • XML Schema support, WebDAV capabilities, and
  • various user interface enhancements. Java 1.2
  • or later is required.
  • lt/descriptiongt
  • ltlinkgthttp//www.cafeconleche.org/news2003April7lt
    /linkgt
  • lt/itemgt

14
What you really want is this
ltdescriptiongt ltpgtlta href"http//www.xerlin.or
g"gtltstronggtXerlin 1.3lt/stronggtlt/agt,an open
source XML Editor written in Java, has been
released. Users can extend the application via
custom editor interfaces for specific DTDs. New
features in version 1.3 includelt/pgt ltulgt
ltligtXML Schema supportlt/ligt ltligtWebDAV
capabilitieslt/ligt ltligtVarious user interface
enhancementslt/ligt lt/ulgt ltpgtJava 1.2 or later
is required.lt/pgt lt/descriptiongt
15
What people do is this
ltdescriptiongtltpgtlta href"http//www.xerlin.o
rg"gtltstronggtXerlin 1.3lt/stronggtlt/agt, an
open source XML Editor written in Java, has been
released. Users can extend the application via
custom editor interfaces for specific DTDs. New
features in version 1.3 includelt/pgt
ltulgt ltligtXML Schema supportlt/ligt
ltligtWebDAV capabilitieslt/ligt
ltligtVarious user interface enhancementslt/ligt
lt/ulgt ltpgtJava 1.2 or later is
required.lt/pgt lt/descriptiongt
16
Item 16 Prefer URLs to unparsed entities and
notations
  • URLs are simple and well understood
  • Notations and unparsed entities are confusing and
    little used
  • URLs dont require the DTD to be read
  • Many APIs dont even support notations and
    unparsed entities

17
Part III Semantics
18
Item 17 Use processing instructions for
process-specific content
  • For a very particular, even local, process
  • Describes how a particular process acts on the
    data in the document
  • Does not describe or add to the content itself
  • A unit that can be treated in isolation
  • Content is not XML-like.
  • Applies to the entire document

19
Processing instructions are not appropriate when
  • Content is closely related to the content of the
    document itself.
  • Structure extends beyond a single processing
    instruction
  • Needs to be validated.

20
Item 18 Include all information in instance
documents
  • Not all parsers read the DTD
  • Especially browsers
  • Beware
  • Default attribute values
  • Parsed entity references
  • XInclude
  • ID type dependence (XPath, DOM, etc.)

21
Item 19 Encode binary data using quoted
printable and/or Base64
  • Quoted printable works well for mostly text
  • Base-64 for non-text data
  • Can you link to the data with a URL instead?

22
Item 20-22 Use namespaces for modularity and
extensibility
  • Not hard simple cases can use one default
    namespace
  • http URIs are normally preferred
  • DTD validation is tricky
  • Code to namespace URIs, not prefixes
  • Avoid namespace prefixes in element content and
    attribute values

23
Item 23 Reuse XHTML for generic narrative content
24
Item 24 Choose the right schema language for the
job
  • DTDs
  • The W3C XML Schema Language
  • RELAX NG
  • Schematron

25
Item 25 Pretend there's no such thing as the PSVI
  • Post Schema Validation Infoset
  • Adds types like int and gYear to elements
  • Often not specific enough
  • Element/attribute names are types

26
Item 28 Use only what you need
  • You need
  • Well-formed XML 1.0
  • A parser
  • You probably need
  • Namespaces
  • You may not need
  • DTDs
  • Schemas
  • XInclude
  • WS-Kitchen-Sink
  • etc.

27
Item 29 Always use a parser
  • Cant use regular expressions
  • Detecting encoding
  • Comments and processing instructions that contain
    tags
  • CDATA sections
  • Unexpected placement of spaces and line breaks
    within tags
  • Default attribute values
  • Character and entity references
  • Malformed documents
  • Internal DTD Subset
  • Why not?
  • Unfamiliarity with parsers
  • Too slow

28
Item 30 Layer Functionality
29
Item 31-33 Program to standard APIs
  • Easier to deploy in Java 1.4/1.5
  • Different implementations have different
    performance characteristics
  • SAX is fast
  • DOM interoperates
  • Semi-standard
  • JDOM
  • XOM
  • Bleeding edge
  • StAX
  • JAXB

30
Item 34 Read the complete DTD
  • Be conservative in what you generate liberal in
    what you accept
  • Important content from DTD
  • Default attribute values
  • Namespace declarations
  • Entity references

31
Item 35 Navigate with XPath
  • More robust against unexpected structure
  • Allow optimization by engine
  • Easier to code enhanced programmer productivity

32
Item 36 Serialize XML with XML
33
Item 37 Validate inside your program with schemas
34
Part IV Implementation
35
Item 38 Write documents in Unicode
  • Prefer UTF-8
  • Smaller in English
  • ASCII compatible
  • Normalization
  • É, ĂĽ, ì and so forth
  • NFC
  • ICU

36
Item 40 Avoid Vendor Lockin Beware
  • Opaque, binary data used in place of marked up
    text.
  • Over-abbreviated, inobvious names like F17354 and
    grgyt
  • APIs that hide the XML
  • Products that focus on the "Infoset
  • Alternate serializations of XML
  • Patented formats

37
Item 41 Hang on to your relational database
38
Item 42 Document Namespaces with RDDL
lt!DOCTYPE html PUBLIC "-//XML-DEV//DTD XHTML RDDL
1.0//EN"
"http//www.rddl.org/rddl-xhtml.dtd"gt lthtml
xmlns"http//www.w3.org/1999/xhtml"
xmlnsxlink"http//www.w3.org/1999/xlink"
xmlnsrddl"http//www.rddl.org/"gt ltheadgt
lttitlegtMegaBank Statement Markup Language
(MBSML)lt/titlegt lt/headgt ltpgt This is the XML
namespace for the lta href"http//developer.megaba
nk.com/xml/"gtMegaBank Statement Markup
Languagelt/agt. lt/pgt ltrddlresource
xlinktype"simple" xlinkhref"http//develope
r.megabank.com/xml/spec.html"
xlinkrole"http//www.w3.org/TR/html4/"
xlinkarcrole "http//www.rddl.org/purposes
normative-reference" gt ltpgt The lta
href"http//developer.megabank.com/xml/spec.html"
gtMegaBank Statement Markup Language
Specification 1.0lt/agt lt/pgt lt/rddlresourcegt lt/bo
dygtlt/htmlgt
39
Item 43 Preprocess XSLT on the server side
40
Item 44 Serve XMLCSS to the client
  • Supported by
  • Safari
  • IE 5.0 and later
  • Mozilla
  • Netscape 6 and later
  • Konqueror
  • Opera
  • Firefox
  • Omniweb

41
Item 45 Pick the correct MIME type
  • application/xml
  • Not text/xml!
  • Don't use charset
  • application/mathmlxml
  • image/svgxml
  • application/xsltxml

42
Item 46 TagSoup Your HTML
43
Item 47 Catalog common resources
  • lt?xml version"1.0"?gt
  • ltcatalog xmlns
  • "urnoasisnamestcentityxmlnsxmlcatalog"
  • gt
  • ltpublic publicId
  • "-//OASIS//DTD DocBook XML V4.2//EN"
  • uri
  • "file///opt/xml/docbook/docbookx.dtd"/gt
  • lt/cataloggt

44
Item 50 Compress if space is a problem
//output OutputStream fout new
FileOutputStream("data.xml.gz") OutputStream
out new GZipOutputStream(fout)
OutputFormat format new OutputFormat(document)
XMLSerializer output new XMLSerializer(out,
format) output.serialize(doc) // input
InputStream fin new FileInputStream("data.xml.gz
") InputStream in new GZipInputStream(fin)
DocumentBuilderFactory factory
DocumentBuilderFactory.newInstance()
DocumentBuilder parser factory.newDocumentBuilde
r() Document doc parser.parse(in) S //
work with the document...
45
To Learn More
  • This Presentation http//cafeconleche.org/slides/
    albany/effectivexml
  • Effective XML 50 Specific Ways to Improve Your
    XML Documents
  • Elliotte Rusty Harold
  • Addison-Wesley, 2003
  • ISBN 0-321-15040-6
  • 44.99
  • http//cafeconleche.org/books/effectivexml
Write a Comment
User Comments (0)
About PowerShow.com