XML 101: A Technical Introduction to XML - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

XML 101: A Technical Introduction to XML

Description:

Bank of Montreal Database Users Group. Ian GRAHAM ... Can create your own language of tags/elements ... Very useful for dynamic modification of, access to the tree ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 42
Provided by: iansg
Category:

less

Transcript and Presenter's Notes

Title: XML 101: A Technical Introduction to XML


1
XML 101A Technical Introduction to XML
  • 20 November 2002
  • Bank of Montreal Database Users Group
  • Ian GRAHAM
  • IT Strategy, IBS, Technology and Solutions, BMO
    Financial Group
  • E ltian.graham_at_bmo.comgt
  • T (416) 513.5656 / F (416) 513.5590
  • To download this talk http//www.utoronto.ca/ian
    /talks/

2
Presentation Outline
  • What is XML (basic introduction)
  • Defining language dialects and constraints
  • DTDs, namespaces, and schemas
  • XML processing
  • Parsers and parser interfaces XML processing
    tools
  • XML databases
  • High-level issues, and references
  • XML messaging / web services
  • Why, and some issues/example
  • Conclusions

3
What is XML?
  • A base-level syntax
  • for encoding structured, text-based information
    (words, characters, ...)
  • A text-based syntax
  • XML is written using printable Unicode
    characters. Explicit binary data is not allowed
  • Supports extensible data formats
  • XML lets you define your own elements
    (essentially data types), within the constraints
    of the syntax rules
  • Designed as a universal format
  • The syntax rules ensure that all XML processing
    software MUST identically handle a given piece
    of XML data.
  • If you can read and process it, so can
    anybody else

4
XML A Simple Example
Flags character encoding used in file
XML Declaration (this is XML)
lt?xml version"1.0" encoding"iso-8859-1"?gt
ltpartorders xmlnshttp//myco.org/Spec/pa
rtordersgt ltorder refx23-2112-2342
date25aug1999-123423hgt ltdescgt Gold
sprockel grommets, with matching
hamster lt/descgt ltpart
number23-23221-a12 /gt ltquantity
unitsgrossgt 12 lt/quantitygt ltdeliveryDate
date27aug1999-1200h /gt lt/ordergt ltorder
refx23-2112-2342 date25aug1999-12
3423hgt . . . Order something else . . .
lt/ordergt lt/partordersgt
Black XML tags and markup Blue - encoded text
data
5
Example Revisited
ltpartorders xmlnshttp//myco.org/Spec/
partorders gt ltorder refx23-2112-2342
date25aug1999-123423hgt ltdescgt Gold
sprockel grommets, with matching
hamster lt/descgt ltpart
number23-23221-a12 /gt ltquantity
unitsgrossgt 12 lt/quantitygt ltdeliveryDate
date27aug1999-1200h /gt lt/ordergt ltorder
refx23-2112-2342 date25aug1999-12
3423hgt . . . Order something else . . .
lt/ordergt lt/partordersgt
6
XML Data Model - A Tree
ltpartorders xmlns"..."gt ltorder date"..."
ref"..."gt ltdescgt ..text..
lt/descgt ltpart /gt ltquantity /gt
ltdelivery-date /gt lt/ordergt ltorder ref".."
.../gt lt/partordersgt
7
XML Design goals
  • Simple but reliable
  • Strict syntax rules, to eliminate syntax errors
  • syntax defines structure (hierarchically), and
    names structural parts (element names) -- it is
    self-describing data
  • Extensible and mixable
  • Can create your own language of tags/elements
  • Can mix one language with another, and still
    reliably separate / process the data
  • Designed for a distributed environment
  • Can have remote (webbed) data, and retrieve and
    use it reliably

8
XML Processing The XML Parser
parser Interface
  • The parser must verify that the XML is
    syntactically correct
  • Such data is said to be well-formed
  • The minimal requirement to be XML
  • A parser MUST stop processing if the data isnt
    well-formed
  • E.g., stop processing and throw an exception to
    the XML-based application. The XML 1.0 spec
    requires this behaviour

XML parser
XML-based application
XML data
9
Special Issues Characters and Charsets
  • XML specification defines characters allowed as
    whitespace in tags ltelement id 23.112
    /gt
  • You cannot use EBCIDIC character NEL as
    whitespace
  • Must make sure to not do so!
  • What if you want to include characters not
    defined in the encoding charset (e.g., Greek
    characters in an ISO-Latin-1 document)
  • Use character references. For example
    9824 -- the spades character (?)
    9824th character
    in the Unicode character set
  • Also, a reminder that binary data is forbidden
  • must be encoded as printable characters (e.g.
    using Base64)

10
Parsers and DTDs
parser interface
  • A DTD can define external parts (entities) to be
    included in
  • But . what if the parser cant find the external
    parts (firewall?)?
  • That depends on the type there are two types of
    XML parsers
  • one that MUST retrieve all parts
  • one that can ignore them (if it cant find them)

parser
XML-based application
XML data
DTD
11
Two types of XML parsers
  • Validating
  • Must retrieve all entities and process all of
    the DTD. Will stop processing and indicate a
    failure if it cannot
  • It must also test and verify other things in the
    DTD -- instructions that define syntactic
    document rules (allowed elements, attributes,
    etc.).
  • Non-validating (well-formed only)
  • Tries retrieve all parts, but will cease
    processing the DTD content at the first part
    (entity) it cant find,
  • But this is not an error -- the parser simply
    makes available the XML data (and the names of
    any unresolved parts) to the application.

Application behavior will depend on parser type
Many parsers can operate in either mode (config)
12
Presentation Outline
  • What is XML (basic introduction)
  • Defining language dialects and constraints
  • DTDs, namespaces, and schemas
  • XML processing
  • Parsers and parser interfaces XML processing
    tools
  • XML databases
  • High-level issues, and references
  • XML messaging / web services
  • Why, and some issues/example
  • Conclusions

13
Defining constraints / languages
  • Two ways of doing so
  • XML Document Type Declaration (DTD) -- Part of
    core XML spec.
  • XML Schema (often called XSD) -- New
    specification (2001), which allows for richer
    constraints on XML documents.
  • What DTDs and/or schema specify
  • Allowed element and attribute names, hierarchical
    nesting rules element content/type restrictions
  • Adding dialect specifications implies two classes
    of XML data
  • Well-formed XML that is syntactically correct
  • Valid XML that is well-formed and consistent
    with a specific DTD (or Schema)
  • Schemas are more powerful than DTDs
  • Often used for type validation, or for defining
    low-level type constraints (integer, varchar,
    datetime, etc.) constraints on values.

14
DTD Example
lt!DOCTYPE transfers lt!ELEMENT transfers
(fundsTransfer) gt lt!ELEMENT fundsTransfer
(from, to) gt lt!ATTLIST fundsTransfer
date CDATA REQUIREDgt lt!ELEMENT from
(amount, transitID?, accountID,
acknowledgeReceipt ) gt lt!ATTLIST from
type (intrabankinternalother) REQUIREDgt
lt!ELEMENT amount (PCDATA) gt . . .
Omitted DTD content . . . lt!ELEMENT to
EMPTY gt lt!ATTLIST to account CDATA
REQUIREDgt gt lttransfersgt ltfundsTransfer
date"20010923T123434Z"gt . . . As with
previous example . . .
15
XML Namespaces
  • Mechanism for identifying different spaces for
    XML names
  • That is, element or attribute names
  • This is a way of identifying different language
    dialects, consisting of names that have specific
    semantic (and processing) meanings.
  • For example ltkey/gt in one language (e.g. a
    security key) can be distinguised from ltkey/gt in
    another language (a database key)
  • Mechanism uses a special xmlns attribute to
    define namespaces.
  • The namespace is a URL string
  • But the URL does not reference anything in
    particular (there may be nothing there!)

16
Mixing languages together
Namespaces let you do this relatively easily
  • lt?xml version "1.0" encoding "utf-8" ?gt
  • lthtml xmlns"http//www.w3.org/1999/xhtml1"
  • xmlnsmt"http//www.w3.org/1998/mathml gt
  • ltheadgt
  • lttitlegt Title of XHTML Document lt/titlegt
  • lt/headgtltbodygt
  • ltdiv class"myDiv"gt
  • lth1gt Heading of Page lt/h1gt
  • ltmtmathmlgt
  • ltmttitlegt ... MathML markup . . .
  • lt/mtmathmlgt
  • ltpgt more html stuff goes here lt/pgt
  • lt/divgt
  • lt/bodygt
  • lt/htmlgt

Default space is xhtml
mt prefix indicates space mathml (a different
language)
17
XML Schemas
  • A specification for defining XML validation rules
    Specs http//www.w3.org/XML/SchemaBest-practi
    ce http//www.xfront.com/BestPracticesHomepage.
    html
  • Uses pure XML (plus namespaces) to do this
  • More powerful than DTDs - can specify things like
    integer types, date strings, real numbers in a
    given range, etc.
  • Often used for type validation, or for relating
    database schemas to XML models
  • They dont, however, let you declare entities --
    those can only be done in DTDs
  • The following slide shows the XML schema
    equivalent to our DTD

18
XML Schema version of our DTD (Portion)
lt?xml version"1.0" encoding"UTF-8"?gt ltxsschema
xmlnsxs"http//www.w3.org/2001/XMLSchema"
elementFormDefault"qualified"gt
ltxselement name"accountID" type"xsstring"/gt
ltxselement name"acknowledgeReceipt"
type"xsstring"/gt ltxscomplexType
name"amountType"gt ltxssimpleContentgt
ltxsrestriction base"xsstring"gt
ltxsattribute name"currency" use"required"gt
ltxssimpleTypegt
ltxsrestriction base"xsNMTOKEN"gt
ltxsenumeration value"USD"/gt
. . . (some stuff omitted) . . .
lt/xsrestrictiongt
lt/xssimpleTypegt lt/xsattributegt
lt/xsrestrictiongt lt/xssimpleContentgt
lt/xscomplexTypegt ltxscomplexType
name"fromType"gt ltxssequencegt
ltxselement name"amount" type"amountType"/gt
ltxselement ref"transitID" minOccurs"0"/gt
ltxselement ref"accountID"/gt
ltxselement ref"acknowledgeReceipt"/gt
lt/xssequencegt . . . And still more !!! .
. .
19
Presentation Outline
  • What is XML (basic introduction)
  • Defining language dialects and constraints
  • DTDs, namespaces, and schemas
  • XML processing
  • Parsers and parser interfaces XML processing
    tools
  • XML databases
  • High-level issues, and references
  • XML messaging / web services
  • Why, and some issues/example
  • Conclusions

20
XML Software
  • XML parsers..
  • Read in XML data, checks for syntactic (and
    possibly DTD/Schema) constraints, and makes data
    available to an application. There are three
    'generic' parser APIs
  • SAX Simple API to XML (event-based)
  • DOM Document Object Model (object/tree based)
  • JDOM Java Document Object Model (object/tree
    based)
  • Pull evolving API (new) (pull-based / object
    tree)
  • Lots of XML parsers and interface software
    available
  • Unix, Linux, Windows 2000/XP, Z/OS, etc
  • SAX-based parsers are fast (often as fast as you
    can stream data)
  • DOM slower, more memory intensive (create
    in-memory version of entire document
  • Validating can be much slower than non-validating

21
Parser API SAX
  • A) SAX Simple API for XML
  • http//www.megginson.com/SAX/index.html
  • An event-based interface (a push parser API)
  • Parser reports events whenever it sees a
    tag/attribute/text node/unresolved external
    entity/other (driven by input stream)
  • Programmer attaches event handlers to handle
    the event
  • Advantages
  • Simple to use
  • Very fast (not doing very much before you get the
    tags and data)
  • Low memory footprint (doesnt read an XML
    document entirely into memory)
  • Disadvantages
  • Not doing very much for you -- you have to do
    everything yourself
  • Not useful if you have to dynamically modify the
    document once its in memory (since youll have
    to do all the work to put it in memory yourself!)

22
Parser API DOM
  • B) DOM Document Object Model
  • http//www.w3.org/DOM/
  • An object-based interface
  • Parser generates an in-memory tree corresponding
    to the document
  • DOM interface defines methods for accessing and
    modifying the tree
  • Advantages
  • Very useful for dynamic modification of, access
    to the tree
  • Useful for querying (I.e. looking for data) that
    depends on the tree structure element.childNode("
    2").getAttributeValue("boobie")
  • Same interface for many programming languages
    (C, Java, ...)
  • Disadvantages
  • Can be slow (needs to produce the tree), and may
    need lots of memory
  • DOM programming interface is a bit awkward, not
    terribly object oriented

23
DOM Parser Processing Model
24
Parser API JDOM
  • B2) JDOM Java Document Object Model
  • http//www.jdom.org
  • A Java-specific object-oriented interface
  • Parser generates an in-memory tree corresponding
    to the document
  • JDOM interface has methods for accessing and
    modifying the tree
  • Advantages
  • Very useful for dynamic modification of the tree
  • Useful for querying (I.e. looking for data) that
    depends on the tree structure
  • Much nicer Object Oriented programming interface
    than DOM
  • Disadvantages
  • Can be slow (make that tree...), and can take up
    lots of memory
  • New, and not entirely cooked (but close)
  • Only works with Java

25
Parser API Pull
  • C) Pull Interfaces
  • http//www.xmlpull.org/ (Java) there is also a
    .NET pull API
  • An pull-parser interface
  • API uses expressions / methods to pull specific
    chunks of XML data, or to iterate over the XML
  • Can be built on top of a DOM model
  • Advantages
  • Easier to write applications that need to read in
    and process XML data (easier model than a push
    API, in many cases)
  • Has proven a very popular component in the .NET
    toolkit
  • Disadvantages
  • Can be slow if you do lots of iteration over the
    XML input data
  • No common API across different languages
    (although xmlpull.org tries to be similar to the
    .NET API) not yet a real standard (still being
    worked on not part of most commercial
    environments)

26
XML Processing XSLT
  • D) XSLT eXtensible Stylesheet Language --
    Transformations
  • http//www.w3.org/TR/xslt
  • An XML language for processing/transforming XML
  • Does tree transformations -- takes XML and an
    XSLT style sheet as input, and produces a new XML
    document with a different structure
  • Advantages
  • Very useful for tree transformations -- much
    easier than DOM or SAX for this purpose
  • Can be used to query a document (XSLT pulls out
    the part you want)
  • Disadvantages
  • Can be slow for large documents or stylesheets
  • Can be difficult to debug stylesheets (poor error
    detection much better if you use schemas)

27
XSLT processing model
  • D) Processing model

schema
XSLT processor
XSLT style sheet in
XML parser
XML data in
data out (XML)
XML parser
schema
document objects for data and style sheet
28
XML Processing Toolkits
  • Lots of them
  • Java
  • JAXP ( http//java.sun.com/xml/jaxp/faq.html
    )dom4j ( http//www.dom4j.org ) .NET ( part
    of .NET framework) others
  • Provide DOM, SAX, (JDOM) interfaces, plus lots of
    other useful tools in a standardized way (loading
    parsers, performing XSLT transformations, etc.)
  • JAXP is standard Java, and thus integrated with
    Websphere

29
Presentation Outline
  • What is XML (basic introduction)
  • Defining language dialects and constraints
  • DTDs, namespaces, and schemas
  • XML processing
  • Parsers and parser interfaces XML processing
    tools
  • XML databases
  • High-level issues, and references
  • XML messaging / web services
  • Why, and some issues/example
  • Conclusions

30
XML and databases
  • So where do you stick XML data
  • Inside a database!?!
  • But how to do this and which database type to
    use
  • RDBMS, ORDBMS, ODB, XML??
  • How you do so depends on the use cases you have
    for the data. Some good-to-ask questions are
  • Am I talking about storing documents, or data?
  • Is the XML format integral to the application
    (e.g. XHTML, DocBook?)
  • How will the database be queried?
  • Queried by XML structure, or by standard SQL
  • What parts of the document need to be queried
  • Do I need a text index?
  • How will the data be used/retrieved?
  • Passed to XML processing tools (e.g. XSLT), or
    used at atomic simple type level?
  • The answers drive out
  • What database to choose, how to map XML to tables
    (O-R or table mappings), store as BLOB or broken
    up ..

31
XML and databases
  • Upcoming technologies
  • XML Query a query language for querying XML
    datasets (and databases)
  • Uses XML schema for type casting, and validation
  • Info http//www.w3.org/XML/Query
  • Useful XML Database references
  • http//www.xml.com/pub/a/2001/10/31/nativexmldb.ht
    ml Introductory article
  • http//www.rpbourret.com/xml/XMLAndDatabases.htm X
    ML and databases
  • http//www.rpbourret.com/xml/XMLDatabaseProds.htm
    Products list
  • http//www.xmldb.org/resources.html Docs /
    resource list

32
Presentation Outline
  • What is XML (basic introduction)
  • Defining language dialects and constraints
  • DTDs, namespaces, and schemas
  • XML processing
  • Parsers and parser interfaces XML processing
    tools
  • XML databases
  • High-level issues, and references
  • XML messaging / web services
  • Why, and some issues/example
  • Conclusions

33
XML Messaging
  • Use XML as the format for sending messages
    between systems
  • Advantages
  • Common syntax self-describing (easier to parse)
  • Can use common/existing transport mechanisms to
    move the XML data (HTTP, HTTPS, SMTP (email),
    MQ, IIOP/(CORBA), JMS, .)
  • Requirements
  • Shared understanding of dialects for transport
    (required registry namespace! ) for identifying
    dialects
  • Shared acceptance of messaging contract
  • Disadvantages
  • Asynchronous transport no guarantee of delivery,
    no guarantee that partner (external) shares
    acceptance of contract.
  • Messages will be much larger than binary (10x or
    more) can compress

34
Common messaging model
  • XML over HTTP
  • Use HTTP to transport XML messages
  • POST /path/to/interface.pl HTTP/1.1Referer
    http//www.foo.org/myClient.htmlUser-agent
    db-server-olkAccept-encoding gzipAccept-charset
    iso-8859-1, utf-8, ucsContent-type
    application/xml charsetutf-8Content-length
    13221. . . lt?xml version1.0
    encodingutf-8 ?gtltmessagegt . . . Markup in
    message . . . lt/messagegt

35
Some standards for message format
  • Define dialects designed to wrap remote
    invocation messages
  • XML-RPC http//www.xmlrpc.com
  • Very simple way of encoding function/method call
    name, and passed parameters, in an XML message.
  • SOAP (Simple object access protocol)
    http//www.soapware.org
  • More complex wrapper, which lets you specify
    schemas for interfaces more complex rules for
    handling/proxying messages, etc. This is a core
    component of Microsofts .NET strategy, and is
    integrated into more recent versions of Websphere
    and other commercial packages. W3c activity (who
    sets the SOAP spec) is outlined at
    http//www.w3.org/2000/xp/Group/

36
XML Messaging Processing
  • XML as a universal format for data exchange

Place order (XML/edi) using SOAP over HTTP
SOAP interface
Application
Supplier
SOAP API
Factory
SOAP
Supplier
XML/ EDI
Transport
HTTP(S) SMTP other ...
Supplier
Response (XML/edi) using SOAP over HTTP
37
Web Services Model
  • SOAP plus higher-level modeling for how services
    are advertised, exposed and found
  • Uses an XML dialect, WSDL (Web Services
    Description Language) to define a service
  • WSDL can use XML Schema to define how data is
    passed between a service provider and requestor
  • Uses an XML dialect, UDDI (Universal Description,
    Discovery and Integration) for
  • Describing services (high-level)
  • Discovering services (registry services,
    metadata)
  • UDDI defined using XML Schema
  • Core technology for application integration
  • Microsoft .NET
  • IBM Websphere
  • Oracle
  • . Many others

38
Web Services Code Development

Client code

WSDL
proxy
proxy

WS/SOAP
SOAP Requests/ responses
Write the Application!
automated code generator
WS/SOAP

XML schema
skeleton
skeleton

Validation, business logic, routing, Logging, mor
e
Middle tier code
adapter
Product System code

adapter
MECH
39
Presentation Outline
  • What is XML (basic introduction)
  • Defining language dialects and constraints
  • DTDs, namespaces, and schemas
  • XML processing
  • Parsers and parser interfaces XML processing
    tools
  • XML databases
  • High-level issues, and references
  • XML messaging / web services
  • Why, and some issues/example
  • Conclusions

40
W3C rec
industry std
XML (and related) Specifications
Open std
W3C draft
XML Core
XML 1.0
XML names
APIs
XSLT
JDOM
JAXP
DOM 1
XHTML 1.0
DOM 2
XML query .
XML schema
SAX 1
SAX 2
SOAP
UDDI
XML-RPC
WSDL
Style
Protocols
Web Services
Application areas
41
XML 101A Technical Introduction to XML
  • The End.
  • Ian GRAHAM
  • IT Strategy, IBS, Technology and Solutions, BMO
    Financial Group
  • E ltian.graham_at_bmo.comgt
  • T (416) 513.5656 / F (416) 513.5590
Write a Comment
User Comments (0)
About PowerShow.com