XML FUNDAMENTALS

About This Presentation

Title:

XML FUNDAMENTALS

Description:

the Extensible Markup Language, is a universal syntax for describing and ... sequence Carriage Return-Line Feed (CRLF) like single CR or LF characters. ... – PowerPoint PPT presentation

Number of Views:75

Avg rating:3.0/5.0

Slides: 77

Provided by: ser1170

Category:

more less

Transcript and Presenter's Notes

Title: XML FUNDAMENTALS

1
XML FUNDAMENTALS
2
GETTING TO KNOW XML
3
What is XML?
4
What is XML?

In the same way that you define the field names
for a data structure, you are free to use any XML
tags that make sense for a given application.
Naturally, though, for multiple applications to
use the same XML data, they have to agree on the
tag names they intend to use.

XML is case-sensitive.
5
XML History

XML eXtensible Markup Language
Whats Markup Language?
The markup is the codes, embedded with the
document text, which store the information
required for electronic processing, like font
name, boldness or, in the case of XML, the
document structure.
Methodology for encoding data with some
information.
Typically, it defines a set of tags each of which
has some as associated meaning.

6
Who developed XML?

XML is an activity of the World Wide Web
Consortium (W3C) http//www.w3c.org. The XML
development effort started in 1996.
A diverse group of markup language experts, from
industry to academic, developed a simplified
version of SGML (Standard Generalized Markup
Language) for the Web. In February 1998, XML 1.0
specification became a recommendation by the W3C.
XML 1.1 W3C Recommendation in February 2004

7
Standard Generalized Markup Language
SGML extends generic coding. Furthermore, it is
an international standard published by the ISO
(International Organization of Standardization).
It is based on the early work done by Dr. Charles
Goldfarb from IBM.

SGML is similar to generic coding but with two
additional characteristics
The markup describes the document's structure,
not the document appearance.
The markup conforms to a model, which is similar
to a database schema. This means that it can be
processed by software or stored in a database.

8
Data Problems

Fundamental issues How do I represent my
application data?
Performance (speed/time)
Persistence(short/long lived)
Mutability
Composition
Security (encryption/identity)
Open Information Management
Interpretation
Presentation
Interoperation
Portability
Interrogation

9
Why XML?

Unfortunately, there are things that HTML just
cant do for you. Fortunately, XML is growing
quickly to meet these needs.
Unfortunately, no matter how many new tags are
added, there will never be enough for all the
good ideas people keep having. Fortunately, XML
is a form of SGML, an ISO standard that allows
you to invent the tags you need, and declare them
so others can use them.
Unfortunately, the SGML standard is large, takes
time to learn, and doesnt have a starter-kit.
Fortunately, XML is here.

10
Why XML?

Plain Text
any editor, readable, for configuration
information
Data Identification
Self-described markup style
internationalized
Unicode-based (UTF-8 / UTF-16), XML as universal
data representation.
Inline Reusability
can integrate data from multiple sources as a
single document, modularization without using
linking.
Linkability
More powerful than HTMLs, W3C Xlink XPointer
specifications
Easily Processed
well-formed rules, validity checking, available
tools parsers, transformers, browsers.
Hierarchical
Faster to access and rearrange each element.

11
XML as a Self-Describing Data Exchange Format

can be easily understood by our friend
can be parsed easily
contains its own structure (parse tree) in the
data
gt allows the application programmer to
rediscover schema and content/semantics (to
which extent???)
may include an explicit schema description
(e.g., DTD)
gt meta-language definition of a language w.r.t.
which it is valid
allows separation of marked-up content from
presentation (gtstyle sheets)
many tools (and many more to come -- (re)use
code) parsers, validators, query languages,
storage,
standards (good for interoperation, integration,
etc)
gt generic standards (XML, DTDs, XML Schema,
XPath,...)
gt community/industry standards (specific markup
languages)

12
Key Features of XML

Extensibility
You define your own markup languages (tags) for
your own problem domain
Media and Presentation independence
Same data can be presented to different
communication medium (browser, voice device) and
different format (WML, HTML) portability
Separation of contents from presentation
Clear separation between contents (data) and
presentation (data appearance)
Structure Relationship and Hierarchical
Structure
Faster to access, easier to rearrange
Validation
XML data is constrained by a Rule (or Syntax)

13
What are XML applications?

XML is poised to play a prominent role as a data
interchange format in B2B Web applications such
as e-commerce, supply-chain management, workflow,
and application integration. Another use of XML
is for structured information management,
including information from databases. XML also
supports media-independent publishing, allowing
documents to be written once and published in
multiple media formats and devices. On the
client, XML can be used to create customized
views into data.

14
XML and Java Technology Relationship

XML and the Java technology are complementary.
Java technology provides the portable,
maintainable code to process portable, reusable
XML data. In addition, XML and Java technology
have a number of shared features that make them
the ideal pair for Web computing, including being
industry standards, platform-independence,
extensible, reusable, Web-centric, and
internationalized.
Its a Match made in Heaven
Java enables Portable Code
XML enables Portable Data
XML tools and programs are mostly written in the
Java programming language
Better API support for Java platform than any
other languages

15
Benefits of Using Java Technology with XML

Java technology offers a substantial productivity
boost for software developers compared to
programming languages such as C or C. In
addition, developers using the Java platform can
create sophisticated programs that are reusable
and maintainable compared to programs written
with scripting languages. Using XML and Java
together, developers can build sophisticated,
interoperable Web applications more quickly and
at a lower cost.

16
HTML

The most popular markup language
Defines a fixed set of tags
Designed for presentation for data
HTML documents are processed by HTML processing
application (Browser)
Easy to implement and author e.g. small number of
tags, forgiving syntax checking
No formal validation
Does not support semantic search
Not for complex document

17
HTML vs XML

Fast becoming the standard for data interchange
on the web.
Extensible Markup Language (XML) is closely
related to HTML, the original document
representation of the WWW. While HTML enables
the creation of Web pages that can be viewed on
any browser, XML adds tags to data so that it can
be processed by any application.
Using XML, companies can separate the business
rules from the content and structure of the data.
By focusing on exchanging data content and
structure, the trading partners are free to
implement their own business rules, which can be
quite distinct from one another.
Custom tag like defining filed names for a data
structure. Same application can agree upon the
same XML tag names.

18
HTML vs XML
19
XML Standards

XML, DTD
XSL, XSLT, XPath
DOM, SAX
W3C XML Schema
Namespaces
XLink, XPointer
XHTML
XQL

20
Domain Specific XML Standards

Chemical - CML
2D Graphics - SVG
Math - MathML
Music - MusicML
Travel -OTA
Many more ...
http//xml.org/xmlorg_registry/index.shtml
FIXML

21
Core Java APIs for XML

JAXP Parsing and Transforming
JAXB High-level XML programming
JAXM Messaging
JAXR Registry APIs
JDOM Java-optimized Parsing

22
E-Commerce Standards

ebXML
UDDI (Universal Description, Discovery and
Integration)
SOAP (Simple Open Access Protocol)
W3C XP (XML Protocol)
WSDL (Web Services Definition Lang.)
S2ML (Security Services ML)
XAML (Transaction Authority ML)

23
XML COMPONENTS
24
XML Document Components

Processing Instruction
Elements and Attributes
Empty Tags
Comments
Special Characters
Entity References
CDATA
Whitespaces
Namespaces
XPath, XLink, XPointer

25
The XML Prolog XML Declaration

The part of an XML document that precedes the XML
data. The prolog includes the declaration and an
optional DTD.
An XML file always starts with a prolog. The
minimal prolog contains a declaration that
identifies the document as an XML document
lt?xml version"1.0"?gt
lt?xml version"1.0" encoding"ISO-8859-1"
standalone"yes"?gt
version Identifies the version of the XML markup
language used in the data. This attribute is not
optional.
encoding Identifies the character set used to
encode the data. "ISO-8859-1" is "Latin-1" the
Western European and English language character
set. (The default is compressed Unicode UTF-8.)
standalone Tells whether or not this document
references an external entity or an external data
type specification

26
Processing Instruction

PIs give commands or information to an
application that is processing the XML data.
lt?target instructions?gt
the target is the name of the application that is
expected to do the processing, and instructions
is a string of characters that embodies the
information or commands for the application to
process.

27
Elements

XML tags usually surround an identified object in
the data stream. A start-tag and an end-tag,
together with the data enclosed by them,
represent an element. The start-tag is delimited
using the lt and gt characters. The end-tag is
delimited by lt/ and gt
It is this ability for one tag to contain others
that gives XML its ability to represent
hierarchical data structures

28
Elements

every XML file defines exactly one element, known
as the root element. Any other elements in the
file are contained within that element.

29
Attributes

Additional information included about an element
as part of the tag itself, within the tag's angle
brackets. It consists of an attribute name and an
attribute value. The attribute name precedes its
value enclosed by quotes ( , ) and separated
by an equals sign.
There must be a least one space between the
element name and the first attribute. Multiple
attributes are separated by spaces
Since you could design a data structure like
ltmessagegt equally well using either attributes or
tags, it can take a considerable amount of
thought to figure out which design is best for
your purposes.

30
Elements and their Content
element
element type
ltbibliographygt ltpaper ID"object-fusion"gt
ltauthorsgt ltauthorgtY.Papakonstantinoult/author
gt ltauthorgtS. Abiteboullt/authorgt
ltauthorgtH. Garcia-Molinalt/authorgt lt/authorsgt
ltfullPaper source"fusion"/gt
lttitlegtObject Fusion in Mediator Systemslt/titlegt
ltbooktitlegtVLDB 96lt/booktitlegt
lt/papergt lt/bibliographygt
element content
empty element
character content
31
Element Attributes
Attribute name
ltbibliographygt ltpaper pid"object-fusion"gt
ltauthorsgt ltauthorgtY.Papakonstantinoult/autho
rgt ltauthorgtS. Abiteboullt/authorgt
ltauthorgtH. Garcia-Molinalt/authorgt lt/authorsgt
ltfullPaper source"fusion"/gt
lttitlegtObject Fusion in Mediator Systemslt/titlegt
ltbooktitlegtVLDB 96lt/booktitlegt
lt/papergt lt/bibliographygt
Attribute Value
32
Empty Tags

Sometimes, we might want to add a "flag" tag that
marks message as important. A tag like that
doesn't enclose any content, so it's known as an
"empty tag". We can create an empty tag by ending
it with /gt instead of gt.
The empty tag saves you from having to code
ltflaggtlt/flaggt in order to have a well-formed
document.

33
Comments in XML Files

XML comments look just like HTML comments
It will not appear in published output.

34
Handling Special Characters

In XML, an entity is an XML structure (or plain
text) that has a name. Referencing the entity by
name causes it to be inserted into the document
in place of the entity reference. To create an
entity reference, the entity name is surrounded
by an ampersand and a semicolon
entityName
Predefined Entities

35
Using Entity Reference in an XML Document

The problem with putting that line into an XML
file directly is that when the parser sees the
left-angle bracket (lt), it starts looking for a
tag name, which throws off the parse. To get
around that problem, you put lt in the file,
instead of "lt".

XML File
XML Output
36
Handling Text with XML-Style Syntax

When you are handling large blocks of XML or HTML
that include many of the special characters, it
would be inconvenient to replace each of them
with the appropriate entity reference. For those
situations, you can use a CDATA section.
all white space in a CDATA section is
significant, and characters in it are not
interpreted as XML.
lt!CDATA ............. gt

37
Handling Text with XML-Style Syntax
XML File
XML Output
38
Document Prolog Body Epilog
lt?xml version1.0?gt lt!-- comments and
processing instructions --gt lt!DOCTYPE
sdsc_play_groups SYSTEM http//localserver/spg.dt
dgt lt!-- comments and processing instructions --gt
ltsdsc_play_groupsgt ltplay_group ID"Data-issues"gt
ltmember_groupsgt ltgroupgtScientific
Computinglt/groupgt ltgroupgtData Intensive
Computinglt/groupgt ltgroupgtSecurity
Technologieslt/groupgt lt/member_groupsgt
ltcharter sourceXPG"/gt lturlgthttp//www.sd
sc.edu/marciano/XML/xpg.htmllt/urlgt
lttitlegtXML Play Grouplt/titlegt lt/play_groupgt
lt/sdsc_play_groupsgt
lt!-- comments and processing instructions --gt
39
White Space

XML specification normalizes different
line-ending conventions to a single convention
but preserves all other white space, except in
attribute values.
White Space and the XML Declaration According
to the current XML 1.0 standard, white space is
not allowed before the XML declaration. If white
space appears before the XML declaration, it will
be treated as a processing instruction. The
information, particularly the encoding, may not
be used by the parser.
White Space in Element Content XML parsers are
required to report all white space that appears
in element content within a document. For this
reason, the following three documents are
different to an XML parser.

40
White Space

White Space in Attributes Although XML
processors preserve all white space in element
content, they frequently normalize it in
attribute values. Tabs, carriage returns, and
spaces are reported as single spaces. In certain
types of attributes, they trim white space that
comes before or after the main body of the value
and reduce white space within the value to single
spaces. If a DTD is available, this trimming
will be performed on all attributes that are not
of type CDATA. If there is no DTD, the parser
assumes that all attributes are of type CDATA.
For the above example, an XML parser reports
both attribute values as "this is a note.",
converting the line breaks to single spaces.
End of Line Handling XML processors treat the
character sequence Carriage Return-Line Feed
(CRLF) like single CR or LF characters. All are
reported as a single LF character.

41
Namespaces
By using XML namespaces, authors can qualify
element names uniquely on the Web and thus avoid
conflicts between elements that have the same
name. Associating a Universal Resource Identifier
(URI) with a namespace is purely to ensure that
two elements with the same name can remain
unambiguous it does not matter what, if
anything, the URI points to.
42
Identifying Vocabularies XML Namespaces

My element may not be your element
geometry context ltelementgtlinelt/elementgt
chemistry context ltelementgtoxygenlt/elementgt
SGML/XML context ....
use XML namespaces to identify the vocabulary

43
XML Namespaces

mechanism for globally unique tag names
lthhtml xmlnsxdc"http//www.xml.com/books"
xmlnsh"http//www.w3.org/HTML/1998/htm
l4"gt
lthheadgtlthtitlegtBook Reviewlt/htitlegtlt/hheadgt
...
ltxdcbookreviewgt
ltxdctitlegtXML A Primerlt/xdctitlegt
...
lt/hhtmlgt
mix of different tag vocabularies without
confusion
namespaces only identify the vocabulary
additional mechanisms required for structure and
meaning of tags

44
XPath, XLink, and XPointer

XPath
a declarative language for locating nodes and
fragments in XML trees
used in both XPointer (for addressing) and XSL
(for pattern matching)
XLink
a generalization of the HTML link concept
higher abstraction level (intended for general
XML - not just hypertext)
more expressive power (multiple destinations,
special behaviours, out-of-line links, ...)
uses XPointer to locate resources
XPointer
an extension of XPath suited for linking
specifies connection between XPath expressions
and URIs

45
XML Path Language XPath

W3C Recommendation Nov. 1999
for addressing parts within an XML document
(non-XML) syntax used for XSLT and XPointer
Find the root element (bookstore) of this
document
/bookstore
Find all author elements anywhere within the
current document
//author

46
XML Linking Language (XLink)

W3C Candidate Recommendation, July/2000
language for typed links between documents
extends the simple untyped href links in HTML
multidirectional links
any element can be the source (not just lta ... gt
lt/agt)
link to arbitrary positions within a document
(via URIs and XPointer)
richer custom applications possible
xlinktype declaration simple, extended,
locator, arc
optional "semantic attributes" role, arcrole,
title
Example

ltauthor xmlnsxlink"... " xlinkhref"....itmav
en.com/peter.html" xlinktitle"Peter's
homepage" xlinkrole"further info about the
book authorgt Peter Pan Sr. lt/authorgt
47
XML Pointer Language (XPointer)

W3C Candidate Recommendation, June/2000
for locating internal structures of XML documents
XLinks URIs can include XPointer parts
extends HTML's named anchors
target doc lta name"target"gt ... lt/agt
source doc lta href"target"gt...lt/agt
... and select via XPath expressions
some extension (points and ranges, ...)
Example
intro/14/3 ("intro" is an ID attribute value)
/1/2/5/14/3
xpointer(id("chap1")) xpointer(//_at_id"chap
1")

48
Four Common Errors
The XML syntax is very strict Elements must have
both a start and end tag, or they must use the
special empty element tag attribute values must
be fully quoted there can be only one top-level
element and so on. A strict syntax was a design
goal for XML. The browser vendors asked for it.
HTML is very lenient, and HTML browsers accept
anything that looks vaguely like HTML. It might
have helped with the early adoption of HTML but
now it is a problem. Studies estimate that more
than 50 of the code in a browser deals with
errors or the sloppiness of HTML authors.
Consequently, an HTML browser is difficult to
write, it has slowed competition, and it makes
for mega-downloads. The four most common errors
in writing XML code are

Forget End Tags
Forget that XML Is Case-Sensitive
Introduce Spaces in the Name of Element
Forget the Quotes for Attribute Value

49
Four Common Errors (cont)

Forget End Tags
Forget that XML Is Case-Sensitive
Introduce Spaces in the Name of Element
Forget the Quotes for Attribute Value

ltaddressgt ltstreetgt34 Fountain Square
Plaza ltregiongtOHlt/regiongt ltpostal-codegt45202lt/po
stal-codegt ltlocalitygtCincinnatilt/localitygt ltcoun
trygtUS lt/addressgt
lttelgt513-744-7098lt/telgt ltTELgt513-744-7098lt/TELgt
lttelgt513-744-7098lt/TELgt
ltaddress bookgt ltentrygt ltnamegtJohn
Doelt/namegt ltemail hrefmailtojdoe_at_emailaholic.
com/gt lt/entrygt lt/address bookgt
lttel preferredtruegt513-744-8889lt/telgt
lttel preferredtruegt513-744-8889lt/telgt
50
XML Document Map
XML Declaration
Processing Instruction
DOCTYPE Declaration
Comment
Root Element
Namespace
Element
Entity Reference
Start Tag
End Tag
CDATA Section
Attribute
Textual Content
51
DOCUMENT TYPE DEFINITION(DTD)
52
XML Document Types

Well-formed XML Document
Conforms to the basic XML syntax
Can be parsed without regard to the DTD
Valid XML Document
Well-formed
Conforms to its DTD

53
Document Type Definition (DTD)

Firstly and most importantly a DTD can define a
class of document. Classes are very powerful
concepts in programming, because if you have a
class it has an expected structure which means it
will have consistent behaviors and properties.
You should also be able to carry out certain
predefined operations on a class of documents.
If you use a DTD you can force a writer to
include certain elements. You can't enforce them
to put PCDATA content in the element, but that's
another story.
If you are planning to display a document using a
style sheet, A DTD will ensure that you do not
include elements that do not have any display
instructions.
If you are planning to search the document, or
otherwise manipulate it using the DOM, a rigid
structure will simplify your coding, and speed up
the execution of your code, by a huge factor.

54
In Search of the Lost Structure Semantics
How do I share structure and metadata/semantics
with my community?
How do I learn and use the element
structure of a document?
How to make all this automatable?
55
Adding Structure and Semantics

XML Document Type Definitions (DTDs)
define the structure of "allowed" documents
(i.e., valid wrt. a DTD)
? database schema
gt improve query formulation, execution, ...
XML Schema
defines structure and data types
allows developers to build their own libraries of
interchanged data types
XML Namespaces
identify your vocabulary

56
XML DTDs as Extended Context Free Grammars
XML DTD
lt!ELEMENT bibliography papergt lt!ELEMENT paper
(authors,fullPaper?,title,booktitle)gt lt!ELE
MENT authors authorgt
Grammar
lhs element (name) rhs regular expression
over elements strings (PCDATA)
57
Document Type Definitions (DTDs)
Define and Constrain Element Names Structure
lt!ELEMENT bibliography papergt lt!ELEMENT paper
(authors, fullPaper?, title, booktitle)gt lt!ELEMENT
authors authorgt lt!ELEMENT author
(PCDATA)gt lt!ATTLIST author age CDATAgt lt!ELEMENT
fullPaper EMPTYgt lt!ELEMENT title
(PCDATA)gt lt!ELEMENT booktitle (PCDATA)gt
Element Type Declaration
Attribute List Declaration
58
Element Declarations
Sequence of 0 or more papers
Authors followed by optional fullpaper, followed
by title, followed by booktitle
lt!ELEMENT bibliography (paper)gt lt!ELEMENT paper
(authors, fullPaper?, title, booktitle)gt lt!ELEMENT
authors (author)gt lt!ELEMENT author
(PCDATA)gt lt!ATTLIST author age
CDATAgt lt!ELEMENT fullPaper EMPTYgt lt!ELEMENT
title (PCDATA)gt lt!ELEMENT booktitle (PCDATA)gt
Sequence of 1 or more authors
Character content
59
Element Content Declarations
60
Attribute Types (DTD)
Type
Meaning
ID
Token unique within the document
IDREF
Reference to an ID token
IDREFS
Reference to multiple ID tokens
ENTITY
External entity (image, video, )
ENTITIES
External entities
CDATA
Character data
NMTOKEN
Name token
NMTOKENS
Name tokens
NOTATION
Data other than XML
Choices
Enumeration
INCLUDE IGNORE declarations
Conditional Sec
Attributes may be REQUIRED, IMPLIED (optional)
can have default values, which may be
FIXED
61
Attribute Types (DTD)
62
Attribute-Specification Parameters
63
Attribute Declarations
lt!ELEMENT bibliography papergt lt!ELEMENT paper
(authors, fullPaper?, title, booktitle)gt lt!ELEMENT
authors (author)gt lt!ELEMENT author
(PCDATA)gt lt!ELEMENT fullPaper EMPTYgt lt!ELEMENT
title (PCDATA)gt lt!ELEMENT booktitle
(PCDATA)gt lt!ATTLIST fullPaper source ENTITY
REQUIREDgt lt!ATTLIST person pid IDgt lt!ATTLIST
author authorRef IDREFgt
Pointer (IDREF) and target (ID) declarations
for intra document pointers
64
XML Attribute
ltperson pidjoyce"gt lt/persongt
ltbibliographygt ltpaper pubid"wsa"
role"publication"gt ltauthorsgt ltauthor
authorRefjoyce age???gt J. L. R. Colina
lt/authorgt lt/authorsgt ltfullPaper
source"http//...confusion"/gt lttitlegtObject
Confusion in a Deviator System lt/titlegt
ltrelated papers "deviation101 x_deviators"/gt
lt/papergt lt/bibliographygt
Object Identity Attribute
CDATA (character data)
IDREF intradocument reference
Reference to external ENTITY
65
XML Attribute
66
Uses of XML Entities

Physical partition
size, reuse, "modularity", (both XML docs
DTDs)
Non-XML data
unparsed entities ? binary data
Non-standard characters
character entities
Shorthand for phrases markup,
gt effectively are macros

67
Types of Entities

Internal (to a doc) vs. External (? use URI)
General (in XML doc) vs. Parameter (in DTD)
Parsed (XML) vs. Unparsed (non-XML)

68
Internal Text Entities
DTD
Internal Text Entity Declaration
lt!ENTITY WWW "World Wide Web"gt
XML
Entity Reference
ltpgtWe all use the WWW.lt/pgt
Logically equivalent to actually appearing
ltpgtWe all use the World Wide Web.lt/pgt
69
Entities Physical Structure
Mylife.xml
A logical element can be split into multiple physi
cal entities
DTD...
ltmylifegt
Chap1.xml
ltteengtyada yada lt/teengt
Chap2.xml
ltadultgtblah blah.. lt/adultgt
lt/mylifegt
70
External Text Entities
DTD
External Text Entity Declaration
lt!ENTITY chap1 SYSTEM "http//...chap1.xml"gt
URL
Entity Reference
XML
ltmylifegt chap1 chap2lt/mylifegt
Logically equivalent to inlining file contents
ltmylifegt ltteengtyada yadalt/teengt ltadultgt blah
blahlt/adultgt lt/mylifegt
71
Unparsed ( "Binary") Entities
DTD
... and unparsed entity
Declare external...
lt!ENTITY fusion SYSTEM "http//... fusion.ps"
NDATA psgt
Declare attribute type to be entity
lt!ATTLIST fullPaper source ENTITY REQUIREDgt
XML
Element with ENTITY attribute
ltfullPaper source"fusion"/gt
NOTATION declaration (helper app)
lt!NOTATION ps SYSTEM "ghostview.exe"gt
72
Pure XML Model (DTD)

Any DTD myDTD defines a language valid(myDTD)
valid(myDTD) docs D D is valid wrt. myDTD
lt!ELEMENT A (B,C)gt
lt!ELEMENT B (PCDATA)gt

Content ("container") model A contains one B,
followed by any number of Cs
B is a leaf, contains actual data
ltAgt ltBgtfoolt/Bgt ltCgtbarlt/Cgt ltCgtlablt/Cgt lt/Agt
73
Data Modeling with DTDs

XML element types "object types"
content model for children elements "subobject
structure"
recursive types (container analogy!?)
lt!ELEMENT A (BC)gt "an A can contain a B..."
lt!ELEMENT B (AC)gt "... which contains an A!"
lt!ELEMENT C (PCDATA)gt
found in doc world document DIVision (generic
block-level container)
loose typing
lt!ELEMENT A ANYgt "so what's in the box,
please??"
no context-sensitive types
DTDs cannot distinguish between the publisher in
ltjournalgt ltpublishergt... lt/publishergt lt/journalgt
ltwebsitegt ltpublishergt ... lt/publishergt lt/websitegt
gt renaming hack ltj_pubgt and ltw_pubgt
gt DTD extensions (XML SCHEMA)

74
Where is the Data??

Actual data can go into leaf elements and/or
attributes
Common/good practice (!?)
XML element container (object)
XML element type (tag) container (object) type
XML attribute properties of the container as a
whole ("metadata")
XML leaf elements contain actual data
Problems with DTDs
no data types
no specialization/extension of types
no "higher level" modeling (classes,
relationships, constraints, etc.)

75
Extending DTDs Data Modeling Approaches

XML main stream XML Schema
data types
user defined types, type extensions/restrictions
("subclassing")
cardinality constraints
XML side streams
RELAX (REgular Language description for XML), SOX
(Schema for Object-Oriented XML), Schematron, ...
alternative approach
use well-established data modeling formalisms
like (E)ER, UML, ORM, OO models, ...
... and just encode them in XML!
e.g. UML XMI (standardized, has much moregtbig),
UXF (UML eXchange Format)