Title: XML FUNDAMENTALS
1XML FUNDAMENTALS
2GETTING TO KNOW XML
3What is XML?
4What is XML?
- In the same way that you define the field names
for a data structure, you are free to use any XML
tags that make sense for a given application.
Naturally, though, for multiple applications to
use the same XML data, they have to agree on the
tag names they intend to use.
XML is case-sensitive.
5XML History
- XML eXtensible Markup Language
- Whats Markup Language?
- The markup is the codes, embedded with the
document text, which store the information
required for electronic processing, like font
name, boldness or, in the case of XML, the
document structure. - Methodology for encoding data with some
information. - Typically, it defines a set of tags each of which
has some as associated meaning.
6Who developed XML?
- XML is an activity of the World Wide Web
Consortium (W3C) http//www.w3c.org. The XML
development effort started in 1996. - A diverse group of markup language experts, from
industry to academic, developed a simplified
version of SGML (Standard Generalized Markup
Language) for the Web. In February 1998, XML 1.0
specification became a recommendation by the W3C. - XML 1.1 W3C Recommendation in February 2004
7Standard Generalized Markup Language
SGML extends generic coding. Furthermore, it is
an international standard published by the ISO
(International Organization of Standardization).
It is based on the early work done by Dr. Charles
Goldfarb from IBM.
- SGML is similar to generic coding but with two
additional characteristics - The markup describes the document's structure,
not the document appearance. - The markup conforms to a model, which is similar
to a database schema. This means that it can be
processed by software or stored in a database.
8Data Problems
- Fundamental issues How do I represent my
application data? - Performance (speed/time)
- Persistence(short/long lived)
- Mutability
- Composition
- Security (encryption/identity)
- Open Information Management
- Interpretation
- Presentation
- Interoperation
- Portability
- Interrogation
9Why XML?
- Unfortunately, there are things that HTML just
cant do for you. Fortunately, XML is growing
quickly to meet these needs. - Unfortunately, no matter how many new tags are
added, there will never be enough for all the
good ideas people keep having. Fortunately, XML
is a form of SGML, an ISO standard that allows
you to invent the tags you need, and declare them
so others can use them. - Unfortunately, the SGML standard is large, takes
time to learn, and doesnt have a starter-kit.
Fortunately, XML is here.
10Why XML?
- Plain Text
- any editor, readable, for configuration
information - Data Identification
- Self-described markup style
- internationalized
- Unicode-based (UTF-8 / UTF-16), XML as universal
data representation. - Inline Reusability
- can integrate data from multiple sources as a
single document, modularization without using
linking. - Linkability
- More powerful than HTMLs, W3C Xlink XPointer
specifications - Easily Processed
- well-formed rules, validity checking, available
tools parsers, transformers, browsers. - Hierarchical
- Faster to access and rearrange each element.
11XML as a Self-Describing Data Exchange Format
- can be easily understood by our friend
- can be parsed easily
- contains its own structure (parse tree) in the
data - gt allows the application programmer to
rediscover schema and content/semantics (to
which extent???) - may include an explicit schema description
(e.g., DTD) - gt meta-language definition of a language w.r.t.
which it is valid - allows separation of marked-up content from
presentation (gtstyle sheets) - many tools (and many more to come -- (re)use
code) parsers, validators, query languages,
storage, - standards (good for interoperation, integration,
etc) - gt generic standards (XML, DTDs, XML Schema,
XPath,...) - gt community/industry standards (specific markup
languages)
12Key Features of XML
- Extensibility
- You define your own markup languages (tags) for
your own problem domain - Media and Presentation independence
- Same data can be presented to different
communication medium (browser, voice device) and
different format (WML, HTML) portability - Separation of contents from presentation
- Clear separation between contents (data) and
presentation (data appearance) - Structure Relationship and Hierarchical
Structure - Faster to access, easier to rearrange
- Validation
- XML data is constrained by a Rule (or Syntax)
13What are XML applications?
- XML is poised to play a prominent role as a data
interchange format in B2B Web applications such
as e-commerce, supply-chain management, workflow,
and application integration. Another use of XML
is for structured information management,
including information from databases. XML also
supports media-independent publishing, allowing
documents to be written once and published in
multiple media formats and devices. On the
client, XML can be used to create customized
views into data.
14XML and Java Technology Relationship
- XML and the Java technology are complementary.
Java technology provides the portable,
maintainable code to process portable, reusable
XML data. In addition, XML and Java technology
have a number of shared features that make them
the ideal pair for Web computing, including being
industry standards, platform-independence,
extensible, reusable, Web-centric, and
internationalized. - Its a Match made in Heaven
- Java enables Portable Code
- XML enables Portable Data
- XML tools and programs are mostly written in the
Java programming language - Better API support for Java platform than any
other languages
15Benefits of Using Java Technology with XML
- Java technology offers a substantial productivity
boost for software developers compared to
programming languages such as C or C. In
addition, developers using the Java platform can
create sophisticated programs that are reusable
and maintainable compared to programs written
with scripting languages. Using XML and Java
together, developers can build sophisticated,
interoperable Web applications more quickly and
at a lower cost.
16HTML
- The most popular markup language
- Defines a fixed set of tags
- Designed for presentation for data
- HTML documents are processed by HTML processing
application (Browser) - Easy to implement and author e.g. small number of
tags, forgiving syntax checking - No formal validation
- Does not support semantic search
- Not for complex document
17HTML vs XML
- Fast becoming the standard for data interchange
on the web. - Extensible Markup Language (XML) is closely
related to HTML, the original document
representation of the WWW. While HTML enables
the creation of Web pages that can be viewed on
any browser, XML adds tags to data so that it can
be processed by any application. - Using XML, companies can separate the business
rules from the content and structure of the data.
By focusing on exchanging data content and
structure, the trading partners are free to
implement their own business rules, which can be
quite distinct from one another. - Custom tag like defining filed names for a data
structure. Same application can agree upon the
same XML tag names.
18HTML vs XML
19XML Standards
- XML, DTD
- XSL, XSLT, XPath
- DOM, SAX
- W3C XML Schema
- Namespaces
- XLink, XPointer
- XHTML
- XQL
20Domain Specific XML Standards
- Chemical - CML
- 2D Graphics - SVG
- Math - MathML
- Music - MusicML
- Travel -OTA
- Many more ...
- http//xml.org/xmlorg_registry/index.shtml
- FIXML
21Core Java APIs for XML
- JAXP Parsing and Transforming
- JAXB High-level XML programming
- JAXM Messaging
- JAXR Registry APIs
- JDOM Java-optimized Parsing
22E-Commerce Standards
- ebXML
- UDDI (Universal Description, Discovery and
Integration) - SOAP (Simple Open Access Protocol)
- W3C XP (XML Protocol)
- WSDL (Web Services Definition Lang.)
- S2ML (Security Services ML)
- XAML (Transaction Authority ML)
23XML COMPONENTS
24XML Document Components
- Processing Instruction
- Elements and Attributes
- Empty Tags
- Comments
- Special Characters
- Entity References
- CDATA
- Whitespaces
- Namespaces
- XPath, XLink, XPointer
25The XML Prolog XML Declaration
- The part of an XML document that precedes the XML
data. The prolog includes the declaration and an
optional DTD. - An XML file always starts with a prolog. The
minimal prolog contains a declaration that
identifies the document as an XML document - lt?xml version"1.0"?gt
- lt?xml version"1.0" encoding"ISO-8859-1"
standalone"yes"?gt - version Identifies the version of the XML markup
language used in the data. This attribute is not
optional. - encoding Identifies the character set used to
encode the data. "ISO-8859-1" is "Latin-1" the
Western European and English language character
set. (The default is compressed Unicode UTF-8.) - standalone Tells whether or not this document
references an external entity or an external data
type specification
26Processing Instruction
- PIs give commands or information to an
application that is processing the XML data. - lt?target instructions?gt
- the target is the name of the application that is
expected to do the processing, and instructions
is a string of characters that embodies the
information or commands for the application to
process.
27Elements
- XML tags usually surround an identified object in
the data stream. A start-tag and an end-tag,
together with the data enclosed by them,
represent an element. The start-tag is delimited
using the lt and gt characters. The end-tag is
delimited by lt/ and gt - It is this ability for one tag to contain others
that gives XML its ability to represent
hierarchical data structures
28Elements
- every XML file defines exactly one element, known
as the root element. Any other elements in the
file are contained within that element.
29Attributes
- Additional information included about an element
as part of the tag itself, within the tag's angle
brackets. It consists of an attribute name and an
attribute value. The attribute name precedes its
value enclosed by quotes ( , ) and separated
by an equals sign. - There must be a least one space between the
element name and the first attribute. Multiple
attributes are separated by spaces - Since you could design a data structure like
ltmessagegt equally well using either attributes or
tags, it can take a considerable amount of
thought to figure out which design is best for
your purposes.
30Elements and their Content
element
element type
ltbibliographygt ltpaper ID"object-fusion"gt
ltauthorsgt ltauthorgtY.Papakonstantinoult/author
gt ltauthorgtS. Abiteboullt/authorgt
ltauthorgtH. Garcia-Molinalt/authorgt lt/authorsgt
ltfullPaper source"fusion"/gt
lttitlegtObject Fusion in Mediator Systemslt/titlegt
ltbooktitlegtVLDB 96lt/booktitlegt
lt/papergt lt/bibliographygt
element content
empty element
character content
31Element Attributes
Attribute name
ltbibliographygt ltpaper pid"object-fusion"gt
ltauthorsgt ltauthorgtY.Papakonstantinoult/autho
rgt ltauthorgtS. Abiteboullt/authorgt
ltauthorgtH. Garcia-Molinalt/authorgt lt/authorsgt
ltfullPaper source"fusion"/gt
lttitlegtObject Fusion in Mediator Systemslt/titlegt
ltbooktitlegtVLDB 96lt/booktitlegt
lt/papergt lt/bibliographygt
Attribute Value
32Empty Tags
- Sometimes, we might want to add a "flag" tag that
marks message as important. A tag like that
doesn't enclose any content, so it's known as an
"empty tag". We can create an empty tag by ending
it with /gt instead of gt. - The empty tag saves you from having to code
ltflaggtlt/flaggt in order to have a well-formed
document.
33Comments in XML Files
- XML comments look just like HTML comments
- It will not appear in published output.
34Handling Special Characters
- In XML, an entity is an XML structure (or plain
text) that has a name. Referencing the entity by
name causes it to be inserted into the document
in place of the entity reference. To create an
entity reference, the entity name is surrounded
by an ampersand and a semicolon - entityName
- Predefined Entities
35Using Entity Reference in an XML Document
- The problem with putting that line into an XML
file directly is that when the parser sees the
left-angle bracket (lt), it starts looking for a
tag name, which throws off the parse. To get
around that problem, you put lt in the file,
instead of "lt".
XML File
XML Output
36Handling Text with XML-Style Syntax
- When you are handling large blocks of XML or HTML
that include many of the special characters, it
would be inconvenient to replace each of them
with the appropriate entity reference. For those
situations, you can use a CDATA section. - all white space in a CDATA section is
significant, and characters in it are not
interpreted as XML. - lt!CDATA ............. gt
37Handling Text with XML-Style Syntax
XML File
XML Output
38 Document Prolog Body Epilog
lt?xml version1.0?gt lt!-- comments and
processing instructions --gt lt!DOCTYPE
sdsc_play_groups SYSTEM http//localserver/spg.dt
dgt lt!-- comments and processing instructions --gt
ltsdsc_play_groupsgt ltplay_group ID"Data-issues"gt
ltmember_groupsgt ltgroupgtScientific
Computinglt/groupgt ltgroupgtData Intensive
Computinglt/groupgt ltgroupgtSecurity
Technologieslt/groupgt lt/member_groupsgt
ltcharter sourceXPG"/gt lturlgthttp//www.sd
sc.edu/marciano/XML/xpg.htmllt/urlgt
lttitlegtXML Play Grouplt/titlegt lt/play_groupgt
lt/sdsc_play_groupsgt
lt!-- comments and processing instructions --gt
39White Space
- XML specification normalizes different
line-ending conventions to a single convention
but preserves all other white space, except in
attribute values. - White Space and the XML Declaration According
to the current XML 1.0 standard, white space is
not allowed before the XML declaration. If white
space appears before the XML declaration, it will
be treated as a processing instruction. The
information, particularly the encoding, may not
be used by the parser. - White Space in Element Content XML parsers are
required to report all white space that appears
in element content within a document. For this
reason, the following three documents are
different to an XML parser.
40White Space
- White Space in Attributes Although XML
processors preserve all white space in element
content, they frequently normalize it in
attribute values. Tabs, carriage returns, and
spaces are reported as single spaces. In certain
types of attributes, they trim white space that
comes before or after the main body of the value
and reduce white space within the value to single
spaces. If a DTD is available, this trimming
will be performed on all attributes that are not
of type CDATA. If there is no DTD, the parser
assumes that all attributes are of type CDATA. -
- For the above example, an XML parser reports
both attribute values as "this is a note.",
converting the line breaks to single spaces. - End of Line Handling XML processors treat the
character sequence Carriage Return-Line Feed
(CRLF) like single CR or LF characters. All are
reported as a single LF character.
41Namespaces
By using XML namespaces, authors can qualify
element names uniquely on the Web and thus avoid
conflicts between elements that have the same
name. Associating a Universal Resource Identifier
(URI) with a namespace is purely to ensure that
two elements with the same name can remain
unambiguous it does not matter what, if
anything, the URI points to.
42Identifying Vocabularies XML Namespaces
- My element may not be your element
- geometry context ltelementgtlinelt/elementgt
- chemistry context ltelementgtoxygenlt/elementgt
- SGML/XML context ....
- use XML namespaces to identify the vocabulary
43XML Namespaces
- mechanism for globally unique tag names
- lthhtml xmlnsxdc"http//www.xml.com/books"
- xmlnsh"http//www.w3.org/HTML/1998/htm
l4"gt - lthheadgtlthtitlegtBook Reviewlt/htitlegtlt/hheadgt
- ...
- ltxdcbookreviewgt
- ltxdctitlegtXML A Primerlt/xdctitlegt
- ...
- lt/hhtmlgt
- mix of different tag vocabularies without
confusion - namespaces only identify the vocabulary
additional mechanisms required for structure and
meaning of tags
44XPath, XLink, and XPointer
- XPath
- a declarative language for locating nodes and
fragments in XML trees - used in both XPointer (for addressing) and XSL
(for pattern matching) - XLink
- a generalization of the HTML link concept
- higher abstraction level (intended for general
XML - not just hypertext) - more expressive power (multiple destinations,
special behaviours, out-of-line links, ...) - uses XPointer to locate resources
- XPointer
- an extension of XPath suited for linking
- specifies connection between XPath expressions
and URIs
45XML Path Language XPath
- W3C Recommendation Nov. 1999
- for addressing parts within an XML document
- (non-XML) syntax used for XSLT and XPointer
- Find the root element (bookstore) of this
document - /bookstore
- Find all author elements anywhere within the
current document - //author
46XML Linking Language (XLink)
- W3C Candidate Recommendation, July/2000
- language for typed links between documents
- extends the simple untyped href links in HTML
- multidirectional links
- any element can be the source (not just lta ... gt
lt/agt) - link to arbitrary positions within a document
(via URIs and XPointer) - richer custom applications possible
- xlinktype declaration simple, extended,
locator, arc - optional "semantic attributes" role, arcrole,
title - Example
ltauthor xmlnsxlink"... " xlinkhref"....itmav
en.com/peter.html" xlinktitle"Peter's
homepage" xlinkrole"further info about the
book authorgt Peter Pan Sr. lt/authorgt
47 XML Pointer Language (XPointer)
- W3C Candidate Recommendation, June/2000
- for locating internal structures of XML documents
- XLinks URIs can include XPointer parts
- extends HTML's named anchors
- target doc lta name"target"gt ... lt/agt
- source doc lta href"target"gt...lt/agt
- ... and select via XPath expressions
- some extension (points and ranges, ...)
- Example
- intro/14/3 ("intro" is an ID attribute value)
- /1/2/5/14/3
- xpointer(id("chap1")) xpointer(//_at_id"chap
1")
48Four Common Errors
The XML syntax is very strict Elements must have
both a start and end tag, or they must use the
special empty element tag attribute values must
be fully quoted there can be only one top-level
element and so on. A strict syntax was a design
goal for XML. The browser vendors asked for it.
HTML is very lenient, and HTML browsers accept
anything that looks vaguely like HTML. It might
have helped with the early adoption of HTML but
now it is a problem. Studies estimate that more
than 50 of the code in a browser deals with
errors or the sloppiness of HTML authors.
Consequently, an HTML browser is difficult to
write, it has slowed competition, and it makes
for mega-downloads. The four most common errors
in writing XML code are
- Forget End Tags
- Forget that XML Is Case-Sensitive
- Introduce Spaces in the Name of Element
- Forget the Quotes for Attribute Value
49Four Common Errors (cont)
- Forget End Tags
- Forget that XML Is Case-Sensitive
- Introduce Spaces in the Name of Element
- Forget the Quotes for Attribute Value
ltaddressgt ltstreetgt34 Fountain Square
Plaza ltregiongtOHlt/regiongt ltpostal-codegt45202lt/po
stal-codegt ltlocalitygtCincinnatilt/localitygt ltcoun
trygtUS lt/addressgt
lttelgt513-744-7098lt/telgt ltTELgt513-744-7098lt/TELgt
lttelgt513-744-7098lt/TELgt
ltaddress bookgt ltentrygt ltnamegtJohn
Doelt/namegt ltemail hrefmailtojdoe_at_emailaholic.
com/gt lt/entrygt lt/address bookgt
lttel preferredtruegt513-744-8889lt/telgt
lttel preferredtruegt513-744-8889lt/telgt
50XML Document Map
XML Declaration
Processing Instruction
DOCTYPE Declaration
Comment
Root Element
Namespace
Element
Entity Reference
Start Tag
End Tag
CDATA Section
Attribute
Textual Content
51DOCUMENT TYPE DEFINITION(DTD)
52XML Document Types
- Well-formed XML Document
- Conforms to the basic XML syntax
- Can be parsed without regard to the DTD
- Valid XML Document
- Well-formed
- Conforms to its DTD
53Document Type Definition (DTD)
- Firstly and most importantly a DTD can define a
class of document. Classes are very powerful
concepts in programming, because if you have a
class it has an expected structure which means it
will have consistent behaviors and properties.
You should also be able to carry out certain
predefined operations on a class of documents. - If you use a DTD you can force a writer to
include certain elements. You can't enforce them
to put PCDATA content in the element, but that's
another story. - If you are planning to display a document using a
style sheet, A DTD will ensure that you do not
include elements that do not have any display
instructions. - If you are planning to search the document, or
otherwise manipulate it using the DOM, a rigid
structure will simplify your coding, and speed up
the execution of your code, by a huge factor.
54In Search of the Lost Structure Semantics
How do I share structure and metadata/semantics
with my community?
How do I learn and use the element
structure of a document?
How to make all this automatable?
55Adding Structure and Semantics
- XML Document Type Definitions (DTDs)
- define the structure of "allowed" documents
(i.e., valid wrt. a DTD) - ? database schema
- gt improve query formulation, execution, ...
- XML Schema
- defines structure and data types
- allows developers to build their own libraries of
interchanged data types - XML Namespaces
- identify your vocabulary
56XML DTDs as Extended Context Free Grammars
XML DTD
lt!ELEMENT bibliography papergt lt!ELEMENT paper
(authors,fullPaper?,title,booktitle)gt lt!ELE
MENT authors authorgt
Grammar
lhs element (name) rhs regular expression
over elements strings (PCDATA)
57Document Type Definitions (DTDs)
Define and Constrain Element Names Structure
lt!ELEMENT bibliography papergt lt!ELEMENT paper
(authors, fullPaper?, title, booktitle)gt lt!ELEMENT
authors authorgt lt!ELEMENT author
(PCDATA)gt lt!ATTLIST author age CDATAgt lt!ELEMENT
fullPaper EMPTYgt lt!ELEMENT title
(PCDATA)gt lt!ELEMENT booktitle (PCDATA)gt
Element Type Declaration
Attribute List Declaration
58Element Declarations
Sequence of 0 or more papers
Authors followed by optional fullpaper, followed
by title, followed by booktitle
lt!ELEMENT bibliography (paper)gt lt!ELEMENT paper
(authors, fullPaper?, title, booktitle)gt lt!ELEMENT
authors (author)gt lt!ELEMENT author
(PCDATA)gt lt!ATTLIST author age
CDATAgt lt!ELEMENT fullPaper EMPTYgt lt!ELEMENT
title (PCDATA)gt lt!ELEMENT booktitle (PCDATA)gt
Sequence of 1 or more authors
Character content
59Element Content Declarations
60Attribute Types (DTD)
Type
Meaning
ID
Token unique within the document
IDREF
Reference to an ID token
IDREFS
Reference to multiple ID tokens
ENTITY
External entity (image, video, )
ENTITIES
External entities
CDATA
Character data
NMTOKEN
Name token
NMTOKENS
Name tokens
NOTATION
Data other than XML
Choices
Enumeration
INCLUDE IGNORE declarations
Conditional Sec
Attributes may be REQUIRED, IMPLIED (optional)
can have default values, which may be
FIXED
61Attribute Types (DTD)
62Attribute-Specification Parameters
63Attribute Declarations
lt!ELEMENT bibliography papergt lt!ELEMENT paper
(authors, fullPaper?, title, booktitle)gt lt!ELEMENT
authors (author)gt lt!ELEMENT author
(PCDATA)gt lt!ELEMENT fullPaper EMPTYgt lt!ELEMENT
title (PCDATA)gt lt!ELEMENT booktitle
(PCDATA)gt lt!ATTLIST fullPaper source ENTITY
REQUIREDgt lt!ATTLIST person pid IDgt lt!ATTLIST
author authorRef IDREFgt
Pointer (IDREF) and target (ID) declarations
for intra document pointers
64XML Attribute
ltperson pidjoyce"gt lt/persongt
ltbibliographygt ltpaper pubid"wsa"
role"publication"gt ltauthorsgt ltauthor
authorRefjoyce age???gt J. L. R. Colina
lt/authorgt lt/authorsgt ltfullPaper
source"http//...confusion"/gt lttitlegtObject
Confusion in a Deviator System lt/titlegt
ltrelated papers "deviation101 x_deviators"/gt
lt/papergt lt/bibliographygt
Object Identity Attribute
CDATA (character data)
IDREF intradocument reference
Reference to external ENTITY
65XML Attribute
66 Uses of XML Entities
- Physical partition
- size, reuse, "modularity", (both XML docs
DTDs) - Non-XML data
- unparsed entities ? binary data
- Non-standard characters
- character entities
- Shorthand for phrases markup,
- gt effectively are macros
67Types of Entities
- Internal (to a doc) vs. External (? use URI)
- General (in XML doc) vs. Parameter (in DTD)
- Parsed (XML) vs. Unparsed (non-XML)
68Internal Text Entities
DTD
Internal Text Entity Declaration
lt!ENTITY WWW "World Wide Web"gt
XML
Entity Reference
ltpgtWe all use the WWW.lt/pgt
Logically equivalent to actually appearing
ltpgtWe all use the World Wide Web.lt/pgt
69Entities Physical Structure
Mylife.xml
A logical element can be split into multiple physi
cal entities
DTD...
ltmylifegt
Chap1.xml
ltteengtyada yada lt/teengt
Chap2.xml
ltadultgtblah blah.. lt/adultgt
lt/mylifegt
70External Text Entities
DTD
External Text Entity Declaration
lt!ENTITY chap1 SYSTEM "http//...chap1.xml"gt
URL
Entity Reference
XML
ltmylifegt chap1 chap2lt/mylifegt
Logically equivalent to inlining file contents
ltmylifegt ltteengtyada yadalt/teengt ltadultgt blah
blahlt/adultgt lt/mylifegt
71Unparsed ( "Binary") Entities
DTD
... and unparsed entity
Declare external...
lt!ENTITY fusion SYSTEM "http//... fusion.ps"
NDATA psgt
Declare attribute type to be entity
lt!ATTLIST fullPaper source ENTITY REQUIREDgt
XML
Element with ENTITY attribute
ltfullPaper source"fusion"/gt
NOTATION declaration (helper app)
lt!NOTATION ps SYSTEM "ghostview.exe"gt
72Pure XML Model (DTD)
- Any DTD myDTD defines a language valid(myDTD)
- valid(myDTD) docs D D is valid wrt. myDTD
- lt!ELEMENT A (B,C)gt
- lt!ELEMENT B (PCDATA)gt
Content ("container") model A contains one B,
followed by any number of Cs
B is a leaf, contains actual data
ltAgt ltBgtfoolt/Bgt ltCgtbarlt/Cgt ltCgtlablt/Cgt lt/Agt
73Data Modeling with DTDs
- XML element types "object types"
- content model for children elements "subobject
structure" - recursive types (container analogy!?)
- lt!ELEMENT A (BC)gt "an A can contain a B..."
- lt!ELEMENT B (AC)gt "... which contains an A!"
- lt!ELEMENT C (PCDATA)gt
- found in doc world document DIVision (generic
block-level container) - loose typing
- lt!ELEMENT A ANYgt "so what's in the box,
please??" - no context-sensitive types
- DTDs cannot distinguish between the publisher in
- ltjournalgt ltpublishergt... lt/publishergt lt/journalgt
- ltwebsitegt ltpublishergt ... lt/publishergt lt/websitegt
- gt renaming hack ltj_pubgt and ltw_pubgt
- gt DTD extensions (XML SCHEMA)
74Where is the Data??
- Actual data can go into leaf elements and/or
attributes - Common/good practice (!?)
- XML element container (object)
- XML element type (tag) container (object) type
- XML attribute properties of the container as a
whole ("metadata") - XML leaf elements contain actual data
- Problems with DTDs
- no data types
- no specialization/extension of types
- no "higher level" modeling (classes,
relationships, constraints, etc.)
75Extending DTDs Data Modeling Approaches
- XML main stream XML Schema
- data types
- user defined types, type extensions/restrictions
("subclassing") - cardinality constraints
- XML side streams
- RELAX (REgular Language description for XML), SOX
(Schema for Object-Oriented XML), Schematron, ... - alternative approach
- use well-established data modeling formalisms
like (E)ER, UML, ORM, OO models, ... - ... and just encode them in XML!
- e.g. UML XMI (standardized, has much moregtbig),
UXF (UML eXchange Format)
76How to use DTD?
- DTD can be declared internal(local subset) or
external XML file. - data elementit contains only sub elements with
no intervening text. - a document element, it is defined to include both
text and sub elements.