XML FUNDAMENTALS - PowerPoint PPT Presentation

1 / 76
About This Presentation
Title:

XML FUNDAMENTALS

Description:

the Extensible Markup Language, is a universal syntax for describing and ... sequence Carriage Return-Line Feed (CRLF) like single CR or LF characters. ... – PowerPoint PPT presentation

Number of Views:75
Avg rating:3.0/5.0
Slides: 77
Provided by: ser1170
Category:
Tags: fundamentals | xml | crlf

less

Transcript and Presenter's Notes

Title: XML FUNDAMENTALS


1
XML FUNDAMENTALS
2
GETTING TO KNOW XML
3
What is XML?
4
What is XML?
  • In the same way that you define the field names
    for a data structure, you are free to use any XML
    tags that make sense for a given application.
    Naturally, though, for multiple applications to
    use the same XML data, they have to agree on the
    tag names they intend to use.

XML is case-sensitive.
5
XML History
  • XML eXtensible Markup Language
  • Whats Markup Language?
  • The markup is the codes, embedded with the
    document text, which store the information
    required for electronic processing, like font
    name, boldness or, in the case of XML, the
    document structure.
  • Methodology for encoding data with some
    information.
  • Typically, it defines a set of tags each of which
    has some as associated meaning.

6
Who developed XML?
  • XML is an activity of the World Wide Web
    Consortium (W3C) http//www.w3c.org. The XML
    development effort started in 1996.
  • A diverse group of markup language experts, from
    industry to academic, developed a simplified
    version of SGML (Standard Generalized Markup
    Language) for the Web. In February 1998, XML 1.0
    specification became a recommendation by the W3C.
  • XML 1.1 W3C Recommendation in February 2004

7
Standard Generalized Markup Language
SGML extends generic coding. Furthermore, it is
an international standard published by the ISO
(International Organization of Standardization).
It is based on the early work done by Dr. Charles
Goldfarb from IBM.
  • SGML is similar to generic coding but with two
    additional characteristics
  • The markup describes the document's structure,
    not the document appearance.
  • The markup conforms to a model, which is similar
    to a database schema. This means that it can be
    processed by software or stored in a database.

8
Data Problems
  • Fundamental issues How do I represent my
    application data?
  • Performance (speed/time)
  • Persistence(short/long lived)
  • Mutability
  • Composition
  • Security (encryption/identity)
  • Open Information Management
  • Interpretation
  • Presentation
  • Interoperation
  • Portability
  • Interrogation

9
Why XML?
  • Unfortunately, there are things that HTML just
    cant do for you. Fortunately, XML is growing
    quickly to meet these needs.
  • Unfortunately, no matter how many new tags are
    added, there will never be enough for all the
    good ideas people keep having. Fortunately, XML
    is a form of SGML, an ISO standard that allows
    you to invent the tags you need, and declare them
    so others can use them.
  • Unfortunately, the SGML standard is large, takes
    time to learn, and doesnt have a starter-kit.
    Fortunately, XML is here.

10
Why XML?
  • Plain Text
  • any editor, readable, for configuration
    information
  • Data Identification
  • Self-described markup style
  • internationalized
  • Unicode-based (UTF-8 / UTF-16), XML as universal
    data representation.
  • Inline Reusability
  • can integrate data from multiple sources as a
    single document, modularization without using
    linking.
  • Linkability
  • More powerful than HTMLs, W3C Xlink XPointer
    specifications
  • Easily Processed
  • well-formed rules, validity checking, available
    tools parsers, transformers, browsers.
  • Hierarchical
  • Faster to access and rearrange each element.

11
XML as a Self-Describing Data Exchange Format
  • can be easily understood by our friend
  • can be parsed easily
  • contains its own structure (parse tree) in the
    data
  • gt allows the application programmer to
    rediscover schema and content/semantics (to
    which extent???)
  • may include an explicit schema description
    (e.g., DTD)
  • gt meta-language definition of a language w.r.t.
    which it is valid
  • allows separation of marked-up content from
    presentation (gtstyle sheets)
  • many tools (and many more to come -- (re)use
    code) parsers, validators, query languages,
    storage,
  • standards (good for interoperation, integration,
    etc)
  • gt generic standards (XML, DTDs, XML Schema,
    XPath,...)
  • gt community/industry standards (specific markup
    languages)

12
Key Features of XML
  • Extensibility
  • You define your own markup languages (tags) for
    your own problem domain
  • Media and Presentation independence
  • Same data can be presented to different
    communication medium (browser, voice device) and
    different format (WML, HTML) portability
  • Separation of contents from presentation
  • Clear separation between contents (data) and
    presentation (data appearance)
  • Structure Relationship and Hierarchical
    Structure
  • Faster to access, easier to rearrange
  • Validation
  • XML data is constrained by a Rule (or Syntax)

13
What are XML applications?
  • XML is poised to play a prominent role as a data
    interchange format in B2B Web applications such
    as e-commerce, supply-chain management, workflow,
    and application integration. Another use of XML
    is for structured information management,
    including information from databases. XML also
    supports media-independent publishing, allowing
    documents to be written once and published in
    multiple media formats and devices. On the
    client, XML can be used to create customized
    views into data.

14
XML and Java Technology Relationship
  • XML and the Java technology are complementary.
    Java technology provides the portable,
    maintainable code to process portable, reusable
    XML data. In addition, XML and Java technology
    have a number of shared features that make them
    the ideal pair for Web computing, including being
    industry standards, platform-independence,
    extensible, reusable, Web-centric, and
    internationalized.
  • Its a Match made in Heaven
  • Java enables Portable Code
  • XML enables Portable Data
  • XML tools and programs are mostly written in the
    Java programming language
  • Better API support for Java platform than any
    other languages

15
Benefits of Using Java Technology with XML
  • Java technology offers a substantial productivity
    boost for software developers compared to
    programming languages such as C or C. In
    addition, developers using the Java platform can
    create sophisticated programs that are reusable
    and maintainable compared to programs written
    with scripting languages. Using XML and Java
    together, developers can build sophisticated,
    interoperable Web applications more quickly and
    at a lower cost.

16
HTML
  • The most popular markup language
  • Defines a fixed set of tags
  • Designed for presentation for data
  • HTML documents are processed by HTML processing
    application (Browser)
  • Easy to implement and author e.g. small number of
    tags, forgiving syntax checking
  • No formal validation
  • Does not support semantic search
  • Not for complex document

17
HTML vs XML
  • Fast becoming the standard for data interchange
    on the web.
  • Extensible Markup Language (XML) is closely
    related to HTML, the original document
    representation of the WWW. While HTML enables
    the creation of Web pages that can be viewed on
    any browser, XML adds tags to data so that it can
    be processed by any application.
  • Using XML, companies can separate the business
    rules from the content and structure of the data.
    By focusing on exchanging data content and
    structure, the trading partners are free to
    implement their own business rules, which can be
    quite distinct from one another.
  • Custom tag like defining filed names for a data
    structure. Same application can agree upon the
    same XML tag names.

18
HTML vs XML
19
XML Standards
  • XML, DTD
  • XSL, XSLT, XPath
  • DOM, SAX
  • W3C XML Schema
  • Namespaces
  • XLink, XPointer
  • XHTML
  • XQL

20
Domain Specific XML Standards
  • Chemical - CML
  • 2D Graphics - SVG
  • Math - MathML
  • Music - MusicML
  • Travel -OTA
  • Many more ...
  • http//xml.org/xmlorg_registry/index.shtml
  • FIXML

21
Core Java APIs for XML
  • JAXP Parsing and Transforming
  • JAXB High-level XML programming
  • JAXM Messaging
  • JAXR Registry APIs
  • JDOM Java-optimized Parsing

22
E-Commerce Standards
  • ebXML
  • UDDI (Universal Description, Discovery and
    Integration)
  • SOAP (Simple Open Access Protocol)
  • W3C XP (XML Protocol)
  • WSDL (Web Services Definition Lang.)
  • S2ML (Security Services ML)
  • XAML (Transaction Authority ML)

23
XML COMPONENTS
24
XML Document Components
  • Processing Instruction
  • Elements and Attributes
  • Empty Tags
  • Comments
  • Special Characters
  • Entity References
  • CDATA
  • Whitespaces
  • Namespaces
  • XPath, XLink, XPointer

25
The XML Prolog XML Declaration
  • The part of an XML document that precedes the XML
    data. The prolog includes the declaration and an
    optional DTD.
  • An XML file always starts with a prolog. The
    minimal prolog contains a declaration that
    identifies the document as an XML document
  • lt?xml version"1.0"?gt
  • lt?xml version"1.0" encoding"ISO-8859-1"
    standalone"yes"?gt
  • version Identifies the version of the XML markup
    language used in the data. This attribute is not
    optional.
  • encoding Identifies the character set used to
    encode the data. "ISO-8859-1" is "Latin-1" the
    Western European and English language character
    set. (The default is compressed Unicode UTF-8.)
  • standalone Tells whether or not this document
    references an external entity or an external data
    type specification

26
Processing Instruction
  • PIs give commands or information to an
    application that is processing the XML data.
  • lt?target instructions?gt
  • the target is the name of the application that is
    expected to do the processing, and instructions
    is a string of characters that embodies the
    information or commands for the application to
    process.

27
Elements
  • XML tags usually surround an identified object in
    the data stream. A start-tag and an end-tag,
    together with the data enclosed by them,
    represent an element. The start-tag is delimited
    using the lt and gt characters. The end-tag is
    delimited by lt/ and gt
  • It is this ability for one tag to contain others
    that gives XML its ability to represent
    hierarchical data structures

28
Elements
  • every XML file defines exactly one element, known
    as the root element. Any other elements in the
    file are contained within that element.

29
Attributes
  • Additional information included about an element
    as part of the tag itself, within the tag's angle
    brackets. It consists of an attribute name and an
    attribute value. The attribute name precedes its
    value enclosed by quotes ( , ) and separated
    by an equals sign.
  • There must be a least one space between the
    element name and the first attribute. Multiple
    attributes are separated by spaces
  • Since you could design a data structure like
    ltmessagegt equally well using either attributes or
    tags, it can take a considerable amount of
    thought to figure out which design is best for
    your purposes.

30
Elements and their Content
element
element type
ltbibliographygt ltpaper ID"object-fusion"gt
ltauthorsgt ltauthorgtY.Papakonstantinoult/author
gt ltauthorgtS. Abiteboullt/authorgt
ltauthorgtH. Garcia-Molinalt/authorgt lt/authorsgt
ltfullPaper source"fusion"/gt
lttitlegtObject Fusion in Mediator Systemslt/titlegt
ltbooktitlegtVLDB 96lt/booktitlegt
lt/papergt lt/bibliographygt
element content
empty element
character content
31
Element Attributes
Attribute name
ltbibliographygt ltpaper pid"object-fusion"gt
ltauthorsgt ltauthorgtY.Papakonstantinoult/autho
rgt ltauthorgtS. Abiteboullt/authorgt
ltauthorgtH. Garcia-Molinalt/authorgt lt/authorsgt
ltfullPaper source"fusion"/gt
lttitlegtObject Fusion in Mediator Systemslt/titlegt
ltbooktitlegtVLDB 96lt/booktitlegt
lt/papergt lt/bibliographygt
Attribute Value
32
Empty Tags
  • Sometimes, we might want to add a "flag" tag that
    marks message as important. A tag like that
    doesn't enclose any content, so it's known as an
    "empty tag". We can create an empty tag by ending
    it with /gt instead of gt.
  • The empty tag saves you from having to code
    ltflaggtlt/flaggt in order to have a well-formed
    document.

33
Comments in XML Files
  • XML comments look just like HTML comments
  • It will not appear in published output.

34
Handling Special Characters
  • In XML, an entity is an XML structure (or plain
    text) that has a name. Referencing the entity by
    name causes it to be inserted into the document
    in place of the entity reference. To create an
    entity reference, the entity name is surrounded
    by an ampersand and a semicolon
  • entityName
  • Predefined Entities

35
Using Entity Reference in an XML Document
  • The problem with putting that line into an XML
    file directly is that when the parser sees the
    left-angle bracket (lt), it starts looking for a
    tag name, which throws off the parse. To get
    around that problem, you put lt in the file,
    instead of "lt".

XML File
XML Output
36
Handling Text with XML-Style Syntax
  • When you are handling large blocks of XML or HTML
    that include many of the special characters, it
    would be inconvenient to replace each of them
    with the appropriate entity reference. For those
    situations, you can use a CDATA section.
  • all white space in a CDATA section is
    significant, and characters in it are not
    interpreted as XML.
  • lt!CDATA ............. gt

37
Handling Text with XML-Style Syntax
XML File
XML Output
38
Document Prolog Body Epilog
lt?xml version1.0?gt lt!-- comments and
processing instructions --gt lt!DOCTYPE
sdsc_play_groups SYSTEM http//localserver/spg.dt
dgt lt!-- comments and processing instructions --gt
ltsdsc_play_groupsgt ltplay_group ID"Data-issues"gt
ltmember_groupsgt ltgroupgtScientific
Computinglt/groupgt ltgroupgtData Intensive
Computinglt/groupgt ltgroupgtSecurity
Technologieslt/groupgt lt/member_groupsgt
ltcharter sourceXPG"/gt lturlgthttp//www.sd
sc.edu/marciano/XML/xpg.htmllt/urlgt
lttitlegtXML Play Grouplt/titlegt lt/play_groupgt
lt/sdsc_play_groupsgt
lt!-- comments and processing instructions --gt
39
White Space
  • XML specification normalizes different
    line-ending conventions to a single convention
    but preserves all other white space, except in
    attribute values.
  • White Space and the XML Declaration According
    to the current XML 1.0 standard, white space is
    not allowed before the XML declaration. If white
    space appears before the XML declaration, it will
    be treated as a processing instruction. The
    information, particularly the encoding, may not
    be used by the parser.
  • White Space in Element Content XML parsers are
    required to report all white space that appears
    in element content within a document. For this
    reason, the following three documents are
    different to an XML parser.

40
White Space
  • White Space in Attributes Although XML
    processors preserve all white space in element
    content, they frequently normalize it in
    attribute values. Tabs, carriage returns, and
    spaces are reported as single spaces. In certain
    types of attributes, they trim white space that
    comes before or after the main body of the value
    and reduce white space within the value to single
    spaces. If a DTD is available, this trimming
    will be performed on all attributes that are not
    of type CDATA. If there is no DTD, the parser
    assumes that all attributes are of type CDATA.
  • For the above example, an XML parser reports
    both attribute values as "this is a note.",
    converting the line breaks to single spaces.
  • End of Line Handling XML processors treat the
    character sequence Carriage Return-Line Feed
    (CRLF) like single CR or LF characters. All are
    reported as a single LF character.

41
Namespaces
By using XML namespaces, authors can qualify
element names uniquely on the Web and thus avoid
conflicts between elements that have the same
name. Associating a Universal Resource Identifier
(URI) with a namespace is purely to ensure that
two elements with the same name can remain
unambiguous it does not matter what, if
anything, the URI points to.
42
Identifying Vocabularies XML Namespaces
  • My element may not be your element
  • geometry context ltelementgtlinelt/elementgt
  • chemistry context ltelementgtoxygenlt/elementgt
  • SGML/XML context ....
  • use XML namespaces to identify the vocabulary

43
XML Namespaces
  • mechanism for globally unique tag names
  • lthhtml xmlnsxdc"http//www.xml.com/books"
  • xmlnsh"http//www.w3.org/HTML/1998/htm
    l4"gt
  • lthheadgtlthtitlegtBook Reviewlt/htitlegtlt/hheadgt
  • ...
  • ltxdcbookreviewgt
  • ltxdctitlegtXML A Primerlt/xdctitlegt
  • ...
  • lt/hhtmlgt
  • mix of different tag vocabularies without
    confusion
  • namespaces only identify the vocabulary
    additional mechanisms required for structure and
    meaning of tags

44
XPath, XLink, and XPointer
  • XPath
  • a declarative language for locating nodes and
    fragments in XML trees
  • used in both XPointer (for addressing) and XSL
    (for pattern matching)
  • XLink
  • a generalization of the HTML link concept
  • higher abstraction level (intended for general
    XML - not just hypertext)
  • more expressive power (multiple destinations,
    special behaviours, out-of-line links, ...)
  • uses XPointer to locate resources
  • XPointer
  • an extension of XPath suited for linking
  • specifies connection between XPath expressions
    and URIs

45
XML Path Language XPath
  • W3C Recommendation Nov. 1999
  • for addressing parts within an XML document
  • (non-XML) syntax used for XSLT and XPointer
  • Find the root element (bookstore) of this
    document
  • /bookstore
  • Find all author elements anywhere within the
    current document
  • //author

46
XML Linking Language (XLink)
  • W3C Candidate Recommendation, July/2000
  • language for typed links between documents
  • extends the simple untyped href links in HTML
  • multidirectional links
  • any element can be the source (not just lta ... gt
    lt/agt)
  • link to arbitrary positions within a document
    (via URIs and XPointer)
  • richer custom applications possible
  • xlinktype declaration simple, extended,
    locator, arc
  • optional "semantic attributes" role, arcrole,
    title
  • Example

ltauthor xmlnsxlink"... " xlinkhref"....itmav
en.com/peter.html" xlinktitle"Peter's
homepage" xlinkrole"further info about the
book authorgt Peter Pan Sr. lt/authorgt
47
XML Pointer Language (XPointer)
  • W3C Candidate Recommendation, June/2000
  • for locating internal structures of XML documents
  • XLinks URIs can include XPointer parts
  • extends HTML's named anchors
  • target doc lta name"target"gt ... lt/agt
  • source doc lta href"target"gt...lt/agt
  • ... and select via XPath expressions
  • some extension (points and ranges, ...)
  • Example
  • intro/14/3 ("intro" is an ID attribute value)
  • /1/2/5/14/3
  • xpointer(id("chap1")) xpointer(//_at_id"chap
    1")

48
Four Common Errors
The XML syntax is very strict Elements must have
both a start and end tag, or they must use the
special empty element tag attribute values must
be fully quoted there can be only one top-level
element and so on. A strict syntax was a design
goal for XML. The browser vendors asked for it.
HTML is very lenient, and HTML browsers accept
anything that looks vaguely like HTML. It might
have helped with the early adoption of HTML but
now it is a problem. Studies estimate that more
than 50 of the code in a browser deals with
errors or the sloppiness of HTML authors.
Consequently, an HTML browser is difficult to
write, it has slowed competition, and it makes
for mega-downloads. The four most common errors
in writing XML code are
  • Forget End Tags
  • Forget that XML Is Case-Sensitive
  • Introduce Spaces in the Name of Element
  • Forget the Quotes for Attribute Value

49
Four Common Errors (cont)
  • Forget End Tags
  • Forget that XML Is Case-Sensitive
  • Introduce Spaces in the Name of Element
  • Forget the Quotes for Attribute Value

ltaddressgt ltstreetgt34 Fountain Square
Plaza ltregiongtOHlt/regiongt ltpostal-codegt45202lt/po
stal-codegt ltlocalitygtCincinnatilt/localitygt ltcoun
trygtUS lt/addressgt
lttelgt513-744-7098lt/telgt ltTELgt513-744-7098lt/TELgt
lttelgt513-744-7098lt/TELgt
ltaddress bookgt ltentrygt ltnamegtJohn
Doelt/namegt ltemail hrefmailtojdoe_at_emailaholic.
com/gt lt/entrygt lt/address bookgt
lttel preferredtruegt513-744-8889lt/telgt
lttel preferredtruegt513-744-8889lt/telgt
50
XML Document Map
XML Declaration
Processing Instruction
DOCTYPE Declaration
Comment
Root Element
Namespace
Element
Entity Reference
Start Tag
End Tag
CDATA Section
Attribute
Textual Content
51
DOCUMENT TYPE DEFINITION(DTD)
52
XML Document Types
  • Well-formed XML Document
  • Conforms to the basic XML syntax
  • Can be parsed without regard to the DTD
  • Valid XML Document
  • Well-formed
  • Conforms to its DTD

53
Document Type Definition (DTD)
  • Firstly and most importantly a DTD can define a
    class of document. Classes are very powerful
    concepts in programming, because if you have a
    class it has an expected structure which means it
    will have consistent behaviors and properties.
    You should also be able to carry out certain
    predefined operations on a class of documents.
  • If you use a DTD you can force a writer to
    include certain elements. You can't enforce them
    to put PCDATA content in the element, but that's
    another story.
  • If you are planning to display a document using a
    style sheet, A DTD will ensure that you do not
    include elements that do not have any display
    instructions.
  • If you are planning to search the document, or
    otherwise manipulate it using the DOM, a rigid
    structure will simplify your coding, and speed up
    the execution of your code, by a huge factor.

54
In Search of the Lost Structure Semantics
How do I share structure and metadata/semantics
with my community?
How do I learn and use the element
structure of a document?
How to make all this automatable?
55
Adding Structure and Semantics
  • XML Document Type Definitions (DTDs)
  • define the structure of "allowed" documents
    (i.e., valid wrt. a DTD)
  • ? database schema
  • gt improve query formulation, execution, ...
  • XML Schema
  • defines structure and data types
  • allows developers to build their own libraries of
    interchanged data types
  • XML Namespaces
  • identify your vocabulary

56
XML DTDs as Extended Context Free Grammars
XML DTD
lt!ELEMENT bibliography papergt lt!ELEMENT paper
(authors,fullPaper?,title,booktitle)gt lt!ELE
MENT authors authorgt
Grammar
lhs element (name) rhs regular expression
over elements strings (PCDATA)
57
Document Type Definitions (DTDs)
Define and Constrain Element Names Structure
lt!ELEMENT bibliography papergt lt!ELEMENT paper
(authors, fullPaper?, title, booktitle)gt lt!ELEMENT
authors authorgt lt!ELEMENT author
(PCDATA)gt lt!ATTLIST author age CDATAgt lt!ELEMENT
fullPaper EMPTYgt lt!ELEMENT title
(PCDATA)gt lt!ELEMENT booktitle (PCDATA)gt
Element Type Declaration
Attribute List Declaration
58
Element Declarations
Sequence of 0 or more papers
Authors followed by optional fullpaper, followed
by title, followed by booktitle
lt!ELEMENT bibliography (paper)gt lt!ELEMENT paper
(authors, fullPaper?, title, booktitle)gt lt!ELEMENT
authors (author)gt lt!ELEMENT author
(PCDATA)gt lt!ATTLIST author age
CDATAgt lt!ELEMENT fullPaper EMPTYgt lt!ELEMENT
title (PCDATA)gt lt!ELEMENT booktitle (PCDATA)gt
Sequence of 1 or more authors
Character content
59
Element Content Declarations
60
Attribute Types (DTD)
Type
Meaning
ID
Token unique within the document
IDREF
Reference to an ID token
IDREFS
Reference to multiple ID tokens
ENTITY
External entity (image, video, )
ENTITIES
External entities
CDATA
Character data
NMTOKEN
Name token
NMTOKENS
Name tokens
NOTATION
Data other than XML
Choices
Enumeration
INCLUDE IGNORE declarations
Conditional Sec
Attributes may be REQUIRED, IMPLIED (optional)
can have default values, which may be
FIXED
61
Attribute Types (DTD)
62
Attribute-Specification Parameters
63
Attribute Declarations
lt!ELEMENT bibliography papergt lt!ELEMENT paper
(authors, fullPaper?, title, booktitle)gt lt!ELEMENT
authors (author)gt lt!ELEMENT author
(PCDATA)gt lt!ELEMENT fullPaper EMPTYgt lt!ELEMENT
title (PCDATA)gt lt!ELEMENT booktitle
(PCDATA)gt lt!ATTLIST fullPaper source ENTITY
REQUIREDgt lt!ATTLIST person pid IDgt lt!ATTLIST
author authorRef IDREFgt
Pointer (IDREF) and target (ID) declarations
for intra document pointers
64
XML Attribute
ltperson pidjoyce"gt lt/persongt
ltbibliographygt ltpaper pubid"wsa"
role"publication"gt ltauthorsgt ltauthor
authorRefjoyce age???gt J. L. R. Colina
lt/authorgt lt/authorsgt ltfullPaper
source"http//...confusion"/gt lttitlegtObject
Confusion in a Deviator System lt/titlegt
ltrelated papers "deviation101 x_deviators"/gt
lt/papergt lt/bibliographygt
Object Identity Attribute
CDATA (character data)
IDREF intradocument reference
Reference to external ENTITY
65
XML Attribute
66
Uses of XML Entities
  • Physical partition
  • size, reuse, "modularity", (both XML docs
    DTDs)
  • Non-XML data
  • unparsed entities ? binary data
  • Non-standard characters
  • character entities
  • Shorthand for phrases markup,
  • gt effectively are macros

67
Types of Entities
  • Internal (to a doc) vs. External (? use URI)
  • General (in XML doc) vs. Parameter (in DTD)
  • Parsed (XML) vs. Unparsed (non-XML)

68
Internal Text Entities
DTD
Internal Text Entity Declaration
lt!ENTITY WWW "World Wide Web"gt
XML
Entity Reference
ltpgtWe all use the WWW.lt/pgt
Logically equivalent to actually appearing
ltpgtWe all use the World Wide Web.lt/pgt
69
Entities Physical Structure
Mylife.xml
A logical element can be split into multiple physi
cal entities
DTD...
ltmylifegt
Chap1.xml
ltteengtyada yada lt/teengt
Chap2.xml
ltadultgtblah blah.. lt/adultgt
lt/mylifegt
70
External Text Entities
DTD
External Text Entity Declaration
lt!ENTITY chap1 SYSTEM "http//...chap1.xml"gt
URL
Entity Reference
XML
ltmylifegt chap1 chap2lt/mylifegt
Logically equivalent to inlining file contents
ltmylifegt ltteengtyada yadalt/teengt ltadultgt blah
blahlt/adultgt lt/mylifegt
71
Unparsed ( "Binary") Entities
DTD
... and unparsed entity
Declare external...
lt!ENTITY fusion SYSTEM "http//... fusion.ps"
NDATA psgt
Declare attribute type to be entity
lt!ATTLIST fullPaper source ENTITY REQUIREDgt
XML
Element with ENTITY attribute
ltfullPaper source"fusion"/gt
NOTATION declaration (helper app)
lt!NOTATION ps SYSTEM "ghostview.exe"gt
72
Pure XML Model (DTD)
  • Any DTD myDTD defines a language valid(myDTD)
  • valid(myDTD) docs D D is valid wrt. myDTD
  • lt!ELEMENT A (B,C)gt
  • lt!ELEMENT B (PCDATA)gt

Content ("container") model A contains one B,
followed by any number of Cs
B is a leaf, contains actual data
ltAgt ltBgtfoolt/Bgt ltCgtbarlt/Cgt ltCgtlablt/Cgt lt/Agt
73
Data Modeling with DTDs
  • XML element types "object types"
  • content model for children elements "subobject
    structure"
  • recursive types (container analogy!?)
  • lt!ELEMENT A (BC)gt "an A can contain a B..."
  • lt!ELEMENT B (AC)gt "... which contains an A!"
  • lt!ELEMENT C (PCDATA)gt
  • found in doc world document DIVision (generic
    block-level container)
  • loose typing
  • lt!ELEMENT A ANYgt "so what's in the box,
    please??"
  • no context-sensitive types
  • DTDs cannot distinguish between the publisher in
  • ltjournalgt ltpublishergt... lt/publishergt lt/journalgt
  • ltwebsitegt ltpublishergt ... lt/publishergt lt/websitegt
  • gt renaming hack ltj_pubgt and ltw_pubgt
  • gt DTD extensions (XML SCHEMA)

74
Where is the Data??
  • Actual data can go into leaf elements and/or
    attributes
  • Common/good practice (!?)
  • XML element container (object)
  • XML element type (tag) container (object) type
  • XML attribute properties of the container as a
    whole ("metadata")
  • XML leaf elements contain actual data
  • Problems with DTDs
  • no data types
  • no specialization/extension of types
  • no "higher level" modeling (classes,
    relationships, constraints, etc.)

75
Extending DTDs Data Modeling Approaches
  • XML main stream XML Schema
  • data types
  • user defined types, type extensions/restrictions
    ("subclassing")
  • cardinality constraints
  • XML side streams
  • RELAX (REgular Language description for XML), SOX
    (Schema for Object-Oriented XML), Schematron, ...
  • alternative approach
  • use well-established data modeling formalisms
    like (E)ER, UML, ORM, OO models, ...
  • ... and just encode them in XML!
  • e.g. UML XMI (standardized, has much moregtbig),
    UXF (UML eXchange Format)

76
How to use DTD?
  • DTD can be declared internal(local subset) or
    external XML file.
  • data elementit contains only sub elements with
    no intervening text.
  • a document element, it is defined to include both
    text and sub elements.
Write a Comment
User Comments (0)
About PowerShow.com