XML Fundamentals - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

XML Fundamentals

Description:

An XML document contains text, never binary data ... Rule 7: No unescaped or & signs may occur in the character data of an element or attributes ... – PowerPoint PPT presentation

Number of Views:19
Avg rating:3.0/5.0
Slides: 23
Provided by: ccNct
Category:

less

Transcript and Presenter's Notes

Title: XML Fundamentals


1
XML Fundamentals
2
XML Documents and XML Files
  • An XML document contains text, never binary data
  • An XML document can be opened with any program
    that knows how to read a text file
  • It is usual to give a .xml extension file name
  • MIME media type application/xml or text/xml

ltpersongt Alan Turing lt/persongt
3
Elements, Tags, and Character Data
  • The previous example is composed of a single
    element named person
  • Start-tag ltpersongt
  • End-tag lt/persongt
  • Everything between start-tag and end-tag is
    called content
  • Content encompasses real information
  • Whitespace is part of the content, though many
    applications will choose to ignore it
  • ltpersongt and lt/persongt are markup
  • Alan Turing and its surrounding whitespace are
    character data

4
Tag Syntax
  • Like HTML tags
  • Start-tags begin with lt and end-tags begin with
    lt/
  • Both of start-tags and end-tags are followed by
    the name of the element and are closed by gt
  • You are allowed to make up new XML tags
  • Tag names generally reflect the type of content
    inside the element, not how that content will be
    formatted
  • Case sensitivity
  • ltPersongt ? ltPERSONgt ? ltpersongt

5
Empty Element
  • Empty element elements that have no content
  • For the value of their attributes
  • Example
  • ltemail hrefmailtogtlt/emailgt
  • Shorthand notation
  • ltemail hrefmailto /gt

6
XML Trees
  • Elements can contain other elements that in turn
    can contain text or elements and so on
  • Start and end tags must always be balanced and
    children are always completed enclosed in their
    parents
  • ltnamegtltfnamegtJacklt/fnamegtltlnamegtSmithltnamegtlt/lname
    gt
  • Illegal
  • Parent Child Sibling
  • Each element (except the root element) has
    exactly one parent element
  • An XML document is a tree of elements
  • Root (document) element the first element in the
    document and the element that contains all other
    elements

7
A Tree Diagram for Example 2-2
8
Tree of the Address Book
addressbook
entry
entry
name
address
tel
tel
email
name
tel
email
fname
lname
street
region
postal-code
locality
country
9
Mixed Content
  • The dichotomy between elements that contain only
    character data and elements that contain only
    child elements is common in data-oriented XML
    documents
  • Mixed content some elements may contain
    sub-elements and raw data
  • Common in XML documents containing articles,
    essays, stories, books, novels, reports, web
    pages document-oriented applications
  • Example 2-3

10
Attributes
  • Attach additional information to elements
  • An attribute is a name-value pair attached to an
    elements start-tag
  • One element can have more than one attribute
  • Name and value are separated by and optional
    whitespace
  • Attribute value is enclosed in double or single
    quotation marks
  • lttel preferredtruegt03-5712121lt/telgt
  • Attribute order is not significant
  • Example 2-4

ltperson born1912-06-23 died1954-06-07gt
Alan Turing lt/persongt
11
Attributes and Elements
  • When and whether one should use child elements or
    attributes to hold information?
  • Attributes are for metadata about the element,
    while elements are for the information itself
  • Each element may have no more than one attribute
    with a given name
  • The value of attribute is simply a text string
    limited in structure
  • An element-based structure is a lot more flexible
    and extensible
  • If you are designing your own XML vocabulary, it
    is up to you to decide when to use which

12
XML Names
  • Rules for naming elements, attributes
  • May contain essentially any alphanumeric
    character and non-english letters, numbers, and
    ideograms
  • May contain underscore(_), period (.), and hyphen
    (-)
  • XML may not contain whitespace of any kind
  • All names beginning with the string xml (in any
    combination of case) are reserved for
    standardization in W3C XML-related specifications
  • Start with either letters. ideograms and
    underscore (_)
  • No limit to the name length

13
XML Names (Cont.)
  • HTML elements in XML are always in uppercase
  • XML elements are frequently written in lowercase
  • When a name consists of several words, the words
    are usually separated by a hyphen (-)
  • address-book
  • OR
  • The first letter of each word in XML elements are
    frequently in uppercase and no separation
    character
  • AddressBook

14
Entity References
  • What if the character data inside an element
    contains lt ?
  • Entity reference when an application parses an
    XML document, it replaces the entity reference
    with the actual characters to which the entity
    reference refers
  • Entity references are markups
  • XML predefines 5 entity references you can
    define more
  • lt the less-than sign
  • amps the ampersand ()
  • gt the greater-than sign
  • quot the straight, double quotation marks (")
  • apos the straight single quote (')
  • Example 2-5

15
CDATA Sections
  • What if your character data have a lot of lt, ,
    ', "
  • Enclose the character data in a CDATA section
  • lt!CDATA .. gt
  • Everything inside a CDATA section is treated as
    raw character data not markup
  • The only thing that cannot appear in a CDATA
    section is the CDATA section end delimiter gt
  • Example 2-6

16
Comments
  • XML documents can be commented so that coauthors
    can leave notes for each other and themselves
  • Begin with lt!-- and end with the first occurrence
    of --gt
  • The double hyphen -- should not appear anywhere
    inside the comment until the closing --gt
  • Comments may appear anywhere in the character
    data of a document
  • Comments may appear before or after the root
    element
  • Comments may not appear inside a tag or inside
    another comment
  • Comments are strictly for making the raw source
    code of an XML document more legible to human
    readers

17
The XML Declaration
  • XML documents should (but not have to) begin with
    an XML declaration
  • The XML declaration must be the first thing in
    the document
  • It must not be preceded by any comments,
    whitespace
  • Example 2-7
  • An XML declaration specifies encoding and
    standalone
  • Encoding specify the character set used in the
    XML document
  • Default to Unicode/UTF8
  • Standalone if the value is "no", then an
    application may be required to read an external
    DTD to determine the proper values for parts of
    the document

18
Rules for Well-Formed XML
  • Rule 1 Mandatory closing tags
  • The set of tags is unlimited but all container
    tags must have end tags
  • Example of legal XML
  • ltpersongtltnamegt Hao-Ren Ke lt/namegtlttitlegt
    Associate Professor lt/titlegtltagegt 25
    lt/agegtlt/persongt
  • Rule 2 There must be exactly one root element

19
Rules for Well-Formed XML (Cont.)
  • Rule 3 Proper element nesting
  • All tags must be nested correctly. Like HTML, XML
    can intermix tags and text, but tags may not
    overlap each other.
  • Legal XML
  • ltpersongt Hao-Ren Ke is an ltrolegt
    pioneerlt/rolegtforltservicegtComputerized
    Interlibrary Loanlt/servicegt in Taiwanlt/persongt
  • Illegal XML
  • ltpersongtltnamegtClavenlt/namegtltkeypointgtlthdgtXML
    provides a data buslt/hdgtlt/persongtltmoregtlt/moregtlt/
    keypointgt

20
Rules for Well-Formed XML (Cont.)
  • Rule 4 Attribute values must be single or double
    quoted
  • Legal
  • lttag attributevaluegt
  • lttag attributevaluegt
  • Illegal
  • ltfont size6gt
  • Rule 5 An element may not have two attributes
    with the same name
  • Rule 6 Comments and processing instructions may
    not appear inside tags
  • Rule 7 No unescaped lt or signs may occur in
    the character data of an element or attributes

21
Rules for Well-Formed XML
  • Rule 8 Empty elements must be written in an
    abbreviated form using special XML syntax.
  • Legal
  • ltBR /gt
  • ltHR /gt
  • ltTITLEgtlt/TITLEgt is equivalent to ltTITLE/gt
  • Illegal
  • ltBRgt
  • ltHRgt

22
Four Common Errors
  • Forget End Tags
  • Forget that XML is Case Sensitive
  • Introduce Spaces in the Name of Element
  • Forget the Quotes for Attribute Value
Write a Comment
User Comments (0)
About PowerShow.com