XML Basics - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

XML Basics

Description:

The combination of characters that make up an XML document ... letters, digits, hyphens, underscores, colons, or full stops, together known as name characters. ... – PowerPoint PPT presentation

Number of Views:148
Avg rating:3.0/5.0
Slides: 24
Provided by: UFO
Category:

less

Transcript and Presenter's Notes

Title: XML Basics


1
XML Basics
  • From Chapter 31 of The XML Handbook by Goldfarb
    and Presco

2
Content
  • Syntactic Details
  • Prolog vs. Instance
  • The Logical Structure
  • Elements
  • Attributes
  • The Prolog
  • Markup Miscellany

3
Syntax
  • The combination of characters that make up an XML
    document
  • We are talking about where you can put angle
    brackets, quote marks, ampersands, and other
    characters and where you cannot!

4
Case-Sensitivity
  • XML is case-sensitive.
  • XML is not case-prejudiced.
  • You have the freedom to create your own names or
    text, you can choose to use upper- or lower-case
    text.

5
Markup and Data
lt?xml version1.0?gt lt!DOCTYPE MEMO SYSTEM
memo.dtdgt ltmemogt ltfromgt ltnamegtPaul
Prescodlt/namegt ltemailgtpaprescod_at_prescod.comlt/em
ailgt lt/fromgt lttogt ltnamegtCharles
Goldfarblt/namegt ltemailgtcharles_at_sgmlsource.comlt/
emailgt lt/togt ltsubjectgtAnother Memo
Examplelt/subjectgt ltbodygt ltparagraphgt...lt/paragraph
gt lt/bodygt lt/memogt
Markup to be understood by the XML processor
Character data to be understood by other human
beings
6
Markup and Data
  • Spec. Reference 31-1
  • Markup takes the form of start-tags, end-tags,
    empty-element tags, entity references, character
    references, comments, CDATA section delimiters,
    document type declarations, and processing
    instructions.

7
White Space
  • The invisible characters
  • space (Unicode/ASCII 32),
  • tab (Unicode/ASCII 9),
  • carriage return (Unicode/ASCII 13) and
  • line feed (Unicode/ASCII 10).
  • You may put as many of these characters as you
    want in any combination, when the XML
    specification says that white space is allowed at
    a particular point.

White Space 3 S (x20 x9 xD xA)
8
White Space
  • White spaces outside of markup is always
    preserved in XML and
  • white space within markup may be
  • preserved,
  • ignored, and
  • sometimes combined in weird, and wonderful ways.

9
Names and Name Tokens
  • When using XML, you will have to give things
    names.

A Name is a token beginning with a letter or one
of a few punctuation characters, and continuing
with letters, digits, hyphens, underscores,
colons, or full stops, together known as name
characters. Names beginning with the string
"xml", or any string which would match (('X''x')
('M''m') ('L''l')), are reserved for
standardization in this or future versions of
this specification.
10
Names and Name Tokens
An name token is any mixture of name characters.
Names and Tokens 4 NameChar Letter Digit
'.' '-' '_' '' CombiningChar
Extender 5 Name (Letter '_' '')
(NameChar) 6 Names Name (S Name) 7
Nmtoken (NameChar) 8 Nmtokens Nmtoken
(S Nmtoken)
11
Literal Strings
  • Literal strings allow users to use funny
    (non-name) characters within markup.

Literal data is any quoted string not containing
the quotation mark used as a delimiter for that
string. Literals are used for specifying the
content of internal entities (EntityValue), the
values of attributes(AttValue), and external
identifiers (SystemLiteral). Note that a
SystemLiteral can be parsed without scanning for
markup.
ltREFERENCE URLhttp//www.documents.com/document.
xmlgt
12
Prolog vs. Instance
  • An XML document is broken up into two main parts
    a prolog and a document instance.
  • The prolog provides information about the
    interpretation of the document instance, such as
    the version of XML and the document type to which
    it conforms.
  • The document instance, following the prolog,
    contains the actual document data organized as a
    hierarchy of elements.

13
The Logical Structure
14
The Logical Structure
  • Experts refer to an elements real-world meaning
    as its semantics.
  • If you find yourself reading or writing markup
    and asking
  • But what does that mean?
  • then you are asking about semantics.

15
Elements
  • XML elements break down into two categories
  • elements containing characters and
  • empty elements.

39 element EmptyElemTag STag content
Etag 40 STag 'lt' Name (S Attribute) S? 'gt'
41 Attribute Name Eq AttValue 42 ETag
'lt/' Name S? 'gt 43 content (element
CharData Reference
CDSect PI Comment) 44 EmptyElemTag
'lt' Name (S Attribute) S? '/gt'
lttitlegtThis is the titlelt/elementgt ltEMPTY-ELEMENT
ATTRARRIVAL/gt
16
Attributes
  • Attributes are a way of attaching characteristics
    or properties to elements of a document.
  • Attributes have semantics. They always mean
    something.

ltperson height165cmgtDale Wicklt/persongt ltperson
height165cm weight161lbgtBill Bunnlt/persongt
ltFROMgtltNAMEgtPaul Prescodlt/NAMEgt
ltEMAILgtpapresco_at_prescod.comlt/EMAILgt lt/FORMgt
ltFROM NAMEPaul Prescod
EMAILpapresco_at_prescod.com/gt
17
The Prolog
  • XML documents should start with a prolog that
    describes
  • the XML version,
  • document type, and
  • other characteristics of the document.
  • The prolog is made up of
  • an XML declaration and
  • a document type declaration.
  • (both optional)
  • The XML declaration must precede the element type
    declaration, if both are provides.

lt?xml version1.0gt lt!DOCTYPE DOCBOOK SYSTEM
http//www.davenport.org/docbookgt
18
The Prolog
22 prolog XMLDecl? Misc (doctypedecl
Misc)? 23 XMLDecl 'lt?xml' VersionInfo
EncodingDecl? SDDecl? S? '?gt' 24 VersionInfo
S 'version' Eq (' VersionNum ' " VersionNum
") 25 Eq S? '' S? 26 VersionNum
(a-zA-Z0-9_. '-') 27 Misc Comment
PI S 80 EncodingDecl S 'encoding' Eq ('"'
EncName '"' "'" EncName "'" ) 81 EncName
A-Za-z (A-Za-z0-9._ '-')
19
Document Type Declaration
  • The document type declaration declares the
    document type that is in use in the document.
  • The document type declaration is the heart of the
    concept of structural validity, which makes
    applications based on XML robust and reliable.

20
Predefined Entities
  • Solution to protecting certain characters from
    markup interpretation
  • predefined entities and
  • CDATA sections.

Predefined entities
21
Predefined Entities
22
CDATA Sections
23
Comments
Write a Comment
User Comments (0)
About PowerShow.com