Title: EXtensible Markup Language XML
1EXtensible Mark-up Language (XML)
- ASTINFO E-Information Services Development and
Administration Training
Oct. 22 26, 2001, Richmonde Hotel, Pasig City,
Manila, Philippines
Melvin H. Ambrosio SRS I, Science Technology
Information Institute
2Introduction to XML
3What is XML
- Stands for EXtensible Mark-up Language.
- Jon Bosak of Sun Microsystems is considered by
many as the father of XML. - XML is a markup language much like HTML.
- XML was designed to describe data.
- XML tags are not predefined in XML. You must
define your own tags. - XML uses a DTD (Document Type Definition) to
describe the data. - XML with a DTD is designed to be self-descriptive.
4XML does not DO anything
- XML was not designed to DO anything.
- XML is created as a way to structure, store and
send information - Above example is just pure information wrapped in
XML tags.
ltnotegt lttogtGwenlt/togt ltfromgtJohnlt/fromgt ltheadinggtRe
minderlt/headinggt ltbodygtDont forget me this
weekend!lt/bodygt lt/notegt
5XML is a complement to HTML
- XML is not a replacement for HTML.
- XML will be used to describe data, while HTML
will be used to format and display the same data. - XML is a cross-platform, software and hardware
independent tool for transmitting information.
6How can XML be Used?
- XML can Separate Data from HTML
- With XML, your data is stored outside your HTML.
- XML is used to Exchange Data
- With XML, data can be exchanged between
incompatible systems. - XML and B2B
- With XML, financial information can be exchanged
over the internet. - XML can be used to Share Data
- With XML, plain text files can be used to share
data.
- XML can be used to Store Data
- With XML, plain text files can be used to store
data. - XML can make your Data more Useful
- With XML, your data is available to more users.
- XML can be used to Create new Languages
- XML is the mother of WAP and WML.
7General Structure of XML
8Logical Structure
- The Prolog first structural element in the XML
document. Divided into two basic components - XML declaration
- Document Type declaration
- The Document Element - comes directly after the
prolog, and contains all the data in your XML
document.
9Making an XML Declaration
- A simple XML declaration
- lt?xml version1.0 encoding standalone ?gt
- lt?xml starts the XML declaration
- version describes the XML version being used.
1.0 is the current and only version of XML. - encoding allows authors to specify the
character encoding that they are using. Most
common is UTF-8. - standalone allows the author to specify whether
external markup declarations may exist. Can be
equal to yes or no. - ?gt - closes the XML declaration.
10Rules in XML
- You define your own tags.
- All XML elements must have a closing tag.
- E.g. ltyeargt2001lt/yeargt
- XML tags are case sensitive.
- E.g. ltyeargt2001lt/Yeargt - WRONG!
- ltyeargt2001lt/yeargt - CORRECT!
- Empty elements can use the empty-element tag.
- E.g. ltyear/gt
- Element names must begin with a letter or an
underscore (_) followed by letters, digits,
underscores, hyphens and periods. Spaces are not
allowed.
11Rules in XML
- All XML tags must be properly nested.
- All XML documents must have a root tag.
- E.g. ltrootgt
- ltchildgt
- ltsubchildgtlt/subchildgt
- lt/childgt
- lt/rootgt
- Attribute values must always be quoted.
- lt?xml version1.0?gt
- With XML, white space is preserved.
- With XML, CR / LF is converted to LF.
- Comments can used like in HTML.
- E.g. lt!-- Comments --gt
12XML Example
lt?xml version"1.0" encoding"ISO8859-1"
?gt ltCATALOGgt ltCDgt ltTITLEgtEmpire
Burlesquelt/TITLEgt ltARTISTgtBob Dylanlt/ARTISTgt
ltCOUNTRYgtUSAlt/COUNTRYgt ltCOMPANYgtColumbialt/C
OMPANYgt ltPRICEgt10.90lt/PRICEgt
ltYEARgt1985lt/YEARgt lt/CDgt
ltCDgt ltTITLEgtHide your heartlt/TITLEgt
ltARTISTgtBonnie Tylorlt/ARTISTgt
ltCOUNTRYgtUKlt/COUNTRYgt ltCOMPANYgtCBS
Recordslt/COMPANYgt ltPRICEgt9.90lt/PRICEgt
ltYEARgt1988lt/YEARgt lt/CDgt lt/CATALOGgt
13XML Sample Output
14XML Elements
- XML Elements are Extensible
- XML Documents can be extended to carry more
information. - XML Elements have Relationships
- Elements are related as parents and children.
- Elements have Content
- Elements can have different content types.
- Element Naming
- XML Elements follow strict naming rules.
15XML Elements
- XML Elements are Extensible
ltnotegt lttogtGwenlt/togt ltfromgtJohnlt/fromgt ltbodygtDont
forget me this weekend!lt/bodygt lt/notegt
ltnotegt lttogtGwenlt/togt ltfromgtJohnlt/fromgt ltheadinggtRe
minderlt/headinggt ltbodygtDont forget me this
weekend!lt/bodygt lt/notegt
MESSAGE To Gwen From John Dont forget me
this weekend!
16XML Elements
- XML Elements have Relationships
ltbookgt lttitlegtMy First XMLlt/titlegt ltprod
id1234 mediapapergtlt/prodgt ltchaptergtIntroduc
tion to XML ltparagtWhat is HTMLlt/paragt ltparagtWhat
is XMLlt/paragt lt/chaptergt ltchaptergtXML
Syntax ltparagtElements must have a closing
taglt/paragt ltparagtElements must be properly
nestedlt/paragt lt/chaptergt
Book is the root element. Title and chapter are
child elements of book. Book is the parent
element of both title and chapter. Title and
chapter are siblings (or sister elements) because
they have the same parent.
17XML Elements
- Elements have Content
- An element can have element content, mixed
content, simple content, or empty content. An
element can also have attributes. - E.g. The book example has element content,
because it contains other elements. Chapter has
mixed content because it contains both text and
other elements. Para has simple content (or text
content) because it contains only text. Prod has
empty content and it has attributes.
18Checking the Logical Structure
- Run the XML document you have created through a
parser. - The parser checks the documents for two things
- Well-formedness well formed means that the
document applies to the syntax rules for XML. - Validity document must be well formed and it
conforms to a DTD.
19Well formed XML
- It contains a root element.
- All other elements are children of the root
element. - All elements are correctly paired.
- The element name in the start-tag and an end-tag
are exactly the same. - Attribute names are used only once within the
same element.
20Well formed XML Document
lt?xml version"1.0" encoding"ISO8859-1"
?gt ltnotegt lttogtTovelt/togt ltfromgtJanilt/fromgt
ltheadinggtReminderlt/headinggt ltbodygtDon't forget
me this weekend!lt/bodygt lt/notegt
21Valid XML
- Documents must be well formed.
- Document must apply to the rules as defined in a
Document Type Definition (DTD) - No possibility to use a tag thats not defined in
the DTD. - Since a valid document is also well formed,
theres no possibility for typos in the tags.
22Valid XML Document
?xml version"1.0"?gt lt!DOCTYPE note lt!ELEMENT
note (to,from,heading,body)gt lt!ELEMENT to
(PCDATA)gt lt!ELEMENT from (PCDATA)gt
lt!ELEMENT heading (PCDATA)gt lt!ELEMENT body
(PCDATA)gt gt ltnotegt lttogtTovelt/togt ltfromgtJanilt/fro
mgt ltheadinggtReminderlt/headinggt ltbodygtDon't forget
me this weekend!lt/bodygt lt/notegt
23Physical Structure
- The actual content (text) of your XML instance.
- The content is contained in chunks of information
called entities. - Internal entities defined completely within the
document itself. - External not located within the main document,
but draw their content from an external file or
source. - Entity References are used in XML instead of
specific characters that would otherwise be
interpreted as part of the markup - amp gt gt quote
- alt lt apos
24XML Attributes
- XML elements can have attributes.
- Attribute values must always be enclosed in
quotes, but either single or double quotes can be
used. - Data can be stored in child elements or in
attributes - Attributes are handy in HTML.
- Avoid using attributes in XML.
- Use child elements if the information feels like
data.
25XML Attributes
- XML Elements can have attributes
- Attributes often provide information that is not
part of the data. - E.g. ltfile typegifgtcomputer.giflt/filegt
- File type is irrelevant to the data, but
important to the software that wants to
manipulate the element.
26XML Attributes
- Attribute values must always be enclosed in
quotes - E.g. ltperson sexfemalegt
- ltperson sexfemalegt
- Double quotes are the most common, but sometimes
(if the attribute value itself contains quotes)
it is necessary to use single quotes. - E.g. ltgangster nameAl Scarface Caponegt
27XML Attributes
- Data can be stored in child elements or in
attributes
ltperson sexfemalegt ltfirstnamegtAnnalt/firstn
amegt ltlastnamegtSmithlt/lastnamegt lt/persongt
ltpersongt ltsexgtfemalelt/sexgt
ltfirstnamegtAnnalt/firstnamegt
ltlastnamegtSmithlt/lastnamegt lt/persongt
sex is an attribute
sex is a child element
28XML Attributes
- Store data in child elements
ltnotegt ltdategt ltmonthgt09lt/monthgt
ltdaygt20lt/daygt ltyeargt2001lt/yeargt lt/dategt lttogtG
wenlt/togt ltfromgtJohnlt/fromgt ltbodygtDont forget me
this weekend!lt/bodygt lt/notegt
ltnote date09/20/2001gt lttogtGwenlt/togt ltfromgtJohnlt
/fromgt ltbodygtDont forget me this
weekend!lt/bodygt lt/notegt
date as attribute
ltnotegt ltdategt09/20/2001lt/dategt lttogtGwenlt/togt ltfrom
gtJohnlt/fromgt ltbodygtDont forget me this
weekend!lt/bodygt lt/notegt
date as expanded element
date as element
29Problems in using Attributes
- Attributes cannot contain multiple values (child
elements can) - Attributes are not easily expandable
- Attributes cannot describe structures (child
elements can) - Attributes are more difficult to manipulate by
program code - Attribute values are not easy to test against a
DTD
30Exception to the Attribute rule
- Assign ID references to elements.
- Metadata (data about data) should be stored as
attributes, and data itself should be stored as
elements.
ltmessagesgt ltnote ID501gt
lttogtGwenlt/togt ltfromgtJohnlt/fromgt
ltheadinggtReminderlt/headinggt
ltbodygtDont forget me this weekend!lt/bodygt
lt/notegt ltnote ID502gt
lttogtJohnlt/togt ltfromgtGwenlt/fromgt
ltheadinggtRe Reminderlt/headinggt
ltbodygtI will not!lt/bodygt lt/notegt lt/messagesgt
31XML Document Type Definition
32XML Document Type Definition
- A DTD defines the legal elements of an XML
document. - A DTD is placed inside the prolog of the
document, directly after the XML declaration, but
before the actual document data begins. - lt?xml version1.0 standaloneyes?gt
- lt!DOCTYPE COLLECTION
- lt!ELEMENT COLLECTION (PCDATA)gt
- gt
- ltCOLLECTIONgt
- This is the outermost element of our
example - lt/COLLECTIONgt
33DTD Syntax
- lt?xml version1.0 standaloneyes?gt
- lt!DOCTYPE COLLECTION
- lt!ELEMENT COLLECTION (PCDATA)gt
- gt
- ltCOLLECTIONgt
- This is the outermost element of our
example - lt/COLLECTION
- A DTD always starts with the lt!DOCTYPE and ends
with gt. - Directly after the lt!DOCTYPE comes the name of
the root element, followed by a . - Between the two square brackets comes all of the
element and attribute declarations, including one
for the root element.
34DTD Elements
- DTD elements uses the following syntax
- lt!ELEMENT
- Followed by the name of the element
- Followed by a description of the element.
- E.g. lt!ELEMENT note (PCDATA) or ANYgt
lt!DOCTYPE note lt!ELEMENT note (to, from,
heading, body)gt lt!ELEMENT to (PCDATA)gt
lt!ELEMENT from (PCDATA)gt lt!ELEMENT heading
(PCDATA)gt lt!ELEMENT body (PCDATA)gt gt
35DTD Data
- The description (PCDATA) stands for parsed
character data, while ANY means all possible
elements and parsed character data are allowed
inside the tag. - Its the tag that is shown and also will be
parsed (interpreted) by the program that reads
the XML document. - You can also define (CDATA), this stands for
character data. - CDATA will not be parsed or shown.
36Sub Elements
lt!ELEMENT note (to,from,heading,body)gt
lt!ELEMENT to (PCDATA)gt lt!ELEMENT from
(PCDATA)gt lt!ELEMENT heading (PCDATA)gt
lt!ELEMENT body (PCDATA)gt
This means that the element note has sub
elements to, from, heading, and body. Each
subtype can contain characters.
37Sub Element Numbers and Choices
- , - sequence operators separate members of a
sequence list, which require sequential use of
all members. - - choice operators separate members of a choice
list, which require use of one and only one
member. - - this non-symbol indicates a required
occurrence. - - indicates a required and repeatable
occurrence. - - indicates an optional and repeatable
occurrence. - ? indicates an optional occurrence.
- E.g. lt!ELEMENT animal (winglegsize)gt
38Empty Elements
- Empty elements get the description EMPTY.
- lt!ELEMENT separator EMPTYgt
- This could define a separator line to be shown if
the XML document appears in a browser.
39A DTD Walkthrough
- Each RECORD must contain information on exactly
one ARTIST and one TITLE. Furthermore, it may
contain information on YEAR, LABEL, TIME , RATING
and COMMENT, but this is not required. In
addition to this, each RECORD must contain at
least one DISC. - Except for DISC, all the other elements in RECORD
must contain character data and not other
elements. - Each DISC must contain at least one TRACK.
- The TRACK element can contain one of the two
elements NAME and TIME of the individual songs on
the CD, but this is optional.
- We are dealing with a DTD of the type
"collection". This means that the root element
must be called ltCOLLECTIONgt. - The element type declaration for the root element
states that it may contain any number of RECORD
elements - nothing else.
40Attributes and DTD
- Use the lt!ATTLISTgt tag for declaring attributes
in a DTD. - ltYEAR functionreleasegt1988lt/YEARgt
- Element YEAR has a function attribute, which has
the value release. - lt!ATTLIST YEAR function CDATA releasegt
- Ideally placed after the element declaration in a
DTD. - After the lt!ATTLISTgt tag, specify the name of the
element which contains an attribute. E.g. YEAR - Specify the name of the attribute for the said
element. E.g. function - Specify the kind of attribute. Most common is
CDATA, which means the attribute can contain text
and nothing else. - The last item deals with the value the attribute
takes on if it has not been specified in the
element. This is called the default value.
41Dealing with Default Values
- The problem with declaring a default value is
that the author of a document may not always have
a particular value that can be used as default.
Instead of specifying a default value yourself,
you can use certain keywords to do one of three
things - Require the author to specify a value (any
value) - Allow the value to be omitted
- Force the use of a given value.
- This is done by using one of these keywords
instead of the attribute value IMPLIED ,
REQUIRED, or FIXED. - E.g. lt!ATTLIST TRACK number CDATA IMPLIED size
CDATA IMPLIEDgt
42Dealing with Default Values
- REQUIRED this keyword is used in situations
where the author of a DTD wants to force the
users to provide a value for a particular
attribute. - IMPLIED this keyword is used in situations
where the author of the DTD wants to provide the
users with the possibility of adding an attribute
value, without forcing them to do so. - FIXED this option is used when you want to
provide specific default value and you dont want
it to be changed by anyone. Also, the least used
keyword.
43Entities in XML
- An entity is an item that holds data, like a
database record, an image file or a text
document. General entity references are used to
merge text into already existing documents and
must obey certain rules - General Entity references must always begin with
an ampersand ( ). - General Entity references must always end with a
semicolon ( ). - General Entity references are case-sensitive.
- General Entity references are composed of
alphanumeric characters. - General Entity references must be declared in a
DTD, unless you are using the five pre-defined
entity references. - Pre-defined Entity References in XML
- Ampersand ( ) - amp
- Left tag ( lt ) - lt
- Right tag ( gt ) - gt
- Apostrophe ( ) - apos
- Quotation marks ( ) - quot
44Entities in DTD
- To create your own entity references in a
document, you will need to declare them in the
DTD with the lt!ENTITYgt tag. - E.g. lt!ENTITY tp Tom Petty and the
Heartbreakersgt - We can now use the entity tp in our document.
- Entity references are used when you have repeated
occurrences of one particular string of text. - Inserting a pre-defined entity reference
- lt!ENTITY tp Tom Petty amp the Heartbreakersgt
45Entities in DTD
- You can also use your own entity references in a
DTD, but they are subject to a couple of
restrictions - Unlike elements and attributes, the position of
the entity declaration is not irrelevant in XML.
Entity references must be declared before they
are used. - General Entity references can not insert text
that will be a part of the DTD and not the
document content. - You can not use general entity references to
replace keywords like PCDATA for example. - If you want to use entity references to replace
parts of the DTD itself, use parameter entity
references.
46Entities in DTD
- Parameter entity references are very similar to
General entity references, with these two very
important distinction - Parameter entity references begin with a percent
sign ( ). - Parameter entity references can only occur inside
the DTD. - E.g. lt!ENTITY pc (PCDATA)gt
- Instead of typing ltELEMENT ARTIST (PCDATA)gt, we
can now use ltELEMENT ARTIST pcgt - One Problem parameter entity references are not
allowed in XML documents with an internal DTD. We
need to separate the DTD from the rest of the XML
document.
47Internal DTD
- A DTD can be be included in the XML document
itself. - lt?xml version1.0?gt
- lt!DOCTYPE name of root-element
- Followed by the element definitions.
- Closed with gt
48External DTD
- External DTDs is everything between the two
square brackets in your internal DTD. Save this
with a .dtd extension. - After the XML declaration, XML document use the
text - lt!DOCTYPE name of root-element SYSTEM addressgt
- lt!DOCTYPE Start the document type declaration
- name - names the document type being defined
- SYSTEM Indicates that a system identifier
(address), which follows, must be read and
resolved.
49XML with Internal DTD
lt?xml version"1.0" standalone"yes" ?gt lt!DOCTYPE
CATALOG lt!ELEMENT CATALOG (CD)gt lt!ELEMENT
CD (TITLE, ARTIST, COUNTRY, COMPANY, PRICE,
YEAR)gt lt!ENTITY cr "Columbia"gt lt!ENTITY ar
"Atlantic"gt lt!ELEMENT TITLE (PCDATA)gt
lt!ELEMENT ARTIST (PCDATA)gt lt!ELEMENT COUNTRY
(PCDATA)gt lt!ELEMENT COMPANY (PCDATA)gt
lt!ELEMENT PRICE (PCDATA)gt lt!ELEMENT YEAR
(PCDATA)gt gt
ltCATALOGgt ltCDgt ltTITLEgtEmpire
Burlesquelt/TITLEgt ltARTISTgtBob Dylanlt/ARTISTgt
ltCOUNTRYgtUSAlt/COUNTRYgt ltCOMPANYgtcrlt/COMPA
NYgt ltPRICEgt10.90lt/PRICEgt
ltYEARgt1985lt/YEARgt lt/CDgt lt/CATALOGgt
50XML with External DTD
lt?xml version"1.0" standalone"no" ?gt lt!DOCTYPE
CATALOG SYSTEM "catalog.dtd"gt ltCATALOGgt ltCDgt
ltTITLEgtEmpire Burlesquelt/TITLEgt ltARTISTgtBob
Dylanlt/ARTISTgt ltCOUNTRYgtUSAlt/COUNTRYgt
ltCOMPANYgtcrlt/COMPANYgt ltPRICEgt10.90lt/PRICEgt
ltYEARgt1985lt/YEARgt lt/CDgt lt/CATALOGgt
lt!ELEMENT CATALOG (CD)gt lt!ELEMENT CD (TITLE,
ARTIST, COUNTRY, COMPANY, PRICE, YEAR)gt
lt!ENTITY cr "Columbia"gt lt!ENTITY ar
"Atlantic"gt lt!ENTITY pc "(PCDATA)"gt
lt!ELEMENT TITLE pcgt lt!ELEMENT ARTIST pcgt
lt!ELEMENT COUNTRY pcgt lt!ELEMENT COMPANY
pcgt lt!ELEMENT PRICE pcgt lt!ELEMENT YEAR
pcgt
catalog.dtd
Cd_catalog.xml
51Viewing XML
52XML in Internet Explorer 5.0
- Internet Explorer 5.0 has the following XML
support - Viewing of XML documents
- Full support for W3C DTD standards
- XML embedded in HTML as Data Islands
- Binding XML data to HTML elements
- Transforming and displaying XML with XSL
- Displaying XML with CSS
- Access to the XML DOM
53XML in Internet Explorer 5.0
- IE 5.0 also has support for Behaviors
- Behaviors is a Microsoft-only technology
- Behaviors can separate scripts from an HTML page.
- Behaviors can store XML data on the clients disk.
54Viewing XML files with IE 5.0
- To view an XML document, click on a link, type
the URL in the address bar, or double-click the
XML file. - It displays XML document with color coded root
and child elements. A plus () and minus sign (-)
to the left of the elements can be clicked to
expand or collapse the element structure.
55(No Transcript)
56Viewing an invalid XML File
- If an erroneous XML file is opened with IE, IE
will report the error.
lt?xml version"1.0"?gt ltnotegt lttogtTovelt/Togt
ltfromgtJanilt/fromgt ltheadinggtReminderlt/headinggt
ltbodygtDon't forget me this
weekend!lt/bodygt lt/notegt
57XML Applications
58XML Examples (CD Catalog)
59XML Examples (Simple Food Menu)
60XML Application (CD Catalog)
Start with an XML document
61XML Application (CD Catalog)
Load the document into a Data Island
ltxml idxmldso async"false" src"cd_catalog.xml"gt
lt/xmlgt
- In your HTML document add the preceding line
right after your ltbodygt tag. - With the example code above, the XML file
cd_catalog.xml will be loaded into an
invisible Data Island called xmldso. - The asyncfalse attribute is added to the Data
Island to make sure that all XML data is loaded
before any other HTML processing takes place.
62XML Application (CD Catalog)
ltTABLEgt ltTBODYgt ltTRgt ltTDgtTitlelt/TDgt
ltTDgtltSPAN dataFldTITLE idtitle
dataSrcxmldsogtlt/SPANgtlt/TDgtlt/TRgt ltTRgt
ltTDgtArtistlt/TDgt ltTDgtltSPAN dataFldARTIST
idartist dataSrcxmldsogtlt/SPANgtlt/TDgtlt/TRgt
ltTRgt ltTDgtYearlt/TDgt ltTDgtltSPAN
dataFldYEAR idyear dataSrcxmldsogtlt/SPANgtlt/TDgtlt
/TRgt ltTRgt ltTDgtCountrylt/TDgt ltTDgtltSPAN
dataFldCOUNTRY idcountry dataSrcxmldsogtlt/SPANgt
lt/TDgtlt/TRgt ltTRgt ltTDgtCompanylt/TDgt
ltTDgtltSPAN dataFldCOMPANY idcompany
dataSrcxmldsogtlt/SPANgtlt/TDgtlt/TRgt
ltTRgtltTDgtPricelt/TDgt ltTDgtltSPAN dataFldPRICE
idprice dataSrcxmldsogtlt/SPANgtlt/TDgtlt/TRgt lt/TBODY
gtlt/TABLEgt ltPgtltINPUT onclickmoveprevious()
typebutton value"Previous CD"gt ltINPUT
onclickmovenext() typebutton value"Next CD"gt
lt/Pgtlt/BODYgtlt/HTMLgt
- Bind the Data Island
- To HTML Tables
- Using ltspangt or
- ltdivgt elements.
- Add a data source
- attribute to the table
- Add data field
- attributes to ltspangt
- elements inside the
- table data.
63XML Application (CD Catalog)
- Add Navigational Scripts
- to your XML
- Create a script that calls
- the movenext() and
- moveprevious() methods
- of the data island.
ltSCRIPT languageJavaScriptgt function
movenext() if (xmldso.recordset.absolutepositio
n lt xmldso.recordset.recordcount) xmldso.recor
dset.movenext() function moveprevious() if
(xmldso.recordset.absoluteposition gt
1) xmldso.recordset.moveprevious() lt/SCRI
PTgt
64XML Application (Web Output)
65References
- www.w3schools.com/
- www.hit.uib.no/vemund/xml/
- www.spiderpro.com/
66Thank you!!!