Title: FT2283 Web Development
1FT228/3 Web Development
Introduction to XML
2Definition
3Introduction
- XML is an important technology used throughout
web applications - XML stands for EXtensible Markup Language
- XML is a markup language much like HTML
- XML was designed to describe data
- XML tags are not predefined in XML. You must
define your own tags
4Introduction
- XML uses a Document Type Definition (DTD) or an
XML Schema to describe the data - XMl documents are human readable
- XMl documents end with .xml e.g. note.xml
5Introduction
- The Web was created to publish information for
people - eyes-only was dominant design perspective
- Hard to search
- Hard to automate processing
6Introduction
- The Web is using XML to become a platform for
information exchange between computers (and
people) - Overcomes HTMLs inherent limitations
- Enables the new business models of the network
economy
7eXtensible Markup Language
- Instead of a fixed set of format-oriented tags
like - HTML, XML allows you to create whatever set of
- tags are needed for your type of information.
- This makes any XML instance self-describing
- and easily understood by computers and people.
- XML-encoded information is smart enough to
- support new classes of Web and e-commerce
- applications.
8Why XML?
- Sample Catalog Entry in HTML
- ltTITLEgt Laptop Computer lt/TITLEgt
- ltBODYgt
- ltULgt
- ltLIgt IBM Thinkpad 600E
- ltLIgt400 MHz
- ltLIgt 64 Mb
- ltLIgt8 Gb
- ltLIgt 4.1 pounds
- ltLIgt 3200
- lt/ULgt
- lt/BODYgt
- How can I parse the content? E.g. price?
- Need a more flexible mechanism than HTML to
interpret content.
9Why XML?
Sample Catalog Entry using XMl
- ltCOMPUTER TYPELaptopgt
- ltMANUFACTURERgtIBMlt/MANUFACTURERgt
- ltLINEgt ThinkPadlt/LINEgt
- ltMODELgt600Elt/MODELgt
- ltSPECIFICATIONSgt
- ltSPEED UNIT MHzgt400lt/SPEEDgt
- ltMEMORY UNITMBgt64lt/MEMORYgt
- ltDISK UNITGBgt8lt/DISKgt
- ltWEIGHT UNITPOUNDgt4.1lt/WEIGHTgt
- ltPRICE CURRENCYUSDgt3200lt/PRICEgt
- lt/SPECIFICATIONSgt
- lt/COMPUTERgt
10Smart processing using XMl
- ltCOMPUTERgt and ltSPECIFICATIONSgt provide logical
containers for extracting and manipulating
product information as a unit - Sort by ltMANUFACTURERgt, ltSPEEDgt,
- ltWEIGHTgt, ltPRICEgt, etc.
- Explicit identification of each part enables
- its automated processing
- Convert ltPRICEgt from USD to Euro, Yen,etc.
11Document exchange
- Use of XMl allows companies to exchange
information that can be processed automatically
without human intervention e.g. - Purchase orders
- Invoices
- Catalogues etc
12Difference between HTML and XML?
- XML was designed to carry data.
- XML is not a replacement for HTML.
- XML and HTML were designed with different goals
- XML was designed to describe data and to focus on
what data is.HTML was designed to display data
and to focus on how data looks. - HTML is about displaying information, while XML
is about describing information.
13XML Describes Data
- XML was not designed to DO anything.
- XML is created to structure, store and to send
information BUT it requires software to send it,
receive it or display it - The following example is a note to Angela from
Jim, stored as XML - ltnotegt
- lttogtAngelalt/togt
- ltfromgtJimmylt/fromgt ltheadinggtReminderlt/headinggt
- ltbodygtGet the paper!lt/bodygt lt/notegt
14XML Describes Data
15XML is free and extensible
- XML tags are not predefined. You must "invent"
your own tags. - The tags used to mark up HTML documents and the
structure of HTML documents are predefined. The
author of HTML documents can only use tags that
are defined in the HTML standard (like ltpgt, lth1gt,
etc.). - XML allows the author to define his own tags and
his own document structure. - The tags in the example above (like lttogt and
ltfromgt) are not defined in any XML standard.
These tags are "invented" by the author of the
XML document.
16XML Document Structure
- XMl Declaration
- ltrootgt ltchildgt ltsubchildgt.....lt/subchildgt lt/c
hildgtlt/rootgt
17XML Document Example
lt?xml version"1.0" encoding"ISO-8859-1"?gt
ltnotegt lttogtAngelalt/togt ltfromgtJimmylt/fromgt lthead
inggtReminderlt/headinggt ltbodygtGet the
paper!lt/bodygt lt/notegt
18XML Syntax rules
- 1. All elements must have a closing lt/gt tag
- e.g. in HTML its okay to write
- ltpgt this is a paragraph
- HTML allows the closing tags to be left out.
- In XMl, would have
- ltpgt this is a paragraph lt/pgt
- except for the XMl declaration
19XML Syntax rules
- 2. All elements are case sensitive
- Unlike HTML, XMl tags are case sensitive
ltlettergt this is a incorrect lt/Lettergt
ltlettergt this is a incorrect lt/lettergt
20XML Syntax rules
- 3. All elements must be properly nested
ltbgtltigt this is a incorrect in XMLlt/bgtlt/igt
acceptable in HTML but not XML
ltbgtltigt this is a correct in XML lt/igtlt/bgt
21XML Syntax rules
- 4. All XML documents must have a root element
- All XML documents must contain a single tag pair
to define a root element - all other elements
must be within this root element. - All elements can have sub elements (child
elements). - Sub elements must be correctly nested within
their parent element
ltrootgt ltchildgt ltsubchildgt.....lt/subchildgt lt/ch
ildgtlt/rootgt
22XML Syntax rules
- 5. Attribute values must always be in quotations
- XML elements can have attributes in name/value
pairs just like in HTML. In XML the attribute
value must always be quoted. Study the two XML
documents below. The first one is incorrect, the
second is correct - Incorrect correct
lt?xml version etc ltnote date12/11/2002gt
lttogtTovelt/togt ltfromgtJanilt/fromgt lt/notegt
lt?xml version etc ltnote date12/11/2002gt
lttogtTovelt/togt ltfromgtJanilt/fromgt lt/notegt
23XML Syntax rules
24child/parent elements
ltbookgt lttitlegtMy First XMLlt/titlegt ltprod
id"33-657" media"paper"gtlt/prodgt
ltchaptergtIntroduction to XML ltparagtWhat is
HTMLlt/paragt ltparagtWhat is XMLlt/paragt
lt/chaptergt ltchaptergtXML Syntax
ltparagtElements must have taglt/paragt
ltparagtElements must be nestedlt/paragt
lt/chaptergt lt/bookgt
Note Indentation helps readability..
Book is the root element. Title, prod, and
chapter are child elements of book. Book is the
parent element of title, prod, and chapter.
Title, prod, and chapter are siblings (or sister
elements) because they have the same parent.
25child elements vs attributes
ltnotegt ltdategt ltdaygt12lt/daygt
ltmonthgt11lt/monthgt ltyeargt2002lt/yeargt lt/dategt
lttogtTovelt/togt ltfromgtJanilt/fromgt
ltheadinggtReminderlt/headinggt ltbodygtHello!lt/bodygt
lt/notegt
2.
ltnote date"12/11/2002"gt lttogtTovelt/togt
ltfromgtJanilt/fromgt ltheadinggtReminderlt/headinggt
ltbodygtHello!lt/bodygt lt/notegt
1.
Both 1 and 2 contain exactly the same
information. 1 uses an attribute for the date. 2
uses child elements. Usually easier to use child
elements easier to read and maintain.
26Well formed XMl documents
A well formed XMl document adhere to syntax
rules described
Can check whether an XML document has valid
syntax at http//www.w3schools.com/dom/dom_valid
ate.asp OR can simply open your xml document in
a web browser such as Internet Explorer and it
will only open if correctlyformatted
27To manually create or write an XML document
- Figure out what the overall document is (-gt root
tag) - Figure out what the key fields of information..
(-gt tag names) - Figure out which information is data (-gt tag
contents)
28To manually create or write an XML document
Example Write an XML document for the
following Address book Name Telephone
Address Jeremy Cannon 0098837 22,
Marlboro Ct Naomi Murphy 992887 39, Alma
Road Sheila Zheng 999287 Apt 4, Hyde
Road
29To manually create or write an XML document
- Figure out what the overall document is (-gt root
tag) - --- Address book
- Figure out what the key fields of information..
(-gt tag names)--- Information is Name (which can
be broken into first name, surname), Telephone,
Address (which can be broken down into house
number, street name) - Figure out which information is data ( tag
contents) - number, street name)
30To manually create or write an XML
document example
ltaddressbookgt ltperson ltnamegt
ltfirstnamegt Jeremy lt/first namegt
ltsurnamegt Cannonlt/surnamegt lt/namegt
lttelephonegt 0098837lt/telephonegt
ltaddressgt lthousenumbergt22lt/house
numbergt ltstreetgt Marlboro Ctlt/streetgt
lt/addressgt lt/person ltpersongt
ltnamegt ltfirstnamegt Jeremy lt/first
namegt etc lt/person lt/addressbook
gt
31To check your document..
- Save it as .xml file
- Open it in a browser and see if any errors are
producedOR - go to a specialist XML validator for better error
diagnosis - e.g. http//www.w3schools.com/dom/dom_validat
e.asp
32Valid XMl documents
- A valid XMl document adheres a definition of
what it can contain (e.g. shouldnt be able to
put 13 as a ltmonthgt, must have only allowed
elements) - Two ways to define this definition
- Document Type Definition (DTD)
- XML SchemaBrief intro to both
33DTD Declaration
You usually specify the DTD for your XML
document by providing a reference to it near the
top of the XMl document.
lt?xml version"1.0"?gt lt!DOCTYPE note SYSTEM
"note.dtd"gt ltnotegt lttogtTovelt/togt
ltfromgtJanilt/fromgt ltheadinggtReminderlt/headinggt
ltbodygtDon't forget me this weekend!lt/bodygt lt/note
gt
Syntax lt!Doctype root-element SYSTEM filename
34Document Type Definitions
The DTD (note.DTD) for XMl document Note is .
lt!DOCTYPE note lt!ELEMENT note
(to,from,heading,body)gt lt!ELEMENT to (PCDATA)gt
lt!ELEMENT from (PCDATA)gt lt!ELEMENT heading
(PCDATA)gt lt!ELEMENT body (PCDATA)gt gt
Lists the document type (note) and the valid
elements, and the type of content they can accept
35Why use a DTD?
With DTD, each of your XML files can carry a
description of its own format with it. With a
DTD, independent groups of people can agree to
use a common DTD for interchanging data. Your
application can use a standard DTD to verify that
the data you receive from the outside world is
valid.
36XML Schema
- XML Schema is an XML based alternative to DTD.
- An XML schema describes the structure of an XML
document. - XML Schemas will probably be used in most Web
applications as a replacement for DTDs because - XML Schemas are supported by the W3C
- XML Schemas are richer and more useful than DTDs
- XML Schemas are written in XML
- XML Schemas support data types
- XML Schemas are extensible to future additions
- XML Schemas support namespaces
37XML Schema
XML Schema is an XML based alternative to
DTD. An XML schema describes the structure of an
XML document. It uses XMl syntax. For XMl
document note.xml XML Schema shown overleaf
lt?xml version"1.0"?gt ltnotegt lttogtTovelt/togt
ltfromgtJanilt/fromgt ltheadinggtReminderlt/headinggt
ltbodygtDon't forget me this weekend!lt/bodygt
lt/notegt
38XML Schema Example
lt?xml version"1.0"?gt ltxsschema
xmlnsxs"http//www.w3.org/2001/XMLSchema"
targetNamespace"http//www.w3schools.com"
xmlns"http//www.w3schools.com"
elementFormDefault"qualified"gt ltxselement
name"note"gt ltxscomplexTypegt
ltxssequencegt ltxselement name"to"
type"xsstring"/gt ltxselement name"from"
type"xsstring"/gt ltxselement
name"heading" type"xsstring"/gt
ltxselement name"body" type"xsstring"/gt
lt/xssequencegt lt/xscomplexTypegt
lt/xselementgt lt/xsschemagt
root element
defines the elements allowed inthe .xml document
39XML Schema
- XML Schemas will probably be used in most Web
applications as a replacement for DTDs because - XML Schemas are supported by the W3C
- XML Schemas are richer and more useful than DTDs
- XML Schemas are written in XML
- XML Schemas support data types
- XML Schemas are extensible to future additions
- XML Schemas support namespaces
40 41Uses of XML
- To exchange data between incompatible
systems(just send an XML document, with an
agreed definition of the tags) - For B2B e-commerce exchange of business
documents between businesses - XML is flexible
enough to describe any logical text structure
e.g. Purchase order, invoice - To store data as plain text files, or in
databases - To create new mark-up languages (I.e. that uses
tags) Can use XML to agree what the tags mean.
Many mark-up languages already created that have
been based on XML e.g. JSTL, WML, VoiceXML,
XHTML
42Using an XMl document
Need an XML Parser to use or parse out the
data held in the XMl document
43XML Parsers
- An XML parser does the following
- Retrieves and read the an XML document I.e.
parses the document to figure out whats in it,
- Ensures the document adheres to specific
standards (e.g. well formed? Adheres to DTD?) - Makes the document contents available to your
application
44XML Document parsers
- If you application is going to use XML documents,
you could write your own parser - But makes sense to use a pre-built parser
- E.g. Java provides an XML parser API that can be
used in any java application that processes XMl
document - Saves on development work
45XMl Document Parsers
- Hundreds of parsers available
- Most parsers are based on two main interfaces
- Tree based Document Object Model (DOM)
- Event based Simple API for XMl (SAX)
46XML Parsers Tree based DOM interface
- Uses Document Object Model (DOM)
- Tree based interface (navigates through the
document) - Developed by W3C
- XML parsers that use DOM exist for java,
javascript, perl, C
47Tree based DOM parser - example
- Object/Tree Interface (DOM)
- Definition Parser reads the XML
- document, and creates an in-memory
- tree of data an object module of the data
- For example
- Given a sample XML document on the next
- slide, what kind of tree would be
- produced?
48Tree based DOM parser - example
- Sample XML Document
- lt?xml version"1.0" encoding"UTF-8"?gt
- lt!DOCTYPE WEATHER SYSTEM "Weather.dtd"gt
- ltWEATHERgt
- ltCITY NAME"Hong Kong"gt
- ltHIgt87lt/HIgt
- ltLOWgt78lt/LOWgt
- lt/CITYgt
- lt/WEATHERgt
49Tree based DOM parser - example
50XML Parsers Event based SAX parser
- Simple API for XML
- Event based
- Developed by volunteers on the XML-dev mailing
list - http//www.megginson.com/SAX/
51Event based SAX parser
- Event Based Parser
- Definition Parser reads the XML
- document, and generates events for
- each parsing event.
- They dont create an in memory object model of
the document its up to the programmer to write
the code to interpret the events - For example
- Given the same XML document, what kind
- of events would be produced?
52Event based SAX parser example
- lt?xml version"1.0" encoding"UTF-8"?gt
- lt!DOCTYPE WEATHER SYSTEM "Weather.dtd"gt
- ltWEATHERgt
- ltCITY NAME"Hong Kong"gt
- ltHIgt87lt/HIgt
- ltLOWgt78lt/LOWgt
- lt/CITYgt
- lt/WEATHERgt
53Event based SAX parser example
- Events generated
- 1. Start of ltWeathergt Element
- 2. Start of ltCITYgt Element
- 3. Start of ltHIgt Element
- 4. Character Event 87
- 5. End of lt/HIgt Element
- 6. Start of ltLOWgt Element
- 7. Character Event 78
- 8. End of lt/LOWgt Element
- 9. End of lt/CITYgt Element
- 10. End of lt/WEATHERgt Element
54Event based parsers
- For each of these events, the your
- application implements event
- handlers.
- Each time an event occurs, a different
- event handler is called.
- Your application intercepts these
- events, and handles them in any way
- you want.
-
55Comparing tree based DOM parser with event based
SAX parser
- Questions
- Which parser is faster?
- Which parser is more efficient?
- Which parser is suitable for which type of XML
documents? -
56Comparing tree based DOM parser with event based
SAX parser
Event based Faster Takes up much less
memory But More complex to implement Good for
large, machine generated, structured documents
e.g. book contents (because repetitive nature of
tags allows for re-use of event handling code and
therefore less work for programmer Good where
only parts of the document needed at any one time
within the document (event based parsers cannot
skip around from one part of the document to
the other
- Tree based
- slower
- takes up more memory
- Simpler to useMore suitable for documentsthat
are less structured, with less repetition of
tags. - More suitable where the program needs to move
around the document alot within the program ?
need to keep easy access to full document at all
time. -
57Comparing tree based DOM parser with event based
SAX parser
- Performance and Memory
- Therefore, when high performance and
- low-memory are the most important
- criteria, use an event-based parser.
- Examples
- Java applets
- Palm Pilot Applications
- Parsing Huge Data files
58Storing XML documents
- Can use XML for data storage e.g. to store news
headlines, business documents - Q How to store XML documents in a database?
59Storing XML documents
- Choices
- Keep as XML files.. (filename.xml)
- Put into a relational database and convert
to/from XMl format - Use a native XML database
60Storing XML documents
- Keep as XML files.. (filename.xml)
- ---- Fast for small number of users-----Eliminate
s overheads of database connections - ---- Large number of users -gt concurrency issues
- -----Poor for high volume read/write
- -----Security/visibility
61Storing XML documents
- Put into a relational database and convert
to/from XMl format as needed - ---- Provides ACID support to ensure integrity
of access to the data - ---- Assumes data can become tabular in format
(usually data used for transport..) - ---- Poor for data that is not easily transformed
into table-based structures e.g. Word processor
documents
62Storing XML documents
- Store in a Native XML database
- -----Native XML databases are databases designed
especially to store XML documents. - ---- A native XML database is one that treats XML
documents and elements as the fundamental
structures rather than tables, records, and
fields. - ---- Good for XMl documents that are for human
consumption..Content.. (e.g. books, emails) - ---- ---- Provides ACID support to ensure
integrity of access to the data
63Storing XML documents
- Store in a Native XML database(continued)
- ----- Good when XMl documents needs to be
returned (but most applications need data
returned in other formats).. - Query languages evolving (e.g. XQuery) but no
equivalent yet of SQL update/insert/delete.. - New technology
- (e.g. open source dB eXist)