Title: Why use XML
1Why use XML?
2Suns slogan
- Java XML
- portable programs portable data
3Markup
- Information added to data to enhance its
meaning - Identification of parts, boundaries and
relationships of elements within documents - Identification of attributes
Without markup, most documents appear as
meaningful to machines as this document does to
humans
4Whats wrong with HTML?
- lthtmlgt
- ltheadgt
- lttitlegt
- Martins page
- lt/titlegt
- lt/headgt
- ltbody bgcolourffffffgt
- ltp aligncentergt
- Some text or other
- lt/pgt
- lt/bodygt
- lt/htmlgt
- HTML is the most successful electronic publishing
language ever invented, but .... - .... HTML is for presentation, not content,
limiting its applicability
5XML example
- lt?xml version"1.0" encoding"UTF-8"?gt
- ltCurriculumgt
- ltCourse Title"Z" Lect"Bogdanov"gt
- ltLecturegtPropslt/Lecturegt
- ltLecturegtPredicateslt/Lecturegt
- ltLecturegtSetslt/Lecturegt
- lt/Coursegt
- lt/Curriculumgt
6XML example
- lt?xml version"1.0" encoding"UTF-8"?gt
- ltCurriculumgt
- ltCourse Title"Z" Lect"Bogdanov"gt
- ltLecturegtPropslt/Lecturegt
- ltLecturegtPredicateslt/Lecturegt
- ltLecturegtSetslt/Lecturegt
- lt/Coursegt
- lt/Curriculumgt
7XML example
- Encodes
- boundaries
- roles
- eg course v lecture
- lt?xml version"1.0" encoding"UTF-8"?gt
- ltCurriculumgt
- ltCourse Title"Z" Lect"Bogdanov"gt
- ltLecturegtPropslt/Lecturegt
- ltLecturegtPredicateslt/Lecturegt
- ltLecturegtSetslt/Lecturegt
- lt/Coursegt
- lt/Curriculumgt
8XML example
- Encodes
- boundaries
- roles
- eg course v lecture
- positions
- lt?xml version"1.0" encoding"UTF-8"?gt
- ltCurriculumgt
- ltCourse Title"Z" Lect"Bogdanov"gt
- ltLecturegtPropslt/Lecturegt
- ltLecturegtPredicateslt/Lecturegt
- ltLecturegtSetslt/Lecturegt
- lt/Coursegt
- lt/Curriculumgt
9XML example
- Encodes
- boundaries
- roles
- eg course v lecture
- positions
- containment
- eg lecture is part of course
- lt?xml version"1.0" encoding"UTF-8"?gt
- ltCurriculumgt
- ltCourse Title"Z" Lect"Bogdanov"gt
- ltLecturegtPropslt/Lecturegt
- ltLecturegtPredicateslt/Lecturegt
- ltLecturegtSetslt/Lecturegt
- lt/Coursegt
- lt/Curriculumgt
10XML example
- Encodes
- boundaries
- roles
- eg course v lecture
- positions
- containment
- eg lecture is part of course
- attributes
- eg title
- lt?xml version"1.0" encoding"UTF-8"?gt
- ltCurriculumgt
- ltCourse Title"Z" Lect"Bogdanov"gt
- ltLecturegtPropslt/Lecturegt
- ltLecturegtPredicateslt/Lecturegt
- ltLecturegtSetslt/Lecturegt
- lt/Coursegt
- lt/Curriculumgt
11A brief history of markup
- GML Generalised Markup Language
- Developed in 60s and 70s by IBM
- Used for IBM technical manuals
- SGML Standardised GML
- 70s, 80s with ANSI standard in 1983
- Flexible and very general, but difficult and
costly - HTML
- Early 90s compact markup for hypertext docs
- Now seen as a step backwards
- XML
12XML is
- Simpler than SGML
- More flexible than HTML
- An application of SGML
- Not a markup language, but a toolkit
- however, common to refer to documents as being
written in XML - Surrounded by a family of technologies which
extend its use (eg transformation)
13XML features
- Represent most kinds of information,
unambiguously (unlike HTML) - Easily customizable
- Supports internationalization through UNICODE
- Allows validation of documents
- Easy to read by humans and machines
- Open standard, managed by W3C
14Applications of a better data format
- Better search engines
- find all places selling X
- Customised data presentation from a single source
- HTML
- WML (wireless markup language)
- PDF
- Reliable information exchange
- CML (chemical structures)
- VoxML (voice)
- B2B transactions of all types
- MathML
- etc
15XHTML
Source http//www.w3.org/TR/xhtml1/
16MathML example
- Can be used for display or calculations
- Source http//www.dessci.com/support/tutorials/ma
thml/default.stm
17SVG (scalable vector graphics)
- SVG benefits
- Zooming
- Text stays text. Text in SVG images remains
editable and searchable - Small file size
- Display independence
- Interactivity and intelligence
-
Source http//www-106.ibm.com/developerworks/educ
ation/transforming-xml/xmltosvg
18VoiceXML
Source http//www.w3.org/TR/voicexml20/dml1.3.1
19DocBook
Source http//nis-www.lanl.gov/rosalia/mydocs/do
cbook-intro/get-going.html
20Apache documentation
Source http//jakarta.apache.org/ecs/index.html
21WML
Source http//www.wap-uk.com/Developers/Tutorial.
htm
22A closer look at XML documents
23Simple example
- lt?xml version"1.0" encoding"UTF-8"?gt
- lt!DOCTYPE Curriculum SYSTEM "Curric.DTD"gt
- ltCurriculumgt
- ltCourse Title"Z" Lect"Kyrill Bogdanov"gt
- ltLecturegtPropositionslt/Lecturegt
- ltLecturegtPredicateslt/Lecturegt
- ltLecturegtSetslt/Lecturegt
- lt/Coursegt
- ltCourse Title"UML" Lect"Marian Gheorge"gt
- ltLecturegtUse Caseslt/Lecturegt
- ltLecturegtClass Diagramslt/Lecturegt
- lt/Coursegt
- lt/Curriculumgt
24Document prolog
- indicates that this document is marked up in XML
- Format
- lt?xml propvalue ?gt
- Param values must be quoted with single or double
quotes (unlike HTML)
- lt?xml version"1.0" encoding"UTF-8"?gt
- lt!DOCTYPE Curriculum SYSTEM "Curric.DTD"gt
- ltCurriculumgt
- ltCourse Title"Z" Lect"Kyrill Bogdanov"gt
- ltLecturegtPropositionslt/Lecturegt
- ltLecturegtPredicateslt/Lecturegt
- ltLecturegtSetslt/Lecturegt
- lt/Coursegt
- ltCourse Title"UML" Lect"Marian Gheorge"gt
- ltLecturegtUse Caseslt/Lecturegt
- ltLecturegtClass Diagramslt/Lecturegt
- lt/Coursegt
- lt/Curriculumgt
25Document type declaration
- lt?xml version"1.0" encoding"UTF-8"?gt
- lt!DOCTYPE Curriculum SYSTEM "Curric.DTD"gt
- ltCurriculumgt
- ltCourse Title"Z" Lect"Kyrill Bogdanov"gt
- ltLecturegtPropositionslt/Lecturegt
- ltLecturegtPredicateslt/Lecturegt
- ltLecturegtSetslt/Lecturegt
- lt/Coursegt
- ltCourse Title"UML" Lect"Marian Gheorge"gt
- ltLecturegtUse Caseslt/Lecturegt
- ltLecturegtClass Diagramslt/Lecturegt
- lt/Coursegt
- lt/Curriculumgt
- Specifies validation root
- Format
- lt!DOCTYPE root-element
- SYSTEM dtdgt
- DTD is optional (see later)
26Elements
- lt?xml version"1.0" encoding"UTF-8"?gt
- lt!DOCTYPE Curriculum SYSTEM "Curric.DTD"gt
- ltCurriculumgt
- ltCourse Title"Z" Lect"Kyrill Bogdanov"gt
- ltLecturegtPropositionslt/Lecturegt
- ltLecturegtPredicateslt/Lecturegt
- ltLecturegtSetslt/Lecturegt
- lt/Coursegt
- ltCourse Title"UML" Lect"Marian Gheorge"gt
- ltLecturegtUse Caseslt/Lecturegt
- ltLecturegtClass Diagramslt/Lecturegt
- lt/Coursegt
- lt/Curriculumgt
- Building blocks of XML documents
- One root element
- Format
- ltname attvalue gt
- content
- lt/namegt
- Or
- ltname attval /gt
- None-empty elements must have a closing tag
(unlike HTML)
27Elements
- lt?xml version"1.0" encoding"UTF-8"?gt
- lt!DOCTYPE Curriculum SYSTEM "Curric.DTD"gt
- ltCurriculumgt
- ltCourse Title"Z" Lect"Kyrill Bogdanov"gt
- ltLecturegtPropositionslt/Lecturegt
- ltLecturegtPredicateslt/Lecturegt
- ltLecturegtSetslt/Lecturegt
- lt/Coursegt
- ltCourse Title"UML" Lect"Marian Gheorge"gt
- ltLecturegtUse Caseslt/Lecturegt
- ltLecturegtClass Diagramslt/Lecturegt
- lt/Coursegt
- lt/Curriculumgt
- Elements may contain other elements
- An elements start and end tags must reside
within the same parent (ie boxes cannot overlap)
28Attributes
- lt?xml version"1.0" encoding"UTF-8"?gt
- lt!DOCTYPE Curriculum SYSTEM "Curric.DTD"gt
- ltCurriculumgt
- ltCourse Title"Z" Lect"Kyrill Bogdanov"gt
- ltLecturegtPropositionslt/Lecturegt
- ltLecturegtPredicateslt/Lecturegt
- ltLecturegtSetslt/Lecturegt
- lt/Coursegt
- ltCourse Title"UML" Lect"Marian Gheorge"gt
- ltLecturegtUse Caseslt/Lecturegt
- ltLecturegtClass Diagramslt/Lecturegt
- lt/Coursegt
- lt/Curriculumgt
- Used to identify specific elements, or to
elaborate elements - Values must be quoted (single or double)
- Not always clear whether to use attributes or
elements
29Attributes or elements?
- lt?xml version"1.0" encoding"UTF-8"?gt
- lt!DOCTYPE Curriculum SYSTEM "Curric.DTD"gt
- ltCurriculumgt
- ltCourse Title"Z" Lect"Kyrill Bogdanov"gt
- ltLecturegtPropositionslt/Lecturegt
- ltLecturegtPredicateslt/Lecturegt
- ltLecturegtSetslt/Lecturegt
- lt/Coursegt
- ltCourse Title"UML" Lect"Marian Gheorge"gt
- ltLecturegtUse Caseslt/Lecturegt
- ltLecturegtClass Diagramslt/Lecturegt
- lt/Coursegt
- lt/Curriculumgt
- Attributes shouldnt really hold content
- Attribute order is ignored, whilst element order
is significant - Attributes values can be restricted (see DTDs
later) - Use attributes as unique refererences if needed
30An XML document is a tree
- lt?xml version"1.0" encoding"UTF-8"?gt
- lt!DOCTYPE Curriculum SYSTEM "Curric.DTD"gt
- ltCurriculumgt
- ltCourse Title"Z" Lect"Kyrill Bogdanov"gt
- ltLecturegtPropositionslt/Lecturegt
- ltLecturegtPredicateslt/Lecturegt
- ltLecturegtSetslt/Lecturegt
- lt/Coursegt
- ltCourse Title"UML" Lect"Marian Gheorge"gt
- ltLecturegtUse Caseslt/Lecturegt
- ltLecturegtClass Diagramslt/Lecturegt
- lt/Coursegt
- lt/Curriculumgt
Curriculum
Course
Course
Lecture
Lecture
Lecture
Lecture
Lecture
31 with content at its leaves
- lt?xml version"1.0" encoding"UTF-8"?gt
- lt!DOCTYPE Curriculum SYSTEM "Curric.DTD"gt
- ltCurriculumgt
- ltCourse Title"Z" Lect"Kyrill Bogdanov"gt
- ltLecturegtPropositionslt/Lecturegt
- ltLecturegtPredicateslt/Lecturegt
- ltLecturegtSetslt/Lecturegt
- lt/Coursegt
- ltCourse Title"UML" Lect"Marian Gheorge"gt
- ltLecturegtUse Caseslt/Lecturegt
- ltLecturegtClass Diagramslt/Lecturegt
- lt/Coursegt
- lt/Curriculumgt
Curriculum
Course
Course
Lecture
Lecture
Lecture
Lecture
Class diagrams
Lecture
Use cases
Sets
Propositions
Predicates
32 and attributes
- lt?xml version"1.0" encoding"UTF-8"?gt
- lt!DOCTYPE Curriculum SYSTEM "Curric.DTD"gt
- ltCurriculumgt
- ltCourse Title"Z" Lect"Kyrill Bogdanov"gt
- ltLecturegtPropositionslt/Lecturegt
- ltLecturegtPredicateslt/Lecturegt
- ltLecturegtSetslt/Lecturegt
- lt/Coursegt
- ltCourse Title"UML" Lect"Marian Gheorge"gt
- ltLecturegtUse Caseslt/Lecturegt
- ltLecturegtClass Diagramslt/Lecturegt
- lt/Coursegt
- lt/Curriculumgt
Curriculum
Course
Course
TitleUML Lect
TitleZ Lect
Lecture
Lecture
Lecture
Lecture
Class diagrams
Lecture
Use cases
Sets
Propositions
Predicates
33Well-formedness
- Element containing text or elements must have
start and end tags
Good ltcurriculumgt ltcoursegtZlt/coursegt
ltcoursegtJavalt/coursegt lt/curriculumgt
Bad ltcurriculumgt ltcoursegtZ
ltcoursegtJava lt/curriculumgt
34Well-formedness
- Element containing text or elements must have
start and end tags - Empty elements tag must close with /gt
Good ltgraphic filenameicon.png/gt
Bad ltgraphic filenameicon.pnggt
35Well-formedness
- Element containing text or elements must have
start and end tags - Empty elements tag must close with /gt
- Attributes must be in quotes
Good ltcourse TitleJavagt ltcourse TitleZgt
Bad ltcourse TitleUMLgt
36Well-formedness
- Element containing text or elements must have
start and end tags - Empty elements tag must close with /gt
- Attributes must be in quotes
- Elements must not overlap
Good ltcurriculumgt ltcoursegtZlt/coursegt lt/curri
culumgt
Bad ltcurriculumgt ltcoursegtZ
lt/curriculumgt lt/coursegt
37Well-formedness
- Element containing text or elements must have
start and end tags - Empty elements tag must close with /gt
- Attributes must be in quotes
- Elements may not overlap
- Markup chars must not appear in parsed content
Good ltequationgt5 lt 2lt/equationgt
Bad ltequationgt5 lt 2lt/equationgt
38Well-formedness
- Element containing text or elements must have
start and end tags - Empty elements tag must close with /gt
- Attributes must be in quotes
- Elements may not overlap
- Markup chars must not appear in parsed content
- Element names start with letters or _, and
contain letters, numbers, -, . and _
Good ltcurriculumgt lt_coursegt lttime-slotgt ltbook
.chaptergt
Bad lt1examplegt ltthe firstgt ltbookchaptergt
39Why the rules?
- Unlike HTML, an arbitrary XML document doesnt
necessarily have a grammar - eg HTML knows that a ltpgt cannot contain another
ltpgt, so the end tag is optional
40Pros and cons of adding a grammar
- CONS
- More effort during development
- Grammars need to be maintained
- Can slow down processing
- Need to learn a new syntax (although it is
trivial for CS)
- PROS
- Grammars enables documents to be validated
- Enforce restrictions such as required fields,
limited choices - Serves as a clear description of the syntax for
users and developers - Can act as a standard eg XHTML
- Good for debugging
41Document Type Definition (DTD)
- lt!ELEMENT Curriculum (Course)gt
- lt!ELEMENT Course (Lecture)gt
- lt!ATTLIST Course
- Title CDATA REQUIRED
- Lect CDATA REQUIRED
- gt
- lt!ELEMENT Lect (PCDATA)gt
- A DTD is a sequence of declarations
- Doesnt conform to XML syntax
- Easy to understand for CS
- PCDATA keyword stands for parsed character
data and means that the textual content will be
parsed to look for XML entities (see later)
42Element definitions
- lt!ELEMENT article
- (title,subtitle?,author,(paratablelist)
,biblio?) - gt
43Attribute definitions
- lt!ATTLIST Course Title CDATA REQUIRED
Lect CDATA REQUIRED Lect2 CDATA
IMPLIED Semester (firstsecond) first - gt
44Entities
- DTDs can also contain entity definitions
- Simplest use is to substitute in any parsed text
(PCDATA) eg - uos
- CDATA is not parsed, so entities will not be
substituted
lt!ENTITY uos The University of
Sheffield gt
45Alternative to DTDs XML Schema
- XML Schema is a proposal to introduce a grammar
definining language which uses XML - Adds better typing
- Predefined byte, float, long, time, date,
boolean, binary, language, uri-reference, - Boundaries on data values
- Pattern-matching
46Overall summary
- Knowledge of XML is essential for all computer
scientists - Should lead to a better web with easier to find
information - Interoperability
- Impetus towards robust and open standards within
industry sectors - Supports internationalisation via UNICODE
- Hardly ever need to write a parser again!
47Online documents
- XML Tutorials for Programmers
- http//www-106.ibm.com/developerworks/education/t
utorial-prog/abstract.html - (online XML parser -- requires registration)
- Transforming XML to PDF
- http//www-106.ibm.com/developerworks/education/tr
ansforming-xml/xmltopdf - Why XML?
- http//www.w3.org/XML/1999/XML-in-10-points
- XSL for fun and diversion
- http//www-106.ibm.com/developerworks/library/hand
s-on-xsl/ - Simplify XML programming with JDOM
- http//www-106.ibm.com/developerworks/java/library
/j-jdom/ - Easy Java/XML integration with JDOM, Part 1
- http//www.javaworld.com/jw-05-2000/jw-0518-jdom.h
tml - Tip Using JDOM and XSLT
- http//www-106.ibm.com/developerworks/java/library
/x-tipjdom.html
48Websites
- http//www.xml.org
- http//www.jdom.org
- Special thanks to Professor Martin Cooke,
m.cooke_at_dcs.shef.ac.uk for the primary creation
of these slides and their content.