Title: Introduction to XML
1Introduction to XML
- Kostas Kontogiannis
- Evan Mamas
2Outline
- Introduce XML, HTML and SGML
- Compare and Contrast
- XML vs. HTML
- XML vs. SGML
- XML
- Components, Applications, Industry
- Thoughts on XML
3What is XML?
- eXtensible Markup Language
- Proper subset of SGML for web use
- Meta-language
- Allows you to create your own markup languages
- Compromise between HTML and SGML
4What is HTML ?
- HyperText Markup Language
- Language to describe information for transmission
over the web. - Uses tags to markup the information
- Tags are just a formatting tool
- Example
- ltH1gt Hello, World lt/H1gt
- Hello, World
5Why isnt HTML enough?
- Good enough for presenting text on the web
- Not accepted as an authoring or archival form
- Extensibility
- HTML standard changes continually
- Uses tags for formatting
- Structures
- Has no defined or definable structural rules
6What is SGML ?
- Standard Generalized Markup Language
- International Standard for over 10 years
- Language for specifying markup languages
- Describes only the formal properties and
inter-relations of the components of a document - Document, Entities, Elements, Attributes
7Uses of SGML
- Formally structured documents
- Technical Manuals
- Exchange documents
- Product documentation
- Data encoding
- Interchange specification
- Provide long-term storage of information which
was independent of suppliers and changes in h/w
and s/w
8SGML Example
- Memo
- DTD (Document Type Definition)
lttogtAll staff ltfromgtMartin Bryan ltdategt5th
November ltsubjectgtCats and Dogs lttextgtPlease
remember to keep all cats and dogs indoors
tonight.
lt!DOCTYPE memo lt!ELEMENT memo O O ((to from
date subject?), text) gt lt!ELEMENT text - O
(para) gt lt!ELEMENT para O O (PCDATA)
gt lt!ELEMENT (to, from, date, subject) - O
(PCDATA) gt gt
9Why isnt SGML enough?
- Specification is very long
- Contains many options not needed for Web
applications - Time consuming and high cost
- Expensive tools
- Too much for small applications
- Bad reputation
10XML vs. HTML
- New tags and attributes definitions allowed
- Document structures can be nested to any level of
complexity - Structural validation is possible by describing
the grammar
11XML vs. SGML
- XML is the minimum required subset of SGML for
web use - Easier to implement and to create tools for
- A new attempt at structured markup languages with
a new face
12XML Components
- XML Style Language (XSL)
- Cascading Style Sheets, level 2 CCS2
- XML Document Object Model (DOM)
- XML Linking Language (XLL)
- XML Pointer Language (XPL)
- XML Name Spaces
- Synchronized Multimedia Integration Language
(SMIL) - Resource Description Framework (RDF)
- Mathematical Markup Language (MathML)
13XML Components (cont.)
- XML Style Language (XSL)
- Defines a way to present the documents
- Separates formatting from content
- Has two steps
- Generate a result tree (associate patterns with
templates) - Use XML Namespace (formatting vocabulary) to
generate formatted output. - Similar to DSSSL for SGML
14XML Components (cont.)
- Cascading Style Sheets, level 2 CCS2
- Defines a way to present documents
- Similar to XSL (Not as strong)
- Supported by most browsers
ltHTMLgt ltTITLEgtBach's home pagelt/TITLEgt
ltSTYLE type"text/css"gt H1
color blue lt/STYLEgt ltBODYgt
ltH1gtBach's home pagelt/H1gt
ltPgtJohann Sebastian Bach was a prolific
composer. lt/BODYgt lt/HTMLgt
15XML Components (cont.)
- XML Document Object Model (DOM)
- In-memory model for representing parsed XML
documents - Designed to provide common structures in XML
browsers - Intended to enable interoperable XML processing
across browsers - Implemented by Internet Explorer and Netscape
16XML Components (cont.)
- XML Linking Language (XLL)
- Links by reference rather than exact location
- Provides hyperlinking elements
- Simple links like HTML links
- Extended
- Multi-directional links
- Links with multiple destinations
- Placing content inline from a linked document
- Requires use of XML Pointer Language
17XML Components (cont.)
- XML Name Spaces
- Vocabulary of all elements and attribute types
- Namespace prefix (mapped to Uniform Redource
Identifier) - Local Part
- Allows use of names defined in other documents
- Modularity and reuse of a markup
- Mechanisms to establish name scope
18XML Components (cont.)
- Synchronized Multimedia Integration Language
(SMIL) - Language for describing interactive synchronized
multimedia distributed on the Web - Several components (images, video, audio) can be
linked together to create a presentation on the
web - Resource Description Framework (RDF)
- Abstract mechanism for defining simple
relationships among web resources - Mathematical Markup Language (MathML)
- Language to describe mathematical expressions
19XML DTD
- Defines the hierarchy of all user-defined
elements (tags) in the XML document - Declares the attributes and behaviour of each XML
element - Each XML document calls a specific DTD file to
validate its elements
20XML DTD
- lt?xml version"1.0" encoding"UTF-8"?gt
- lt!-- DTD for a simple program beginning of
element declarations--gt - lt!--the root tag of Language--gt
- lt!ELEMENT Language (FileTag,Declaration,Function
_Call)gt - lt!ELEMENT FileTag (IncludeTag,SourceTag)gt
- lt!ELEMENT IncludeTag (PCDATA)gt
- lt!ELEMENT SourceTag (PCDATA)gt
- lt!ELEMENT Declaration (Type_NameIdentifier)gt
- lt!ELEMENT Type_Name (PCDATA)gt
- lt!ELEMENT Identifier (PCDATA)gt
- lt!ELEMENT Function_Call (Return_Type,Function_Nam
e,Argument)gt - lt!ELEMENT Return_Type (Return_Var)gt
- lt!ELEMENT Return_Var (PCDATA)gt
- lt!ELEMENT Function_Name (PCDATA)gt
Defines what other tags are within the
ltLanguagegt tag
Defines data types for contents within the
ltIncludeTaggt tag
21XML Document (page 1 of 2)
- lt?xml version"1.0"?gt
- lt?xmlstylesheet type"text/xsl"
href"studentXSL1.xsl" ?gt - lt!DOCTYPE Language SYSTEM "Student.dtd"gt
- ltLanguagegt
- ltFileTaggt
- ltIncludeTaggtinclude stdio.hlt/IncludeTaggt
- lt/FileTaggt
- ltFileTaggt
- ltIncludeTaggtinclude math.hlt/IncludeTaggt
- lt/FileTaggt
- ltFileTaggt
- ltSourceTaggtcode statement3lt/SourceTaggt
- lt/FileTaggt
- ltFileTaggt
- ltSourceTaggtcode statement2lt/SourceTaggt
- lt/FileTaggt
- ltDeclarationgt
- ltType_Namegtcharlt/Type_Namegt
Calls a XSL style sheet
Calls a DTD document
22XML Document (page 2 of 2)
- ltDeclarationgt
- ltType_Namegtintlt/Type_Namegt
- ltIdentifiergtnumOfstudentslt/Identifiergt
- lt/Declarationgt
- ltDeclarationgt
- ltType_Namegtcharlt/Type_Namegt
- ltIdentifiergtfacultyNamelt/Identifiergt
- lt/Declarationgt
- ltFunction_Callgt
- ltReturn_Typegt
- ltReturn_Vargtstudent_profilelt/Return_Vargt
- lt/Return_Typegt
- ltFunction_Namegtelec_englt/Function_Namegt
- ltArgumentgt
- ltparameterNamegtnamelt/parameterNamegt
- lt/Argumentgt
- lt/Function_Callgt
23XML Namespaces
- Latest milestone for W3C's XML technology
(14-January-1999 ) - W3Cs definition of XML NameSpaces
- XML namespaces provide a simple method for
qualifying element and attribute names used in
Extensible Markup Language documents by
associating them with namespaces identified by
URI references. - Why use it?
- Maintain tag meaningfulness and uniqueness
- How does it solve the problem?
- Add context to XML tags by using prefix and URL
24XSL Document (Page 1 of 3)
- lt?xml version"1.0"?gt
- ltDIV xmlnsxsl"http//www.w3.org/TR/WD-xsl"gt
- lthtmlhtml xmlnshtml"http//www.w3.org/TR/REC-ht
ml40"gt - ltigtThis page consists of XML, XSL, Namespace,
HTML, and Java Appletlt/igt - lthtmlheadgtlthtmltitlegtltH1gtSample C Code (hidden
XML tag)lt/H1gtlt/htmltitlegtlt/htmlheadgt -
- ltxslfor-each select"Language"gt
-
- ltTD STYLE"padding-left1em"gt
- ltDIVgtltxslvalue-of select"/"/gtlt/DIVgt
- lthtmlfont color"red"gtThe above command
prints out all contents within tags without any
formmating, ordering, linebreaks,
etc.lt/htmlfontgt - lt/TDgt
- lt/xslfor-eachgt
- ltxslfor-each order-by" IncludeTag"
select"Language/FileTag"gt - ltTD STYLE"padding-left1em"gt
- lthtmlBRgtlt/htmlBRgt
- ltDIVgtlthtmlBRgtltxslvalue-of select"IncludeTag"/
gtlt/htmlBRgtlt/DIVgt - lt/TDgt
Namespace for XSL
Namespace for HTML
25XSL Document (Page 2 of 3)
- ltxslfor-each order-by" SourceTag"
select"Language/FileTag"gt -
- ltTD STYLE"padding-left1em"gt
- lthtmlBRgtlt/htmlBRgt
- ltDIVgtltxslvalue-of page-break-after"SourceTag"
select"SourceTag"/gtlt/DIVgt - lt/TDgt
- lt/xslfor-eachgt
-
- lthtmlfont color"red"gtEnd of SourceTag,
ascending sort on SourceTag Contentlt/htmlfontgt - lthtmlBRgtlt/htmlBRgt
- ltxslfor-each order-by" Type_Name"
select"Language/Declaration"gt - ltTD STYLE"padding-left1em"gt
- lthtmlBRgtlt/htmlBRgt
- ltDIVgtlthtmlBRgtltxslvalue-of select"Type_Name"/gt
lt/htmlBRgtlt/DIVgt - ltDIVgtlthtmlBRgtltxslvalue-of select"Identifier"/
gtlt/htmlBRgtlt/DIVgt - lt/TDgt
- lt/xslfor-eachgt
- lthtmlfont color"red"gtEnd of Declaration,
ascending sort on Type_Namelt/htmlfontgt - ltDIVgtlt/DIVgt
26XSL Document (Page 3 of 3)
- ltxslfor-each select"Language/Function_Call"gt
- ltTD STYLE"padding-left1em"gt
- lthtmlBRgtltDIVgtltxslvalue-of select"Return_Type"
/gtlt/DIVgtlt/htmlBRgt - lthtmlfont color"red"gtEnd of
Return_Typelt/htmlfontgt - lthtmlBRgtltDIVgtltxslvalue-of select"Function_Nam
e"/gtlt/DIVgtlt/htmlBRgt - lthtmlfont color"red"gtEnd of
Function_Namelt/htmlfontgt - lthtmlBRgtlt/htmlBRgt
- lthtmlBRgtltDIVgtltxslvalue-of select"Argument"/gtlt
/DIVgtlt/htmlBRgt - lthtmlfont color"red"gtEnd of
Argumentlt/htmlfontgt - lthtmlBRgtlt/htmlBRgt
- lt/TDgt
-
- lt/xslfor-eachgt
- lthtmlBRgtlt/htmlBRgt
- lthtmlAPPLET code"AgentAction.class" width"400"
height"200"gtlt/htmlAPPLETgt - lthtmlBRgtlt/htmlBRgt
- lt/htmlhtmlgt
27Applications that require XML
- Information exchange between heterogeneous
databases - Health care example
- Distributed processing
- Semiconductor industry example
- Multiple views of the same data
- Intelligent information agents
28Using XML
- XML for Storage
- Compact syntax
- Generalized and standarized
- Product independent
- XML for Searching
- Use of content specific markup enables robust
searching - Search engines need to be XML aware
- Can use current SGML search engines
29What is DOM?
- A programming API for XML
- logical structure of document
- Access and Manipulation of documents
30What is DOM?
- As an object model, DOM identifies
- Interface and Objects used for the doc.
- Behaviours and Attributes
- Relationships and Collaborations of Interfaces
and Objects
31What is DOM?
- 2 Major Components for DOM Level 1
- DOM Core Basic functionalities for XML
- DOM HTML Objects and Methods specific to HTML
- Level 2
- DOM CSS, DOM Event, DOM Filters and Iterators,
DOM Range
32Advantages of using DOM
- Easy to create, navigate, add, modify documents
- DOM abstraction avoids implementation
dependencies - DOM applications may use additional language
bindings
33A Typical DOM Structure
- ltcondition_statementgt
- ltif_statementgt
- ltif_taggt if lt/if_taggt
- ltexpression_taggt (b c) lt/expression_taggt
- ltstatement_taggt a c lt/statement_taggt
- lt/if_statementgt
- lt/condition_statementgt
34A Typical DOM Structure (2)
35A Typical DOM Structure (3)
- DOM abstraction is a Tree or Forest Structure
- Users have full flexibility to specify the
structure - Structural Isomorphism
36Some Key Objects
- Node
- Tree node of the document
- root node, parents and children
- Element (is a Node object)
- Elements of a document
- Represents contents between the start tag and end
tag - Attributes defined by DTD
37Some Key Objects (2)
- Document
- root node of a document
- NodeIterator
- iterates over a set of nodes specified by a
filter - AttributeList
- collection of Attribute objects, indexed by
attribute name
38Some Key Objects (3)
- Attribute
- attribute of an Element Object
- DocumentContext
- respository for metadata about a document
- DOM
- provides instance-independent document operations
39Memory Management for DOM
- DOM APIs operate across a variety of memory
implementation methods - Language platforms that do not expose memory
management to user - Language (Java) that provides constructors with
Garbage collection capability - Language (C/C) that requires explicit memory
allocations
40Resources/Quirks
- IE 5 and Navigator 5.0 implement different
features - IE 5.0 - XML/XSL Navigator - XML/CSS
- Navigator to support RDF
- XML Resources
- http//www.swen.uwaterloo.ca/group1
41Using XML (cont.)
- XML for Presentation
- Convert to HTML at server
- Use Java applications to render in browser
- Slow
- Use XSL or CSS to render in browser
- Fast
42XML in the industry
- Explosive growth of XML tools and specifications
- Tools JADE, MSXML, JUMBO,...
- Specifications CDF, CFML,EDI
- Browsers IE, Netscape
43Thoughts on XML
- Seems like a transition stage between HTML and
SGML - Will we eventually end up using SGML?
- XML follows basic principles of SE
- Higher abstraction layer
- Reuse
- Modularity
44References
- XML.COM - A guide to XML
- http//www.xml.com/xml/pub/w3j/s3.walsh.html
- XML.COM - The Road to XML Adapting SGML to the
Web - http//www.xml.com/xml/pub/w3j/s1.discussion.html
- The Computer Bulletin - The XML Files
- http//www.bcs.org.uk/publicat/ebull/may98/xml.htm
- XML, Java, and the future of the Web
- http//sunsite.unc.edu/pub/sun-info/standards/xml/
why/xmlapps.htm - XML What is it
- http//iai.sgml.com/980106-01.asp
- Why do we need XML?
- http//info.admin.kth.se/SGML/Konferenser/xml98sve
/seminar.html - An Introduction to the Standard Generalized
Markup Language - http//www.personal.u-net.com/sgml/sgml.htm
- SGML101
- http//www.uslynx.com/sgml101.htm