Title: ACE104 Lecture 2
1ACE104Lecture 2
2XML in messaging
- Most modern languages have method of representing
structured data. - Typical flow of events in application
Read data (file, db, socket)
Marshal objects
Manipulate in program
Unmarshal (file, db, socket)
- Many language-specific technologies to reduce
these steps RMI, object - serialization in any language, CORBA (actually
somewhat language neutral), - MPI, etc.
- XML provides a very appealing alternative that
hits the sweet spot for - many applications
3User-defined types in programming languages
- One view of XML is as a text-based,
programming-language-neutral way of representing
structured information. Compare
4Sample XML Schema
- In XML, (a common) datatype description is
called an XML schema. - DTD and Relax NG are other common alternatives
- Below uses schema just for illustration purposes
- Note that schema itself is written in XML
- lt?xml version"1.0" encoding"UTF-8"?gt
- ltxsschema xmlnsxs"http//www.w3.org/2001/XMLSch
ema" - elementFormDefault"qualified"
attributeFormDefault"unqualified"gt - ltxselement name"student"gt
- ltxscomplexTypegt
- ltxssequencegt
- ltxselement name"name"
type"xsstring"/gt - ltxselement name"ssn"
type"xsstring"/gt - ltxselement name"age"
type"xsinteger"/gt - ltxselement name"gpa"
type"xsdecimal"/gt - lt/xssequencegt
- lt/xscomplexTypegt
- lt/xselementgt
- lt/xsschemagt
Ignore this For now
5Alternative schema
- In this example studentType is defined separately
rather than anonymously - ltxsschemagt
- ltxselement name"student" type"studentType/gt
-
- ltxscomplexType name"studentType"gt
- ltxssequencegt
- ltxselement name"name" type"xsstring"/gt
- ltxselement name"ssn" type"xsstring"/gt
- ltxselement name"age" type"xsinteger"/gt
- ltxselement name"gpa" type"xsdecimal"/gt
- lt/xssequencegt
- lt/xscomplexTypegt
- lt/xsschemagt
new type defined separately
6Alternative DTD
- Can also use a DTD (Document Type Descriptor),
but this is - much simpler than a schema but also much less
powerful - (notice the lack of types)
- lt!DOCTYPE Student
- lt! Each XML file is stored in a document
whose name is the same as the root node -- gt - lt! ELEMENT Student (name,ssn,age,gpa)gt
- lt! Student has four attributes -- gt
- lt!ELEMENT name (PCDATA)gt
- lt! name is parsed character data --
gt - lt!ELEMENT ssn (PCDATA)gt
- lt!ELEMENT age (PCDATA)gt
- lt!ELEMENT gpa (PCDATA)gt
- gt
7Another alternative Relax NG
- Gaining in popularity
- Can be very simple to write and at same time has
many more features than DTD - Still much less common than Schema
8Creating instances of types
In programming languages, we instantiate
objects struct Student s1, s2 s1.name
Andrew s1.ssn123-45-6789 Student s new
Student() s1.name Andrew s1.ssn123-45-6789
. type(Student) s1 s1name Andrew .
C
Java
Fortran
9Creating XML documents
- XML is not a programming language!
- In XML we make a Student object in an xml file
(Student.xml) - ltStudentgt
- ltnamegtAndrewlt/namegt
- ltssngt123-45-6789lt/ssngt
- ltagegt39lt/agegt
- ltgpagt2.0lt/gpagt
- lt/Studentgt
- Think of this as like a serialized object.
10XML and Schema
- Note that there are two parts to what we did
- Defining the structure layout
- Defining an instance of the structure
- The first is done with an appropriate Schema or
DTD. - The second is the XML part
- Both can go in the same file, or an XML file can
refer to an external Schema or DTD (typical) - From this point on we use only Schema
- Exercise 1
11?
- Question What can we do with such a file?
- Some answers
- Write corresponding Schema to define its content
- Write XSL transformation to display
- Parse into a programming language
12Exercise 1
13Exercise 1 Solution
lt?xml version"1.0" encoding"UTF-8"?gt ltcarsgt
ltcargt ltmakegtdodgelt/makegt
ltmodelgtramlt/modelgt ltcolorgtredlt/colorgt
ltyeargt2004lt/yeargt ltmileagegt22000lt/mileagegt
lt/cargt ltcargt ltmakegtFordlt/makegt
ltmodelgtPintolt/modelgt ltcolorgtwhitelt/colorgt
ltyeargt1980lt/yeargt ltmileagegt100000lt/mileagegt
lt/cargt lt/carsgt
14Some sample XML documents
15Order / Whitespace
Note that element order is important, but
whitespace in element data is not. This is the
same as far as the xml parser is
concerned ltArticle gt ltHeadlinegtDirect Marketer
Offended by Term 'Junk Mail' lt/Headlinegt ltauthors
gt ltauthorgt Joe Gardenlt/authorgt ltauthorgt Tim
Harrodlt/authorgt lt/authorsgt ltabstractgtDan
Spengler, CEO of the direct-mail-marketing firm
Mailbox of Savings, took umbrage Monday at the
use of the term ltitgtjunk maillt/itgt lt/abstractgt ltbo
dy type"url" gt http//www.theonion.com/archive/3-
11-01.html lt/bodygt lt/Articlegt
16Molecule Example
- XML is extremely useful for standardizing data
sharing within specialized domains. Below is a
part of the Chemical Markup Language describing a
water molecule and its constituents - lt?xml version "1.0" ?gt
- ltCMLgt
- ltMOL TITLE"Water" gt
- ltATOMSgt
- ltARRAY BUILTIN"ELSYM" gt H O Hlt/ARRAYgt
- lt/ATOMSgt
- ltBONDSgt
- ltARRAY BUILTIN"ATID1" gt1 2lt/ARRAYgt
- ltARRAY BUILTIN"ATID2" gt2 3lt/ARRAYgt
- ltARRAY BUILTIN"ORDER" gt1 1lt/ARRAYgt
- lt/BONDSgt
- lt/MOLgt
- lt/CMLgt
17Rooms example
- A typical example showing a few more XML
features - lt?xml version"1.0" ?gt
- ltroomsgt
- ltroom name"Red"gt
- ltcapacitygt10lt/capacitygt
- ltequipmentListgt
- ltequipmentgtProjectorlt/equipmentgt
- lt/equipmentListgt
- lt/roomgt
- ltroom name"Green"gt
- ltcapacitygt5lt/capacitygt
- ltequipmentList /gt
- ltfeaturesgt
- ltfeaturegtNo Rooflt/featuregt
- lt/featuresgt
- lt/roomgt
- lt/roomsgt
18Suggestion
- Try building each of those documents in an XML
builder tool (XMLSpy, Oxygen, etc.) or at least
an XML-aware editor. - Note it is not required to create a schema to do
this. Just create new XML document and start
building.
19Dissecting an XML Document
20Things that can appear in an XML document
- ELEMENTS simple, complex, empty, or mixed
content model attributes. - The XML declaration
- Processing instructions(PIs) lt? ?gt
- Most common is lt?xml-stylesheet ?gt
- lt?xml-stylesheet typetext/css hrefmys.css?gt
- Comments lt!-- comment text --gt
21Parts of an XML document
- lt?xml version "1.0" ?gt
- ltCMLgtltMOL TITLE"Water" gt ltATOMSgt
- ltARRAY BUILTIN"ELSYM" gt H O Hlt/ARRAYgt
- lt/ATOMSgt
- ltBONDSgt
- ltARRAY BUILTIN"ATID1" gt1 2lt/ARRAYgt
- ltARRAY BUILTIN"ATID2" gt2 3lt/ARRAYgt
- ltARRAY BUILTIN"ORDER" gt1 1lt/ARRAYgt
- lt/BONDSgt
- lt/MOLgt
- lt/CMLgt
Declaration
Tags
Begin Tags End Tags
Attributes
Attribute Values
An XML element is everything from (including) the
element's start tag to (including) the element's
end tag.
22XML and Trees
- Tags give the structure of a document. They
divide the document up into Elements, starting at
the top most element, the root element. The stuff
inside an element is its content content can - include other elements along with character
data
Root element
CML
MOL
ATOMS
BONDS
ARRAY
ARRAY
ARRAY
ARRAY
CDATA sections
12
23
11
HOH
23XML and Trees
- lt?xml version "1.0" ?gt
- ltCMLgt
- ltMOL TITLE"Water" gt
- ltATOMSgt
- ltARRAY BUILTIN"ELSYM" gt H O Hlt/ARRAYgt
- lt/ATOMSgt
- ltBONDSgt
- ltARRAY BUILTIN"ATID1" gt1 2lt/ARRAYgt
- ltARRAY BUILTIN"ATID2" gt2 3lt/ARRAYgt
- ltARRAY BUILTIN"ORDER" gt1 1lt/ARRAYgt
- lt/BONDSgt
- lt/MOLgt
- lt/CMLgt
Root element
CML
MOL
ATOMS
BONDS
ARRAY
ARRAY
ARRAY
ARRAY
Data sections
12
23
11
HOH
24XML and Trees
rooms
room
room
capacity
features
capacity
equipmentlist
equipmentlist
equipment
10
feature
5
projector
No Roof
25More detail on elements
26Element relationships
- Book is the root element.
- Title, prod, and chapter are
- child elements of book.
- Book is the parent element
- of title, prod, and chapter.
- Title, prod, and chapter are
- siblings (or sister elements)
- because they have the
- same parent.
ltbookgt lttitlegtMy First XMLlt/titlegt ltprod
id"33-657" media"paper"gtlt/prodgt
ltchaptergtIntroduction to XML ltparagtWhat is
HTMLlt/paragt ltparagtWhat is XMLlt/paragt
lt/chaptergt ltchaptergtXML Syntax
ltparagtElements must have a closing taglt/paragt
ltparagtElements must be properly nestedlt/paragt
lt/chaptergt lt/bookgt
27Well formed XML
28Well-formed vs Valid
- An XML document is said to be well-formed if it
obeys basic semantic and syntactic constraints. - This is different from a valid XML document,
which (as we will see in more depth) properly
matches a schema.
29Rules for Well-Formed XML
- An XML document is considered well-formed if it
obeys the following rules - There must be one element that contains all
others (root element) - All tags must be balanced
- ltBOOKgt...lt/BOOKgt
- ltBOOK /gt
- Tags must be nested properly
- ltBOOKgt ltLINEgt This is OK lt/LINEgt lt/BOOKgt
- ltLINEgt ltBOOKgt This is lt/LINEgt definitely NOT
lt/BOOKgt OK - Element text is case-sensitive so
- ltPgtThis is not ok, even though we do it all the
time in HTML!lt/pgt
30More Rules for Well-Formed XML
- The attributes in a tag must be in quotes
- lt ITEM CATEGORYHome and Garden Namehoe-matic
t500gt - Comments are allowed
- lt!- They are done just as in HTML --gt
- Must begin with
- lt?xml version1.0 ?gt
- Special characters must be escaped the most
common are - lt " ' gt
- ltformulagt x lt y2x lt/formulagt
- ltcd title"quot mmusic"gt
31Naming Rules
- Naming rules for XML elements
- Names may contain letters, numbers, and other
characters - Names must not start with a number or punctuation
character - Names must not start with the letters xml (or XML
or Xml ..) - Names cannot contain spaces
- Any name can be used, no words are reserved, but
the idea is to make names descriptive. Names
with an underscore separator are typical - Examples ltfirst_namegt, ltdate_of_birthgt, etc.
32XML Tools
- XML can be created with any text editor
- Normally we use an XML-friendly editor
- e.g. XMLSpy
- nXML emacs extensions
- MSXML on Windows
- Oxygen
- Etc etc.
- To check and validate XML, use either these tools
and/or xmllint on Unix systems.
33Another View
- XML-as-data is one way to introduce XML
- Another is as a markup language similar to html.
- One typically says that html has a fixed tag set,
whereas XML allows the definition of arbitrary
tags - This analogy is particularly useful when the goal
is to use XML for text presentation -- that is,
when most of our data fields contain text - Note that mixed element/text fields are
permissible in XML
34Article example
ltArticle gt ltHeadlinegtDirect Marketer Offended
by Term 'Junk Mail' lt/Headlinegt ltauthorsgt
ltauthorgt Joe Gardenlt/authorgt ltauthorgt Tim
Harrodlt/authorgt lt/authorsgt ltabstractgtDan
Spengler, CEO of the direct-mail-marketing firm
Mailbox of Savings, took umbrage
Monday at the use of the term ltitgtjunk
maillt/itgt. lt/abstractgt ltbody type"url" gt
http//www.theonion.com/archive/3-11-01.html
lt/bodygt lt/Articlegt
35More uses of XML
- There is more!
- A very popular use of XML is as a base syntax for
programming languages (the elements become
program control structures) - XSLT, BPEL, ant, etc. are good examples
- XML is ubiqitous and must have a deep
understanding to be efficient and productive - Many other current and potential uses -- up to
the creativity of the programmer
36XML Schema
- There are many details to cover of schema
specification. It is extremely rich, flexible,
and somewhat complex - We will do this in detail next lecture
- Now we begin with a brief introduction
37XML Schema
- XML itself does not restrict what elements
existing in a document. - In a given application, you want to fix a
vocabulary -- what elements make sense, what
their types are, etc. - Use a Schema to define an XML dialect
- MusicXML, ChemXML, VoiceXML, ADXML, etc.
- Restrict documents to those tags.
- Schema can be used to validate a document -- ie
to see if it obeys the rules of the dialect.
38 Schema determine
- What sort of elements can appear in the document.
- What elements MUST appear
- Which elements can appear as part of another
element - What attributes can appear or must appear
- What kind of values can/must be in an attribute.
39lt?xml version"1.0" encoding"UTF-8"?gt ltlibrarygt
ltbook id"b0836217462" available"true"gt
ltisbngt 0836217462 lt/isbngt lttitle
lang"en"gt Being a Dog is a Full-Time Job
lt/titlegt ltauthor id"CMS"gt
ltnamegt Charles Schulz lt/namegt ltborngt
1922-11-26 lt/borngt ltdeadgt 2000-02-12
lt/deadgt lt/authorgt ltcharacter
id"PP"gt ltnamegt Peppermint Patty
lt/namegt ltborngt 1966-08-22 lt/borngt
ltqualificationgt bold,brash, and tomboyish
lt/qualificationgt lt/charactergt
ltcharacter id"Snoopy"gt ltnamegt
Snoopylt/namegt ltborngt1950-10-04lt/borngt
ltqualificationgtextroverted
beaglelt/qualificationgt lt/charactergt
ltcharacter id"Schroeder"gt
ltnamegtSchroederlt/namegt
ltborngt1951-05-30lt/borngt
ltqualificationgtbrought classical music to the
Peanuts Striplt/qualificationgt
lt/charactergt ltcharacter id"Lucy"gt
ltnamegtLucylt/namegt
ltborngt1952-03-03lt/borngt
ltqualificationgtbossy, crabby, and
selfishlt/qualificationgt lt/charactergt
lt/bookgt lt/librarygt
- We start with sample
- XML document and
- reverse engineer a
- schema as a simple
- example
- First identify the elements
- author, book, born, character,
- dead, isbn, library, name,
- qualification, title
- Next categorize by content
- model
- Empty contains nothing
- Simple only text nodes
- Complex only sub-elements
- Mixed text nodes sub-elements
- Note content model independent
40Content models
- Simple content model name, born, title, dead,
isbn, qualification - Complex content model libarary, character, book,
author
41Content Types
- We further distinguish between complex and simple
content Types - Simple Type An element with only text nodes and
no child elements or attributes - Complex Type All other cases
- We also say (and require) that all attributes
themselves have simple type
42Content Types
- Simple content type name, born, dead, isbn,
qualification - Complex content type library, character, book,
author, title
43Exercise2 answer
- In the previous example ltbookgt
- book has element content, because it contains
other elements. - Chapter has mixed content because it contains
both text - and other elements.
-
- Para has simple content (or text content) because
it - contains only text.
-
- Prod has empty content, because it carries no
information
44Building the schema
- Schema are XML documents
- They must contain a schema root element as such
- lt?xml version"1.0"?gt
- ltxsschema xmlnsxs"http//www.w3.org/2001/XML
Schema" targetNamespace"http//www.w3schools.com"
xmlns"http//www.w3schools.com"
elementFormDefault"qualified"gt - ... ...
- lt/xsschemagt
- We will discuss details in a bit -- note for now
that yellow part can be excluded for now.
45Flat schema for library
Start by defining all of the simple types
(including attributes) ltxsschema
xmlnsxshttp//www.w3.org/2001/XMLSchemagt
ltxselement namename typexsstring/gt
ltxselement namequalification
typexsstring/gt ltxselement nameborn
typexsdate/gt ltxselement namedead
typexsdate/gt ltxselement nameisbn
typexsstring/gt ltxsattribute nameid
typexsID/gt ltxsattribute nameavailable
typexsboolean/gt ltxsattribute namelang
typexslanguage/gt / lt/xsschemagt
46Complex types with simple content
Now to complex types with simple content lttitle
langengt Being a Dog is lt/titlegt ltxseleme
nt nametitlegt ltxscomplexTypegt
ltxssimpleContentgt ltxsextension
basexsstringgt ltxsattribute
reflang/gt lt/xsextensiongt
lt/xssimpleContentgt lt/xscomplexTypegt lt/xseleme
ntgt
the element named title has a complex type which
is a simple content obtained by extending the
predefined datatype xsstring by adding the
attribute defined in this schema and having the
name lang.
47Complex Types
All other types are complex types with complex
content. For example ltxselement
namelibrarygt ltxscomplexTypegt
ltxssequencegt ltxselement refbook
maxOccursunbounded/gt lt/xssequencegt
lt/xscomplexTypegt lt/xselementgt ltxselement
nameauthorgt ltxscomplexTypegt
ltxssequencegt ltxselement refname/gt
ltxselement refborn/gt ltxselement
refdead minOccurs0/gt lt/xssequencegt
ltxsattribute refid/gt lt/xscomplexTypegt lt/xs
elementgt
48lt?xml version"1.0" encoding"UTF-8"?gt ltxsschema
xmlnsxs"http//www.w3.org/2001/XMLSchema"gt
ltxselement name"name" type"xsstring"/gt
ltxselement name"qualification"
type"xsstring"/gt ltxselement name"born"
type"xsdate"gt lt/xselementgt ltxselement
name"dead" type"xsdate"gt lt/xselementgt
ltxselement name"isbn" type"xsstring"gt
lt/xselementgt ltxsattribute name"id"
type"xsID"gt lt/xsattributegt ltxsattribute
name"available" type"xsboolean"gt
lt/xsattributegt ltxsattribute name"lang"
type"xslanguage"gt lt/xsattributegt
ltxselement name"title"gt
ltxscomplexTypegt ltxssimpleContentgt
ltxsextension base"xsstring"gt
ltxsattribute ref"lang"gt
lt/xsattributegt lt/xsextensiongt
lt/xssimpleContentgt
lt/xscomplexTypegt lt/xselementgt
ltxselement name"library"gt
ltxscomplexTypegt ltxssequencegt
ltxselement maxOccurs"unbounded"
ref"book"gt lt/xselementgt
lt/xssequencegt lt/xscomplexTypegt
lt/xselementgt ltxselement name"author"gt
ltxscomplexTypegt ltxssequencegt
ltxselement ref"name"gt
lt/xselementgt ltxselement
ref"born"gt lt/xselementgt
ltxselement ref"dead" minOccurs"0"gt
lt/xselementgt lt/xssequencegt
ltxsattribute ref"id"gt lt/xsattributegt
lt/xscomplexTypegt lt/xselementgt
ltxselement name"book"gt
ltxscomplexTypegt ltxssequencegt
ltxselement ref"isbn"gt lt/xselementgt
ltxselement ref"title"gt
lt/xselementgt ltxselement
ref"author" minOccurs"0" maxOccurs"unbounded/gt
ltxselement ref"character"
minOccurs"0" maxOccurs"unbounded"/gt
lt/xssequencegt ltxsattribute
ref"available"gt lt/xsattributegt
ltxsattribute ref"id"gt lt/xsattributegt
lt/xscomplexTypegt lt/xselementgt
ltxselement name"character"gt
ltxscomplexTypegt ltxssequencegt
ltxselement ref"name"/gt
ltxselement ref"born"/gt
ltxselement ref"qualification"/gt
lt/xssequencegt ltxsattribute
ref"id"gt lt/xsattributegt
lt/xscomplexTypegt lt/xselementgt lt/xsschemagt
49lt?xml version"1.0" encoding"UTF-8"?gt ltxsschema
xmlnsxs"http//www.w3.org/2001/XMLSchema"gt
ltxselement name"library"gt
ltxscomplexTypegt ltxssequencegt
ltxselement name"book"
maxOccurs"unbounded"gt
ltxscomplexTypegt
ltxssequencegt
ltxselement name"isbn" type"xsinteger"gt
lt/xselementgt
ltxselement name"title"gt
ltxscomplexTypegt
ltxssimpleContentgt
ltxsextension
base"xsstring"gt
ltxsattribute name"lang"
type"xslanguage"
gt lt/xsattributegt
lt/xsextensiongt
lt/xssimpleContentgt
lt/xscomplexTypegt
lt/xselementgt
ltxselement name"author"
minOccurs"0" maxOccurs"unbounded"gt
ltxscomplexTypegt
ltxssequencegt
ltxselement name"name"
type"xsstring"gt lt/xselementgt
ltxselement name"born"
type"xsdate"gt lt/xselementgt
ltxselement name"dead"
type"xsdate"gt lt/xselementgt
lt/xssequencegt
ltxsattribute name"id"
type"xsID"gt lt/xsattributegt
lt/xscomplexTypegt
lt/xselementgt
ltxselement name"character" minOccurs"0"
maxOccurs"unbounded"gt
ltxscomplexTypegt
ltxssequencegt
ltxselement name"name"
type"xsstring"gt lt/xselementgt
ltxselement name"born"
type"xsdate"gt lt/xselementgt
ltxselement
name"qualification" type"xsstring"
gt lt/xselementgt
lt/xssequencegt
ltxsattribute
name"id" type"xsID"gt lt/xsattributegt
lt/xscomplexTypegt
lt/xselementgt
lt/xssequencegt
ltxsattribute type"xsID" name"id"gt
lt/xsattributegt
ltxsattribute name"available" type"xsboolean"gt
lt/xsattributegt
lt/xscomplexTypegt lt/xselementgt
lt/xssequencegt
lt/xscomplexTypegt lt/xselementgt lt/xsschemagt
Same schema but with everything defined locally!
50Next Lecture
- Even with this simple example there are many
design issues to discuss - When is a flat layout better
- When is a nested layout better
- What are scoping rules
- When to use ref vs. defining new type
- Schema in depth is topic of next lecture