XML - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

XML

Description:

CDATA Character Data. Attributes declared with CDATA may contain any text characters ... In order to get greatest benefit. Common standards are required ... – PowerPoint PPT presentation

Number of Views:156
Avg rating:3.0/5.0
Slides: 35
Provided by: smlu
Category:
Tags: xml | characters | greatest | tv

less

Transcript and Presenter's Notes

Title: XML


1
XML
  • eXtensible Markup Language
  • CC292
  • Simon M. Lucas

2
Overview
  • Brief History
  • Why XML?
  • What is XML?
  • Elements and attributes
  • DTDs
  • XHTML
  • XML processing
  • With Java
  • With JavaScript E4X

3
What is XML?
  • XML is a metadata language - a language for
    providing data about data
  • W3C standard around 1998
  • It looks a bit like HTML, but with XML the tags
    are user-defined and therefore extensible
  • HTML marks up logical presentation
  • CSS specifies presentation style
  • XML marks up meaning (semantics)

4
Why XML?
  • Separates content from presentation
  • General - can be applied to anything
  • Adds value to semi-structured data
  • E.g. Product Catalogue
  • Enables an enterprise to mark up all its data
  • Using XML greatly simplifies encoding of data
  • (c.f. ad hoc text representations)
  • Ubiquitous - everybody is using it!

5
Where does XML fit? 
  • Why not put everything in a relational or OO
    database?
  • XML is a global standard
  • offers better information transfer between
    different applications and enterprises than
    proprietary databases
  • XML is flexible and easily applied
  • (which also presents dangers - data does NOT
    become more valuable just because it is marked up
    in XML - the XML structures have to be well
    designed).

6
Data Centric or Document Centric?
  • Data centric
  • Used in web services
  • Communication between applications
  • Data export from databases
  • Document centric
  • To add meaning to semi-structured documents
  • E.g. content for web pages, lecture notes,
    product catalogues
  • Emerging XML databases such as Xindice
    http//xml.apache.org/xindice/ store XML directly
    (dont have to map to relational DB)

7
XML Basic Syntax
  • An XML document consists of a number of
    declarations followed by a tree of elements. 
  • Each element is delimited between begin and end
    tags.  
  • Each element may contain attributes
  • Elements may contain text or other elements (or a
    mixture of the two)
  • Attributes may only contain text

8
XML Element
  • Has a name
  • Has a begin tag ltelementNamegt
  • Then text and/or child elements
  • Has an end tag lt/elementNamegt
  • E.g. ltnamegt Simon lt/namegt
  • Elements can also be empty
  • E.g. ltperson nameSimon /gt

9
Well-Formed and Valid
  • Elements tags must be properly nested
  • E.g. ltagt ltbgt text lt/bgt lt/agt is ok
  • But ltagt ltbgt text lt/agt lt/bgt is NOT
  • Attribute values enclosed in string quotes
  • A document where all the tags are properly nested
    is well-formed
  • If a document is well-formed, and obeys the
    syntax rules of a specified DTD, then it is also
    Valid

10
Elements or Attributes
  • Information can either be stored in elements or
    attributes
  • Structured information is stored in elements
  • Primitive information (i.e. a single atomic value
    or list of values) can either be stored in an
    element or an attribute
  • Perhaps better to store primitives in attributes

11
XML Attributes
  • Element start tags may also contain attributes
  • An attribute consists of an attribute name
    followed by an attribute value
  • Attributes are only allowed in the start tags
  • E.g.
  • ltperson emailsml_at_essex.ac.ukgt
  • ltnamegtSimonlt/namegt
  • lt/persongt

12
Document Type Definition (DTD)
  • Provides a concise way to specify the syntax of a
    given document type
  • Declares how the elements can include other
    elements
  • And the attributes allowed for each element
  • Special operators specify the order and
    cardinality of each item (see below)

13
DTD Symbols Elements

14
CDATA and PCDATA
  • CDATA Character Data
  • Attributes declared with CDATA may contain any
    text characters
  • PCData Parsed Character Data
  • Elements declared PCDATA do not contain other
    elements
  • i.e. no other mark-up within them
  • In tree-terms, these are LEAF-nodes

15
DTD for Address Book Example
  • lt!-- DTD for simple address book --gt
  • lt!ELEMENT AddressBook (Title, Person)gt
  • lt!ELEMENT Title (PCDATA)gt
  • lt!ELEMENT Person EMPTYgt
  • lt!ATTLIST Person name CDATA REQUIREDgt
  • lt!ATTLIST Person email CDATA IMPLIEDgt
  • Tip Enter the Address Book DTD and XML as files
    in Intellij, then use the tools -gt validate
    command to perform validation on the document.
  • Try to modify the DTD and/or XML document to make
    it invalid.

16
Address Book XML
  • lt!DOCTYPE AddressBook SYSTEM "AddressBook.dtd"gt
  • ltAddressBookgt
  • ltTitlegtSimon's address booklt/Titlegt
  • ltPerson name"Simon
  • email"sml_at_essex.ac.uk" /gt
  • ltPerson name"Anna" /gt
  • lt/AddressBookgt

17
Alternative Address Book
  • What about this version
  • ltAddressBookgt
  • ltSimon emailsml_at_essex.ac.uk /gt
  • ltAnna emailthewife_at_gmail.com /gt
  • lt/AddressBookgt
  • Is it well formed?
  • Is it valid (with respect to previous DTD?)
  • Is it well designed?

18
Creating XML with JDOM(JDOM Java API for XML)
  • public static void main(String args) throws
    Exception
  • Element root new Element("AddressBook")
  • Element title new Element("Title")
  • title.setText("Simon's address book")
  • Element e1 new Element("Person")
  • Element e2 new Element("Person")
  • e1.setAttribute("name", "Simon")
  • e1.setAttribute("email",
    "sml_at_essex.ac.uk")
  • e2.setAttribute("name", "Anna")
  • root.addContent(title)
  • root.addContent(e1)
  • root.addContent(e2)
  • XMLOutputter out new
    XMLOutputter( Format.getPrettyFormat())
  • out.output(root, System.out)

19
Produced the following
  • ltAddressBookgt
  • ltTitlegtSimon's address booklt/Titlegt
  • ltPerson name"Simon"
  • email"sml_at_essex.ac.uk" /gt
  • ltPerson name"Anna" /gt
  • lt/AddressBookgt

20
Reading and Processing XML
  • public static void main(String args) throws
    Exception
  • String infile args0
  • SAXBuilder builder new SAXBuilder()
  • InputStream is new FileInputStream(infil
    e)
  • Document doc builder.build(is)
  • Element root doc.getRootElement()
  • // now print the names and emails in
    plain text
  • for (Element el
    (ListltElementgt) root.getChildren())
  • System.out.println(el.getAttribute("na
    me"))
  • ----------- Produces -----------------------
    ---
  • null
  • Attribute name"Simon"
  • Attribute name"Anna"

21
XHTML
  • XHTML is a stricter version of HTML
  • Tags must have begin/end pairs
  • E.g. ltpgt lt/pgt and not just ltpgt
  • Tags must be properly nested
  • Attribute values must be in string quotes
  • Document must have a single root element
  • MS Frontpage can apply XML formatting rules to
    comply with this
  • Then makes info very easy to manipulate

22
Example Benefit of XHTML
  • Web site construction
  • If all pages are in XHTML
  • Can be edited with WYSIWYG editor
  • And manipulated with JSP / JDOM / E4X
  • I use this method for web site construction
  • Example http//cigames.org
  • This allows a single master page
  • To include selected parts of content pages
  • Less effort than adding ltjspincludegt tags to
    each content page
  • Simple instance of MVC architecture

23
XML and JavaScript
  • Can use XML within JavaScript
  • JavaScript is also known as ECMAScript
  • Easiest to use E4X
  • ECMAScript 4 XML
  • Can treat XML fragments as native parts of the
    document
  • Supported in Rhino (JavaScript implemented in
    Java)
  • Hence can be executed on the server / stand-alone
  • But on server, does not have access to Browser
    DOM
  • And in Firefox (e.g. 2.1)
  • Enables concise generation of HTML

24
E4X Native XML
  • Fantastic!
  • Write XML mark-up
  • Then directly instantiate object models of the
    XML
  • And navigate using dot notation etc.
  • Following examples adapted from the
    e4x_example.js file that comes with the Rhino
    distribution
  • Can be executed on Server using Rhino
  • Or in a compatible web browser (e.g. Firefox)
  • Note that print is a utility method defined in
    the Shell program that comes with Rhino

25
Making an XML Structure
  • var order ltordergt
  • ltcustomergt
  • ltfirstnamegtJohnlt/firstnamegt
  • ltlastnamegtDoelt/lastnamegt
  • lt/customergt
  • ltitemgt
  • ltdescriptiongtBig Screen Televisionlt/descript
    iongt
  • ltpricegt1299.99lt/pricegt
  • ltquantitygt1lt/quantitygt
  • lt/itemgt
  • lt/ordergt

26
Accessing with . Notation
  • // Construct the full customer name
  • var name order.customer.firstname " "
  • order.customer.lastname
  • // Calculate the total price
  • var total order.item.price order.item.quantity

27
Construction with Expressions
  • Contents of curly braces are evaluated as
    expressions e.g.
  • var tagname "name"
  • var attributename "id"
  • var attributevalue 5
  • var content "Fred"
  • var x lttagname attributenameattributevalue
    gtcontentlt/tagnamegt
  • Exercise write the XML that this produces (i.e.
    that x is bound to after executing the above).

28
Data Selection
  • var e ltemployeesgt
  • ltemployee id"1"gtltnamegtJoelt/namegtltagegt20lt/agegtlt
    /employeegt
  • ltemployee id"2"gtltnamegtSuelt/namegtltagegt30lt/agegtlt
    /employeegt
  • ltemployee id"3"gtltnamegtSimonlt/namegtltagegt25lt/age
    gtlt/employeegt
  • lt/employeesgt
  • // get all the names in e
  • print("All the employee names are\n" e..name)
  • // employees with name Joe
  • print("The employee named Joe is\n"
    e.employee.(name "Joe"))
  • // employees with id's 1 2
  • print("Employees with ids 1 2\n"
    e.employee.(_at_id 1 _at_id 2))
  • // name of employee with id 1
  • print("Name of the employee with ID1 "
    e.employee.(_at_id 1).name)

29
Produced
  • All the employee names are
  • ltnamegtJoelt/namegt
  • ltnamegtSuelt/namegt
  • ltnamegtSimonlt/namegt
  • The employee named Joe is
  • ltemployee id"1"gt
  • ltnamegtJoelt/namegt
  • ltagegt20lt/agegt
  • lt/employeegt
  • Employees with ids 1 2
  • ltemployee id"1"gt
  • ltnamegtJoelt/namegt
  • ltagegt20lt/agegt
  • lt/employeegt
  • ltemployee id"2"gt
  • ltnamegtSuelt/namegt
  • ltagegt30lt/agegt
  • lt/employeegt
  • Name of the employee with ID1 Joe

30
Iteration
  • // calculate the average age of all employees
  • // based on previous employee data
  • var totalAge 0.0
  • var nEmps 0.0
  • for each (i in e.employee) totalAge 1.0
    i.age nEmps
  • print("Average age of all employees "
    totalAge / nEmps)
  • Produces
  • Average age of all employees 25

31
Simplified HTML Generation(Tested in Firefox
Browser)
  • ltscript type"text/javascript"gt
  • function myFunc()
  • var el document.getElementById('test')
  • var table lttablegtlttrgt
  • ltthgtCelsiuslt/thgt
  • ltthgtFahrenheitlt/thgtlt/trgtlt/tablegt
  • var from 10
  • var to 12
  • for (var ifrom iltto i)
  • table.tr lttrgt lttdgt i lt/tdgt
  • lttdgt i 9.0/5 32 lt/tdgt lt/trgt
  • el.innerHTML table
  • lt/scriptgt

32
Some Exciting XML Applications
  • Word processing
  • (e.g. Syntext Serna)
  • Web Application Programming
  • XForms
  • News feeds
  • RSS (Really Simple Syndication)
  • Mathematics
  • MathML

33
Summary
  • XML
  • Simple yet powerful
  • Will become more and more widespread
  • Java APIs such as JDOM allow easy XML processing
    in Java
  • E4X even easier!
  • Also see AJAX
  • Many challenges ahead
  • In order to get greatest benefit
  • Common standards are required
  • Is there a common XML standard for a delivery
    address yet? UK? Globally?

34
Exercise
  • Design an XML markup for a simple product
    catalogue
  • Each product has a name, price and manufacturer
  • Each manufacturer has a name and homepage-URL
  • Include sample XML DTD in your solution
Write a Comment
User Comments (0)
About PowerShow.com