Introduction to XML and Related Technologies - PowerPoint PPT Presentation

1 / 58
About This Presentation
Title:

Introduction to XML and Related Technologies

Description:

book category='CHILDREN' title lang='en' Harry ... font color=blue unquoted. attribute values /font font color='blue' quoted. attribute values /font ... – PowerPoint PPT presentation

Number of Views:223
Avg rating:3.0/5.0
Slides: 59
Provided by: sep58
Category:

less

Transcript and Presenter's Notes

Title: Introduction to XML and Related Technologies


1
Introduction to XML and Related Technologies
  • Internet Engineering Course
  • University of Tehran
  • Sepand Ansari

2
History
  • The essence of Markup languages.
  • You may faced problems that you need to add
    metadata or tags in your document to describe it!
  • Example plain text vs. rich text document
  • Setting files. (linux setting files)

Section "Device" Identifier "ATI Radeon
Mobility M6" Driver "radeon"
VendorName "ATI Radeon Mobility M6"
BoardName "Radeon Mobility M6 LY"
ChipID 0x4c59 VideoRam 32768
BusID "PCI150" Option
"AGPMode" "4" Option
"noaccel" EndSection
3
Before standardization
  • Several markup languages were developed, but each
    with a its own style.
  • Problems
  • Incompatibility
  • No CASE Tool could be developed for markup file
    processing.
  • Complexity and deceptions
  • Some examples MS Office files, Linux setting
    files, bussines markup data files

4
SGML
  • Standard Generalized Markup Language (SGML) is
    developed in1960s by Charles Goldfarb, Edward
    Mosher and Raymond Lorie (whose surname initials
    also happen to be GML)
  • It is a standard for creating new markup
    languages.
  • but its complexity has prevented its widespread
    application for small-scale general-purpose use.
  • This complexity was because of its generality.
  • And that much generality, is not needed for most
    of usages.

5
HTML
  • HyperText Markup Language (HTML) is a markup
    language designed for the creation of web pages
    and other information viewable in a browser.
  • Originally defined as a highly simplified subset
    of SGML by Tim Barners-Lee.
  • And is now widely used with HTTP protocol.
  • It is used for presentation of data! (in
    browsers)

6
XML
  • SGML is too complex
  • HTML is a simplified subset of SGML which
  • Many unused features of SGML are eliminated
  • It is well-known and widely used.
  • So XML was born to
  • Do what SGML was originally created to do.
  • But as simple as HTML.

7
XML (cont.)
  • XML is a metalanguage
  • A language used to describe other languages using
    markup tags that describe properties of the
    data
  • Designed to be structured
  • Strict rules about how data can be formatted
  • Designed to be extensible
  • Can define own terms and markup

8
When XML is used?
  • XML aims to accomplish what HTML cannot and be
    simpler to use and implement than SGML
  • In XML you can define your own tags.
  • And create markup documents based on your tag
    declaration to describe your data.
  • And this descriptions are used by an application
    to extract semantics from your data.
  • .

9
An example
  • lt?xml version"1.0" encoding"ISO-8859-1"?gt
  • ltbookstoregt
  • ltbook category"CHILDREN"gt
  • lttitle lang"en"gtHarry Potterlt/titlegt
  • ltauthorgtJ K. Rowlinglt/authorgt
  • ltyeargt2005lt/yeargt
  • ltpricegt29.99lt/pricegt
  • lt/bookgt
  • ltbook category"WEB"gt
  • lttitle lang"en"gtXQuery Kick Startlt/titlegt
  • ltauthorgtJames McGovernlt/authorgt
  • ltauthorgtPer Bothnerlt/authorgt
  • ltauthorgtJames Linnlt/authorgt
  • ltpricegt49.99lt/pricegt
  • lt/bookgt
  • lt/bookstoregt

10
An odd example
  • lt?xml version"1.0" encoding"UTF-8"?gt
  • ltRecipe name"bread" prep_time"5 mins"
    cook_time"3 hours"gt lttitlegtBasic breadlt/titlegt
  • ltingredient amount"3" unit"cups"gtFlourlt/
    ingredientgt
  • ltingredient amount"0.25"
    unit"ounce"gtYeastlt/ingredientgt
  • ltingredient amount"1.5" unit"cups"gtWarm
    Waterlt/ingredientgt
  • ltingredient amount"1" unit"teaspoon"gtSal
    tlt/ingredientgt
  • ltInstructionsgt
  • ltstepgtMix all ingredients
    togetherlt/stepgt
  • ltstepgtLeave for one hour in warm
    room.lt/stepgt
  • ltstepgtKnead again, and then bake in
    the oven.lt/stepgt
  • lt/Instructionsgt
  • lt/Recipegt

11
XML features
  • its simultaneously human- and machine-readable
    format
  • its support for Unicode, allowing almost any
    information in any human language to be
    communicated
  • its ability to represent the most general
    computer science data structures (records, lists
    and trees)
  • its self-documenting format that describes
    structure and field names as well as specific
    values
  • its strict syntax and parsing requirements that
    allow the necessary parsing algorithms to remain
    simple, efficient, and consistent.

12
XML Family
  • XML is not a subset of HTML, nor HTML is a subset
    of XML.
  • Since XML is more general than HTML.
  • XML has some constraints (next slide) that HTML
    doesn't have.
  • But if those constraints are held, we have XHTML

HTML
XML
SGML
13
HTML vs. XML
HTML
XML
14
HTML vs. XML
HTML
XML
XHTML documents have all XML properties Except
these two.
15
Working with XML
  • First, How to describe tags
  • DTD
  • XSD
  • How to parse XML files
  • SAX Parsers
  • DOM parsers
  • XML binding tools
  • Transform XML files to (X)HTML or other XML
    types.
  • XSLT
  • Address a point in XML file
  • XPath
  • Query XML file for specific data
  • XQuery

16
Working with XML
  • First, How to describe tags
  • DTD
  • XSD
  • We should have a parser to extract content of xml
    file
  • SAX Parsers
  • DOM parsers
  • XML binding tools
  • Transform XML files to (X)HTML or other XML
    types.
  • XSLT
  • Address a point in XML file
  • XPath
  • Query XML file for specific data
  • XQuery

17
DTD
  • A Document Type Definition (DTD for short) is a
    set of declarations that conform to a particular
    markup syntax and that describe a class, or
    "type", of SGML or XML documents, in terms of
    constraints on the structure of those documents.
  • DTD criticisms
  • No support for newer features of XML most
    importantly, namespaces.
  • Lack of expressivity. Certain formal aspects of
    an XML document cannot be captured in a DTD.
  • Custom non-XML syntax to describe the schema,
    inherited from SGML.

18
Example of DTD and its sample XML
  • lt!ELEMENT people_list (person)gt
  • lt!ELEMENT person (name, birthdate?, gender?,
    SSNum?)gt
  • lt!ELEMENT name (PCDATA) gt
  • lt!ELEMENT birthdate (PCDATA) gt
  • lt!ELEMENT gender (PCDATA) gt
  • lt!ELEMENT socialsecuritynumber (PCDATA) gt

lt?xml version"1.0" encoding"UTF-8"?gt lt!DOCTYPE
people_list SYSTEM "example.dtd"gt ltpeople_listgt
ltpersongt ltnamegtFred Bloggslt/namegt
ltbirthdategt27/11/2008lt/birthdategt
ltgendergtMalelt/gendergt lt/persongt
lt/people_listgt
19
XSD
  • An XML Schema Definition (XSD) , published as a
    W3C Recommendation in May 2001.
  • XSD files have .xsd extention.
  • XSD solved problems that DTD has
  • Supports namespace
  • XSD is datatype-aware

20
Example of XSD and its sample XML
  • ltxsschema xmlnsxs"http//www.w3.org/2001/XMLSch
    ema"gt ltxselement name"country"gt
  • ltxscomplexTypegt
  • ltxssequencegt
  • ltxselement name"name"
    type"xsstring"/gt
  • ltxselement name"population"
    type"xsdecimal"/gt
  • lt/xssequencegt
  • lt/xscomplexTypegt
  • lt/xselementgt
  • lt/xsschemagt

ltcountry xmlnsxsi"http//www.w3.org/2001/XMLSche
ma-instance" xsinoNamespaceSchemaLocation"countr
y.xsd"gt ltnamegtFrancelt/namegt
ltpopgt59.7lt/popgt lt/countrygt
21
Well-formed vs. valid XML file
  • A well-formed XML file has all the properties of
    an XML file.
  • Proper nesting
  • Case sensitivity
  • Quoted attributes
  • An XML document that complies with a particular
    schema, in addition to being well-formed, is said
    to be valid.

22
Working with XML
  • First, How to describe tags
  • DTD
  • XSD
  • We should have a parser to extract content of xml
    file
  • SAX Parsers
  • DOM parsers
  • XML binding tools
  • Transform XML files to (X)HTML or other XML
    types.
  • XSLT
  • Address a point in XML file
  • XPath
  • Query XML file for specific data
  • XQuery

23
SAX Parser
  • SAX Simple API for XML
  • Parser creates events while traversing tree
  • Parser calls methods (that you write) to deal
    with the events.
  • Similar to an I/O-Stream, goes in one direction

24
Sample Document
  • lttransactiongt
  • ltaccountgt89-344lt/accountgt
  • ltbuy shares100gt
  • ltticker exchNASDAQgtWEBMlt/tickergt
  • lt/buygt
  • ltsell shares30gt
  • ltticker exchNYSEgtGElt/tickergt
  • lt/sellgt
  • lt/transactiongt

25
SAX Example
  • import java.io.
  • import org.xml.sax.
  • import org.xml.sax.helpers.
  • import org.apache.xerces.parsers.SAXParser
  • public class Flour extends DefaultHandler
  • public void startElement(String
    namespaceURI, String localName,String qName,
    Attributes atts)
  • if (localName.equals(amount")
  • String n atts.getValue("","name")
  • System.out.println(number of shares n)

26
SAX Example (cont.)
  • public static void main(String args)
  • Flour f new Flour()
  • SAXParser p new SAXParser()
    p.setContentHandler(f)
  • try
  • p.parse(args0)
  • catch (Exception e)
  • e.printStackTrace()

27
Document as Events
  • lttransactiongt
  • ltaccountgt89-344lt/accountgt
  • ltbuy shares100gt
  • ltticker exchNASDAQgtWEBMlt/tickergt
  • lt/buygt
  • ltsell shares30gt
  • ltticker exchNYSEgtGElt/tickergt
  • lt/sellgt
  • lt/transactiongt

28
Advantages and Disadvantages
  • Advantages
  • Requires little memory
  • Fast
  • Disadvantages
  • Cannot read backwards
  • Does not support transformation of the document
    such as cut and paste of fragments
  • Difficult to program

29
Programming using SAX is Difficult
  • In some cases, programming with SAX is difficult
  • How can we find, using a SAX parser, an element
    e1 with ancestor e2?
  • How can we find, using a SAX parser, elements e1
    that have a descendant element e2?
  • What about cases that are even more complex?

30
DOM Parser
  • DOM Document Object Model
  • Parser creates a tree object out of the document
  • User accesses data by traversing the tree
  • The API allows for constructing, accessing and
    manipulating the structure and content of XML
    documents

31
Document as Tree
Methods like getRoot getChildren getAttributes et
c.
transaction
account
buy
sell
89-344
shares
shares
ticker
ticker
100
30
exch
exch
NYSE
NASDAQ
WEBM
GE
32
Node Navigation
  • Every node has a specific location in tree
  • Node interface specifies methods to find
    surrounding nodes
  • Node getFirstChild()
  • Node getLastChild()
  • Node getNextSibling()
  • Node getPreviousSibling()
  • Node getParentNode()
  • NodeList getChildNodes()

33
Node Manipulation
  • Children of a node in a DOM tree can be
    manipulated - added, edited, deleted, moved,
    copied, etc.

Node removeChild(Node old) throws
DOMException Node insertBefore(Node new, Node
ref) throws DOMException Node appendChild(Node
new) throws DOMException Node replaceChild(Node
new, Node old) throws DOMException Node
cloneNode(boolean deep)
34
Advantages and Disadvantages
  • Advantages
  • Natural and relatively easy to use
  • Can repeatedly traverse tree
  • Disadvantages
  • High memory requirements the whole document is
    kept in memory
  • Must parse the whole document and construct many
    objects before use

35
Which should we use?DOM vs. SAX
  • If your document is very large and you only need
    a few elements use SAX
  • If you need to manipulate (i.e., change) the XML
    use DOM
  • If you need to access the XML many times use
    DOM (assuming the file is not too large)

36
XML data binding
  • With XML data binding, an java object is
    automatically created from the data of a XML
    document.
  • JAXB is Sun Microsystems's specification for XML
    data binding.
  • This java class can be created manually.
  • A mapping file is needed to tell the binding
    engine, how to map XML elements and attributes to
    Class properties.
  • Or the java class can be automatically created by
    the JAXB compiler.

37
Advantages and disadvantages
  • Advantages
  • JAXB requires a DTD
  • Using JAXB ensures the validity of your XML
  • A JAXB parser is actually faster than a generic
    SAX parser
  • A tree created by JAXB is smaller than a DOM tree
  • Its much easier to use a JAXB tree for
    application-specific code
  • You can modify the tree and save it as XML
  • Disadvantages
  • JAXB requires a DTD
  • Hence, you cannot use JAXB to process generic XML
    (for example, if you are writing an XML editor or
    other tool)
  • You must do additional work up front to tell JAXB
    what kind of tree you want it to construct
  • But this more than pays for itself by simplifying
    your application
  • JAXB is new Version 1.0 is due Q4 (fourth
    quarter) 2002

38
JAXB at a glance
39
Step 1 Create XML Schema
Demo.xsd
ltxselement name"Person" type"PersonType"/gt
ltxscomplexType name"PersonType"gt
ltxssequencegt ltxselement nameName"
type"xsstring"/gt ltxselement
name"Address" type"AddressType"
minOccurs"1" maxOccurs"unbounded"/gt
lt/xssequencegt lt/xscomplexTypegt
ltxscomplexType name"AddressType"gt
ltxssequencegt ltxselement
name"Number" type"xsunsignedInt"/gt
ltxselement name"Street" type"xsstring"/gt
lt/xssequencegt lt/xscomplexTypegt
40
Step 2 Create XML Document
Demo.xml
ltPerson xmlnsxsi"http//www.w3.org/2001/XMLS
chema-instance" xsinoNamespaceSchemaLocation
"C\JAXB Demo\demo.xsd"gt ltNamegtSharon
Krisherlt/Namegt ltAddressgt ltStreetgtIben
Gevirollt/Streetgt ltNumbergt57lt/Numbergt lt/Addressgt
ltAddressgt ltStreetgtMoshe Sharetlt/Streetgt ltNum
bergt89lt/Numbergt lt/Addressgt lt/Persongt
Check that your XML conforms to the Schema
41
Step 3 Run the binding compiler
  • JWSDP_HOME\jaxb\bin\xjc -p demo demo.xsd
  • A package named demo is created
  • (in the directory demo)
  • The package contains (among other things)
  • interface AddressType
  • interface PersonType

42
AddressType and PersonType
public interface AddressType long
getNumber() void setNumber(long value)
String getStreet() void setStreet(String
value)
Must be non-negative
Must be non-null
Must be non-null
public interface PersonType String
getName() void setName(String value)
/ List of AddressType / java.util.List
getAddress()
Must contain at least one item
In Java1.5 ListltAddressTypegt
43
Step 4 Create Context
  • The context is the entry point to the API
  • Contains methods to create Marshaller,
    Unmarshaller and Validator instances

JAXBContext context JAXBContext.newInstance("dem
o")
The package name is demo (Recall xjc -p demo
demo.xsd)
44
Step 5 Unmarshal xml -gt objects
Enable validation of xml according to the schema
while unmarshalling
Unmarshaller unmarshaller context.createUnmars
haller() unmarshaller.setValidating(true) Pers
onType person (PersonType) unmarshaller.unmars
hal( new FileInputStream("demo.xml") )
45
Step 6 Read
System.out.println("Person name"
person.getName() ) AddressType address
(AddressType) person.getAddress().get(0) Syste
m.out.println("First Address " " Street"
address.getStreet() " Number"
address.getNumber() )
46
Step 7 Manipulate objects
// Update person.setName("Yoav Zibin")
// Delete List addressList person.getAddress()
addressList.clear()
47
Step 8 Validate on-demand
  • Validator validator context.createValidator()
  • validator.validate(newAddr)
  • validator.validate(person)

Check that we have set Street and Number, and
that Number is non-negative
Check that we have set Name, and that Address
contains at least one item
48
Step 9 Marshal objects -gt xml
Marshaller marshaller context.createMarshaller()
marshaller.setProperty(Marshaller.JAXB_FORMATTED
_OUTPUT, Boolean.TRUE) marshaller.marshal(person
, new FileOutputStream("output.xml"))
output.xml
ltPersongt ltNamegtYoav Zibinlt/Namegt ltAddressgt
ltStreetgtHanoterlt/Streetgt ltNumbergt5lt/Numbergt
lt/Addressgt lt/Persongt
49
JAXB compiler is smart enough
  • The DTD lt!ELEMENT book (title, author, chapter)
    gt lt!ELEMENT title (PCDATA) gt lt!ELEMENT
    author (PCDATA)gt lt!ELEMENT chapter (PCDATA)
    gt
  • The schema ltxml-java-binding-schemagt
    ltelement name"book" type"class" root"true"
    /gt lt/xml-java-binding-schemagt
  • The results public Book() //
    constructor public String getTitle() public
    void setTitle(String x) public String
    getAuthor() public void setAuthor(String
    x) public List getChapter() public void
    deleteChapter() public void emptyChapter()

Note 1 In these slides we only show the class
outline, but JAXB creates a complete class for you
Note 2 JAXB constructs names based on yours,
with good capitalization style
50
Some Implementations of JAXB specification
  • JAXB specification is still in ß version.
    (version 0.8)
  • JAXME by apache foundation is an implementation
    of JAXB specification
  • There are some other packages that are not JAXB
    implementation but do the same task
  • XMLBeans by apache foundation
  • Castor XML

51
Working with XML
  • First, How to describe tags
  • DTD
  • XSD
  • We should have a parser to extract content of xml
    file
  • SAX Parsers
  • DOM parsers
  • XML binding tools
  • Transform XML files to (X)HTML or other XML
    types.
  • XSLT
  • Address a point in XML file
  • XPath
  • Query XML file for specific data
  • XQuery

52
XSLT
  • XSLT stands for Extensible Stylesheet Language
    Transformations
  • XSLT is used to transform XML documents into
    other kinds of documents--usually, but not
    necessarily, XHTML
  • XSLT uses two input files
  • The XML document containing the actual data
  • The XSL document containing both the framework
    in which to insert the data, and XSLT commands to
    do so

53
An example
54
Cocoon framework
  • Apache Cocoon is an open source web based
    publishing framework written in Java.
  • It transforms XML documents to XML, WML or PDF
    using XSL file.
  • Cocoon can be integrated with Tomcat.
  • It is used when
  • The source of data is in XML format
  • If you want to completely separate data from
    presentation

55
Working with XML
  • First, How to describe tags
  • DTD
  • XSD
  • We should have a parser to extract content of xml
    file
  • SAX Parsers
  • DOM parsers
  • XML binding tools
  • Transform XML files to (X)HTML or other XML
    types.
  • XSLT
  • Address a point in XML file
  • XPath
  • Query XML file for specific data
  • XQuery

56
XPath
  • XPath (XML Path Language) is a terse (non-XML)
    syntax for addressing portions of an XML
    document.
  • XPath further defines a library of standard
    functions for working with strings, numbers and
    Boolean expressions, as well as supporting a
    number of utility operators.
  • A typical XPath expression is a Location Path
    consisting of a string of element or attribute
    qualifiers separated by forward slashes ("/")

57
Some Examples
  • The Root element /
  • All elements everywhere (implementations of this
    expression can be very slow) //
  • All Top Level Elements (children of Root) //
  • The fifth child element under an element named
    "FOOB" FOOB5
  • The element FOOB whose BAZ attribute is "untrue"
    FOOB _at_BAZ "untrue"

58
XQuery
  • XQuery is a programming language under
    development by the W3C that's designed to query
    collections of XML data.
  • XQuery provides a mechanism to extract and
    manipulate data from XML documents or any data
    source that can be viewed as XML such as
    relational databases or office documents.
  • It is semantically similar to SQL.
  • XQuery uses XPath syntax to address specific
    parts of an XML document.
Write a Comment
User Comments (0)
About PowerShow.com