Title: XMLAn Introduction
1XML-An Introduction
- The eXtensible Markup Language (XML) created by
the World Wide Web Consortium (W3C) in 1996 to
address limitations of HTML - XML a language similar to HTML, but more
extensible - Supports user defined tags that allow both data
and metadata (i.e. data about data) to be - stored in a single document
- At the same time, presentation aspects remain
decoupled from data representation
2A Brief History
- HTML and XML are like children of the same
parent, Standardized General Markup Language
(SGML). - SGML, made a standard of the ISO in 1986
- SGML originated in IBM, which wanted a
means of publishing document content in
different ways. - The result of the standards process A rich
document markup language, allowing authors to
separate logical content from its presentation - SGML, a series of commands understood by another
program.
3Why Another Markup Language?
- The question to be asked is What's Wrong With
SGML or HTML? - SGML is very large, powerful and COMPLEX.
- SGML used in industry, for commercial purposes
for over a decade. - SGML too complex to program for a Web environment
4Why Another Markup Language? contd
- HTML can be thought of as a small application of
the SGML used on the web - HTML defines a very simple class of report-style
documents, with section headings,paragraphs,
lists, tables, and illustrations etc. - It was the first computer language that could be
understood and used by the masses. It gave the
Web to the common person. - HTML is said to be static, one can do limited
things with HTML
5Advantages of XML over HTML
6XML
- XML allows users to
- Bring multiple files together to form compound
documents - Identify where illustrations are to be
incorporated into text files, and the format used
to encode each illustration - Provide processing control information to
supporting programs, such as document validators
and browsers - Add editorial comments to a file
7XML Components
- XML is based on the concept of documents composed
of a series of entities - Each entity can contain one or more logical
elements - Each of these elements can have certain
attributes (properties) that describe the way in
which it is to be processed
8XML Few Important Points
- Tag names are case sensitive
- Every opening tag must have a corresponding
closing tag - A nested tag pair cannot overlap another tag
- Attribute values must appear within quotes
- Every document must have a root element
9XML Editor
- XML documents are raw text documents
- Any simple text editor can be used as an XML
editor - For eg., Windows users can use windows notepad or
Wordpad - Microsoft XML editor Microsoft XML notepad
- Java based XML editor
10XML Document
- ltexampledocgt - the root element of the document.
- lteqgt - a question and its associated answers.
- question a question.
- a the first possible answer to a question.
- b the second possible answer to a question
- c the third possible answer to a question.
11XML Document (contd)
- lt?xml version 1.0?gt
- ltexampledocgt
- lteq answer agt
- ltQuestiongt
- In 1994, a man had an accident
while robbing a pizza restaurant in Akron, Ohio,
that resulted in his arrest. What happened to
him? - lt/Questiongt
12XML Document (contd)
- ltAgt he slipped on a patch of grease on the
floor and knocked himself out. lt/Agt - ltBgthe backed into a police car while
attempting to drive off. lt/Bgt - ltCgthe choked on a breadstick that he had
grabbed as he was running out. lt/Cgt - lt/ttgt
- lt/exampledocgt
13Viewing XML Document
- Style sheet is the best way to view an XML
document. - Style sheet is a series of formatting
descriptions that determines how elements are
displayed on a web page. - In simple english, a style sheet controls how a
web page content looks like in a web browser.
14A CSS for Example doc XML Document
- CSS for eq tag
- eq
- Display block
- Width 750px
- Padding 10px
- Margin-bottom 10px
- Border 4px double black
- Background color silver
15Style Sheets (contd)
- In the absence of a style sheet, internet
explorer or any browser just displays the XML
code - To attach the style sheet to the document, add
the following line of code just after the XML
declaration for the document - lt?xml-stylesheet type text/css
hrefexampledoc.css?gt
16Is XML a Database?
- XML and its surrounding technologies constitute a
"database" in the looser sense of the term i.e.
database management system (DBMS). - XML provides many of the things found in
databases storage (XML documents), schemas
(DTDs, XML schema languages) - Query languages (XQuery, XPath, XQL, XML-QL,
QUILT, etc.), programming interfaces (SAX, DOM,
JDOM) etc.
17Is XML a Database? Contd..
- But lacks many of the things found in real
databases efficient storage, indexes,
security,transactions and data integrity,
multi-user access, triggers, queries across
multiple documents - Use XML documents as database in environments
with small amounts of data, few users, and modest
performance requirements. - Fails in an environment, with many users, strict
data integrity requirements, and the need for
good performance.
18XML And Databases
- XMLs proliferation raises questions how is data
transferred by XML documents to be read, stored
and queried. - In other words how do DBMSs handle XML
documents??? - Two ways to look at XML Documents Data-Centric
and Document Centric documents.
19Data Centric Documents
- Data-Centric documents use XML as a data
transport - Such documents usually are found in
business-to-business applications - Examples Buyer-supplier trading automation,
Sales orders, Flight Schedules, Scientific data - Data-centric documents have a regular structure
- Data originates both in the database (in which
case we want to expose it as XML) and outside the
database (in which case we want to store it in a
database)
20Example - Data Centric Document
ltEmployeesgt ltEmployee JobCode"A1"gt ltDept
No"1"/gt ltEmpNogt1234lt/EmpNogt
ltFirstNamegtJohnlt/FirstNamegt
ltLastNamegtDoelt/LastNamegt ltHireDategt1998-02-11lt
/HireDategt lt/Employeegt ltEmployee
JobCode"B3"gt ltDept No"2"/gt
ltEmpNogt5678lt/EmpNogt ltFirstNamegtJoylt/FirstNamegt
ltLastNamegtBlacklt/LastNamegt
ltHireDategt1998-03-09lt/HireDategt
lt/Employeegt lt/Employeesgt
21Data Centric Documents
- To manage Data-Centric documents, there need to
be data extraction as well as data formatting
services - Data Extraction Receive XML documents from a
network, and extract structured data from them,
to be stored in a DBMS - To support data extraction, a mapping must be
defined between XML documents and the DBMS data
model - Data extracted stored in a table, follows a
predefined schema. (that is why called a
structured representation) - The original XML documents structure is not
maintained in this case
22Example Data Extraction
ltclientsgt ltrowgt ltnumbergt 7369 lt/numbergt ltfirstname
gt Paul lt/firstnamegt ltlastnamegt Smith
lt/lastnamegt lt/rowgt ltnumbergt 7000
lt/numbergt ltfirstnamegt Steve lt/firstnamegt ltlastname
gt Adam lt/ lastnamegt lt/rowgt lt/clientsgt
- Number First Name Last Name
- Steve Adam
- 7369 Paul Smith
23Data Centric Documents
- Data Formatting XML encoding software, takes
result of a query expressed in a DBMS Query
language, and encode the resulting data in an XML
document to be transferred over the network. - To support data formatting, implement a sort of a
reverse formatting with respect to data
extraction - After a set of tuples is selected from the
database with a database query, data formatting
services transform it into an XML document
24Data Formatting - Data centric document
Select FirstName, LastName From Clients Where
number 7369
Table Clients
Xml document ltclientsgt ltrowgt ltfirstn
amegt Paul lt/firstnamegt ltlastnamegt Smith
lt/lastnamegt lt/rowgt lt/clientsgt
25Document Centric Documents
- In this view, XML documents are
application-relevant objects, i.e. new data
objects to be stored and managed by a DBMS - The meaning of the XML document depends on the
document as a whole. - Structure is more irregular, and data are
heterogeneous - Examples books, email, advertisements
- Unlike data-centric documents, they usually do
not originate in the database
26Document Centric Documents
- Document centric documents are application-relevan
t objects - The meaning of the XML document depends on the
document as a whole. - Structure is more irregular, and data are
heterogeneous - Unlike data-centric documents, they usually do
not originate in the database
27Example - Document Centric Documents
ltProductgt ltIntrogt The ltProduct NamegtTurkey
Wrench lt/Product Namegt from ltDevelopergtFull
Fabrication Labs, Inc.lt/Developergt is
ltSummarygtlike a monkey wrench, but not as
big.lt/Summarygt lt/Introgt ltDescriptiongt ltParagtThe
turkey wrench, which comes in ltigtboth right- and
left-handed versions (skyhook optional)lt/igt,is
made of the ltbgt finest stainless steellt/bgt. The
Ready-grip rubberized handle quickly adapts to
your hands your hands, even in the greasiest
situations. Adjustment is possible through a
variety of custom dials.lt/Paragt ltParagtYou
canlt/Paragtltlistgt ltItemgtltLink URL"Order.html"gtOrd
er your own turkey wrenchlt/Linkgtlt/Itemgt lt/listgt lt/
Descriptiongt lt/Productgt
28Document Centric Documents
- This type of document requires a DBMS enhanced
with new data types for representing XML data
types - New capabilities for querying and managing the
documents - Two types of data types devised are
- Unstructured representation
- Hybrid representation
29Document Centric Documents (Unstructured)
- Unstructured representation
- A single data field inside the DBMS is managed by
the DBMS - A single data field outside the DBMS, but linked
to the DBMS. In this case the operating system
manages it - For unstructured XML documents, DBMSs extend
query languages with XML based selection
conditions
30 Example - Unstructured
ltclientsgt
ltrowgt 10 ltnumbergt 7369
lt/numbergt ltfirstnamegt Paul lt/firstnamegt ltlastnamegt
Smith lt/lastnamegt lt/rowgt ltrowgt ltnumbergt 7000
lt/numbergt ltfirstnamegt Steve lt/firstnamegt ltlastname
gt Adam lt/lastnamegt lt/rowgt lt/clientsgt
XML Document ltclientsgt ltrowgt ltnumbergt
7369 lt/numbergt ltfirstnamegt Paul
lt/firstnamegt ltlastnamegt Smith lt/lastnamegt lt/rowgt lt
rowgt ltnumbergt 7000 lt/numbergt ltfirstnamegt Steve
lt/firstnamegt ltlastnamegt Adam lt/lastnamegt lt/rowgt lt/
clientsgt
Id
10
31Document Centric Documents --Hybrid
- Hybrid Representation
- Combination of Structured and unstructured type.
- Useful while mixing types, such as structural
information about a book, but unstructured
information consisting of the contents or
chapters of the book.
32Example -- Hybrid
33Commercial Support In Databases
- Oracle 8i
- Has extended architecture with tools to manage
XML documents - Supports structured, unstructured and hybrid
representation of XML documents - XML-SQL utility supports data extraction and data
formatting for data-centric documents - Document-centric data stored using CLOB
(character large object)
34Commercial Support In Databases
- IBM DB2
- The XML Extender provides features to store and
manage XML documents - Handles structured, unstructured as well as
hybrid types - Data centric documents stored in a set of
relational tables containing data extracted from
XML documents - The Extender supports storage and access methods
to compose an XML document from existing data or
decompose data from an XML document
35Commercial Support In Databases
- Document-centric documents stored as either
XMLClob or XMLVarChar or XML File - Microsoft SQL Server
- Data-centric The OpenXML function extracts data
from XML document and stores it in a relational
database - Extending the Select-From-Where statement with
the FOR XML clause provides XML formatting of a
query language - Permits construction of XDR Schemas Schemas that
generate views of the database in XML format,
which can be queried with XPath.
36Data-centric and Document-centric
- In practice, the distinction between data-centric
and document-centric documents is not always
clear. - For example, a data-centric document, such as an
invoice, might contain irregularly structured
data, such as a part description. - An otherwise document-centric document, such as a
user's manual, might contain regularly
structured data (often metadata), such as an
author's name and a revision date.
37Document Schema,Database Schema
- A schema is a set of rules that defines the
structure of any document or database - Database schema describes over all structure of
the database. - Document schema describes exact elements and
attributes available with in a given markup
language along with association between
attributes and elements and relationship between
elements - The schema will allow XML documents to be
validated for accuracy -
38 Document schemas
- There are two different approaches for creating
schemas in XML documents - Document Type Definition(DTD)
- XML Schema Definition(XSD)
- A DTD describes vital information about the
structure of XML document i.e, it lists element
types,attributes and their relationships to each
other - It sets out what names are to be used for the
different types of element, where they may occur,
and how they all fit together -
39Limitations of DTD
- Non XML syntax
- No data-type facility
- Employs a closed-data model which does not allow
much flexibility to extend markup languages
40XML Schema
- XSDs are not only significant in defining XML
structures but also in providing data type
capabilities to XML - Coded in XML tags
- Supports Integrity constraints such as Primary
and foreign keys etc. - Represents an open-ended data model allowing to
extend custom markup languages and establishing
complex relationships between elements
41Mapping Document Schemas to Database
Schema
- Two mappings used commonly Table-based mapping
and Object-relational mapping - The data transfer software is built on top of
this mapping. - Use an XML query language (such as XPath, XQuery,
or a proprietary language) - OR
- Simply transfer data according to the mapping
(the XML equivalent of SELECT FROM Table). -
42Table Based mapping
- Used by many of the middleware products that
transfer data between an XML document and a
relational database - It models XML documents as a single table or set
of tables. That is, the structure of an XML
document must be as follows - ltdatabasegt
- lttablegt
- ltrowgt
- ltcolumn1gt...lt/column1gt
- ltcolumn2gt...lt/column2gt
- ...
- lt/rowgt
- ltrowgt
- ...
- lt/rowgt
- ...
- lt/tablegt
- lttablegt
- ...
- lt/databasegt
43Table based mapping
- Advantages
- Its simplicity because it matches
structure of tables and result sets in
relational databases - Mainly useful for transferring data
between databases - Disadvantages
- Applies to only limited set of XML documents
- It doesn't exploit ability of XML to
represent hierarchies of data - It doesnt preserve physical structure
i.e., DTD
44Object-relational mapping
- The object-relational mapping is used by all
XML-enabled relational databases and some
middleware products. - It models the data in the XML document as a tree
of objects that are specific to the data in the
document. - Objectrelational mapping is done in two steps
- An Document Schema( DTD) is mapped to object
schema - The object schema is mapped to database
schema
45Object-relational mapping Contd..
- In this model, element types with attributes are
generally modeled as classes. - The model is then mapped to relational databases
using traditional object-relational mapping
techniques - i.e. Classes are mapped to tables, scalar
properties are mapped to columns, and object
valued properties are mapped to primary key /
foreign key -
46Object-relational mapping - contd
For example , consider the following XML document
ltSalesOrdergt ltNumbergt1234lt/Numbergt
ltCustomergtABC Industrieslt/Customergt
ltDategt29.10.00lt/Dategt ltItem Number"1"gt
ltPartgt123lt/Partgt ltQuantitygt12lt/Quant
itygt ltPricegt10.95lt/Pricegt lt/Itemgt
ltItem Number"2"gt ltPartgt456lt/Partgt
ltQuantitygt600lt/Quantitygt
ltPricegt3.99lt/Pricegt lt/Itemgt lt/SalesOrdergt
47Object-relational mapping Contd..
Which maps to the following objects
Object SalesOrder number 1234 Customer
ABC Industries orderdate 12.15.98 Items
ptrs to Item Objects
Object Item Number 2 Part 456 Quantity
600 Price 3.99
Object Item Number 1 Part 123 Quantity
12 Price 10.95
48Object-relational mapping Contd..
and then to rows in the following tables
SaleOrders ---------- Number Customer
Date ------ --------------------
-------- 1234 ABC Industries
29.10.00 ... ... ...
... ... ...
Items ----- SONumber Item Part
Quantity Price -------- ----
---- -------- ----- 1234 1
123 12 10.95 1234
2 456 600 3.99 ... ...
... ... ...
49Query Languages
- Use of XSLT or Integrate limited number of
transformations into mappings - Long TermImplementation of query languages that
return XML - Almost all of XML query languages (including
XQuery 1.0) are read-only, so different means
needed to insert, update, and delete data - In the long term, XQuery will add these
capabilities
50Template-Based Query Languages
- Most of these languages rely on SELECT statements
embedded in templates - lt?xml version"1.0"?gt
- ltFlightInfogt
- ltIntroductiongtThe following flights have
available - seatslt/Introductiongt
- ltSelectStmtgtSELECT Airline, FltNumber, Depart,
Arrive FROM - Flightslt/SelectStmtgt
- ltFlightgt
- ltAirlinegtAirlinelt/Airlinegt
- ltFltNumbergtFltNumberlt/FltNumbergt
51Template-based Query languages - contd
- The result of processing such a template might
be - lt?xml version"1.0"?gt
- ltFlightInfogt
- ltIntroductiongtThe following flights have
available - seatslt/Introductiongt
- ltFlightsgt
- ltFlightgt
- ltAirlinegtACMElt/Airlinegt
- ltFltNumbergt123lt/FltNumbergt
- ltDepartgtDec 12, 1998 1343lt/Departgt
- ltArrivegtDec 13, 1998 0121lt/Arrivegt
- lt/Flightgt ...
52SQL Based Query Languages
- SQL-based query languages use modified SELECT
statements, the results of which are transformed
to XML - The simplest of these uses nested SELECT
statements, which are transformed directly to
nested XML according to the object-relational
mapping
53XML Query Languages
- Template-based query languages and SQL-based
query languages can only be used with relational
databases - XML query languages can be used over any XML
document - To use with relational databases, the data in the
database must be modeled as XML, thereby allowing
queries over virtual XML documents - There are different types of XML Query languages
such as XQuery ,XPath etc .,
54XQuery -- An Introduction
- XQuery is a functional language in which a
query is represented as expression - An XQuery expression leverages the capabilities
of XML by allowing both specification of what is
being selected and designation of output format. - There are several types of expressions used in
Xquery such as path expressions, element
constructors, FLWR expressions, conditional
expressions etc.,,
55XQuery
- Either a table-based mapping or an
object-relational mapping can be used - If a table-based mapping is used, each table is
treated as a separate document and joins between
tables (documents) are specified in the query
itself, as in SQL - If an object-relational mapping is used,
hierarchies of tables are treated as a single
document and joins are specified in the mapping
56Xpath - An Introduction
- XPath is a set of syntax rules for defining parts
of an XML document - XPath uses path expressions to identify nodes in
an XML document - These path expressions look very much like the
expressions you see when you work with a computer
file system
57XPath
- An object-relational mapping is used to do
queries across more than one table (Xpath does
not support joins across documents) - If the table-based mapping used, it is possible
to query only one table at a time
58Native XML Databases (NXD)
- A native XML database defines a logical model
for an XML document-as opposed to data in that
document-and stores and retrieves documents
according to that model. - The model must include elements, attributes,
PCDATA and document order. - Eg Xpath data model, the XML Infoset, and the
most models implied by the DOM and the events in
SAX.
59NXDs (contd)
- It has an XML document as its fundamental unit
of logical storage. - Any particular underlying physical storage model
is not required. - An NXD does not really store the XML in true
native form (i.e., text).
60NXDs in Brief
- It is specialized for storing XML data and stores
all components of XML model intact. - An NXD may not actually be a standalone database.
- It does not represent a new low-level database
model, and is not intended to replace existing
databases. - Is simply a tool intended to assist the developer
by providing robust storage and manipulation of
XML documents.
61NXD Features
- XML Storage
- NXDs store XML documents as a unit and will
create a model that is closely aligned with XML
or one of XMLs technologies like the Infoset or
DOM. - Includes arbitrary levels of nesting and
complexity. - This model is automatically mapped by the NXD
into the underlying storage mechanism.
62NXD Features (contd)
- Collections
- NXDs manage collections of documents, allowing
you to query and manipulate those documents as a
set. - Any XML document can be stored in the collection,
regardless of the schema Schema-Independent
functionality. - In the future, it is likely that W3C XML Schema
will emerge as the schema language of choice for
NXDs.
63NXD Features (contd)
- Queries
- XPath is the current NXD query language of
choice. - In order to function as a database query
language, XPath is extended slightly to allow
queries across collections of documents. - XPath has several shortcomings which include lack
of grouping, sorting, cross documented joints and
support for data types. - These issues can be resolved by XSLT and XQuery.
64NXDs (contd)
- Native XML databases are the databases designed
especially to store XML documents. - Like other databases, they support features like
transactions, security, multi user access, query
languages etc., - They are mainly useful for storing
document-centric documents. - NXDs support XML query languages that execute
complex queries which are not possible in sql. - Eg.,In NXDs, data can be retrieved based on the
structural information, which is not possible in
SQL.
65NXDs (contd)
- NXD offers XML-specific capabilities such as, XML
query languages and will be faster at retrieving
whole document. - In NXDs we can store semi-structured data i.e.,
documents that do not have DTDs, to increase
retrieval speed. - NXDs can store and understand any XML document
without prior configuration. - Eg., Web search engines where no single or set
of DTDs apply to all documents.
66Application Areas of NXD
- Any application area that uses XML can use NXD.
- In general, NXDs excel at storing
document-oriented data (eg., XHTML or DocBlock). - If the data is represented as XML and is kind of
fuzzy, an NXD will probably be a good solution. - An NXD might not be the best tool to for
something like an accounting system where the
data is very well-defined and rigid.
67Application Areas (contd)
- Corporate Information Portals
- Catalog Data
- Manufacturing Parts Database
- Medical Information Storage
- Document Management Systems
- B2B Transaction Logs
- Personalization Databases.
68XML Programming Interfaces
- Programming interfaces give developers a
consistent interface for working with XML
documents. There are four of the most popular and
useful ones - Document Object Model (DOM)
- Simple API for XML (SAX)
- JDOM
- Java API for XML Parsing (JAXP).
69XML Parsers
- XML Parsers are programs which are able to read
XML syntax and get information required out of
it. - There are two kinds of XML Parsers
- Non-valid. For e.g. LARK, XP and HEX etc
- Valid For e.g. IBM's XML Parser for Java which
include DOM and SAX, Oracle XML parser,
XMLbooster and DXP etc
70Relationship Between XML Documents, Parsers and
Applications
XML parser
XML DTD (optional)
Application
Document
71Document object Model
- DOM was created by the W3C, and is an Official
Recommendation of the consortium. - It is defined as a set of interfaces to the
parsed version of an XML document. - DOM provides a rich set of functions that you can
use to interpret and manipulate an XML document - Â
72 Parse Get Info
XML Document
DOM
XML Parser
Application
73DOM issues
- It requires a significant amount of memory.
-
- The DOM creates objects that represent everything
in the original document, including elements,
text, attributes, and white space. - A DOM parser causes significant delays for large
documents.
74The Simple API for XML(SAX)
- To get around the DOM issues, the XML-DEV
participants (led by David Megginson) created the
SAX interface. - A SAX parser is event based.
- A SAX parser doesn't create any objects at all,
it simply delivers events to your application. - A SAX parser starts delivering events as soon as
the parse begins and the application starts
generating results right away.
75 Parse Information
XML Document
XML Parser
Application
Event handlers
76SAX issues
- SAX events are stateless
- SAX events are not permanent.
- SAX is not controlled by a centrally managed
organization
77JDOM
- Frustrated by the difficulty in doing certain
tasks with the DOM and SAX models, Jason Hunter
and Brett McLaughlin created the JDOM package - JDOM is a Java based-technology, open source
project that attempts to follow the 80/20 rule - JDOM works with SAX and DOM parsers
- The main feature of JDOM is that it greatly
reduces the amount of code
78The Java API for XML Parsing(JAXP)
- There are still several things that DOM, SAX, and
JDOM dont address. So, Sun has released JAXP,
the Java API for XML Parsing. - JAXP provides common interfaces for processing
XML documents using DOM, SAX, and XSLT.
79Which interface is right for you?
- Will your application be written in Java?
- How will your application be deployed?
- Once you parse the XML document, will you need to
access that data many times? - Do you need just a few things from the XML
source? - Are you working on a machine with very little
memory? -
80Applications Of XML
- There are several applications of XML which are
- Wireless Markup Language (WML) It is an XML
application which is designed specifically to
support wireless communication networks. - MathML It is an XML application which supports
mathematical and scientific markups for the use
on the web. - Scalable Vector Graphics (SVG) It is an
application of XML which is used for describing
two-dimensional graphics in XML.
81Applications Of XML (contd.)
- Resource Description Framework (RDF) A framework
for metadata to assure interoperability between
applications. - Synchronized Multimedia Integration Language
(SMIL) SMIL enables to integrates independent
multimedia objects into synchronized multi media
presentation. - Web Services A tool to access the Web browser,
such as SOAP, UDDI, WSDL, all these are XML based
technologies. - Other applications include VoiceML, VectorML and
MusicML etc.
82Conclusion
- Even though current DBMSs support XML, several
problems remain to be investigated. - Development of clustering algorithms for
persistent XML documents - Extension of support for XML query languages in
commercial databases - Development of access control models to provide
more secure content based access to XML
documents - Development of ad hoc indexing structures for
more efficient document access - Data centric Architectures need flexible
extraction and formatting mechanisms - Architectural support for document-centric
document management