XML - PowerPoint PPT Presentation

1 / 20

About This Presentation

Title:

XML

Description:

DOM treats XML content as tree and can be used to access XML data stored in databases. ... DOM does not support any form of declarative querying however. ... – PowerPoint PPT presentation

Number of Views:28

Avg rating:3.0/5.0

Slides: 21

Provided by: nikisa

Category:

Tags: xml | dom

more less

Transcript and Presenter's Notes

Title: XML

1
XML

Name Niki Sardjono
Class CS 157A
Instructor Prof. S. M. Lee

2
Introduction

XML stands for Extensible Markup Language
Its root is in document managements and derived
from Standard Generalized Markup Language (SGML)
XML can represent Database data and other kinds
of structured data.

3
Background

The root is a document Markup language
Markup refers to anything in a document that is
not meant to be part of the printed output.
For the family of markup language (HTML,SGML, and
XML), the markup takes the form of tags enclosed
in angle brackets, ltgt, and are always used in
pair with lttaggt and lt/taggt for beginning and
ending of the document where the tag refers.
Example would be lttitlegt Database lt/titlegt

Unlike HTML however, XML does not prescribe the
set of tags allowed and tags can be specialized
as needed.
Compared to storage of data in database, XML can
be inefficient since tag names are repeated
throughout the documents. However XML can have an
advantage if its used to exchange data.
- the presence of tags makes message self
documenting (schema dont need to be
consulted to understand meaning of text).
- The format of the document is not rigid.
- XML format is widely accepted.
XML in a sense is becoming the dominant
format for data exchange.

5
Structure of XML Data

The fundamental construct in XML document is the
element (a pair of matching start-and end-tags
and the text between them)
XML documents must have a single root element
that encompasses all other elements in a
document. Examples ltaccountgt ltbalan
cegt lt/balancegt lt/accountgt
Text is said to appear in the context of an
element if it appears between the start-tag and
end-tag of that element and tags are properly
nested if every start-tag has a unique matching
end-tag that is in the context of the same parent
element.

Nested representations are widely used in XML
data interchange applications to avoid joins
XML specifies the notion of an attribute.
Attributes are strings and do not contain markup,
and can appear only once in a given tag.

Example would be
ltaccount
acct-type checkinggt
ltaccount-numbergt A-120lt/account numbergt
ltbranch-namegt Perryridge
lt/branch-namegt ltbalancegt 400
lt/balancegt
lt/accountgt
A name space mechanism has been introduced in XML
to allow organizations to specify globally unique
names to be used as element tags.
The idea is to prepend each tag or attribute with
a universal resource identifier (Example would be
Web Address.), but using long namespace would be
inconvenient, so namespace standard provides a
standard to use abbreviation for identifiers.

Example ltbank xminsFB http//www.FirstBank.
comgt ltFBbranchgt
.
lt/FBbranchgt
We can use default namespace in the example above
by using xmins instead of xminsFB. In the root
element.

9
XML Document SchemaDocument type definition

DTD (Document Type Definition) is an optional
part of XML.
The main purpose of DTD
To constrain and type the information present in
the document, but only constrains the appearance
of subelements and attributes within an element.
DTD is a list of rules for what pattern of
subelements appear within an element.
Operators used are
specifies one or more
specifies or
specifies zero or more
? specifies optional elements

Attributes can be specified into several types
such as
CDATA character data
ID unique identifier for the element.
IDREF a reference to an element which uses a
value that appears in ID attribute in
some elements in the document.
IDREFS is a list of identifiers.
Limitations on DTDs as schema mechanism
Individual text elements and attributes cannot be
further typed, which is quite problematic for
data processing and exchange applications.
Difficult to use DTD to specify unordered sets of
subelements.
Lack of typing in ID IDREF which will lead to
impossibility to specify the type of element to
which an IDREF IDREFS should refer.

11
XMLSchema

XMLSchema is a more sophisticated schema language
compared to DTD.
Benefits compared to DTD
Allows user-defined types to be created.
Allows the text that appears in elements to be
constrained to specific types.
Allows types to be restricted to create
specialized types, for instance by specifying min
and max values.
Allows complex types to be extended by using form
of inheritance.
Is a superset of DTDs.
Allow uniqueness and foreign key constraints.
It is integrated with namespaces to allow
different parts of documents to conform to
different Schema.
It is itself specified by XML syntax.
Disadvantage of it is XMLSchema is significantly
more complicated compared to DTDs.

12
Querying and Transformation

Querying and Transformation are essential to
extract information from large bodies of XML
data, and convert it to different representations
(schemas) in XML.
Several languages provide increasing degrees of
querying and transformation capabilities
XPath is a language for path expressions, and is
actually a building block for the remaining two
query languages.
XSLT is the transformation language (part of XLS
style sheet system, used to control the
formatting of XML data to HTML or other). It can
generate XML as output.
XQuery is the standard for querying of XML data.
All of these languages use the tree model of XML
data, where nodes correspond to elements and
attributes.

13
XPath

Path expression in XPath is a sequence of
locations steps separated by /. Example would
be /bank-2/customer/name/text()
Its the same with directory structure where the
initial / is the root and the other / are above.
It is also inspected from left to right.
If an element name appears before the next /,
it will refer to all the elements of the
specified name that are children of elements in
the current element set. Attributes can also be
accessed by using the character _at_. Example
would be /bank-2/account/_at_account-num which
will return a set of all values of account-number
attributes of account elements. IDREF however by
default are not followed.

Xpath supports a number of other features
Selection predicates may follow any step in a
path and contained in square brackets. Example
/bank-2/accountbalance gt 400.
Provides several functions that can be used as
part of predicates including testing the position
of the current node in sibling order and counting
the number match. Example
/bank-2/account/customer/count()gt2
Function id(foo) returns nodes(if any) with an
attribute of type ID and value foo.
The operator allows expression results to be
unioned. For example /bank-2/account/id(_at_
owner) /bank-2/loan/id(_at_borrower) will return
customers with either accounts or loans. However,
the operator cant be nested inside other
operators.
Can skip multiple level of nodes by using //
Each step need not select from the children of
the nodes in the current node set.

15
XSLT

XML Style Language (XSL) was originally designed
for generating HTML from XML. The language
however includes a general-purpose transformation
mechanism, called XSL Transformation (XSLT).
XSLT transformations is expressed as a series of
recursive rules, called templates.
Structural recursion is important in XSLT due to
the fact that the data are based on tree
structure. So XSLT can use recursion to apply
template rules recursively on subtrees.
XSLT has a feature called key which is similar to
id() in goals, but can use more than the ID
attributes. Example
ltxslkey name acctno matchacctno
useaccount number/gt where name is to
distinguish keys, match to specify which nodes
the key applies, and use which expressions to be
used as value of the key.

16
XQuery

Built by the world wide web consortium (w3c).
Organized into FLWR comprising of for, let,
where, and return.
for gives a series of variables that range over
the results of XPath expressions. Where more than
one var. is specified, the result will include
Cartesian product of possible values the variable
can take.
let allow complicated expressions to be assigned
to variable names for simplicity of
representation.
where performs additional tests on joined tuples
from the for section.
return allows the construction of result in XML.
Example for x in /bank-2/account
let acctno x/_at_account-number
where
x/balance gt 400
return
ltaccount-numbergt acctnolt/account-numbergt

17
Application Program Interface

Two standards which is DOM (document object
model) and SAX (Simple API for XML).
DOM treats XML content as tree and can be used to
access XML data stored in databases. XML
databases can also be built using DOM as its
primary interface for accessing and modifying
data. DOM does not support any form of
declarative querying however.
SAX is an event model, where it provides a common
interface between parsers and applications.

18
Storage of XML Data

Using a relational database.
If data from XML was generated from relational
schema, the converting process is straight
forward. If its not however, there are several
alternatives to approach this problem
Store as string
store each child element of the top-level
element as a string in a separate tuple in
database. It is easy to use, however the database
system does not know the schema of the stored
elements. A partial solution to that problem
would be to store different types of elements in
different relations, and also store the values of
some critical elements as attributes of the
relation to enable indexing. Drawback of this
type of storage is that a large part of the XML
information is stored within strings.

Tree representation

use a tree structure where elements attributes
in XML data is given a unique identifier. Tuple
inserted in the nodes deoends on identifier(id),
type (attribute or element), the name of the
element or attribute(label), and the ext value of
element or attribute(value). Advantage would be
that all XML information can be represented
directly in relational form, and many XML queries
can be translated into relational queries and
executed inside the database system. The drawback
would be that each element gets broken up into so
many pieces and will require a large number of
join to assemble elements.
Map to relations

XML elements whose schema is known are mapped to
relations and attributes. If its unknown it will
be stored as strings or as tree representation.
There is also Nonrelational Data Stores which is
Store in flat files

lacks data isolation, integrity checks,
atomicity, concurrent access, and security.
Store in an XML Database