MSc IT UFIE8K10M Data Management Prakash Chatterjee Room 3P16 prakash'chatterjeeuwe'ac'uk http:www'c - PowerPoint PPT Presentation

About This Presentation
Title:

MSc IT UFIE8K10M Data Management Prakash Chatterjee Room 3P16 prakash'chatterjeeuwe'ac'uk http:www'c

Description:

... applications (and the list is growing rapidly) include XML Signature, XML ... true native XML database system uses trees of nodes as the fundamental storage ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 18
Provided by: pc208
Category:

less

Transcript and Presenter's Notes

Title: MSc IT UFIE8K10M Data Management Prakash Chatterjee Room 3P16 prakash'chatterjeeuwe'ac'uk http:www'c


1
MSc IT UFIE8K-10-M Data ManagementPrakash
ChatterjeeRoom 3P16prakash.chatterjee_at_uwe.ac.uk
http//www.cems.uwe.ac.uk/pchatter/courses/msc/dm
  • Lecture 10 XML XML Databases

2
Definition
Extensible Markup Language, abbreviated XML,
describes a class of data objects called XML
documents and partially describes the behavior of
computer programs which process them. XML is an
application profile or restricted form of SGML,
the Standard Generalized Markup Language ISO
8879. By construction, XML documents are
conforming SGML documents. Extensible Markup
Language (XML) 1.0 (Third Edition) W3C
Recommendation 04 February 2004
3
So what is it really?
A document syntax (markup) standard for text
documents that is simple and open
(non-proprietary) for electronic data exchange
and storage. It is flexible and eXtendable (Xml)
because it allows users to create their own
vocabularies (new markup languages) - no fixed
set of tags as in HTML or XHTML. XML documents
contain only data delimited by tags no
formatting instructions or style.
4
A little history
Developed by an XML Working Group formed under
the auspices of the World Wide Web Consortium
(W3C) in 1996. A subset of SGML (Standard
Generalized Markup Language) originally designed
to meet the challenges of large-scale electronic
publishing. XML now adopted in fields as diverse
as law, healthcare, insurance, multimedia, web
publishing, EDI, telecommunications, aeronautics,
engineering, software, hospitality, tourism,
retail, stock trading, etc. etc. etc.
5
Design goals
The original design goals for XML were - that it
should be straightforwardly usable over the
Internet. - that it should support a wide variety
of applications. - that it be compatible with
SGML. - that it should be easy to write programs
which process XML documents. - that the number
of optional features in XML were to be kept to
the absolute minimum, ideally zero. - that XML
documents should be human-legible and reasonably
clear. - that the XML design would be prepared
quickly. - that the design of XML would be formal
and concise. - that XML documents would be easy
to create. - that terseness in XML markup was to
be of minimal importance.
6
Example XML document
lt?xml version"1.0" encoding"UTF-8"?gt ltpatient
nhs-no"7503557856"gt ltnamegt
ltfirstgtJosephlt/firstgt
ltmiddlegtMichaellt/middlegt
ltlastgtBloggslt/lastgt ltprevious /gt
ltpreferredgtJoelt/preferredgt lt/namegt
lttitlegtMrlt/titlegt ltaddressgt
ltstreetgt2 Gloucester Roadlt/street1gt
ltstreet /gt ltstreet /gt
ltcitygtBristollt/citygt
ltcountygtAvonlt/countygt ltpostcodegtBS2
4QSlt/postcodegt lt/addressgt lttelgt
lthomegt0117 9541054lt/homegt
ltmobilegt07710 234674lt/mobilegt lt/telgt
ltemailgtjoe.bloggs_at_email.comlt/emailgt
ltfax /gt lt/patientgt
7
Other formats
pipe dilimited nhs-nofirstmiddlelastprevious
preferred. emailfax 7503557856JosephMic
haelBloggsJoe.joe.bloggs_at_email.com
relational table
Patient
nhs-no
7503557856
first
Joseph
middle
Michael
8
Tree view of example XML document
(all xml documents are hierarchical in structure)
KEY
element
attribute
content
9
Well-formed XML documents (1)
Every XML document must be well-formed and must
therefore adhere to the following rules (among
others)
  • Every start-tag must have a matching end tag.
  • Elements may nest but must not overlap.
    ltnamegtAnnaltemgtCoffeylt/emgtlt/namegt - v
    ltnamegtltemgtAnnalt/namegtCoffeylt/emgt -
  • There must be exactly one root element.
  • Attribute values must be quoted.
  • An element must not be quoted.
  • Comments and processing instructions may not
    appear inside tags.
  • No unescaped lt or signs may occur in the
    character data of an element.

Note A XML document may be well-formed but not
valid. A valid document requires a declaration
that identifies a Document Type Definition (DTD)
or Schema that the document conforms to. This
ensures that the document meets various grammar
rules for each of its elements and attributes,
their order and the values that are allowed. A
validating parser can check the document to
ensure these rules are met. We will look at XML
Schemas in some detail in the next lecture.
10
Well-formed XML documents (2)
Element names are case sensitive - ltNAMEgt,
ltnamegt, ltNamegt ltNaMegt are four different
element types. No white spaces in element name -
ltFirst Namegt not allowed ltFirst_Namegt
OK. Element names cannot start with the letters
XML or xml reserved terms. Element names
must start with a letter or a underscore. Element
names cannot start with a number but numbers may
be embedded within an element name - lt2yougt not
allowed ltme2yougt is OK. Attribute names are
constrained by the above rules for element
names. Entity references are used to substitute
specific characters. There are five predefined
entities built into XML Entity Char Notes amp
Do not use inside processing instructions lt lt
Use inside attribute values quoted with
. gt gt Use after in normal text and inside
processing instruction. quot Use inside
attribute values quoted with . apos Use
inside attribute values quoted with .
11
XML Namespaces
  • Namespaces serve two functions in the XML
    specification
  • To distinguish between elements and attributes
    from two different vocabularies with different
    meanings that might share the same name and hence
    avoid naming collisions.
  • To group all the related attributes from a single
    XML application together so that software can
    easily recognise them.

Consider the following fragments from two
different documents ltnamegtBernadette
Coffeylt/namegt and ltnamegtHegel in a
Nutshelllt/namegt The first ltnamegt element refers
to the name of a person and the second to the
name of a book. If we were to build a merged
document (say Bernadettes reading list) we will
have a collision since there are two ltnamegt
elements with different meanings. Namespaces can
distinguish between the two by using
prefixes. ltstudentnamegtBernadette
Coffeylt/studentnamegt and ltbooknamegtHegel in a
Nutshelllt/booknamegt Each element has a prefix
corresponding to a uniform resource identifier
(URI) that uniquely identifies the namespace e.g.
ltstudent xmlns http//www.uwe.ac.uk/CEMS/Student
sgt and ltbook xmlns http//www.uwe.ac.uk/Library/
Booksgt BUT dont confuse URIs with URLs.
URLs are a subset of URIs that locate resources
based on a network filename concept. A URL is a
path to a file or resource on the Web. A URI used
as a namespace is simply a unique name.
12
XML Applications (1)
XSLT Extensible Stylesheet Language
Transformations is an application for specifying
rules which transform one XML document into
another document. It uses template rules in the
stylesheet to match patterns in the input
document and when a match is found it writes the
template from the rule to the output tree.
13
XML Applications (2)
XLinks - is the XML Linking Language. It defines
how one document links to another. It is divided
into two parts XLinks and XPointer (which
identifies a particular part of the document (re
anchors in HTML)). XPath XPath is a non-XML
language for identifying particular parts of an
XML document. It is designed to be used in
conjunction with the Extensible Stylesheet
Language Transformations (XSLT) and XPointer.
XForms is the W3Cs name for a specification
of Web forms that can be used with a wide variety
of platforms including desktop computers, hand
helds, information appliances and even
paper. XQuery an XML based query language to
extract data from real or virtual documents
providing the needed interaction between the Web
and databases. SVG Scalable Vector Graphics. A
XML application which describes vector graphics
data for JPEG, GIF and PNG for distribution and
display over the web. Other applications (and
the list is growing rapidly) include XML
Signature, XML Encryption, Web Services (SOAP,
WDSL UDDI), XML Key Management, Synchronized
Multimedia Integration Language (SMIL), etc. etc.
etc.
14
XML Vocabularies
XHTML the Extensible HyperText Markup Language
which reproduces and extends HTML. An XHTML
document conforms to all rules required of a well
formed XML document and drops many of the weak
features of HTML e.g. the ltfontgt tag. WML the
Wireless Markup Language is a strict HTML type
vocabulary for use with wireless-enabled devices
such as mobile phones, PDAs pagers. InkML
For representing digital ink data that is input
with a pen. MathML For the inclusion of
mathematical formulas in web pages and machine to
machine communications. CML Chemical Markup
Language is a XML vocabulary for representing
molecular and chemical information. A formula
can be transformed into a graphic represenation
for displaying on a web page. Others standardized
vocabularies include the Banking Industry
Technology Secretariat (BITS) Financial Exchange
(IFX) Bank Internet Payment System (BIPS)
Telecommunications Interchange Markup (TIM)
Common Business Library (xCBL) Electronic
Business XML Initiative (ebXML) Product Data
Markup Language (PDML) Financial Information
eXchange protocol (FIX) The Text Encoding
Initiative (TEI) and hundreds of others.
15
Relational v. xml approach to data
16
Approaches to structuring xml (1)
  • storing XML in VARCHAR or BLOBS
  • offers xpath/xquery but not much else
  • storing XML in shredded form
  • XML document is decomposed according to specified
    rules into one or more relational tables and
    reconstructed back on retrieval
  • Pros and when to use it
  • The XML schema is stable XML is only used as
    transfer format and document structure is not
    relevant Incoming XML data must be integrated
    with existing relational data the structure of
    XML documents is simple to allow for easy
    mapping performance of query is more important
    then insert.
  • Cons and when to avoid
  • Document structure is too complex to be mapped
    into tables performance of insert is important
    document structure needs to be preserved full
    retrieval of documents is frequent XML schema
    frequently changes or does not exist data in XML
    document is sparse

17
Approaches to structuring xml (2)
  • native xml db
  • native XML database is a system which processes
    and stores XML data using XML data model. A true
    native XML database system uses trees of nodes
    as the fundamental storage and processing model
  • Pros
  • No mapping between data models document
    structure and order preserved XQuery XPath can
    be processed without translation to SQL no
    parsing required at query or update time
    documents with or without schemas can be stored
    in native store without the need to adjust for
    complex mappings sub document update is fast
  • When not
  • the document collection isnt order centered
    applications must run XQueries that can easily be
    expressed in SQL where there is no need to
    construct XML documents that are different from
    the ones that were inserted into the database.
Write a Comment
User Comments (0)
About PowerShow.com