XML and Databases - PowerPoint PPT Presentation

About This Presentation
Title:

XML and Databases

Description:

Need for more structural information than plain text, but less constraints on ... Programming APIs (for Java, C , etc.) Languages to manipulate XML (how many books? ... – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 42
Provided by: amelie3
Category:
Tags: xml | apis | databases

less

Transcript and Presenter's Notes

Title: XML and Databases


1
XML and Databases
  • 198541
  • Spring 2007

2
XML Motivation
3
XML Motivation
  • Huge amounts of unstructured data on the web
    HTML documents
  • No structure information
  • Only format instructions (presentation)
  • Integration of data from different sources
  • Structural differences
  • Closely related to semistructured data

4
Semistructured Data
  • Integration of heterogeneous sources
  • Data sources with non rigid structures
  • Biological data
  • Web data
  • Need for more structural information than plain
    text, but less constraints on structure than in
    relational data

5
Characteristics of Semistructured Data
  • Missing or additional tuples
  • Multiple attributes
  • Different types in different objects
  • Heterogeneous collection
  • Self-describing, irregular data with no apriori
    structure

6
HTML Document Example
Type of information
  • lth1gt Bibliography lt/h1gt
  • ltpgt ltigt Foundations of Databases lt/igt
  • Abiteboul, Hull, Vianu
  • ltbrgt Addison Wesley, 1995
  • ltpgt ltigt Data on the Web lt/igt
  • Abiteoul, Buneman, Suciu
  • ltbrgt Morgan Kaufmann, 1999

Title
Authors
Year
book
7
The Idea Behind XML
  • Easily support information exchange between
    applications / computers
  • Reuse what worked in HTML
  • Human readable
  • Standard
  • Easy to generate and read
  • But allow arbitrary markup
  • Uniform language for semistructured data
  • Data Management

8
XML
9
XML
  • eXtensible Markup Language
  • Universal standard for documents and data
  • Defined by W3C
  • Set of emerging technologies
  • XLink, XPointer, XSchema, DOM, SAX, XPath,
    XQuery,

10
XML
  • XML gives a syntax, not a semantic
  • XML defines the structure of a document, not how
    it is processed
  • Separate structural information from format
    instructions

11
XML Example
  • ltbibliographygt
  • ltbookgt lttitlegt Foundations lt/titlegt
  • ltauthorgt Abiteboul lt/authorgt
  • ltauthorgt Hull lt/authorgt
  • ltauthorgt Vianu lt/authorgt
  • ltpublishergt Addison Wesley
    lt/publishergt
  • ltyeargt 1995 lt/yeargt
  • lt/bookgt
  • lt/bibliographygt

12
XML Terminology
  • Tags book, title, author,
  • Start tag ltbookgt
  • End Tag lt/bookgt
  • Elements are nested
  • Empty Element
  • ltreviewsgtlt/reviewsgt gt ltreviews/gt
  • XML Document single root element
  • XML Document is well formed matching tags

13
XML Attributes
  • Attributes are ltname, valuegt pairs that
    characterize an element.
  • ltbook price 55 currency USDgt
  • lttitlegt Foundations of Databases lt/titlegt
  • ltauthorgt Abiteboul lt/authorgt
  • ltyeargt 1995 lt/yeargt
  • lt/bookgt
  • Can define oid, but they are just syntax

14
More XML
  • Text can be CDATA or PCDATA
  • Entity References amp, gtgt,
  • Processing Instructions lt?blink?gt
  • Comments lt!-- comment text --gt

15
Well Formed XML Documents
  • Elements must be properly nested
  • ltbookgtlttitlegt Foundations of Databases
    lt/titlegtlt/bookgt
  • But Not
  • ltbookgtlttitlegt Foundations of Databases
    lt/bookgtlt/titlegt
  • There must be a unique root element
  • Elements can be of
  • element content
  • or mixed content
  • lttitlegtThis is ltbgtMixedlt/bgtContentlt/titlegt

16
XML Potential
  • Flexible enough to represent anything
  • Stock market, DNA, Music, Chemicals
  • Weather information
  • Wireless network configuration
  • Enables easy information exchange
  • Between companies
  • Within companies
  • Standard everybody uses the same technology

17
XML Limitations
  • XML is only a syntax for documents
  • We need tools!
  • Editors and parsers
  • Programming APIs (for Java, C, etc.)
  • Languages to manipulate XML (how many books?)
  • Schemas (What is a book like?)
  • Storage (What if you have a lot of XML?)
  • Transfer protocols (How do you exchange it?)
  • What about XML in Chinese?
  • How can XML fit into my phone?
  • Query processing?

18
XML Schema Language
19
DTDs Document Type Descriptors
  • Similar to a schema
  • Grammar describing constraints on document
    structure and content
  • XML Documents can be validated against a DTD

lt!ELEMENT Book (title, author)gtlt!ELEMENT title
PCDATAgtlt!ELEMENT author (name, address,
age?)gtlt!ATTLIST Book id ID REQUIREDgtlt!ATTLIST
Book pub IDREF IMPLIEDgt
20
Shortcomings of DTDs
  • Useful for documents, but not so good for data
  • No support for structural re-use
  • Object-oriented-like structures arent supported
  • No support for data types
  • Cant do data validation
  • Can have a single key item (ID), but
  • No support for multi-attribute keys
  • No support for foreign keys (references to other
    keys)
  • No constraints on IDREFs (reference only a
    Section)

21
XSchema
  • In XML format
  • Includes primitive data types (integers, strings,
    dates,)
  • Supports value-based constraints (integers gt 100)
  • Inheritance
  • Foreign keys

22
Example of XSchema
  • ltschema version1.0 xmlnshttp//www.w3.org/199
    9/XMLSchemagt
  • ltelement nameauthor typestring /gt
  • ltelement namedate type date /gt
  • ltelement nameabstractgt
  • lttypegt
  • lt/typegt
  • lt/elementgt
  • ltelement namepapergt
  • lttypegt
  • ltattribute namekeywords typestring/gt
  • ltelement refauthor minOccurs0
    maxOccurs /gt
  • ltelement refdate /gt
  • ltelement refabstract minOccurs0
    maxOccurs1 /gt
  • ltelement refbody /gt
  • lt/typegt
  • lt/elementgt
  • lt/schemagt

23
XML Storage
24
Storing XML Data
  • Different approaches
  • Storing as text
  • Using RDBMS
  • Using a native system
  • Tailored for XML, (NATIX, Tamino, Ipedo, etc.)
  • Performance of the various approaches
  • depends on your application

25
Storing XML as Text
  • Simple
  • Easy to compress
  • No updates
  • Need to parse the document every time it is needed

26
Storing XML in RDBMS
  • Uses existing RDBMS techniques
  • Costly in space, takes time to reconstruct
    original document
  • Example techniques
  • Schema with 2 relations tag and value
  • Schema with n relations 1 per element name

27
Accessing and Querying XML Data
28
XML as a Tree DOM
  • DOM Document Object Model
  • Class hierarchy serving as an API to XML trees
  • Methods of those classes can be used to
    manipulate XML (e.g., Nodechild, Nodename)
  • Can be used from Java, C to develop XML
    applications.
  • Each node has an identity (i.e., a unique
    identifier) in the whole document

29
XML as a DOM Tree
  • Class hierarchy(node, element attribute)

bibliography
book
book
title
author
publisher
year
author
author
Foundations of Databases
Abiteboul
Hull
Vianu
Addison Wesley
1995
30
XML as a Stream SAX
  • XML document event stream. E.g.,
  • Opening tag book
  • Opening tag title
  • Text Foundations of databases
  • Closing tag title
  • Opening tag author
  • Etc.
  • SAX allow you to associate actions with those
    events to build applications
  • Very efficient since it corresponds to events
    during parsing, but not always sufficient.

31
XPath
  • Language for navigating in an XML document (seen
    as a tree)
  • One root node
  • types of nodes root, element, text, attribute,
    comment,
  • XPath expression defines navigation in the tree
    following axis child, descendant, parent,
    ancestor,

32
XPath Examples
  • Find all the titles of all the books
  • //book/title
  • Find the title of all books written by Charles
    Dickens
  • //bookauthorCharles Dickens/title
  • Find the title of the first section in the
    second chapter in Great Expectations
  • //booktitleGreat Expectations/chapter2/sect
    ion1/title
  • Find the title of all sections that come after
    the second chapter in Great Expectations
  • //booktitleGreat Expectations/chapter2/foll
    owingsection/title

33
Querying XML Data
  • Need for a language to query XML data
  • Should yield XML output
  • Should support standard query operations
  • No schema required
  • Several work on an XML query language XML-QL,
    XQuery,..

34
XQuery
  • XPath included in XQuery
  • FLWR expressions for let where return

FOR x IN document("bib.xml")/bib/book WHERE
x/year gt 1995 RETURN x/title
Result lttitlegt abc lt/titlegt lttitlegt def
lt/titlegt lttitlegt ghi lt/titlegt
35
How to process XML Queries?
  • Use indexes
  • Need to identify nodes
  • Need to know relations between nodes
  • Labeling Schemes
  • Dewey encoding
  • Prefix-Postfix encoding
  • Twigstack

36
Web Services
37
What are Web Services
  • Programming interfaces for application to
    application communication on the Web
  • platform-independent,
  • language-independent
  • object model-independent
  • Possibility to activate methods on remote web
    servers (RPC)
  • 2 main applications
  • E-commerce
  • Access to remote data

38
XML and Web Services
  • Exchange of information between application is in
    XML
  • Input and Result
  • Use of SOAP to generate messages
  • Descriptions of the web service functionality
    given in XML, according to the WSDL schema

Web Services standards use XML heavily
39
Conclusions
  • XML a very active area
  • Many research directions
  • Many applications
  • Standards not finalized yet
  • XQuery
  • XML Schema
  • Web Services

40
Some Important XML Standards
  • XSL/XSLT presentation and transformation
    standards
  • RDF resource description framework (meta-info
    such as ratings, categorizations, etc.)
  • XPath/XPointer/XLink standard for linking to
    documents and elements within
  • Namespaces for resolving name clashes
  • DOM Document Object Model for manipulating XML
    documents
  • SAX Simple API for XML parsing

41
References
  • XML
  • http//www.w3.org/XML/
  • Sudarshan S. Chawathe Describing and
    Manipulating XML Data. IEEE Data Engineering
    Bulletin 22(3)(1999)
  • XML Standards
  • http//www.w3.org/ (XSL, XPath, XSchema, DOM)
  • Storing XML Data
  • Daniela Florescu, Donald Kossmann Storing and
    Querying XML Data using an RDMBS. IEEE Data
    Engineering Bulletin 22(3)(1999)
  • Hartmut Liefke, Dan Suciu XMILL An Efficient
    Compressor for XML Data. SIGMOD Conference 2000
  • XQuery
  • http//www.w3.org/TR/xquery/
  • Peter Fankhauser XQuery Formal Semantics State
    and Challenges. SIGMOD Record 30(3)(2001)
  • Web Services
  • http//www.w3.org/2002/ws/
Write a Comment
User Comments (0)
About PowerShow.com