Database Management Systems Session 10 - PowerPoint PPT Presentation

About This Presentation
Title:

Database Management Systems Session 10

Description:

And now iPod has turned the world topsy-turvy once again with video, letting you ... Imagine: With iPod, you can play the ... He invented XML with Jon Bosak. ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 42
Provided by: vincen85
Learn more at: https://cs.hofstra.edu
Category:

less

Transcript and Presenter's Notes

Title: Database Management Systems Session 10


1
Database Management Systems Session 10
  • Instructor Vinnie Costavcosta_at_optonline.net

2
Making A Difference
  • Apple Advertisement, 10/13 Its unfolded
    before your eyes. The revolution that is iPod
    first took the music scene by storm. Further
    spiced things up with full-color photos. Added a
    full complement of podcasts to the mix. And now
    iPod has turned the world topsy-turvy once again
    with video, letting you carry up to 150 hours of
    video wherever you go. Imagine With iPod, you
    can play the DJ one minute. Rock with the latest
    Madonna or U2 music videos the next. Then get
    lost with Lostor any of the other TV shows or
    short films now available for purchase and
    download from the iTunes Music Store.
  • The Long Tail is becoming reality!!!

3
Tim Bray - Coinventor of XML
  • For more than 20 years, Tim Bray has been
    tackling projects as deep as the English Language
    (computerized Oxford English Dictionary, 1987),
    as wide as the Web (one of the first Internet
    search engines, 1994), and as tall as the meaning
    of data (XML, 1996). He invented XML with Jon
    Bosak.
  • XML is used for banking transactions, for
    interchanging prices in condo developments and
    for exporting data from iTunes, he points out.
    None of those things were remotely on our minds
    when we were building it.
  • http//en.wikipedia.org/wiki/Tim_Bray
  • http//www.tbray.org/ongoing/

4
Introduction to Semistructured Data and XML
  • Chapter 27

5
How the Web is Today
  • HTML documents
  • often generated by applications
  • consumed by humans only
  • easy access across platforms, across
    organizations
  • No application interoperability
  • HTML not understood by applications
  • screen scraping brittle
  • Database technology client-server
  • still vendor specific

6
New Universal Data Exchange Format XML
  • A recommendation from the W3C
  • XML data
  • XML generated by applications
  • XML consumed by applications
  • Easy access across platforms, organizations

7
Paradigm Shift on the Web
  • From documents (HTML) to data (XML)
  • From information retrieval to data management
  • For databases, also a paradigm shift
  • from relational model to semistructured data
  • from data processing to data/query translation
  • from storage to transport

8
Semistructured Data
  • Origins
  • Integration of heterogeneous sources
  • Data sources with non-rigid structure
  • Biological data
  • Web data

9
The Semistructured Data Model
Bib
Object Exchange Model (OEM)
1
complex object
paper
paper
book
references
12
24
29
references
references
author
page
author
year
author
title
http
title
title
publisher
author
author
author
43
25
96
1997
last
firstname
firstname
lastname
first
lastname
243
206
Serge
Abiteboul
Victor
122
133
Vianu
atomic object
10
Syntax for Semistructured Data
  • Bib 1 paper 12 ,
  • book 24 ,
  • paper 29
  • author 52
    Abiteboul,
  • author 96
    firstname 243 Victor,

  • lastname 206 Vianu,
  • title 93 Regular
    path queries with constraints,
  • references 12,
  • references 24,
  • pages 25 first 64
    122, last 92 133

11
Syntax for Semistructured Data
  • May omit oids
  • paper author Abiteboul,
  • author firstname Victor,
  • lastname
    Vianu,
  • title Regular path queries
    ,
  • page first 122, last 133

12
Characteristics of Semistructured Data
  • Missing or additional attributes
  • Multiple attributes
  • Different types in different objects
  • Heterogeneous collections

Self-describing, irregular data, no a priori
structure
13
Comparison with Relational Data
  • row name John, phone 3634 ,
  • row name Sue, phone 6343 ,
  • row name Dick, phone 6363

14
XML
  • A W3C standard to complement HTML
  • Origins Structured text SGML
  • Large-scale electronic publishing
  • Data exchange on the web
  • Motivation
  • HTML describes presentation
  • XML describes content
  • http//www.w3.org/TR/2000/REC-xml-20001006
    (version 2, 10/2000)

15
From HTML to XML
HTML describes the presentation
16
HTML
  • Bibliography
  • Foundations of Databases
  • Abiteboul, Hull, Vianu

  • Addison Wesley, 1995
  • Data on the Web
  • Abiteboul, Buneman, Suciu

  • Morgan Kaufmann, 1999

17
XML
  • Foundations
  • Abiteboul
  • Hull
  • Vianu
  • Addison Wesley
  • 1995

XML describes the content
18
Why are we DBers interested?
  • Its data, stupid. Thats us.
  • Proof by Google
  • databaseXML 1,940,000 pages.
  • Database issues
  • How are we going to model XML? (graphs).
  • How are we going to query XML? (XQuery)
  • How are we going to store XML (in a relational
    database? object-oriented? native?)
  • How are we going to process XML efficiently?
    (many interesting research questions!)

19
Document Type Descriptors
  • Sort of like a schema but not really.
  • Inherited from SGML DTD standard
  • BNF grammar establishing constraints on element
    structure and content
  • Definitions of entities

20
Shortcomings of DTDs
  • Useful for documents, but not so good for data
  • Element name and type are associated globally
  • No support for structural re-use
  • Object-oriented-like structures arent supported
  • No support for data types
  • Cant do data validation
  • Can have a single key item (ID), but
  • No support for multi-attribute keys
  • No support for foreign keys (references to other
    keys)
  • No constraints on IDREFs (reference only a
    Section)

21
XML Schema
  • In XML format
  • Element names and types associated locally
  • Includes primitive data types (integers, strings,
    dates, etc.)
  • Supports value-based constraints (integers 100)
  • User-definable structured types
  • Inheritance (extension or restriction)
  • Foreign keys
  • Element-type reference constraints

22
Sample XML Schema
  • 9/XMLSchema
  • maxOccurs /
  • maxOccurs1 /

23
Important XML Standards
  • XSL/XSLT presentation and transformation
    standards
  • RDF resource description framework (meta-info
    such as ratings, categorizations, etc.)
  • Xpath/Xpointer/Xlink standard for linking to
    documents and elements within
  • Namespaces for resolving name clashes
  • DOM Document Object Model for manipulating XML
    documents
  • SAX Simple API for XML parsing
  • XQuery query language

24
XML Data Model (Graph)
  • Issues
  • Distinguish between attributes and
    sub-elements?
  • Should we conserve order?

25
XML Terminology
  • Tags book, title, author,
  • start tag , end tag
  • Elements ,
  • elements can be nested
  • empty element (Can be abbrv.
    )
  • XML document Has a single root element
  • Well-formed XML document Has matching tags
  • Valid XML document conforms to a schema

26
More XML Attributes
  • Foundations of Databases
  • Abiteboul
  • 1995

Attributes are alternative ways to represent data
27
More XML Oids and References
  • Jane
  • Mary
  • idrefo123 o555/
  • John

oids and references in XML are just syntax
28
XQuery
  • Summary
  • FOR-LET-WHERE-ORDERBY-RETURN FLWOR

FOR/LET Clauses
List of tuples
WHERE Clause
List of tuples
ORDERBY/RETURN Clause
Instance of Xquery data model
29
XQuery
  • FOR x in expr -- binds x to each value in the
    list expr
  • LET x expr -- binds x to the entire list
    expr
  • Useful for common subexpressions and for
    aggregations

30
FOR v.s. LET
Returns ...
...
... ...
FOR x IN document("bib.xml")/bib/book RETURN
x
LET x IN document("bib.xml")/bib/book RETURN
x
Returns ...
...
... ...
31
Path Expressions
  • Abbreviated Syntax
  • /bib/paper2/author1
  • /bib//author
  • paperauthor/lastnameVianu"
  • /bib/(paperbook)/title
  • Unabbreviated Syntax
  • childbib/descendantauthor
  • childbib/descendant-or-self/childauthor
  • parent, self, descendant-or-self, attribute

32
XQuery
  • Find all book titles published after 1995

FOR x IN document("bib.xml")/bib/book WHERE
x/year 1995 RETURN x/title
Result abc def
ghi
33
XQuery
  • For each author of a book by Morgan Kaufmann,
    list all books she published

FOR a IN distinct(document("bib.xml")
/bib/bookpublisherMorgan
Kaufmann/author) RETURN
a, FOR t IN
/bib/bookauthora/title
RETURN t
distinct a function that eliminates duplicates
34
XQuery
  • Result
  • Jones
  • abc
  • def
  • Smith
  • ghi

35
XQuery
FOR p IN
distinct(document("bib.xml")//publisher)
LET b document("bib.xml")/bookpublisher
p WHERE count(b) 100 RETURN
p
count a (aggregate) function that returns the
number of elms
36
XQuery
  • Find books whose price is larger than average

LET aavg(document("bib.xml")/bib/book/price) FOR
b in document("bib.xml")/bib/book WHERE
b/price a RETURN b
37
FOR v.s. LET
  • FOR
  • Binds node variables ? iteration
  • LET
  • Binds collection variables ? one value

38
Sorting in XQuery
FOR p IN distinct(document("
bib.xml")//publisher) ORDERBY p RETURN
p/text() ,
FOR b IN document("bib.xml")//bookp
ublisher p ORDERBY
b/price DESCENDING RETURN

b/title ,
b/price


39
If-Then-Else
FOR h IN //holding ORDERBY h/title RETURN
h/title,
IF h/_at_type "Journal"
THEN h/editor
ELSE h/author

40
XML vs. Semistructured Data
  • Both described best by a graph
  • Both are schema-less, self-describing
  • XML is ordered, ssd is not
  • XML can mix text and elements
  • Making Java easier to type and easier
    to type
  • Phil Wadler
  • XML has lots of other stuff attributes,
    entities, processing instructions, comments

41
La commedia e finita'
Good LuckMake A Difference!!!
Write a Comment
User Comments (0)
About PowerShow.com