Models and languages for semistructured data - PowerPoint PPT Presentation

About This Presentation
Title:

Models and languages for semistructured data

Description:

Models and languages for semistructured data Bridging documents and databases Lectures 1. Introduction to data models 2. Query languages for relational databases 3. – PowerPoint PPT presentation

Number of Views:61
Avg rating:3.0/5.0
Slides: 49
Provided by: pajo
Category:

less

Transcript and Presenter's Notes

Title: Models and languages for semistructured data


1
Models and languages forsemistructured data
  • Bridging documents and databases

2
Lectures
  • 1. Introduction to data models
  • 2. Query languages for relational databases
  • 3. Models and query languages for object
    databases
  • 4. Embedded query languages
  • 5. Models and query languages for semistructured
    data, XML
  • 6. Semantic Web, introduction
  • 7. Semantic Web, continued

3
Why do we like types?
  • Types facilitate understanding
  • Types enable compact representations
  • Types enable query optimisation
  • Types facilitate consistency enforcement

4
Background assumptions fortyped data
  • Data stable over time
  • Organisational body to control data
  • Exercise Give an example of a context where
    these assumptions do not hold

5
Semistructured data
Semistructured data is schemaless and self
describing The data and the description of the
data are integrated
6
An example
name first John, last Smith, tel
112233, email john_at_123.edu
7
Another example
person o1name Eva, age 40, child
o2, person o2name Abel, age 20
An object identifier, such as o1, before a
structure, binds the object identifier to the
identity of that structure. The object identifier
can then be used to refer to the structure.
8
Terminology
  • The following is an ssd-expression
  • o1name Eva, age 40, child o2

9
A database
author
Crick
DNA spiral
author
n1
Wallace
1956
paper
title
date
Origin
1848
Darwin
author
biblio
book
n2
db
title
date
book
Kapital
1860
Marx
author
.
n3
title
date
10
Path expressions
  • A path expression is a sequence of labels
  • l1.l2ln
  • A path expression results in a set of nodes
  • Path properties are specified by regular
    expressions on two levels on the alphabet of
    labels and on the alphabet of characters that
    comprise labels

11
A path expression
author
Crick
DNA spiral
biblio.book.author
author
n1
Wallace
1956
paper
title
date
Origin
1848
Darwin
author
biblio
book
n2
db
title
date
book
Kapital
1860
Marx
author
.
n3
title
date
12
A path expression
author
Crick
DNA spiral
biblio.(book l paper).author
author
n1
Wallace
1956
paper
title
date
Origin
1848
Darwin
author
biblio
book
n2
db
title
date
book
Kapital
1860
Marx
author
.
n3
title
date
13
Examples of path expressions
  • biblio.book.author - authors of books
  • biblio.paper.author - authors of papers
  • biblio.(book l paper).author - authors of books
    or papers
  • biblio._.author - authors of anything
  • biblio._.author - nodes at the ends of paths
    starting with biblio, ending with author, and
    having an arbitrary sequence of labels between

14
Example of a label pattern
  • ((b l B)ook l (a l A)uthor) (s)? - book, Book,
    author, Author, books, Books, authors, Authors

15
An exercise
  • biblio._.author.(s l Section)
  • Which ones of the following paths match the path
    expression above?
  • 1. Biblio.author.Section
  • 2. Biblio.cat.rat.hat.author.section
  • 3. Biblio.author
  • 4. Biblio.cat.author.section.Section

16
A simple query
  • Select author X
  • from biblio.book.author X
  • Result
  • author Darwin, author Marx

17
A query with a condition
  • select row X
  • from biblio._ X
  • where Crick in X.author
  • Result
  • row author Crick,
  • author Wallace,
  • date 1956,
  • title The spiral DNA,

18
Two exercises
  • select row title Y, date Z
  • from biblio.paper X, X.title Y, X.date Z
  • select row author Y, date Z
  • from biblio.book X, X.author Y, X.date Z

19
A database
select row title Y, date Z from biblio.paper
X, X.title Y, X.date Z
author
Crick
DNA spiral
author
n1
Wallace
1956
paper
title
date
Origin
1848
Darwin
author
biblio
book
n2
db
title
date
book
Kapital
1860
Marx
author
.
n3
title
date
20
A database
author
Crick
DNA spiral
author
n1
Wallace
1956
paper
title
date
Origin
1848
Darwin
author
biblio
book
n2
db
title
date
book
Kapital
1860
Marx
author
.
n3
title
date
21
Nested queries
  • select row (select author Y
  • from X.author Y)
  • from biblio.book X

22
Three exercises
  • Which authors have written a book or a paper in
    1992?
  • Which authors have written a book together with
    Jones?
  • Which authors have written both a book and a
    paper?

23
Expressing relations
r1
r2
a b c
b d e
1 2 3
1 1 3
3 2 2
3 4 2
4 3 1
2 3 1
r1 row a 1, b2, c2, row a
1, b2, c2, row a 1, b2, c2 ,
r2 row b 1, d2, e2, row b
1, d2, e2, row b 1, d2, e2
24
Expressing relational joins
  • select a A, d D
  • from r1.row X
  • r2.row Y
  • X.a A, X.b B, Y.b B, Y.d D
  • where B B

25
Label variables
Label variable
  • select L X
  • from biblio._.L X
  • where matches(.Shakespeare., X)

Macbeth
1622
Shakespeare
author
biblio
book
n2
db
title
date
book
Best of Shakespeare
1992
Smith
author
.
n3
title
date
26
Label variables
  • select L X
  • from biblio._.L X
  • where matches(.Shakespeare., X)
  • author Shakespeare,
  • title Best of Shakespeare

27
Turning labels into data
  • select publ type L, author A
  • from biblio.L X, X.author A

publ type paper, author Crick, publ
type paper, author Wallace, publ type
book, author Darwin
28
An exercise
  • List all publications in 1992, their types, and
    titles.

29
Basic XML syntax
  • XML is a textual representation of data
  • An element is a text bounded by tags
  • ltnamegt John lt/namegt

ltnamegt lt/namegt can be abbreviated as ltname/gt
30
Basic XML syntax
  • Elements may contain subelements
  • ltpersongt
  • ltnamegt John lt/namegt
  • lttelgt 112233 lt/telgt
  • ltemailgt john_at_123.edu lt/emailgt
  • lt/persongt

31
XML attributes
  • An attribute is defined by a name-value pair
    within a tag
  • ltprice currency dollargt 500 lt/pricegt
  • ltlength unit cmgt 25 lt/lengthgt

32
XML attributes and elements
  • ltproductgt
  • ltnamegt widget lt/namegt
  • ltpricegt 10 lt/pricegt
  • lt/productgt
  • ltproduct price 10gt
  • ltnamegt widget lt/namegt
  • lt/productgt
  • ltproduct name widget price 10/gt

33
XML and ssd-expressions
  • ltpersongt
  • ltnamegt John lt/namegt
  • lttelgt 112233 lt/telgt
  • ltemailgt john_at_123.edu lt/emailgt
  • lt/persongt

person name John, tel 112233, email
john_at_123.edu
34
XML references
  • ltperson id p1gt
  • ltnamegt John lt/namegt
  • lttelgt 112233 lt/telgt
  • lt/persongt
  • ltperson id p2gt
  • ltnamegt Peter lt/namegt
  • lttelgt 998877 lt/telgt
  • ltboss idref p1/gt
  • lt/persongt

35
Document Type Definitions
  • lt!DOCTYPE db
  • lt!ELEMENT db (person)gt
  • lt!ELEMENT person (name, age, email)gt
  • lt!ELEMENT name (PCDATA)gt
  • lt!ELEMENT age (PCDATA)gt
  • lt!ELEMENT email (PCDATA)gt
  • gt

36
An exercise on DTDs as schemas
  • ltdbgt ltr1gt ltagt a1 lt/agt ltbgt b1 lt/bgt lt/r1gt
  • ltr1gt ltagt a2 lt/agt ltbgt b2 lt/bgt lt/r1gt
  • ltr2gt ltcgt a1 lt/cgt ltdgt b1 lt/dgt lt/r1gt
  • ltr2gt ltcgt c2 lt/cgt ltdgt d2 lt/dgt lt/r1gt
  • ltr3gt ltagt a1 lt/agt ltcgt b1 lt/cgt lt/r1gt
  • lt/dbgt
  • Write down a DTD for the data above!

37
Attributes in DTDs
  • ltproductgt
  • ltname language Swedish department musicgt
  • trumpet lt/namegt
  • ltprice currency dollargt 500 lt/pricegt
  • ltlength unit cmgt 25 lt/lengthgt
  • lt/productgt

lt!ATTLIST name language CDATA REQUIRED
department CDATA IMPLIEDgt lt!ATTLIST price
currency CDATA REQUIREDgt lt!ATTLIST length unit
CDATA REQUIREDgt
38
Reference attributes in DTDs
  • lt!DOCTYPE people
  • lt!ELEMENT people (person)gt
  • lt!ELEMENT person (name)gt
  • lt!ELEMENT name (PCDATA)gt
  • lt!ATTLIST person id ID REQUIRED
  • boss IDREF REQUIRED
  • friends IDREFS IMPLIEDgt
  • gt

39
An exercise
  • ltpeoplegt
  • ltpersongt id sven boss ollegt
  • ltnamegt Sven Svensson lt/namegt
  • lt/persongt
  • ltpersongt id olle friends nils evagt
  • ltnamegt Olle Olsson lt/namegt
  • lt/persongt
  • ltpersongt id pelle boss nils evagt
  • ltnamegt Per Persson lt/namegt
  • lt/persongt
  • ltpeoplegt
  • Does this XML element conform to the previous
    DTD?

40
Limitations of DTDs as schemas
  • DTDs impose order
  • No base types
  • The types of IDREFs cannot be constrained

41
XSL - extensible stylesheet language
  • ltbibgt ltbookgt lttitlegt t1 lt/titlegt
  • ltauthorgt a1 lt/authorgt
  • ltauthorgt a2 lt/authorgt
  • lt/bookgt
  • ltpapergt
  • lttitlegt t2 lt/titlegt
  • ltauthorgt a3 lt/authorgt
  • ltauthorgt a4 lt/authorgt
  • lt/papergt
  • ltbookgt lttitlegt t3 lt/titlegt
  • ltauthorgt a5 lt/authorgt
  • ltauthorgt a6 lt/authorgt
  • lt/bookgt
  • lt/bibgt

42
Template rules and XSL patterns
  • ltxsl templategt
  • ltxsl apply-templates/gt
  • lt/xsl templategt
  • ltxsl template match bib//titlegt
  • ltresultgt
  • ltxsl value-of/gt
  • lt/resultgt
  • lt/xsl templategt

ltresultgt t1 lt/resultgt ltresultgt t2
lt/resultgt ltresultgt t3 lt/resultgt
43
Two exercises
  • select row title Y, date Z
  • from biblio.paper X, X.title Y, X.date Z
  • row title The spiral DNA,
  • date 1956,
  • title Origin,
  • date 1848,
  • title Kapital,
  • date 1860
  • select row author Y, date Z
  • from biblio.book X, X.author Y, X.date Z

44
Which authors have written a book or a paper in
1992? select author X from biblio.(book
paper) Y, Y.author X where Y.date 1992
45
Which authors have written a book together with
Jones? select author X from biblio.book Y,
Y.author X where Jones in Y.author
46
Which authors have written both a book and a
paper? select author A from biblio.book B,
biblio.paper P, B.author A where B.author
P.author select author A1 from biblio.book B,
biblio.paper P, B.author A1, P.author A2 where A1
A2
47
List all publications in 1992, their types, and
titles. select publ type L, title T from
biblio.L X, X.title T where X.date 1992
48
  • lt!DOCTYPE db
  • lt!ELEMENT db (r1, r2, r3)gt
  • lt!ELEMENT r1 (a, b)gt
  • lt!ELEMENT r2 (c, d)gt
  • lt!ELEMENT r3 (a, c)gt
  • lt!ELEMENT a (PCDATA)gt
  • lt!ELEMENT b (PCDATA)gt
  • lt!ELEMENT c (PCDATA)gt
  • lt!ELEMENT d (PCDATA)gt
  • gt
  • ltdbgt ltr1gt ltagt a1 lt/agt ltbgt b1 lt/bgt lt/r1gt
  • ltr1gt ltagt a2 lt/agt ltbgt b2 lt/bgt lt/r1gt
  • ltr2gt ltcgt a1 lt/cgt ltdgt b1 lt/dgt lt/r1gt
  • ltr2gt ltcgt c2 lt/cgt ltdgt d2 lt/dgt lt/r1gt
  • ltr3gt ltagt a1 lt/agt ltcgt b1 lt/cgt lt/r1gt
  • lt/dbgt
Write a Comment
User Comments (0)
About PowerShow.com