Database Application Design - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

Database Application Design

Description:

WHERE BOOK $b BOOK IN 'www.booklist.com/books.xml', AUTHOR $n /AUTHOR ... SI 760 (1) Classes 1-3 Introduction to the course and linguistic background ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 41
Provided by: dragomi3
Category:

less

Transcript and Presenter's Notes

Title: Database Application Design


1
Database Application Design
March 24, 2000
  • Handout 11

2
Course information
  • Instructor Dragomir R. Radev (radev_at_si.umich.edu)
  • Office 305A, West Hall
  • Phone (734) 615-5225
  • Office hours Thursdays 3-4 and Fridays 1-2
  • Course page http//www.si.umich.edu/radev/654w00
  • Class meets on Fridays, 230 - 530 PM, 311 WH

3
Web-based databases
4
Types of databases
  • Textual databases
  • Semi-structured databases

5
Indexing textual data
  • Inverted files
  • Boolean queries
  • Signature files
  • Signature S1 matches signature S2 if S2S1S2

6
XML-QL
7
XML-QL
Two slides from Johannes Gehrke, Cornell
University
ltIMG SRCxysq.gif ALT(xy)2gt
ltapplygt ltpower/gt ltapplygt ltplus/gt ltcigtxlt/cigt
ltcigtylt/cigt lt/applygt ltcngt2lt/cngt lt/applygt
WHERE ltBOOKgt ltNAMEgtltLASTgt1lt/LASTgtlt/NAME
gt lt/BOOKgt in www.booklist.com/books.xml CONSTR
UCT ltRESULTgt 1 lt/RESULTgt
8
XML-QL (continued)
WHERE ltBOOKgt b ltBOOKgt IN www.booklist.com/boo
ks.xml, ltAUTHORgt n lt/AUTHORgt ltPUBLISHEDgt
p lt/PUBLISHEDgt in e CONSTRUCT ltRESULTgt
ltPUBLISHEDgt p lt/PUBLISHEDgt WHERE
ltLASTgt l lt/LASTgt IN n CONSTRUCT ltLASTgt
l lt/LASTgt lt/RESULTgt
9
XML-QL (continued)
  • lt!ELEMENT book (author, title, publisher)gt
  • lt!ATTLIST book year CDATAgt
  • lt!ELEMENT article (author, title, year?,
    (shortversionlongversion))gt
  • lt!ATTLIST article type CDATAgt
  • lt!ELEMENT publisher (name, address)gt
  • lt!ELEMENT author (firstname?, lastname)gt

10
XML-QL (continued)
  • WHERE ltbookgt
  • ltpublishergtltnamegtAddison-Wesleylt/namegtlt/publisher
    gt
  • lttitlegt tlt/titlegt
  • ltauthorgt alt/authorgt
  • lt/bookgt IN "www.a.b.c/bib.xml"
  • CONSTRUCT a

11
XML-QL (continued)
  • WHERE ltbookgt
  • ltpublishergtltnamegtAddison-Wesleylt/gtlt/gt
  • lttitlegt tlt/gt
  • ltauthorgt alt/gt
  • lt/gt IN "www.a.b.c/bib.xml"
  • CONSTRUCT a

12
XML-QL (continued)
  • WHERE ltbookgt
  • ltpublishergtltnamegtAddison-Wesleylt/gtlt/gt
  • lttitlegt tlt/gt
  • ltauthorgt alt/gt
  • lt/gt IN "www.a.b.c/bib.xml"
  • CONSTRUCT ltresultgt
  • ltauthorgt alt/gt
  • lttitlegt tlt/gt
  • lt/gt

13
XML-QL (continued)
  • ltbibgt
  • ltbook year"1995"gt
  • lt!-- A good introductory text --gt
  • lttitlegt An Introduction to Database
    Systems lt/titlegt
  • ltauthorgt ltlastnamegt Date lt/lastnamegt
    lt/authorgt
  • ltpublishergt ltnamegt Addison-Wesley lt/name gt
    lt/publishergt
  • lt/bookgt
  • ltbook year"1998"gt
  • lttitlegt Foundation for Object/Relational
    Databases The Third Manifesto lt/titlegt
  • ltauthorgt ltlastnamegt Date lt/lastnamegt
    lt/authorgt
  • ltauthorgt ltlastnamegt Darwen lt/lastnamegt
    lt/authorgt
  • ltpublishergt ltnamegt Addison-Wesley lt/name gt
    lt/publishergt
  • lt/bookgt
  • lt/bibgt

14
XML-QL (continued)
  • ltresultgt
  • ltauthorgt ltlastnamegt Date lt/lastnamegt
    lt/authorgt
  • lttitlegt An Introduction to Database Systems
    lt/titlegt
  • lt/resultgt
  • ltresultgt
  • ltauthorgt ltlastnamegt Date lt/lastnamegt
    lt/authorgt
  • lttitlegt Foundation for Object/Relational
    Databases The Third Manifesto lt/titlegt
  • lt/resultgt
  • ltresultgt
  • ltauthorgt ltlastnamegt Darwen lt/lastnamegt
    lt/authorgt
  • lttitlegt Foundation for Object/Relational
    Databases The Third Manifesto lt/titlegt
  • lt/resultgt

15
XML-QL (continued)
  • WHERE ltbook gt plt/gt IN "www.a.b.c/bib.xml",
  • lttitle gt tlt/gt,
  • ltpublishergtltnamegtAddison-Wesleylt/gtgt IN p
  • CONSTRUCT ltresultgt
  • lttitlegt t lt/gt
  • WHERE ltauthorgt a lt/gt IN p
  • CONSTRUCT ltauthorgt alt/gt
  • lt/gt

16
XML-QL (continued)
  • ltresultgt
  • lttitlegt An Introduction to Database Systems
    lt/titlegt
  • ltauthorgt ltlastnamegt Date lt/lastnamegt
    lt/authorgt
  • lt/resultgt
  • ltresultgt
  • lttitlegt Foundation for Object/Relational
    Databases The Third Manifesto lt/titlegt
  • ltauthorgt ltlastnamegt Date lt/lastnamegt
    lt/authorgt
  • ltauthorgt ltlastnamegt Darwen lt/lastnamegt
    lt/authorgt
  • lt/resultgt

17
XML-QL (continued)
  • WHERE ltarticlegt
  • ltauthorgt
  • ltfirstnamegt f lt/gt // firstname f
  • ltlastnamegt l lt/gt // lastname l
  • lt/gt
  • lt/gt CONTENT_AS a IN "www.a.b.c/bib.xml"
  • ltbook yearygt
  • ltauthorgt
  • ltfirstnamegt f lt/gt // join on same
    firstname f
  • ltlastnamegt l lt/gt // join on same
    lastname l
  • lt/gt
  • lt/gt IN "www.a.b.c/bib.xml",
  • y gt 1995
  • CONSTRUCT ltarticlegt a lt/gt

18
XML-QL (continued)
19
XML-QL (continued)
  • lt!ATTLIST person ID ID REQUIREDgt
  • lt!ATTLIST article author IDREFS IMPLIEDgt

20
XML-QL (continued)
  • ltperson ID"o123"gt
  • ltfirstnamegtJohnlt/firstnamegt
  • ltlastnamegtSmithltlastnamegt
  • lt/persongt
  • ltperson ID"o234"gt
  • . . .
  • lt/persongt
  • ltarticle author"o123 o234"gt
  • lttitlegt ... lt/titlegt
  • ltyeargt 1995 lt/yeargt
  • lt/articlegt

21
XML-QL (continued)
22
XML-QL (continued)
  • WHERE ltarticlegtltauthorgtltlastnamegt nlt/gtlt/gtlt/gt IN
    "abc.xml

WHERE ltarticle authorigt lttitlegt lt/gt
ELEMENT_AS t lt/gt, ltperson IDigt
ltlastnamegt lt/gt ELEMENT_AS l
lt/gt CONSTRUCT ltresultgt t llt/gt
23
Scalar values
  • lttitlegtA Trip to lttitlepartgt the Moon
    lt/titlepartgtlt/titlegt NOT!

lttitlegtltCDATAgt A Trip to lt/CDATAgtlttitlepartgtltCDATA
gt the Moonlt/CDATAgtlt/titlepartgtlt/titlegt
YES
24
Tag variables
  • WHERE ltpgt
  • lttitlegt t lt/titlegt
  • ltyeargt1995lt/gt
  • ltegt Smith lt/gt
  • lt/gt IN "www.a.b.c/bib.xml",
  • e IN author, editor
  • CONSTRUCT ltpgt
  • lttitlegt t lt/titlegt
  • ltegt Smith lt/gt
  • lt/gt

25
Transforming data
  • lt!ELEMENT book (author, title, publisher)gt
  • lt!ATTLIST book year CDATAgt
  • lt!ELEMENT article (author, title, year?,
    (shortversionlongversion))gt
  • lt!ATTLIST article type CDATAgt
  • lt!ELEMENT publisher (name, address)gt
  • lt!ELEMENT author (firstname?, lastname)gt

lt!ELEMENT person (lastname, firstname,
address?, phone?, publicationtitle)gt
26
Transforming data (contd)
  • WHERE ltgt ltauthorgt ltfirstnamegt fn lt/gt
  • ltlastnamegt ln lt/gt
  • lt/gt
  • lttitlegt t lt/gt
  • lt/gt IN "www.a.b.c/bib.xml",
  • CONSTRUCT ltperson IDPersonID(fn, ln)gt
  • ltfirstnamegt fn lt/gt
  • ltlastnamegt ln lt/gt
  • ltpublicationtitlegt t lt/gt
  • lt/gt

27
Integrating data from different sources
  • WHERE ltpersongt
  • ltnamegtlt/gt ELEMENT_AS n
  • ltssngt ssnlt/gt
  • lt/gt IN "www.a.b.c/data.xml",
  • lttaxpayergt
  • ltssngt ssnlt/gt
  • ltincomegtlt/gt ELEMENT_AS i
  • lt/gt IN "www.irs.gov/taxpayers.xml"
  • CONSTRUCT ltresultgt n i lt/gt

28
Query blocks
  • WHERE ltegt lttitlegt t lt/gt
  • ltyeargt 1995 lt/gt lt/gt CONTENT_A p
  • IN "www.a.b.c/bib.xml"
  • CONSTRUCT ltresult IDResultID(p)gt lttitlegt t
    lt/gt lt/gt
  • WHERE e "journal-paper",
  • ltmonthgt m lt/gt IN p
  • CONSTRUCT ltresult IDResultID(p)gt ltmonthgt
    m lt/gt lt/gt
  • WHERE e "book",
  • ltpublishergtq lt/gt IN p
  • CONSTRUCT ltresult IDResultID(p)gt
    ltpublishergtq lt/gt lt/gt

29
WSQ
30
Web-supported queries
  • SIGMOD2000 (Goldman and Widom)
  • WebPages (SearchExp,T1,T2,,Tn,URL,Rank, Date)

SELECT NAME, COUNTFROM STATES, WEBCOUNTWHERE
NAME T1ORDER BY COUNT DESC
31
XHTML
32
Simple example
  • lt?xml version"1.0" encoding"UTF-8"?gt
  • lt!DOCTYPE html
  • PUBLIC "-//W3C//DTD XHTML
    1.0 Strict//EN"
  • "DTD/xhtml1-strict.dtd"gt
  • lthtml xmlns"http//www.w3.org/199
    9/xhtml" xmllang"en" lang"en"gt
  • ltheadgt
  • lttitlegtVirtual
    Librarylt/titlegt
  • lt/headgt
  • ltbodygt
  • ltpgtMoved to lta
    href"http//vlib.org/"gtvlib.orglt/agt.lt/pgt
  • lt/bodygt
  • lt/htmlgt

33
SI 760Language and information(Fall 2000)
34
SI 760 (1)
  • Classes 1-3 Introduction to the course and
    linguistic background
  • The study of language. Computational Linguistics
    and Psycholinguistics.
  • Classes 4-5 Elementary probability and statistics
  • Describing data. Measures of central tendency.
    The z score. Hypothesis testing.
  • Classes 6-8 Information theory
  • Entropy, joint entropy, conditional entropy.
    Relative entropy and mutual information. Chain
    rules.
  • Classes 9-10 Data compression and coding
  • Entropy rate. Language modeling. Examples of
    codes. Optimal codes. Huffman codes. Arithmetic
    coding. The entropy of English.

35
SI 760 (2)
  • Classes 11-12 Clustering
  • Cluster analysis. Clustering of terms according
    to semantic similarity. Distributional
    clustering.
  • Classes 13-14 Concordancing and collocations
  • Concordances. Collocations. Syntactic criteria
    for collocability.
  • Classes 15-16 Literary detective work
  • The statistical analysis of writing style.
    Decipherment and translation.
  • Classes 17-18 Information extraction
  • Message understanding. Trainable methods.

36
SI 760 (3)
  • Classes 19-20 Word sense disambiguation and
    lexical acquisition
  • Supervised disambiguation. Unsupervised
    disambiguation. Attachment ambiguity.
    Computational lexicography.
  • Classes 21-22 Part-of-speech tagging
  • Statistical taggers. Transformation-based
    learning of tags. Maximum entropy models.
    Weighted finite- state transducers.
  • Classes 23-24 Question answering
  • Semantic representation. Predictive annotation.

37
SI 760 (4)
  • Classes 25-26 Text summarization
  • Single-document summarization. Multi-document
    summarization. Language models. Maximal Marginal
    Relevance. Cross-document structure theory.
    Trainable methods. Text categorization.
  • Classes 27-28 (30) Other topics
  • Text alignment. Word alignment. Statistical
    machine translation. Discourse segmentation. Text
    categorization. Maximum entropy modeling.

38
SI 760 (5)
  • Manning and Schuetze. Foundations of Statistical
    Natural Language Processing. MIT Press. 1999.
  • Jurafsky and Martin. Speech and Language
    Processing. Prentice-Hall 2000.
  • Cover Thomas. Elements of Information Theory.
    John Wiley and Sons 1991.
  • Baeza-Yates and Ribeiro-Neto. Modern Information
    Retrieval. Addison-Wesley 1999.
  • Oakes. Statistics for Corpus Linguistics.
    Edinburgh University Press 1998.

39
Course URL
  • http//www.si.umich.edu/radev/760f00

40
Readings for next time
  • Web-based readings
  • Asilomar report
  • http//www.acm.org/sigmod/record/issues/9812/asilo
    mar.html
  • White paper on XML
  • http//www-db.stanford.edu/widom/xml-whitepaper.h
    tml
Write a Comment
User Comments (0)
About PowerShow.com