C20.0046: Database Management Systems Lecture - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

C20.0046: Database Management Systems Lecture

Description:

http://wwwasdoc.web.cern.ch/wwwasdoc/WWW/publications/ictp99/ictp99N8059.ht ml ... Based on Quilt and XML-QL. FOR/LET... WHERE... RETURN... – PowerPoint PPT presentation

Number of Views:110
Avg rating:3.0/5.0
Slides: 48
Provided by: pagesSt
Category:

less

Transcript and Presenter's Notes

Title: C20.0046: Database Management Systems Lecture


1
C20.0046 Database Management SystemsLecture 24
  • Matthew P. Johnson
  • Stern School of Business, NYU
  • Spring, 2004

2
Agenda
  • Previously XML
  • Next
  • Finish XML related technologies
  • Hardware
  • Indices
  • Hw3 up soon
  • 1-minute responses
  • Grading

3
XML Applications/dialects
  • Copy from http//pages.stern.nyu.edu/mjohnson/db
    ms/eg/xml.txt
  • MathML Mathematical Markup Language
  • http//wwwasdoc.web.cern.ch/wwwasdoc/WWW/publicati
    ons/ictp99/ictp99N8059.html
  • ChemML Chemical Markup Language
  • X4ML XML for Merrill Lynch
  • XHMTL HTML retrofitted as an XML application
  • Validation http//pages.stern.nyu.edu/mjohnson/d
    bms/

4
XML Applications/dialects
  • VoiceXML
  • http//newmedia.purchase.edu/Jeanine/interfaces/r
    ps.xml
  • ATT Directory Assistance
  • http//phone.yahoo.com/

5
More XML Apps
  • FIXML
  • XML equiv. of FIX Financial Information eXchange
  • swiftML
  • XML equiv. of SWIFT Society for Worldwide
    Interbank Financial Telecommunications message
    format
  • Apaches Ant
  • Scripting language for Java build management
  • http//ant.apache.org/manual/using.html
  • Many more
  • http//www-106.ibm.com/developerworks/xml/library/
    x-stand4/

6
More XML Applications/Protocols
  • RSS Rich Site Summary/Really Simple Syndication
  • http//slate.msn.com/rss/
  • http//slashdot.org/index.rss
  • Screenshot
  • http//paulboutin.weblogger.com/pictures/viewer67
    3
  • More info http//slate.msn.com/id/2096660/

ltchannelgt lttitlegtmy channellt/titlegt ltitemgt
lttitlegtstory 1lt/titlegt ltlinkgtlt/linkgt lt/itemgt //
other items lt/channelgt
7
More XML Applications/Protocols
  • SOAP Simple Object Access Protocol
  • XML-based messaging format
  • Used by Google API http//www.google.com/apis/
  • Amazon API http//amazon.com/gp/aws/landing.html
  • Amazon light http//kokogiak.com/amazon/
  • Other examples http//www.wired.com/wired/archive
    /12.03/google.html?pg10topictopic_set
  • SOAP envelope with header and body
  • Request sales tax for total

ltSOAPEnvelope xmlnsSOAP"urnschemas-xmlsoap-or
gsoap.v1"gt ltSOAPHeadergtlt/SOAPHeadergt
ltSOAPBodygt ltGetSalesTaxgt
ltSalesTotalgt100lt/SalesTotalgt
ltGetSalesTaxgt lt/SOAPBodygt lt/SOAPEnvelopegt
8
More XML Applications/Protocols
  • lt?xml version"1.0" encoding"UTF-8"?gt
  • ltsoapEnvelope xmlnssoap"http//schemas.xmlsoap.
    org/soap/envelope/"gt
  • ltsoapBodygt
  • ltgsdoGoogleSearch xmlnsgs"urnGoogleSearch"
    gt
  • ltkeygt(key)slt/keygt
  • ltstartgt0lt/startgt
  • ltmaxResultsgt10lt/maxResultsgt
  • ltfiltergttruelt/filtergt
  • ltrestrict/gt
  • ltsafeSearchgtfalselt/safeSearchgt
  • ltlr/gt
  • lt/gsdoGoogleSearchgt
  • lt/soapBodygt
  • lt/soapEnvelopegt

9
RDF
  • RDF Resource Definition Framework
  • Describe info on web
  • Metadata for the web
  • Content, authors, relations to other content
  • Semantic web
  • See http//www.w3.org/DesignIssues/RDFnot.html

10
New topic Querying XML
  • XPath
  • Simple protocol for accessing node
  • Wont discuss
  • XQuery SQL of XML
  • XSLT sophisticated transformations

11
XQuery
  • XQuery FLWR expressions
  • Based on Quilt and XML-QL

FOR/LET... WHERE... RETURN...
FOR b IN document("bib.xml")//book WHERE
b/publisher "Morgan Kaufmann" AND b/year
"1998" RETURN b/title
12
XQuery
  • Find all book titles published after 1995

FOR x IN document("bib.xml")/bib/book WHERE
x/year gt 1995 RETURN x/title
Result lttitlegt abc lt/titlegt lttitlegt def
lt/titlegt lttitlegt ghi lt/titlegt
13
SQL and XQuery Side-by-side
Product(pid, name, maker)Company(cid, name, city)
Find all products made in Seattle
FOR r in document(db.xml)/db, x in
r/Product/row, y in r/Company/rowWHER
E x/maker/text()y/cid/text() and
y/city/text() SeattleRETURN x/name
SELECT x.nameFROM Product x, Company yWHERE
x.makery.cid and y.citySeattle
SQL
XQuery
14
SQL and XQuery Side-by-side
For each company with revenues lt 1M count the
products over 100
SELECT y.name, count()FROM Product x, Company
yWHERE x.price gt 100 and x.makery.cid and
y.revenue lt 1000000GROUP BY y.cid, y.name
FOR r in document(db.xml)/db, y in
r/Company/rowrevenue/text()lt1000000RETURN
ltproudCompanygt ltcompanyNamegt
y/name/text() lt/companyNamegt
ltnumberOfExpensiveProductsgt
count(r/Product/rowmaker/text()y/cid/text()p
rice/text()gt100) lt/numberOfExpensive
Productsgt lt/proudCompanygt
15
XSLT XST Transformations
  • Converts XML docs to other XML docs
  • Or to HTML, PDF, etc.
  • E.g. Have data in XML, want to display to all
    users
  • Users view web with IE, Netscape, Palm
  • Have XSLT convert to HTML that looks good on each
  • XSLT processor takes XML doc and XSL template for
    view

16
Querying XML with XQuery
  • FLWR expressions
  • Often much simpler than XSLT
  • XSLT v. XQuery
  • http//www.xmlportfolio.com/xquery.html

ltxsltransform version"1.0" xmlnsxsl"http//www
.w3.org/1999/XSL/Transform"gt ltxsltemplate
match"/"gt ltxslfor-each select"document('bib
.xml')//book"gt ltxslif test"publisher'Morg
an Kaufmann' and year'1998'"gt
ltxslcopy-of select"title"/gt lt/xslifgt
lt/xslfor-eachgt lt/xsltemplategt lt/xsltransformgt

FOR b IN document("bib.xml")//book WHERE
b/publisher "Morgan Kaufmann" AND b/year
"1998" RETURN b/title
17
Displaying XML with XSL/XSLT
  • XSL style sheet language for XML
  • As CSS is for HTML
  • Menu in XML
  • http//www.w3schools.com/xml/simple.xml
  • XSL file for displaying it
  • http//www.w3schools.com/xml/simple.xsl
  • XSL applied to the XML
  • http//www.w3schools.com/xml/simplexsl.xml
  • More info on Java with XSLT and Xpath
  • http//java.sun.com/webservices/docs/ea2/tutorial/
    doc/JAXPXSLT2.html

18
Why XML matters
  • Hugely popular
  • To millennium what Java was to mid-90s
  • Buzzword compliant
  • XML databases wont likely replace RDBMSs
    (remember OODBMSs?), but
  • Allows for comm. between DBMSs disparate
    architectures, tools, languages, etc.
  • Basis for Web Services
  • DBMS vendors are adding XML support
  • MS, Oracle, et al.

19
For more info
  • APIs SAX, JAXP
  • Editors XML Spy, MS XML Notepad
    http//www.webattack.com/get/xmlnotepad.shtml
  • Parsers Saxon, Xalan, MS XML Parser
  • Lecture drew on resources from
  • Nine-week course on XML
  • http//www.cs.rpi.edu/puninj/XMLJ/classes.html
  • W3C XML Tutorial
  • http//www.w3schools.com/xml/default.asp
  • http//www.cs.cornell.edu/courses/cs433/2001fa/Sli
    des/Xml,20XPath,2020Xslt.ppt

20
Next topic Hardware
  • Types of memory
  • Disks
  • Mergesort/TPMMS

21
What should a DBMS do?
  • Store large amounts of data
  • Process queries efficiently
  • Allow multiple users to access the database
    concurrently and safely.
  • Provide durability of the data.
  • How will we do all this?

22
Lets get physical
Query update
User/ Application
Query compiler/optimizer
Query execution plan
Record, index requests
Transaction commands
Execution engine
Index/record mgr.
  • Transaction manager
  • Concurrency control
  • Logging/recovery

Page commands
Buffer manager
Read/write pages
Storage manager
storage
23
Types of memory
Main Memory Disk
Tape
  • 5-10 MB/S
  • transmission rates
  • 100s GB storage
  • average time to
  • access a block
  • 10-15 msecs.
  • Need to consider
  • seek, rotation,
  • transfer times.
  • Keep records close
  • to each other.
  • 1.5 MB/S transfer rate
  • 280 GB typical
  • capacity
  • Only sequential access
  • Not for operational
  • data
  • Volatile
  • limited address
  • spaces
  • expensive
  • average access
  • time
  • 10-100 ns

Cache access time 10 nanos
24
Main Memory
  • Fastest, most expensive
  • Today O(1 GB) are common on PCs
  • Some databases could fit in memory
  • New industry trend Main Memory Database
  • But many cannot
  • RAM is volatile and small
  • Still need to store on disk

25
Secondary Storage
  • Disks
  • Slower, cheaper than main memory
  • Persistent!
  • Used with a main memory buffer

26
200 worth of disk space
27
Buffer Management in a DBMS
Page Requests from Higher Levels
BUFFER POOL
disk page
free frame
MAIN MEMORY
DISK
choice of frame dictated by replacement policy
  • Data must be in RAM for DBMS to operate on it!
  • Table of ltframe, pageidgt pairs is maintained.
  • LRU is not always good.

28
Buffer Manager
  • Why not just use the OS?
  • DBMS may be able to anticipate access patterns
  • Hence, may also be able to perform prefetching
  • DBMS needs the ability to force pages to disk.

29
Tertiary Storage
  • CDs, DVDs, jukeboxes
  • ROM
  • Tapes, tape silos
  • sequential access
  • Bi but very slow
  • long term archiving only

30
The Mechanics of Disk
  • Mechanical characteristics
  • Rotation speed (5400RPM)
  • Number of platters (1-30)
  • Number of tracks (lt10000)
  • Number of bytes/track(105)

Cylinder
Spindle
Disk head
Sector
Platters
31
Disk Access Characteristics
  • Disk latency time between when command is
    issued and when data is in memory
  • Disk latency seek time rotational latency
  • Seek time time for the head to reach cylinder
  • 10ms 40ms
  • Rotational latency time for the sector to
    rotate
  • Rotation time 10ms
  • Average latency 10ms/2
  • Transfer time typically 40MB/s
  • Disks read/write one block at a time (typically
    4kB)

32
A little CS
  • In main memory CPU time
  • Big O notation !
  • In databases time is dominated by I/O cost
  • Big O too, but for I/Os
  • Often big O becomes a constant
  • ? The I/O Model of Computation
  • Consequence need to redesign certain algorithms

33
Mergesort
  • Alg
  • E.g.
  • Complexity

34
Sorting
  • Problem sort 1 GB of data with 1MB of RAM.
  • Where we need this
  • Data requested in sorted order (ORDER BY)
  • Needed for grouping operations
  • First step in sort-merge join algorithm
  • Duplicate removal
  • Bulk loading of B-tree indexes.

35
Two-Way Merge-sort
  • Requires 3 Buffers in RAM
  • Pass 1 Read a page, sort it, write it.
  • Pass 2, 3, , etc. merge two runs, write them

36
Two-Way External Merge Sort
  • Assume block size is B 4Kb
  • Step 1 ? runs of length L 4Kb
  • Step 2 ? runs of length L 8Kb
  • Step 3 ? runs of length L 16Kb 23-1 4Kb
  • Step 9 ? runs of length L 1MB
  • Step 19 ? runs of length L 1GB (why?)

Need 19 iterations over the disk data to sort 1GB
37
Can we do better?
38
Large Two-Way External Merge Sort
  • We've got a meg!
  • Divide RAM into thirds
  • Read, write in blocks of 333kb
  • How much improvement?

39
Can we do better?
40
Cost Model for Our Analysis
  • B Block size ( 4KB)
  • M Size of main memory ( 1MB)
  • N Number of records in the file
  • R Size of one record

41
External Merge-Sort
  • Phase one load M bytes in memory, sort
  • Result SIZE/M lists of length M bytes (1MB)

42
Phase Two
  • Merge M/B 1 lists into a new list
  • M/B-1 1MB / 4kb -1 250
  • Result lists of size M (M/B 1) bytes
  • 249 1MB 250 MB

Input 1
. . .
. . .
Input 2
Output
. . . .
Input M/B
Disk
Disk
M bytes of main memory
43
Phase Three
  • Merge M/B 1 lists into a new list
  • Result lists of size M(M/B 1)2 bytes
  • 249 250 MB 62,500 MB 625 GB

44
Cost of External Merge Sort
  • Number of passes
  • How much data can we sort with 1MB RAM?
  • 1 pass ? 1MB
  • 2 passes ? 250MB (M/B 250)
  • 3 passes ? 625GB
  • Time
  • assume read/write block 10 ms .01 s
  • eac pass read, write all data
  • eac pass 2625GB/4kb.01s 21562500s
    226041m 2434 218 days 36 days

45
Cost of External Merge Sort
  • Number of passes
  • How much data can we sort with 10MB RAM (M/B
    2500)?
  • 1 pass ? 10MB
  • 2 passes ? 10MB 2500 25,000MB 25GB
  • 3 passes ? 2500 25GB 62,500GB

46
Cost of External Merge Sort
  • Number of passes
  • How much data can we sort with 100MB RAM (M/B
    25,000)?
  • 1 pass ? 100MB
  • 2 passes ? 100MB 25,000 2,500,000MB 2,500GB
    2.5TB
  • 3 passes ? 25,000 2.5TB 62,500TB 62.5PB

47
Next time
  • Next Indices
  • For next time reading from chapter 13 posted
    today
  • Hw3 up soon
  • Now one-minute responses
Write a Comment
User Comments (0)
About PowerShow.com