Title: C20.0046: Database Management Systems Lecture
1C20.0046 Database Management SystemsLecture 24
- Matthew P. Johnson
- Stern School of Business, NYU
- Spring, 2004
2Agenda
- Previously XML
- Next
- Finish XML related technologies
- Hardware
- Indices
- Hw3 up soon
- 1-minute responses
- Grading
3XML Applications/dialects
- Copy from http//pages.stern.nyu.edu/mjohnson/db
ms/eg/xml.txt - MathML Mathematical Markup Language
- http//wwwasdoc.web.cern.ch/wwwasdoc/WWW/publicati
ons/ictp99/ictp99N8059.html - ChemML Chemical Markup Language
- X4ML XML for Merrill Lynch
- XHMTL HTML retrofitted as an XML application
- Validation http//pages.stern.nyu.edu/mjohnson/d
bms/
4XML Applications/dialects
- VoiceXML
- http//newmedia.purchase.edu/Jeanine/interfaces/r
ps.xml - ATT Directory Assistance
- http//phone.yahoo.com/
5More XML Apps
- FIXML
- XML equiv. of FIX Financial Information eXchange
- swiftML
- XML equiv. of SWIFT Society for Worldwide
Interbank Financial Telecommunications message
format - Apaches Ant
- Scripting language for Java build management
- http//ant.apache.org/manual/using.html
- Many more
- http//www-106.ibm.com/developerworks/xml/library/
x-stand4/
6More XML Applications/Protocols
- RSS Rich Site Summary/Really Simple Syndication
- http//slate.msn.com/rss/
- http//slashdot.org/index.rss
- Screenshot
- http//paulboutin.weblogger.com/pictures/viewer67
3 - More info http//slate.msn.com/id/2096660/
ltchannelgt lttitlegtmy channellt/titlegt ltitemgt
lttitlegtstory 1lt/titlegt ltlinkgtlt/linkgt lt/itemgt //
other items lt/channelgt
7More XML Applications/Protocols
- SOAP Simple Object Access Protocol
- XML-based messaging format
- Used by Google API http//www.google.com/apis/
- Amazon API http//amazon.com/gp/aws/landing.html
- Amazon light http//kokogiak.com/amazon/
- Other examples http//www.wired.com/wired/archive
/12.03/google.html?pg10topictopic_set - SOAP envelope with header and body
- Request sales tax for total
ltSOAPEnvelope xmlnsSOAP"urnschemas-xmlsoap-or
gsoap.v1"gt ltSOAPHeadergtlt/SOAPHeadergt
ltSOAPBodygt ltGetSalesTaxgt
ltSalesTotalgt100lt/SalesTotalgt
ltGetSalesTaxgt lt/SOAPBodygt lt/SOAPEnvelopegt
8More XML Applications/Protocols
- lt?xml version"1.0" encoding"UTF-8"?gt
- ltsoapEnvelope xmlnssoap"http//schemas.xmlsoap.
org/soap/envelope/"gt - ltsoapBodygt
- ltgsdoGoogleSearch xmlnsgs"urnGoogleSearch"
gt - ltkeygt(key)slt/keygt
- ltstartgt0lt/startgt
- ltmaxResultsgt10lt/maxResultsgt
- ltfiltergttruelt/filtergt
- ltrestrict/gt
- ltsafeSearchgtfalselt/safeSearchgt
- ltlr/gt
- lt/gsdoGoogleSearchgt
- lt/soapBodygt
- lt/soapEnvelopegt
9RDF
- RDF Resource Definition Framework
- Describe info on web
- Metadata for the web
- Content, authors, relations to other content
- Semantic web
- See http//www.w3.org/DesignIssues/RDFnot.html
10New topic Querying XML
- XPath
- Simple protocol for accessing node
- Wont discuss
- XQuery SQL of XML
- XSLT sophisticated transformations
11XQuery
- XQuery FLWR expressions
- Based on Quilt and XML-QL
FOR/LET... WHERE... RETURN...
FOR b IN document("bib.xml")//book WHERE
b/publisher "Morgan Kaufmann" AND b/year
"1998" RETURN b/title
12XQuery
- Find all book titles published after 1995
FOR x IN document("bib.xml")/bib/book WHERE
x/year gt 1995 RETURN x/title
Result lttitlegt abc lt/titlegt lttitlegt def
lt/titlegt lttitlegt ghi lt/titlegt
13SQL and XQuery Side-by-side
Product(pid, name, maker)Company(cid, name, city)
Find all products made in Seattle
FOR r in document(db.xml)/db, x in
r/Product/row, y in r/Company/rowWHER
E x/maker/text()y/cid/text() and
y/city/text() SeattleRETURN x/name
SELECT x.nameFROM Product x, Company yWHERE
x.makery.cid and y.citySeattle
SQL
XQuery
14SQL and XQuery Side-by-side
For each company with revenues lt 1M count the
products over 100
SELECT y.name, count()FROM Product x, Company
yWHERE x.price gt 100 and x.makery.cid and
y.revenue lt 1000000GROUP BY y.cid, y.name
FOR r in document(db.xml)/db, y in
r/Company/rowrevenue/text()lt1000000RETURN
ltproudCompanygt ltcompanyNamegt
y/name/text() lt/companyNamegt
ltnumberOfExpensiveProductsgt
count(r/Product/rowmaker/text()y/cid/text()p
rice/text()gt100) lt/numberOfExpensive
Productsgt lt/proudCompanygt
15XSLT XST Transformations
- Converts XML docs to other XML docs
- Or to HTML, PDF, etc.
- E.g. Have data in XML, want to display to all
users - Users view web with IE, Netscape, Palm
- Have XSLT convert to HTML that looks good on each
- XSLT processor takes XML doc and XSL template for
view
16Querying XML with XQuery
- FLWR expressions
- Often much simpler than XSLT
- XSLT v. XQuery
- http//www.xmlportfolio.com/xquery.html
ltxsltransform version"1.0" xmlnsxsl"http//www
.w3.org/1999/XSL/Transform"gt ltxsltemplate
match"/"gt ltxslfor-each select"document('bib
.xml')//book"gt ltxslif test"publisher'Morg
an Kaufmann' and year'1998'"gt
ltxslcopy-of select"title"/gt lt/xslifgt
lt/xslfor-eachgt lt/xsltemplategt lt/xsltransformgt
FOR b IN document("bib.xml")//book WHERE
b/publisher "Morgan Kaufmann" AND b/year
"1998" RETURN b/title
17Displaying XML with XSL/XSLT
- XSL style sheet language for XML
- As CSS is for HTML
- Menu in XML
- http//www.w3schools.com/xml/simple.xml
- XSL file for displaying it
- http//www.w3schools.com/xml/simple.xsl
- XSL applied to the XML
- http//www.w3schools.com/xml/simplexsl.xml
- More info on Java with XSLT and Xpath
- http//java.sun.com/webservices/docs/ea2/tutorial/
doc/JAXPXSLT2.html
18Why XML matters
- Hugely popular
- To millennium what Java was to mid-90s
- Buzzword compliant
- XML databases wont likely replace RDBMSs
(remember OODBMSs?), but - Allows for comm. between DBMSs disparate
architectures, tools, languages, etc. - Basis for Web Services
- DBMS vendors are adding XML support
- MS, Oracle, et al.
19For more info
- APIs SAX, JAXP
- Editors XML Spy, MS XML Notepad
http//www.webattack.com/get/xmlnotepad.shtml - Parsers Saxon, Xalan, MS XML Parser
- Lecture drew on resources from
- Nine-week course on XML
- http//www.cs.rpi.edu/puninj/XMLJ/classes.html
- W3C XML Tutorial
- http//www.w3schools.com/xml/default.asp
- http//www.cs.cornell.edu/courses/cs433/2001fa/Sli
des/Xml,20XPath,2020Xslt.ppt
20Next topic Hardware
- Types of memory
- Disks
- Mergesort/TPMMS
21What should a DBMS do?
- Store large amounts of data
- Process queries efficiently
- Allow multiple users to access the database
concurrently and safely. - Provide durability of the data.
- How will we do all this?
22 Lets get physical
Query update
User/ Application
Query compiler/optimizer
Query execution plan
Record, index requests
Transaction commands
Execution engine
Index/record mgr.
- Transaction manager
- Concurrency control
- Logging/recovery
Page commands
Buffer manager
Read/write pages
Storage manager
storage
23Types of memory
Main Memory Disk
Tape
- 5-10 MB/S
- transmission rates
- 100s GB storage
- average time to
- access a block
- 10-15 msecs.
- Need to consider
- seek, rotation,
- transfer times.
- Keep records close
- to each other.
- 1.5 MB/S transfer rate
- 280 GB typical
- capacity
- Only sequential access
- Not for operational
- data
- Volatile
- limited address
- spaces
- expensive
- average access
- time
- 10-100 ns
Cache access time 10 nanos
24Main Memory
- Fastest, most expensive
- Today O(1 GB) are common on PCs
- Some databases could fit in memory
- New industry trend Main Memory Database
- But many cannot
- RAM is volatile and small
- Still need to store on disk
25Secondary Storage
- Disks
- Slower, cheaper than main memory
- Persistent!
- Used with a main memory buffer
26200 worth of disk space
27Buffer Management in a DBMS
Page Requests from Higher Levels
BUFFER POOL
disk page
free frame
MAIN MEMORY
DISK
choice of frame dictated by replacement policy
- Data must be in RAM for DBMS to operate on it!
- Table of ltframe, pageidgt pairs is maintained.
- LRU is not always good.
28Buffer Manager
- Why not just use the OS?
- DBMS may be able to anticipate access patterns
- Hence, may also be able to perform prefetching
- DBMS needs the ability to force pages to disk.
29Tertiary Storage
- CDs, DVDs, jukeboxes
- ROM
- Tapes, tape silos
- sequential access
- Bi but very slow
- long term archiving only
30The Mechanics of Disk
- Mechanical characteristics
- Rotation speed (5400RPM)
- Number of platters (1-30)
- Number of tracks (lt10000)
- Number of bytes/track(105)
Cylinder
Spindle
Disk head
Sector
Platters
31Disk Access Characteristics
- Disk latency time between when command is
issued and when data is in memory - Disk latency seek time rotational latency
- Seek time time for the head to reach cylinder
- 10ms 40ms
- Rotational latency time for the sector to
rotate - Rotation time 10ms
- Average latency 10ms/2
- Transfer time typically 40MB/s
- Disks read/write one block at a time (typically
4kB)
32A little CS
- In main memory CPU time
- Big O notation !
- In databases time is dominated by I/O cost
- Big O too, but for I/Os
- Often big O becomes a constant
- ? The I/O Model of Computation
- Consequence need to redesign certain algorithms
33Mergesort
34Sorting
- Problem sort 1 GB of data with 1MB of RAM.
- Where we need this
- Data requested in sorted order (ORDER BY)
- Needed for grouping operations
- First step in sort-merge join algorithm
- Duplicate removal
- Bulk loading of B-tree indexes.
35Two-Way Merge-sort
- Requires 3 Buffers in RAM
- Pass 1 Read a page, sort it, write it.
- Pass 2, 3, , etc. merge two runs, write them
36Two-Way External Merge Sort
- Assume block size is B 4Kb
- Step 1 ? runs of length L 4Kb
- Step 2 ? runs of length L 8Kb
- Step 3 ? runs of length L 16Kb 23-1 4Kb
-
- Step 9 ? runs of length L 1MB
-
- Step 19 ? runs of length L 1GB (why?)
Need 19 iterations over the disk data to sort 1GB
37Can we do better?
38Large Two-Way External Merge Sort
- We've got a meg!
- Divide RAM into thirds
- Read, write in blocks of 333kb
- How much improvement?
39Can we do better?
40Cost Model for Our Analysis
- B Block size ( 4KB)
- M Size of main memory ( 1MB)
- N Number of records in the file
- R Size of one record
41External Merge-Sort
- Phase one load M bytes in memory, sort
- Result SIZE/M lists of length M bytes (1MB)
42Phase Two
- Merge M/B 1 lists into a new list
- M/B-1 1MB / 4kb -1 250
- Result lists of size M (M/B 1) bytes
- 249 1MB 250 MB
Input 1
. . .
. . .
Input 2
Output
. . . .
Input M/B
Disk
Disk
M bytes of main memory
43Phase Three
- Merge M/B 1 lists into a new list
- Result lists of size M(M/B 1)2 bytes
- 249 250 MB 62,500 MB 625 GB
44Cost of External Merge Sort
- Number of passes
- How much data can we sort with 1MB RAM?
- 1 pass ? 1MB
- 2 passes ? 250MB (M/B 250)
- 3 passes ? 625GB
- Time
- assume read/write block 10 ms .01 s
- eac pass read, write all data
- eac pass 2625GB/4kb.01s 21562500s
226041m 2434 218 days 36 days
45Cost of External Merge Sort
- Number of passes
- How much data can we sort with 10MB RAM (M/B
2500)? - 1 pass ? 10MB
- 2 passes ? 10MB 2500 25,000MB 25GB
- 3 passes ? 2500 25GB 62,500GB
46Cost of External Merge Sort
- Number of passes
- How much data can we sort with 100MB RAM (M/B
25,000)? - 1 pass ? 100MB
- 2 passes ? 100MB 25,000 2,500,000MB 2,500GB
2.5TB - 3 passes ? 25,000 2.5TB 62,500TB 62.5PB
47Next time
- Next Indices
- For next time reading from chapter 13 posted
today - Hw3 up soon
- Now one-minute responses