C20'0046: Database Management Systems Lecture - PowerPoint PPT Presentation

1 / 35

About This Presentation

Title:

C20'0046: Database Management Systems Lecture

Description:

Failover/recovery. Data warehousing & mining. Websearch. Hw3 due Thursday. no extensions! ... easy. If it's longer, need to shift records, create overflow blocks ... – PowerPoint PPT presentation

Number of Views:48

Avg rating:3.0/5.0

Slides: 36

Provided by: pagesSt

Learn more at: https://pages.stern.nyu.edu

Category:

more less

Transcript and Presenter's Notes

Title: C20'0046: Database Management Systems Lecture

1
C20.0046 Database Management SystemsLecture 25

Matthew P. Johnson
Stern School of Business, NYU
Spring, 2004

2
Agenda

Previously Hardware sorting
Next
Indices
Failover/recovery
Data warehousing mining
Websearch
Hw3 due Thursday
no extensions!
1-minute responses
XML links up

3
Lets get physical
Query update
User/ Application
Query compiler/optimizer
Query execution plan
Record, index requests
Transaction commands
Execution engine
Index/record mgr.

Transaction manager
Concurrency control
Logging/recovery

Page commands
Buffer manager
Read/write pages
Storage manager
storage
4
Hardware/memory review

DBs wont fit in RAM
Disk access is O(100,000) times slower than RAM
RAM Model of Computation
Single ops about same as single memory access
I/O Model of Computation
We read/write one block (4k) at a time
Measure time in disk accesses
Ignore processor operations O(100,000) times
faster
Regular Mergesort
Divide in half each time and recurse

5
Hardware/memory review

Big problem how to sort 1GB with 1MB of RAM?
Can use MS but must read/write all data 19 times
Soln TPMMS (External MergeSort)
Sort data in 1MB chunks
Sort 249 of the chunks into a 249MB chunk
Sort 249 of the 249MB chunks
Each iteration
RAM size/blocksize last-chunk-size

6
External Merge-Sort

Phase one load 1MB in memory, sort
Result SIZE/M lists of length M bytes (1MB)

7
Phase Two

Merge M/B 1 lists into a new list
M/B-1 1MB / 4kb -1 250
Result lists of size M (M/B 1) bytes
249 1MB 250 MB

Input 1
. . .
. . .
Input 2
Output
. . . .
Input M/B
Disk
Disk
M bytes of main memory
8
Phase Three

Merge M/B 1 lists into a new list
Result lists of size M(M/B 1)2 bytes
249 250 MB 62,500 MB 625 GB

9
Next topic File organization 1

Heap files unordered list of rows
One damn row after another.
All row queries are easy
SELECT FROM T
Insert is easy just add to end
Unique/subset queries are hard
Must test each row

10
File organization 2

Sorted file sort rows on some fields
Since datafile likely to be large, must use an
external sort like external MS
Equality, range select now easier
Do binary search to find first
Walk through rows until one fails test
Insert, delete now hard
Must move avg of half rows forward or back
Possible solns
Leave empty space
Use overflow pages

11
Modifications

Insert File is unsorted ? easy
File is sorted
Is there space in the right block?
Then store it there
If anything else fails, create overflow block
Delete Free space in block ?
Maybe be able to eliminate an overflow block
If not, use a tombstone (null record)
Update new rec is shorter than prev. ? easy
If its longer, need to shift records, create
overflow blocks

12
Overflow Blocks
Blockn-1
Blockn
Blockn1
Overflow

After a while the file starts being dominated by
overflow blocks time to reorganize

13
File organization 3

Datafile (un/sorted) index
Speeds searches based on its fields
Any subset/list of tables fields
these called search key
not to be confused with tables keys/superkeys
Idea trade disk space for disk time
also may cost processor/RAM time
Downsides
Takes up more space
Must reflect changes in data

14
Classification of indices

Primary v. secondary
Clustered v. unclustered
Dense v. sparse
Index data structures
B-trees
Hash tables
More advanced types
Function-based indices
R-trees
Bitmap indices

15
Dense indices

Index has entry for each row
NB index entries are smaller than rows
? more index entries per block than rows

16
Sparse indices

Why make sparse?
Fewer disk accesses
Bin search on shorter list log(shorter N)
Analogy thumb index in large dictionaries
Trade disk space for RAM space and comp. Time
May fit in RAM

17
Secondary/unclustered indices

To index other attributes than primary key
Always dense (why?)

18
Clustered v. unclustered

Clustered means data and index sorted same way
Sorted on the fields the index is indexing
Each index entry stored near data entry
Sparse indices must be clustered
Unclustered indices must be dense
Clustered indices can reduce disk latency
Related data stored together less far to go
Good for range queries

19
Primary v. secondary

Primary indexes
usually clustered
Only one per table
Use PRIMARY KEY
Secondary indexes
usually unclustered
many allowed per table
Use UNIQUE or CREATE INDEX

20
Partial key searches

Situ index on fields a1,a2,a3 we search on
fields ai, aj
When will this work?
i and j must be 1 and 2 (in either order)
Searched fields must be a prefix of the indexed
fields
E.g. lastname,firstname in phone book
Index must be clustered

21
New topic Hash Tables

I/O model hash tables are much like main memory
ones
Hash basics
There are n buckets
A hash function f(k) maps a key k to 0, 1, ,
n-1
Store in bucket f(k) a pointer to record with key
k
Difference for I/O model/DBMS
bucket size 1 block
use overflow blocks when needed

22
Example hash table

Assume 10 buckets, each storing 5 keys and
pointers (only 2 shown)
h(0)0
h(25)h(5)5
h(83)h(43)3
h(99)h(9)9

0
1
2
3
23
Hash table search

Search for 82
Compute h(82)2
Read bucket 2
1 disk access

0
1
2
3
24
Hash table insertion

Place in corresponding bucket, if space
Insert 42

0
1
2
3
25
Hash table insertion

Create overflow block, if no space
Insert 91
More over-flow blocksmay be added as necessary

0
1
2
3
26
Hash table performance

Excellent if no overflow blocks
For in-memory indices, hash tables usually
preferred
Performance degrades as ratio of
keys/(nblocksize) increases

27
Hash functions

Lots of ideas for good functions, depending on
situation
One obvious idea h(x) x mod n
Every x mapped to one of 0, 1, , n-1
Roughly 1/nth of xs mapped to each bucket
Does this work for equality search?
Does this work for range search?
Does this work for partial-key search?
Good functions of hashing passwords?
What was the point of hashing in that case?

28
Extensible hash table

Number of buckets grows to prevent overflows
Also used for crypto, hashing passwords, etc.
And Javas HashMap and object.hashCode()

29
New topic B-trees

Saw connected, rooted graphs before XML graphs
Trees are connected, acyclic graphs
Saw rooted trees before
XML docs
directory structure on hard drive
Organizational/management charts
B-trees are one kind of rooted tree

30
Twenty Questions

What am I thinking of?
Large space of possible choices
Can ask only yes/no questions
Each gives lt1 bit
Strategy
ask questions that divide searchspace in half
? gain full bit from each question
log2(1,000,000 220) 20

31
BSTs

Very simple data structure in CS BSTs
Binary Search Trees
Keep balanced
Each node one item
Each node has two children
Left subtree lt
Right subtree gt
Can search, insert, delete in log time
log2(1MB 220) 20

32
Search for DBMS

Big improvement log2(1MB) 20
Each op divides remaining range in half!
But recall all that matters is disk accesses
20 is better than 220 but
Can we do better?

33
BSTs ? B-trees

Like BSTs except each node one block
Branching factor is gtgt 2
Each access divides remaining range by, say, 300
B-trees BSTs blocks
B trees are a variant of B-trees
Data stored only in leaves
Leaves form a (sorted) linked list
Better supports range queries
Consequences
Much shorter depth ? Many fewer disk reads
Must find element within node
Trades CPU/RAM time for disk time

34
B-tree search efficiency

With params
block4k
integer 4b,
pointer 8b
the largest n satisfying 4n8(n1) lt 4096 is
n340
Each node has 170..340 keys
assume on avg has (170340)/2255
Then
255 rows ? depth 1
2552 64k rows ? depth 2
2553 16M rows ? depth 3
2554 4G rows ? depth 4

35
Next time

Next Failover
For next time reading online
Hw3 due next time
no extensions!
Now one-minute responses

Write a Comment

User Comments (0)

About PowerShow.com

Recommended Relevance Latest Highest Rated Most Viewed

Sort by:

Related More from user

CrystalGraphics Presentations

Introducing-PowerShowcom PowerPoint PPT Presentation

Introducing-PowerShowcom - Introducing-PowerShowcom (Without Music)

CrystalGraphics 3D Character Slides for PowerPoint PowerPoint PPT Presentation

CrystalGraphics 3D Character Slides for PowerPoint - CrystalGraphics 3D Character Slides for PowerPoint

Chart and Diagram Slides for PowerPoint PowerPoint PPT Presentation

Chart and Diagram Slides for PowerPoint - Beautifully designed chart and diagram s for PowerPoint with visually stunning graphics and animation effects. Our new CrystalGraphics Chart and Diagram Slides for PowerPoint is a collection of over 1000 impressively designed data-driven chart and editable diagram s guaranteed to impress any audience. They are all artistically enhanced with visually stunning color, shadow and lighting effects. Many of them are also animated. And they’re ready for you to use in your PowerPoint presentations the moment you need them. – PowerPoint PPT presentation

Related Presentations

C20.0046: Database Management Systems Lecture 11 PowerPoint PPT Presentation

C20.0046: Database Management Systems Lecture 11 - Aggregation op applied to rows in group, not to all rows in table ... Another grouping/aggregation e.g. Movie(title, year, length, studioName) ... | PowerPoint PPT presentation | free to view

C20.0046: Database Management Systems Lecture PowerPoint PPT Presentation

C20.0046: Database Management Systems Lecture - Boat examples. Reserve(ssn,bmodel,color) Q: Find ssns of sailors who ... Q: Find ssns of sailors who reserved red boats and green boats. SELECT DISTINCT r1.ssn ... | PowerPoint PPT presentation | free to view

C20'0046: Database Management Systems Lecture - All key attributes of entity sets from supporting relationships ... One member of Product relation: (gizmo, $19.99, gadgets, GizmoWorks) in Product ... | PowerPoint PPT presentation | free to view

C20.0046: Database Management Systems Lecture - http://wwwasdoc.web.cern.ch/wwwasdoc/WWW/publications/ictp99/ictp99N8059.ht ml ... Based on Quilt and XML-QL. FOR/LET... WHERE... RETURN... | PowerPoint PPT presentation | free to view

C20.0046: Database Management Systems Lecture - Q: What if a movie's Owns row were missing? 2003. Sylia. M202. 1999. Mr. Ripley. M101 ... P.S. Hoffman. T401. Palm Springs. Jude Law. T402. Bev.Hills. Gwyneth ... | PowerPoint PPT presentation | free to view

C20'0046: Database Management Systems Lecture - Like L R except that dangling tuples are included, padded with nulls. Left outerjoin: dangling tuples from L are include. Nulls appear 'on the right' ... | PowerPoint PPT presentation | free to view

C20.0046: Database Management Systems Lecture - retail: Amazon, etc. data-mining: Page You Made. search engines ... Amazon/BN (may be cheaper) Amazon.co.uk (may be cheaper still) Links on class page ... | PowerPoint PPT presentation | free to view

C20'0046: Database Management Systems Lecture - AB = ABCD, AC = AC, AD = ABCD, BC = BC, BD = BD, CD = CD ... Q: What are the keys? Q: What are the ... Define FDs (and keys) for them based on real world ... | PowerPoint PPT presentation | free to view

C20'0046: Database Management Systems Lecture - Project part 3 really due now. Bad date. Project part 4 due ... use CGI::Carp qw( fatalsToBrowser. warningsToBrowser ); print header(); pr int 'Hello Worldn' ... | PowerPoint PPT presentation | free to view

C20.0046: Database Management Systems Lecture - club cards. webserver logs. call centers. 311 calls in NYC. cameras. M.P. Johnson, DBMS, Stern/NYU, Spring 2005. 5. Lots of data. Can store this data ... | PowerPoint PPT presentation | free to view

C20.0046: Database Management Systems Lecture - phone company's records. Local Usage Details ('Pull his LUDs, Lenny.') Caller ID ... directories: Internic, etc. searchable DBs: IMDB, tvguide.com, etc. ... | PowerPoint PPT presentation | free to view

C20.0046: Database Management Systems Lecture - ... total sales for the entire database: Q: Find total sales of ... Movie(title, year, ... the year of each star's first movie. Q: Find the span of each star's ... | PowerPoint PPT presentation | free to view

C20.0046: Database Management Systems Lecture - CGI.pm offers a 'front-end' to HTML. Replaces mark-up language with an API. Very simple example: ... Extract next row of data from statement results, if available: ... | PowerPoint PPT presentation | free to view

C20.0046: Database Management Systems Lecture - Characters: CHAR(20) -- fixed length. VARCHAR(40) -- variable length. Numbers: ... was Star Wars (1977)? Strategy: find the row with Star Wars; then project ... | PowerPoint PPT presentation | free to view

C20.0046: Database Management Systems Lecture - ... index-name ON R(fields) CREATE INDEX nameIndex ON Person(name) ... We have a new virtual table: NYCview(buyer, seller, product, store) CREATE VIEW NYCview AS ... | PowerPoint PPT presentation | free to view

C20'0046: Database Management Systems Lecture - Users can put books in the cart. A couple pages to pay. You need to ... http://amazon.com/cart.cgi?title=Database Systems&price=.01 ... | PowerPoint PPT presentation | free to view

C20'0046: Database Management Systems Lecture - Topic: populating your tables with data. Using MySQL's bulk loader. Can ... The Oracle client should be ... http://www.geocities.com/cliktoprogram/plsql ... | PowerPoint PPT presentation | free to view

C20'0046: Database Management Systems Lecture - Set Theory. Invented by Georg Cantor. Great 19th-C German mathematician. Big set theory results in 1870s-1890s. Controversial at the ... A little set theory ... | PowerPoint PPT presentation | free to view

C20'0046: Database Management Systems Lecture - Failure/recovery. Data warehousing & mining. Websearch. Hw3 due today. no extensions! ... Much shorter depth Many fewer disk reads. Must find element within node ... | PowerPoint PPT presentation | free to view

C20'0046: Database Management Systems Lecture - M.P. Johnson, DBMS, Stern/NYU, Spring 2005. 1. C20.0046: ... use CGI::Carp qw( fatalsToBrowser. warningsToBrowser ); print header(); pr int 'Hello Worldn' ... | PowerPoint PPT presentation | free to view

C20'0046: Database Management Systems Lecture - each possible solution x as some fitness f(x) space of possible ... lots of time-series data for different props for different stocks. return market signals ... | PowerPoint PPT presentation | free to view

C20'0046: Database Management Systems Lecture - See http://servlet.java.sun.com/products/jdbc/drivers for 202 (!) JDBC drivers ... Class.forName('oracle.jdbc.driver.OracleDriver') Connection con ... | PowerPoint PPT presentation | free to view

C20'0046: Database Management Systems Lecture - Any clauses used must appear in this order: SELECT L. FROM Rs. WHERE s ... Inner joins require an ON clause. Like a where clause. Arbitrary boolean expression ... | PowerPoint PPT presentation | free to view

C20'0046: Database Management Systems Lecture - Person(persname, phoneNumber, city) M.P. Johnson, DBMS, Stern/NYU, Spring 2005. 8 ... People(ssn, name, street, city, state) assume for clarity that cities are unique ... | PowerPoint PPT presentation | free to view

C20'0046: Database Management Systems Lecture - Now, you both share a key and can encrypt passwords, credit cards, etc. ... Soln: digital certificates. George encrypts his order with his private key (not a typo! ... | PowerPoint PPT presentation | free to view

C20'0046: Database Management Systems Lecture - my $dbh = DBI- connect('dbi:mysql:database=mydb;mysql2.stern.nyu.edu;port=3306', user, pass) ... CGI & some host language, Java Servlets, etc. ... | PowerPoint PPT presentation | free to view

C20'0046: Database Management Systems Lecture - M.P. Johnson, DBMS, Stern/NYU, Sp2004. 1. C20.0046: Database Management Systems ... Orthodox SQL has set operators: UNION, INTERSECT, EXCEPT. And bag operators: ... | PowerPoint PPT presentation | free to view