C20'0046: Database Management Systems Lecture - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

C20'0046: Database Management Systems Lecture

Description:

Failure/recovery. Data warehousing & mining. Websearch. Hw3 due today. no extensions! ... Much shorter depth Many fewer disk reads. Must find element within node ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 43
Provided by: pagesSt
Category:

less

Transcript and Presenter's Notes

Title: C20'0046: Database Management Systems Lecture


1
C20.0046 Database Management SystemsLecture 26
  • Matthew P. Johnson
  • Stern School of Business, NYU
  • Spring, 2004

2
Agenda
  • Previously Indices
  • Next
  • Finish Indices, advanced indices
  • Failure/recovery
  • Data warehousing mining
  • Websearch
  • Hw3 due today
  • no extensions!
  • 1-minute responses
  • Review clustered, dense, primary, /tbl, syntax

3
Lets get physical
Query update
User/ Application
Query compiler/optimizer
Query execution plan
Record, index requests
Transaction commands
Execution engine
Index/record mgr.
  • Transaction manager
  • Concurrency control
  • Logging/recovery

Page commands
Buffer manager
Read/write pages
Storage manager
storage
4
BSTs
  • Very simple data structure in CS BSTs
  • Binary Search Trees
  • Keep balanced
  • Each node one item
  • Each node has two children
  • Left subtree lt
  • Right subtree gt
  • Can search, insert, delete in log time
  • log2(1MB 220) 20

5
Search for DBMS
  • Big improvement log2(1MB) 20
  • Each op divides remaining range in half!
  • But recall all that matters is disk accesses
  • 20 is better than 220 but
  • Can we do better?

6
BSTs ? B-trees
  • Like BSTs except each node one block
  • Branching factor is gtgt 2
  • Each access divides remaining range by, say, 300
  • B-trees BSTs blocks
  • B trees are a variant of B-trees
  • Data stored only in leaves
  • Leaves form a (sorted) linked list
  • Better supports range queries
  • Consequences
  • Much shorter depth ? Many fewer disk reads
  • Must find element within node
  • Trades CPU/RAM time for disk time

7
B Trees
  • Parameter n ? branching factor is n1
  • Largest number s.t. one block can contain n
    search-key values and n1 pointers
  • Each node (except root) has at least n/2 keys

Keys k lt 30
Keys 120ltklt240
Keys 240ltk
Keys 30ltklt120
Next leaf
40
50
60
8
Searching a B Tree
Select name From people Where age 25
  • Exact key values
  • Start at the root
  • If were in leaf, walk through its key values
  • If not, look at keys K1..Kn
  • If Ki lt K lt Ki1, look in child i
  • Range queries
  • As above
  • Then walk left until test fails

Select name From people Where 20 lt age and
age lt 30
9
B Tree Example
Find the key 40
n 4
40 ? 80
20 lt 40 ? 60
30 lt 40 ? 40
10
15
18
20
30
40
50
60
65
80
85
90
NB Leaf keys are sorted data pointed to is only
if clustered
10
Clustered unclustered B-trees

Data entries
Data entries
(Index File)
(Data file)
Data Records
Data Records
CLUSTERED
UNCLUSTERED
11
B trees, and, or
  • Assume index on a,b,c
  • Intuition phone book
  • WHERE a x and b y
  • WHERE b y and c z
  • WHERE a a and c z
  • WHERE a x or b y or c z

12
B trees and LIKE
  • Supports only hard-coded prefix LIKE checks
  • Intuition phone book
  • Select from T where a like xyz
  • Select from T where a like xyz
  • Select from T where a like xyzzyx

13
B-tree search efficiency
  • With params
  • block4k
  • integer 4b,
  • pointer 8b
  • the largest n satisfying 4n8(n1) lt 4096 is
    n340
  • Each node has 170..340 keys
  • assume on avg has (170340)/2255
  • Then
  • 255 rows ? depth 1
  • 2552 64k rows ? depth 2
  • 2553 16M rows ? depth 3
  • 2554 4G rows ? depth 4

14
B-trees in practice
  • Most DBMSs use B-trees for most indices
  • Default in MySQL
  • Default in Oracle
  • Speeds up
  • where clauses
  • Some like checks
  • Min or max functions
  • joins
  • Limitation fields used must
  • Be a prefix of indexed fields
  • Be ANDed together

15
Next topic Advanced types of indices
  • Spatial indices based on R-trees (R region)
  • Support multi-dimensional searches on geometry
    fields
  • 2-d not 1-d ranges
  • Oracle
  • MySQL

CREATE INDEX geology_rtree_idx ON
geology_tab(geometry) INDEXTYPE IS
MDSYS.SPATIAL_INDEX
CREATE TABLE geom (g GEOMETRY NOT NULL, SPATIAL
INDEX(g))
16
Advanced types of indices
  • Inverted indices for web doc search
  • First, think of each webpage as a tuple
  • One column for every possible word
  • True means the word appears on the page
  • Index on all columns
  • Now can search youre fired
  • ? select from T where youreT and firedT

17
Advanced types of indices
  • Can simplify somewhat
  • For each field index, delete False entries
  • True entries for each index become a bucket
  • Create inverted index
  • One entry for each search word
  • Search word entry points to corresponding bucket
  • Bucket points to pages with its word
  • Amazon

18
Advanced types of indices
  • Function-based indices
  • Speeds up WHERE upper(name)BUSH, etc.
  • Now supported in Oracle 8, not MySQL
  • Bitmap indices
  • Speeds up arbitrary combination of reqs
  • Not limited to prefixes or conjunctions
  • Now supported in Oracle 9, not MySQL

create index on T(my_soundex(name)) create index
on T(substr(DOB),4,5))
19
Bitmap indices
  • Assume table has n records
  • Assume F is a field with m different values
  • Bitmap index on F m length-n bitstrings
  • One bitstring for each value of F
  • Each one says which rows have that value for F
  • Example
  • n , mF , mG
  • Q find rows where
  • F50 or (F30 and GBaz)

20
Bitmap index search
  • Larger example (age,salary) of jewelry buyers
  • Bitmaps for age
  • 25100000001000, 30000000010000, 4501000000100,
    50001110000010, 60000000000001,
    70000001000000, 85000000100000
  • Bitmaps for salary
  • 60110000000000, 75001000000000,
    100000100000000, 110000001000000,
    120000010000000, 140000000100000,
    260000000010001, 275000000000010,
    350000000000100, 400000000001000

21
Bitmap index search
  • Query find buyers of age 45-55 with salary
    100-200
  • Age range 010000000100 (45) 001110000010 (50)
    011110000110
  • Bitwise or of Salary range 000111100000
  • AND together 011110000110 000111100000
    000110000000
  • What does this mean?

22
Bitmap index search
  • Once we have row numbers, then what?
  • Get rows with those numbers (How?)
  • Bitmap indices in Oracle
  • Best for low-cardinality fields
  • Boolean, enum, gender
  • ? lots of 0s in our bitmaps
  • Compress 000000100001 ? 6141
  • run-length encoding

CREATE BITMAP INDEX ON T(F,G)
23
New topic Recovery
24
System Failures
  • Each transaction has internal state
  • When system crashes, internal state is lost
  • Dont know which parts executed and which didnt
  • Remedy use a log
  • A file that records each action of each xact
  • Trail of breadcrumbs

25
Media Failures
  • Rule of thumb Pr(hard drive has head crash
    within 10 years) 50
  • Simpler rule of thumb Pr(hard drive has head
    crash within 1 years) 10
  • Serious problem
  • Soln different RAID strategies
  • RAID Redundant Arrays of Independent Disks

26
RAID levels
  • RAID level 1 each disk gets a mirror
  • RAID level 4 one disk is xor of all others
  • Each bit is sum mod 2 of corresponding bits
  • E.g.
  • Disk 1 11110000
  • Disk 2 10101010
  • Disk 3 00111000
  • Disk 4
  • How to recover?

27
Transactions
  • Transaction unit of code to be executed
    atomically
  • In ad-hoc SQL
  • one command one transaction
  • In embedded SQL
  • Transaction starts first SQL command issued
  • Transaction ends
  • COMMIT
  • ROLLBACK (abort)
  • Can turn off/on autocommit

28
Primitive operations of transactions
  • Each xact reads/writes rows or blocks elms
  • INPUT(X)
  • read element X to memory buffer
  • READ(X,t)
  • copy element X to transaction local variable t
  • WRITE(X,t)
  • copy transaction local variable t to element X
  • OUTPUT(X)
  • write element X to disk
  • LOG RECORD

29
Transaction example
  • Xact Transfer 100 from savings to checking
  • A A100
  • B B-100
  • READ(A,t)
  • t t100
  • WRITE(A,t)
  • READ(B,t)
  • t t-100
  • WRITE(B,t)

30
Transaction example
  • READ(A,t) t t100WRITE(A,t) READ(B,t) t
    t-100WRITE(B,t)

31
The log
  • An append-only file containing log records
  • Note multiple transactions run concurrently, log
    records are interleaved
  • After a system crash, use log to
  • Redo some transaction that didnt commit
  • Undo other transactions that didnt commit
  • Three kinds of logs undo, redo, undo/redo
  • Well discuss only Undo

32
Undo Logging
  • Log records
  • ltSTART Tgt
  • transaction T has begun
  • ltCOMMIT Tgt
  • T has committed
  • ltABORT Tgt
  • T has aborted
  • ltT,X,vgt
  • T has updated element X, and its old value was v

33
Undo-Logging Rules
  • U1 Changes logged (ltT,X,vgt) before being written
    to disk
  • U2 Commits logged (ltCOMMIT Tgt) after being
    written to disk
  • Results
  • May forget we did whole xact (and so wrongly
    undo)
  • Will never forget did partial xact (and so leave)
  • Log-change, change, log-change, change, Commit,
    log-commit

34
Undo-Logging e.g. (inputs omitted)
35
Recovery with Undo Log
  • After systems crash, run recovery manager
  • Decide for each xact T whether it was completed
  • Undo all modifications from incomplete xacts, in
    reverse order (why?) and abort each

ltSTART Tgt.ltCOMMIT Tgt ? yes ltSTART Tgt.ltABORT
Tgt ? yes ltSTART Tgt ? no
36
Recovery with Undo Log
  • Read log from the end cases
  • ltCOMMIT Tgt mark T as completed
  • ltABORT Tgt mark T as completed
  • ltT,X,vgt
  • ltSTART Tgt ignore

if T is not completed then write Xv to
disk else ignore
37
Recovery with Undo Log
ltT2,X2,v2gt ltSTART T5gt ltSTART
T4gt ltT1,X1,v1gt ltT5,X5,v5gt ltT4,X4,v4gt ltCOMMIT
T5gt ltT3,X3,v3gt ltT2,X2,v2gt
Start
Q Which updates areundone?
Crash!
38
Recovery with Undo Log
  • Note undo commands are idempotent
  • No harm done if we repeat them
  • Q What if system crashes during recovery?
  • How far back in the log do we go?
  • Dont go all the way back to the start
  • May be very large
  • Better idea use checkpointing

39
Checkpointing
  • Checkpoint the database periodically
  • Stop accepting new transactions
  • Wait until all current xacts complete
  • Flush log to disk
  • Write a ltCKPTgt log record, flush log
  • Resume accepting new xacts

40
Undo Recovery with Checkpointing
ltT1,X1,v1gt (all completed) ltCKPTgt ltSTART
T2gt ltSTART T3 ltSTART T5gt ltSTART
T4gt ltT4,X4,v4gt ltT5,X5,v5gt ltT4,X4,v4gt ltCOMMIT
T5gt ltT3,X3,v3gt ltT2,X2,v2gt
other xacts
During recovery, can stop at first ltCKPTgt
xacts T2,T3,T4,T5
41
Non-quiescent Checkpointing
  • Problem database must freeze during checkpoint
  • Would like to checkpoint while database is
    operational
  • Idea non-quiescent checkpointing
  • Quiescent quiet, still, at rest inactive

42
Next time
  • Next Data warehousing mining!
  • For next time reading online
  • Proj5 due next Thursday
  • no extensions!
  • Now one-minute responses
  • Relative weight warehousing, mining, websearch
  • Data mining techniques
  • NNs
  • GAs
  • kNN
  • Decision Trees
Write a Comment
User Comments (0)
About PowerShow.com