Indexing Techniques - PowerPoint PPT Presentation

1 / 48

About This Presentation

Title:

Indexing Techniques

Description:

t = ceil(log26 215 ) = 3. t = ceil(logfo#index-records) Dynamic multi-level indices ... Each node at least ceil(q/2) tree pointers. except from root ... – PowerPoint PPT presentation

Number of Views:20

Avg rating:3.0/5.0

Slides: 49

Provided by: nikosriz

Category:

more less

Transcript and Presenter's Notes

Title: Indexing Techniques

1
Indexing Techniques
2
The Problem

What can we introduce to make search more
efficient?
Indices!
What is an index?

3
Definitions

Index an auxiliary data structure to speed up
record retrieval
Search key the field/s of a table which is/are
indexed
Storage index files that contain index records
Each entry storing
Actual data record
or, search key value k and record ID ltk,ridgt
or, search key value k and list of records IDs
ltk,rid listgt
Types ordered and unordered (hash) indices

4
Types of Ordered Indices (1/3)

Assuming ordered data files
Depending on which field is indexed
Primary index search key is ordering key field
Pointer for each page
Secondary index search key is non ordering field

secondary
primary
5
Types of Ordered Indices (2/3)

Depending on the density of index records
Dense index an index record for each distinct
search key value, ie every record
Sparse index index records for only some search
key values
search key value for first record in page
pointer to page

dense
sparse
6
Types of Ordered Indices (3/3)

Ordering field is nonkey (may have duplicates)
Clustered index
Unclustered index

clustered
unclustered
7
Indices Exercise

215 records
128 bytes/record
210 bytes/page
ordered file equality search on ordering field,
unspanned organization
without an index
with a primary index
on field of size 12 bytes
assume pointer 4 bytes long

8
Multi-level Indices (1/2)

If access using first-level index is still
expensive
Build a sparse index on the first-level index
Multi-level Index
Fan-out index blocking factor

first-level index
second-level index
9
Multi-level Indices (2/2)

26 index records/page (fan-out)
215 index records
1st-level
29 pages
2nd-level
29 index records
23 pages
3rd-level
23 index records
1 page
1 lt 215 / (26)t
t ceil(log26 215 ) 3
t ceil(logfoindex-records)

10
Dynamic multi-level indices

So far assumed indices are physically ordered
files
expensive insertions and deletions
Dynamic multi-level indices
B trees
B trees

11
Tree-structured Indices

For each node K1 lt K2 lt Kq-1
For each value X in subtree pointed to by Pi
Ki-1lt X lt Ki, 1ltiltq
X lt Ki, i1
Ki-1lt X, iq

P1
K1

Ki-1
Pi
Ki

Kq-1
Pq
X
X
X
12
B tree

Problems empty nodes, unbalanced trees
solution B trees

13
B tree Definition

Each node ltP1,ltK1, Pr1gt, P2,,ltKq-1, Prq-1gt, Pqgt
Pi tree pointer, Ki search value, Pri data
pointer
For each node K1 lt K2 lt Kq-1
For each value X in subtree pointed to by Pi
Ki-1lt X lt Ki, 1ltiltq
X lt Ki, i1
Ki-1lt X, iq
Each node at most q pointers
B tree is order q
Each node at least ceil(q/2) tree pointers
except from root
Internal node with p pointers has p-1 values
All leaves at the same level
balanced tree

14
B tree Example
5
8
ø
1
ø
3
ø
ø
6
ø
7
ø
ø
9
ø
12
ø
tree pointer
data pointer
ø
null pointer
15
B tree

Most implementations of B tree are B tree
Data pointers only in leaves
more entries in internal nodes than regular B
trees
less internal nodes
less levels
faster access

16
B tree Definition

Internal nodes ltP1,K1, P2,, Pq-1, Kq-1, Pqgt
Leaf nodes ltltK1, Pr1gt, ltK2, Pr2gt,,ltKp-1,
Prp-1gt, Pnextgt
Pri points a data records or block of pointers of
such records
leaf order

120
150
180
120
130
180
200
17
B tree Search

At each level, find smallest Ki larger than
search key
Follow associated pointer Pi

100
30
30
35
120
130
180
200
18
B tree Insert

Nodes may overflow or underflow
Ignoring overflow or underflow
Inserting data record with with search key value
k
find leaf node
if k found
add record to file, create indirect block if
there isnt one
add record pointer to indirect block
if k not found
add data record to file
insert record pointer in leaf node (all search
keys in order)

19
B tree Delete

Ignoring overflow or underflow
Find leaf node with search key value k
Find data record pointer, delete record
delete index record
and indirect block, if any, if empty

20
B tree Simple Insert

Insert 42

42
21
B tree Leaf Overflow (1/2)

Insert 9

22
B tree Leaf Overflow (2/2)

first ceil(n/2) in existing node, rest in new
leaf node
n314

100
k lt 100
9
30
3
5
30
35
42
9
11
120
130
180
200
23
B tree Internal Node Overflow (1/3)

Insert 210, insert 205

100
k lt 100
9
30
3
5
30
35
42
9
11
120
130
180
200
210
24
B tree Internal Node Overflow (2/3)

Leaf Split

100
k lt 100
9
30
3
5
30
35
42
9
11
120
130
180
200
205
210
25
B tree Internal Node Overflow (3/3)
100
150
k lt 100
9
30
3
5
30
35
42
9
11
120
130
180
200
205
210
26
B tree New Root (1/2)

Insert 210, insert 205

120
130
180
200
205
210
27
B tree New Root (2/2)
120
130
180
200
205
210
28
Index Insert Exercise

Insert 8, 7, 41

9
30
3
5
30
35
42
9
11
29
B tree Delete

Simple delete case
Underflow case
redistribute records
coalesce with siblings
update parents

30
B tree Simple Delete (1/2)

Delete 110

120
130
180
200
205
210
215
31
B tree Simple Delete (2/2)

Leaf Updated

100
101
120
130
180
200
205
210
215
32
B tree Delete Redistribution (1/2)

Delete 180

100
101
120
130
180
200
205
210
215
33
B tree Delete Redistribution (2/2)

Redistribute entries
left or right sibling

179
205
100
101
120
130
150
156
179
200
205
210
34
B tree Delete Coalesce (1/4)

Delete 101

179
205
100
101
120
130
150
156
179
200
205
210
215
35
B tree Delete Coalesce (2/4)

Leaf updated
No redistribution
sibling coalesce

179
205
100
120
130
150
156
179
200
205
210
215
36
B tree Delete Coalesce (3/4)

Leaf updated
No redistribution
sibling coalesce

179
205
100
120
130
150
156
179
200
205
210
215
37
B tree Delete Coalesce (4/4)

Redistribution

205
100
120
130
150
156
179
200
205
210
215
38
Hashing Techniques
39
Static Hashing (1/2)

Store records in buckets with overflow chains
Allocate a fixed number of buckets M
Problems
small M
long overflow chains, slow search-delete-insert

40
Static Hashing (2/2)

Problems
large M
wasted space, slow scan

null
null
h
null
41
Dynamic Hashing

Splitting and coalescing buckets as the database
grows-shrinks
One scheme Extendible Hashing
Hash function generates large values, eg 32 bits
use i bits, change i as database size changes
If overflow, double the number of buckets
use i1 bits of the hash function
but, expensive read all pages M and distribute
records in 2M pages
solution use a directory and double the size of
the directory
only split bucket that overflowed

42
Extendible Hashing (1/4)
h(18) 10010
18
43
Extendible Hashing (2/4)
2
16
20
A
2
00
2
01
1
B
10
h(4) 00100
2
11
18
C
2
3
7
D
44
Extendible Hashing (3/4)
3
16
A
2
00
2
01
1
B
10
2
11
18
C
2
3
7
D
3
20
4
A1
45
Extendible Hashing (4/4)
3