Indexing Techniques - PowerPoint PPT Presentation

1 / 48
About This Presentation
Title:

Indexing Techniques

Description:

t = ceil(log26 215 ) = 3. t = ceil(logfo#index-records) Dynamic multi-level indices ... Each node at least ceil(q/2) tree pointers. except from root ... – PowerPoint PPT presentation

Number of Views:19
Avg rating:3.0/5.0
Slides: 49
Provided by: nikosriz
Category:

less

Transcript and Presenter's Notes

Title: Indexing Techniques


1
Indexing Techniques
2
The Problem
  • What can we introduce to make search more
    efficient?
  • Indices!
  • What is an index?



3
Definitions
  • Index an auxiliary data structure to speed up
    record retrieval
  • Search key the field/s of a table which is/are
    indexed
  • Storage index files that contain index records
  • Each entry storing
  • Actual data record
  • or, search key value k and record ID ltk,ridgt
  • or, search key value k and list of records IDs
    ltk,rid listgt
  • Types ordered and unordered (hash) indices

4
Types of Ordered Indices (1/3)
  • Assuming ordered data files
  • Depending on which field is indexed
  • Primary index search key is ordering key field
  • Pointer for each page
  • Secondary index search key is non ordering field

secondary
primary
5
Types of Ordered Indices (2/3)
  • Depending on the density of index records
  • Dense index an index record for each distinct
    search key value, ie every record
  • Sparse index index records for only some search
    key values
  • search key value for first record in page
  • pointer to page

dense
sparse
6
Types of Ordered Indices (3/3)
  • Ordering field is nonkey (may have duplicates)
  • Clustered index
  • Unclustered index

clustered
unclustered
7
Indices Exercise
  • 215 records
  • 128 bytes/record
  • 210 bytes/page
  • ordered file equality search on ordering field,
    unspanned organization
  • without an index
  • with a primary index
  • on field of size 12 bytes
  • assume pointer 4 bytes long

8
Multi-level Indices (1/2)
  • If access using first-level index is still
    expensive
  • Build a sparse index on the first-level index
  • Multi-level Index
  • Fan-out index blocking factor

first-level index
second-level index
9
Multi-level Indices (2/2)
  • 26 index records/page (fan-out)
  • 215 index records
  • 1st-level
  • 29 pages
  • 2nd-level
  • 29 index records
  • 23 pages
  • 3rd-level
  • 23 index records
  • 1 page
  • 1 lt 215 / (26)t
  • t ceil(log26 215 ) 3
  • t ceil(logfoindex-records)

10
Dynamic multi-level indices
  • So far assumed indices are physically ordered
    files
  • expensive insertions and deletions
  • Dynamic multi-level indices
  • B trees
  • B trees

11
Tree-structured Indices
  • For each node K1 lt K2 lt Kq-1
  • For each value X in subtree pointed to by Pi
  • Ki-1lt X lt Ki, 1ltiltq
  • X lt Ki, i1
  • Ki-1lt X, iq

P1
K1

Ki-1
Pi
Ki

Kq-1
Pq
X
X
X
12
B tree
  • Problems empty nodes, unbalanced trees
  • solution B trees

13
B tree Definition
  • Each node ltP1,ltK1, Pr1gt, P2,,ltKq-1, Prq-1gt, Pqgt
  • Pi tree pointer, Ki search value, Pri data
    pointer
  • For each node K1 lt K2 lt Kq-1
  • For each value X in subtree pointed to by Pi
  • Ki-1lt X lt Ki, 1ltiltq
  • X lt Ki, i1
  • Ki-1lt X, iq
  • Each node at most q pointers
  • B tree is order q
  • Each node at least ceil(q/2) tree pointers
  • except from root
  • Internal node with p pointers has p-1 values
  • All leaves at the same level
  • balanced tree

14
B tree Example
5
8
ø
1
ø
3
ø
ø
6
ø
7
ø
ø
9
ø
12
ø
tree pointer
data pointer
ø
null pointer
15
B tree
  • Most implementations of B tree are B tree
  • Data pointers only in leaves
  • more entries in internal nodes than regular B
    trees
  • less internal nodes
  • less levels
  • faster access

16
B tree Definition
  • Internal nodes ltP1,K1, P2,, Pq-1, Kq-1, Pqgt
  • Leaf nodes ltltK1, Pr1gt, ltK2, Pr2gt,,ltKp-1,
    Prp-1gt, Pnextgt
  • Pri points a data records or block of pointers of
    such records
  • leaf order

120
150
180
120
130
180
200
17
B tree Search
  • At each level, find smallest Ki larger than
    search key
  • Follow associated pointer Pi

100
30
30
35
120
130
180
200
18
B tree Insert
  • Nodes may overflow or underflow
  • Ignoring overflow or underflow
  • Inserting data record with with search key value
    k
  • find leaf node
  • if k found
  • add record to file, create indirect block if
    there isnt one
  • add record pointer to indirect block
  • if k not found
  • add data record to file
  • insert record pointer in leaf node (all search
    keys in order)

19
B tree Delete
  • Ignoring overflow or underflow
  • Find leaf node with search key value k
  • Find data record pointer, delete record
  • delete index record
  • and indirect block, if any, if empty

20
B tree Simple Insert
  • Insert 42

42
21
B tree Leaf Overflow (1/2)
  • Insert 9

22
B tree Leaf Overflow (2/2)
  • first ceil(n/2) in existing node, rest in new
    leaf node
  • n314

100
k lt 100
9
30
3
5
30
35
42
9
11
120
130
180
200
23
B tree Internal Node Overflow (1/3)
  • Insert 210, insert 205

100
k lt 100
9
30
3
5
30
35
42
9
11
120
130
180
200
210
24
B tree Internal Node Overflow (2/3)
  • Leaf Split

100
k lt 100
9
30
3
5
30
35
42
9
11
120
130
180
200
205
210
25
B tree Internal Node Overflow (3/3)
100
150
k lt 100
9
30
3
5
30
35
42
9
11
120
130
180
200
205
210
26
B tree New Root (1/2)
  • Insert 210, insert 205

120
130
180
200
205
210
27
B tree New Root (2/2)
120
130
180
200
205
210
28
Index Insert Exercise
  • Insert 8, 7, 41

9
30
3
5
30
35
42
9
11
29
B tree Delete
  • Simple delete case
  • Underflow case
  • redistribute records
  • coalesce with siblings
  • update parents

30
B tree Simple Delete (1/2)
  • Delete 110

120
130
180
200
205
210
215
31
B tree Simple Delete (2/2)
  • Leaf Updated

100
101
120
130
180
200
205
210
215
32
B tree Delete Redistribution (1/2)
  • Delete 180

100
101
120
130
180
200
205
210
215
33
B tree Delete Redistribution (2/2)
  • Redistribute entries
  • left or right sibling

179
205
100
101
120
130
150
156
179
200
205
210
34
B tree Delete Coalesce (1/4)
  • Delete 101

179
205
100
101
120
130
150
156
179
200
205
210
215
35
B tree Delete Coalesce (2/4)
  • Leaf updated
  • No redistribution
  • sibling coalesce

179
205
100
120
130
150
156
179
200
205
210
215
36
B tree Delete Coalesce (3/4)
  • Leaf updated
  • No redistribution
  • sibling coalesce

179
205
100
120
130
150
156
179
200
205
210
215
37
B tree Delete Coalesce (4/4)
  • Redistribution

205
100
120
130
150
156
179
200
205
210
215
38
Hashing Techniques
39
Static Hashing (1/2)
  • Store records in buckets with overflow chains
  • Allocate a fixed number of buckets M
  • Problems
  • small M
  • long overflow chains, slow search-delete-insert

40
Static Hashing (2/2)
  • Problems
  • large M
  • wasted space, slow scan

null
null
h
null
41
Dynamic Hashing
  • Splitting and coalescing buckets as the database
    grows-shrinks
  • One scheme Extendible Hashing
  • Hash function generates large values, eg 32 bits
  • use i bits, change i as database size changes
  • If overflow, double the number of buckets
  • use i1 bits of the hash function
  • but, expensive read all pages M and distribute
    records in 2M pages
  • solution use a directory and double the size of
    the directory
  • only split bucket that overflowed

42
Extendible Hashing (1/4)
h(18) 10010
18
43
Extendible Hashing (2/4)
2
16
20
A
2
00
2
01
1
B
10
h(4) 00100
2
11
18
C
2
3
7
D
44
Extendible Hashing (3/4)
3
16
A
2
00
2
01
1
B
10
2
11
18
C
2
3
7
D
3
20
4
A1
45
Extendible Hashing (4/4)
3
  • Global Depth
  • Local Depth
  • If bucket full
  • split bucket
  • increment LD
  • If GDLD
  • increment GD
  • double directory

16
A
3
000
2
001
1
B
010
2
011
18
C
100
101
2
110
3
7
D
3
111
20
4
A1
46
Extendible Hashing Delete
  • If deletion make bucket empty
  • merge with split image
  • If directory pointers point to same bucket as
    split image
  • directory halved

47
Extendible Hashing Summary
  • Avoids overflow pages
  • Directory can get large
  • Key search requires just 2 page reads
  • Space utilization fluctuates
  • 59-90 for uniformly distributed records

48
Extendible Hashing Exercise
  • Initially GD LD 1
  • M 2 buckets
  • Hash function h(k) k mod 2i
  • inserts 14, 18, 22, 3, 9
  • deletes 9, 22, 3

1
12
8
1
00
1
01
5
Write a Comment
User Comments (0)
About PowerShow.com