Data Storage and Indexing - PowerPoint PPT Presentation

1 / 76

About This Presentation

Title:

Data Storage and Indexing

Description:

for a while new old may coexist. Records from different relations may ... After a while the file starts being dominated by overflow blocks: time to reorganize ... – PowerPoint PPT presentation

Number of Views:23

Avg rating:3.0/5.0

Slides: 77

Provided by: csU70

Category:

more less

Transcript and Presenter's Notes

Title: Data Storage and Indexing

1
Data Storage and Indexing
Lecture 18
2
Outline

Last lecture
memory hierarchy
disks
buffer managers
This lecture
representing data
external sorting
indexing
types of indexes
B trees

3
The Memory Hierarchy
Main Memory Disk Cache

Processor Cache
access time 10 nanos

Volatile
256M-1G
expensive
Access time
10-100 nanoseconds

Disk
Tape

Persistent
2-10 GB storage
speed
Rate5-10 MB/S
Access time
10-15 msecs.

1.5 MB/S transfer rate
280 GB typical
capacity
Only sequential access
Not for operational
data

4
Main Memory

Fastest, most expensive
Today 256MB are common even on PCs
Many databases could fit in memory
New industry trend Main Memory Database
E.g TimesTen
Main issue is volatility

5
Secondary Storage

Disks
Slower, cheaper than main memory
Persistent !!!
The unit of disk I/O block
Typically 1 block 4k
Used with a main memory buffer

6
The Mechanics of Disk
Cylinder

Mechanical characteristics
Rotation speed (5400RPM)
Number of platers (1-30)
Number of tracks (lt10000)
Number of bytes/track(105)

Spindle
Disk head
Sector
Platters
7
Important Disk Access Characteristics

Disk latency time between when command is
issued and when data is in memory
Disk latency seek time rotational latency
Seek time time for the head to reach cylinder
10ms 40ms
Rotational latency time for the sector to
rotate
Rotation time 10ms
Average latency 10ms/2
Transfer time typically 5-10MB/s
Disks read/write one block at a time (typically
4kB)

8
RAIDs

Redundant Array of Independent Disks
Was inexpensive disks
Idea use more disks, increase reliability
Recall
Database recovery helps after a systems crash,
not after a disk crash
6 ways to use RAIDs. More important
Level 4 use N-1 data disks, plus one parity disk
Level 5 same, but alternate which disk is the
parity
Level 6 use Hamming codes instead of parity

9
Buffer Management in a DBMS
Page Requests from Higher Levels
BUFFER POOL
disk page
free frame
MAIN MEMORY
DISK
choice of frame dictated by replacement policy

Data must be in RAM for DBMS to operate on it!
Table of ltframe, pageidgt pairs is maintained

10
Buffer Manager

When a page is first requested, it is read in a
free frame
The DBMS typically requests it to be pinned
pin_count how many processes requested it
pinned
When the DBMS writes to it, the buffer manager
marks it dirty
When the DBMS doesnt need it any more,
un-pinned
pin_count is decremented
Typical replacement policy (always chooses among
un-pinned frames)
LRU
Clock
MRU

11
Buffer Manager
Why not use the Operating System for the
task?? - DBMS may be able to anticipate access
patterns - Hence, may also be able to perform
prefetching - DBMS needs the ability to force
pages to disk.
12
Representing Data Elements

Relational database elements
CREATE TABLE Product (
pid INT PRIMARY KEY,
name CHAR(20),
description VARCHAR(200),
maker CHAR(10) REFERENCES Company(name))
A tuple is represented as a record

13
Representing Data Elements

Representing objects
interface Company
attribute string name
relationship SetltProductgt makes
inverse Productmaker
An object is represented as a record plus object
identifier
What to do with repeating fields (e.g. makes)

14
Record Formats Fixed Length
F1
F2
F3
F4
L1
L2
L3
L4
Base address (B)
Address BL1L2

Information about field types same for all
records in a file stored in system catalogs.
Finding ith field requires scan of record.
Note the importance of schema information!

15
Record Header
To schema
length
F1
F2
F3
F4
L1
L2
L3
L4
header
timestamp

Need the header because
The schema may change
for a while newold may coexist
Records from different relations may coexist

16
Variable Length Records
Other header information
header
F1
F2
F3
F4
L1
L2
L3
L4
length
Place the fixed fields first F1, F2 Then the
variable length fields F3, F4 Null values take 2
bytes only Sometimes they take 0 bytes (when at
the end)
17
Records With Repeating Fields
Other header information
header
F1
F2
F3
L1
L2
L3
length
E.g. to represent one-many or many-many
relationships
18
Storing Records in Blocks

Blocks have fixed size (typically 4k)

BLOCK
R1
R2
R3
R4
19
Spanning Records Across Blocks

When records are very large
Or even medium size saves space in blocks

block header
block header
R1
R2
R3
R2
20
BLOB

Binary large objects
Supported by modern database systems
E.g. images, sounds, etc.
Storage attempt to cluster blocks together

21
Modifications Insertion

File is unsorted ( heap file)
add it to the end (easy ?)
File is sorted
Is there space in the right block ?
Yes we are lucky, store it there
Is there space in a neighboring block ?
Look 1-2 blocks to the left/right, shift records
If anything else fails, create overflow block

22
Overflow Blocks
Blockn-1
Blockn
Blockn1
Overflow

After a while the file starts being dominated by
overflow blocks time to reorganize

23
Modifications Deletions

Free space in block, shift records
Maybe be able to eliminate an overflow block
Can never really eliminate the record, because
others may point to it
Place a tombstone instead (a NULL record)

24
Modifications Updates

If new record is shorter than previous, easy ?
If it is longer, need to shift records, create
overflow blocks

25
We will skip logical and physical
addressespointer swizzlingin this lecture, pls
see book
26
Physical Addresses

Each block and each record have a physical
address that consists of
The host
The disk
The cylinder number
The track number
The block within the track
For records an offset in the block
sometimes this is in the blocks header

27
Logical Addresses

Logical address a string of bytes (10-16)
More flexible can blocks/records around
But need translation table

28
Main Memory Address

When the block is read in main memory, it
receives a main memory address
Buffer manager has another translation table

29
Optimization Pointer Swizzling

the process of replacing a physical/logical
pointer with a main memory pointer
Still need translation table, but subsequent
references are faster

30
Pointer Swizzling
Block 2
Block 1
Disk
read in memory
swizzled
Memory
unswizzled
31
Pointer Swizzling

Automatic when block is read in main memory,
swizzle all pointers in the block
On demand swizzle only when user requests
No swizzling always use translation table

32
Pointer Swizzling

When blocks return to disk pointers need
unswizzled
Danger someone else may point to this block
Pinned blocks we dont allow it to return to
disk
Keep a list of references to this block

33
The I/O Model of Computation

In main memory algorithms we care about CPU time
In databases time is dominated by I/O cost
Assumption cost is given only by I/O
Consequence need to redesign certain algorithms
Will illustrate here with sorting

34
Sorting

Illustrates the difference in algorithm design
when your data is not in main memory
Problem sort 1Gb of data with 1Mb of RAM.
Arises in many places in database systems
Data requested in sorted order (ORDER BY)
Needed for grouping operations
First step in sort-merge join algorithm
Duplicate removal
Bulk loading of B-tree indexes.

35
2-Way Merge-sortRequires 3 Buffers

Pass 1 Read a page, sort it, write it.
only one buffer page is used
Pass 2, 3, , etc.
three buffer pages used.

INPUT 1
OUTPUT
INPUT 2
Main memory buffers
Disk
Disk
36
Two-Way External Merge Sort

Each pass we read write each page in file.
N pages in the file gt the number of passes
So total cost is
Improvement start with larger runs
Sort 1GB with 1MB memory in 10 passes

Input file
6,2
2
3,4
9,4
8,7
5,6
3,1
PASS 0
1-page runs
1,3
2
3,4
5,6
2,6
4,9
7,8
PASS 1
4,7
1,3
2,3
2-page runs
8,9
5,6
2
4,6
PASS 2
2,3
4,4
1,2
4-page runs
6,7
3,5
6
8,9
PASS 3
1,2
2,3
3,4
8-page runs
4,5
6,6
7,8
9
37
Two-Way External Merge Sort
2
3
6
8
1
4
5
3
4
9
6
2
7
3
4
6
2
4
9
8
7
5
1
3
6
2
5
Input file
PASS 0
2
6
5
3
1
7
8
4
9
6
2
4
3
4
1-page runs
PASS 1
2-page runs
PASS 2
38
2
3
4
7
1
3
7
4
2
3
3
1
2-page runs
4
6
6
8
9
5
2
9
8
6
4
6
5
2
PASS 2
2
3
4
4
1
2
4-page runs
7
6
5
3
6
8
9
PASS 3
8-page runs
39
Can We Do Better ?

We have more main memory
Should use it to improve performance

40
Cost Model for Our Analysis

B Block size
M Size of main memory
N Number of records in the file
R Size of one record

41
External Merge-Sort

Phase one load M bytes in memory, sort
Result runs of length M/R records

M/R records
. . .
. . .
Disk
Disk
M bytes of main memory
42
Phase Two

Merge M/B 1 runs into a new run
Result runs have now M/R (M/B 1) records

Input 1
. . .
. . .
Input 2
Output
. . . .
Input M/B
Disk
Disk
M bytes of main memory
43
Phase Three

Merge M/B 1 runs into a new run
Result runs have now M/R (M/B 1)2 records

Input 1
. . .
. . .
Input 2
Output
. . . .
Input M/B
Disk
Disk
M bytes of main memory
44
Cost of External Merge Sort

Number of passes
Think differently
Given B 4KB, M 64MB, R 0.1KB
Pass 1 runs of length M/R 640000
Have now sorted runs of 640000 records
Pass 2 runs increase by a factor of M/B 1
16000
Have now sorted runs of 10,240,000,000 1010
records
Pass 3 runs increase by a factor of M/B 1
16000
Have now sorted runs of 1014 records
Nobody has so much data !
Can sort everything in 2 or 3 passes !

45
Indexes
46
Outline

Types of indexes
B trees
Hash tables (maybe)

47
Indexes

An index on a file speeds up selections on the
search key field(s)
Search key any subset of the fields of a
relation
Search key is not the same as key (minimal set of
fields that uniquely identify a record in a
relation).
Entries in an index (k, r), where
k the key
r the record OR record id OR record ids

48
Index Classification

Clustered/unclustered
Clustered records sorted in the key order
Unclustered no
Dense/sparse
Dense each record has an entry in the index
Sparse only some records have
Primary/secondary
Primary on the primary key
Secondary on any key
Some books interpret these differently
B tree / Hash table /

49
Clustered Index

NOT the same as clustered relation in book
File is sorted on the index attribute
Dense index sequence of (key,pointer) pairs
See Sections 13.1.2 13.1.3 in book

50
Clustered Index

Sparse index one key per data block

51
Clustered Index with Duplicate Keys

Dense index point to the first record with that
key

52
Clustered Index with Duplicate Keys

Sparse index pointer to lowest search key in
each block
Search for 20

53
Clustered Index with Duplicate Keys

Better pointer to lowest new search key in each
block
Search for 20

54
Unclustered Indexes

To index other attributes than primary key
Always dense (why ?)

55
Summary Clustered vs. Unclustered Index
Data entries
Data entries
(Index File)
(Data file)
Data Records
Data Records
CLUSTERED

UNCLUSTERED
56
Composite Search Keys

Composite Search Keys Search on a combination of
fields.
Equality query Every field value is equal to a
constant value. E.g. wrt ltsal,agegt index
age20 and sal 75
Range query Some field value is not a constant.
E.g.
age 20 or age20 and sal gt 10

Examples of composite key indexes using
lexicographic order.
11,80
11
12
12,10
name
age
sal
12,20
12
bob
10
12
13,75
13
cal
80
11
ltage, salgt
ltagegt
joe
12
20
sue
13
75
10,12
10
20
20,12
Data records sorted by name
75,13
75
80,11
80
ltsal, agegt
ltsalgt
Data entries in index sorted by ltsal,agegt
Data entries sorted by ltsalgt
57
B Trees

Search trees
Idea in B Trees
make 1 node 1 block
Idea in B Trees
Make leaves into a linked list (range queries are
easier)

58
B Trees Basics

Parameter d the degree
Each node has gt d and lt 2d keys (except root)
Each leaf has gtd and lt 2d keys

Keys k lt 30
Keys 120ltklt240
Keys 240ltk
Keys 30ltklt120
Next leaf
40
50
60
59
B Tree Example
d 2
10
15
18
20
30
40
50
60
65
80
85
90
60
B Tree Design

How large d ?
Example
Key size 4 bytes
Pointer size 8 bytes
Block size 4096 byes
2d x 4 (2d1) x 8 lt 4096
d 170

61
Searching a B Tree

Exact key values
Start at the root
Proceed down, to the leaf
Range queries
As above
Then sequential traversal

Select name From people Where age 25
Select name From people Where 20 lt age and
age lt 30
62
B Trees in Practice

Typical order 100. Typical fill-factor 67.
average fanout 133
Typical capacities
Height 4 1334 312,900,700 records
Height 3 1333 2,352,637 records
Can often hold top levels in buffer pool
Level 1 1 page 8 Kbytes
Level 2 133 pages 1 Mbyte
Level 3 17,689 pages 133 MBytes

63
Insertion in a B Tree

Insert (K, P)
Find leaf where K belongs, insert
If no overflow (2d keys or less), halt
If overflow (2d1 keys), split node, insert in
parent
If leaf, keep K3 too in right node
When root splits, new root has 1 key only

(K3, ) to parent
64
Insertion in a B Tree
Insert K19
10
15
18
20
30
40
50
60
65
80
85
90
65
Insertion in a B Tree
After insertion
10
15
18
20
30
40
50
60
65
80
85
90
19
66
Insertion in a B Tree
Now insert 25
10
15
18
20
30
40
50
60
65
80
85
90
19
67
Insertion in a B Tree
After insertion
10
15
18
20
25
30
40
60
65
80
85
90
19
50
68
Insertion in a B Tree
But now have to split !
10
15
18
20
25
30
40
60
65
80
85
90
19
50
69
Insertion in a B Tree
After the split
10
15
18
20
25
30
40
60
65
80
85
90
19
50
70
Deletion from a B Tree
Delete 30
10
15
18
20
25
30
40
60
65
80
85
90
19
50
71
Deletion from a B Tree
After deleting 30
May change to 40, or not
10
15
18
20
25
40
60
65
80
85
90
19
50
72
Deletion from a B Tree
Now delete 25
10
15
18
20
25
40
60
65
80
85
90
19
50
73
Deletion from a B Tree
After deleting 25 Need to rebalance Rotate
10
15
18
20
40
60
65
80
85
90
19
50
74
Deletion from a B Tree
Now delete 40
10
15
18
20
40
60
65
80
85
90
19
50
75
Deletion from a B Tree
After deleting 40 Rotation not possible Need to
merge nodes
10
15
18
20
60
65
80
85
90
19
50
76
Deletion from a B Tree
Final tree
10
15
18
20
60
65
80
85
90
19
50

Write a Comment

User Comments (0)