Data Storage and Indexing - PowerPoint PPT Presentation

1 / 76
About This Presentation
Title:

Data Storage and Indexing

Description:

for a while new old may coexist. Records from different relations may ... After a while the file starts being dominated by overflow blocks: time to reorganize ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 77
Provided by: csU70
Category:

less

Transcript and Presenter's Notes

Title: Data Storage and Indexing


1
Data Storage and Indexing
Lecture 18
2
Outline
  • Last lecture
  • memory hierarchy
  • disks
  • buffer managers
  • This lecture
  • representing data
  • external sorting
  • indexing
  • types of indexes
  • B trees

3
The Memory Hierarchy
Main Memory Disk Cache
  • Processor Cache
  • access time 10 nanos
  • Volatile
  • 256M-1G
  • expensive
  • Access time
  • 10-100 nanoseconds

Disk
Tape
  • Persistent
  • 2-10 GB storage
  • speed
  • Rate5-10 MB/S
  • Access time
  • 10-15 msecs.
  • 1.5 MB/S transfer rate
  • 280 GB typical
  • capacity
  • Only sequential access
  • Not for operational
  • data

4
Main Memory
  • Fastest, most expensive
  • Today 256MB are common even on PCs
  • Many databases could fit in memory
  • New industry trend Main Memory Database
  • E.g TimesTen
  • Main issue is volatility

5
Secondary Storage
  • Disks
  • Slower, cheaper than main memory
  • Persistent !!!
  • The unit of disk I/O block
  • Typically 1 block 4k
  • Used with a main memory buffer

6
The Mechanics of Disk
Cylinder
  • Mechanical characteristics
  • Rotation speed (5400RPM)
  • Number of platers (1-30)
  • Number of tracks (lt10000)
  • Number of bytes/track(105)

Spindle
Disk head
Sector
Platters
7
Important Disk Access Characteristics
  • Disk latency time between when command is
    issued and when data is in memory
  • Disk latency seek time rotational latency
  • Seek time time for the head to reach cylinder
  • 10ms 40ms
  • Rotational latency time for the sector to
    rotate
  • Rotation time 10ms
  • Average latency 10ms/2
  • Transfer time typically 5-10MB/s
  • Disks read/write one block at a time (typically
    4kB)

8
RAIDs
  • Redundant Array of Independent Disks
  • Was inexpensive disks
  • Idea use more disks, increase reliability
  • Recall
  • Database recovery helps after a systems crash,
    not after a disk crash
  • 6 ways to use RAIDs. More important
  • Level 4 use N-1 data disks, plus one parity disk
  • Level 5 same, but alternate which disk is the
    parity
  • Level 6 use Hamming codes instead of parity

9
Buffer Management in a DBMS
Page Requests from Higher Levels
BUFFER POOL
disk page
free frame
MAIN MEMORY
DISK
choice of frame dictated by replacement policy
  • Data must be in RAM for DBMS to operate on it!
  • Table of ltframe, pageidgt pairs is maintained

10
Buffer Manager
  • When a page is first requested, it is read in a
    free frame
  • The DBMS typically requests it to be pinned
  • pin_count how many processes requested it
    pinned
  • When the DBMS writes to it, the buffer manager
    marks it dirty
  • When the DBMS doesnt need it any more,
    un-pinned
  • pin_count is decremented
  • Typical replacement policy (always chooses among
    un-pinned frames)
  • LRU
  • Clock
  • MRU

11
Buffer Manager
Why not use the Operating System for the
task?? - DBMS may be able to anticipate access
patterns - Hence, may also be able to perform
prefetching - DBMS needs the ability to force
pages to disk.
12
Representing Data Elements
  • Relational database elements
  • CREATE TABLE Product (
  • pid INT PRIMARY KEY,
  • name CHAR(20),
  • description VARCHAR(200),
  • maker CHAR(10) REFERENCES Company(name))
  • A tuple is represented as a record

13
Representing Data Elements
  • Representing objects
  • interface Company
  • attribute string name
  • relationship SetltProductgt makes
  • inverse Productmaker
  • An object is represented as a record plus object
    identifier
  • What to do with repeating fields (e.g. makes)

14
Record Formats Fixed Length
F1
F2
F3
F4
L1
L2
L3
L4
Base address (B)
Address BL1L2
  • Information about field types same for all
    records in a file stored in system catalogs.
  • Finding ith field requires scan of record.
  • Note the importance of schema information!

15
Record Header
To schema
length
F1
F2
F3
F4
L1
L2
L3
L4
header
timestamp
  • Need the header because
  • The schema may change
  • for a while newold may coexist
  • Records from different relations may coexist

16
Variable Length Records
Other header information
header
F1
F2
F3
F4
L1
L2
L3
L4
length
Place the fixed fields first F1, F2 Then the
variable length fields F3, F4 Null values take 2
bytes only Sometimes they take 0 bytes (when at
the end)
17
Records With Repeating Fields
Other header information
header
F1
F2
F3
L1
L2
L3
length
E.g. to represent one-many or many-many
relationships
18
Storing Records in Blocks
  • Blocks have fixed size (typically 4k)

BLOCK
R1
R2
R3
R4
19
Spanning Records Across Blocks
  • When records are very large
  • Or even medium size saves space in blocks

block header
block header
R1
R2
R3
R2
20
BLOB
  • Binary large objects
  • Supported by modern database systems
  • E.g. images, sounds, etc.
  • Storage attempt to cluster blocks together

21
Modifications Insertion
  • File is unsorted ( heap file)
  • add it to the end (easy ?)
  • File is sorted
  • Is there space in the right block ?
  • Yes we are lucky, store it there
  • Is there space in a neighboring block ?
  • Look 1-2 blocks to the left/right, shift records
  • If anything else fails, create overflow block

22
Overflow Blocks
Blockn-1
Blockn
Blockn1
Overflow
  • After a while the file starts being dominated by
    overflow blocks time to reorganize

23
Modifications Deletions
  • Free space in block, shift records
  • Maybe be able to eliminate an overflow block
  • Can never really eliminate the record, because
    others may point to it
  • Place a tombstone instead (a NULL record)

24
Modifications Updates
  • If new record is shorter than previous, easy ?
  • If it is longer, need to shift records, create
    overflow blocks

25
We will skip logical and physical
addressespointer swizzlingin this lecture, pls
see book
26
Physical Addresses
  • Each block and each record have a physical
    address that consists of
  • The host
  • The disk
  • The cylinder number
  • The track number
  • The block within the track
  • For records an offset in the block
  • sometimes this is in the blocks header

27
Logical Addresses
  • Logical address a string of bytes (10-16)
  • More flexible can blocks/records around
  • But need translation table

28
Main Memory Address
  • When the block is read in main memory, it
    receives a main memory address
  • Buffer manager has another translation table

29
Optimization Pointer Swizzling
  • the process of replacing a physical/logical
    pointer with a main memory pointer
  • Still need translation table, but subsequent
    references are faster

30
Pointer Swizzling
Block 2
Block 1
Disk
read in memory
swizzled
Memory
unswizzled
31
Pointer Swizzling
  • Automatic when block is read in main memory,
    swizzle all pointers in the block
  • On demand swizzle only when user requests
  • No swizzling always use translation table

32
Pointer Swizzling
  • When blocks return to disk pointers need
    unswizzled
  • Danger someone else may point to this block
  • Pinned blocks we dont allow it to return to
    disk
  • Keep a list of references to this block

33
The I/O Model of Computation
  • In main memory algorithms we care about CPU time
  • In databases time is dominated by I/O cost
  • Assumption cost is given only by I/O
  • Consequence need to redesign certain algorithms
  • Will illustrate here with sorting

34
Sorting
  • Illustrates the difference in algorithm design
    when your data is not in main memory
  • Problem sort 1Gb of data with 1Mb of RAM.
  • Arises in many places in database systems
  • Data requested in sorted order (ORDER BY)
  • Needed for grouping operations
  • First step in sort-merge join algorithm
  • Duplicate removal
  • Bulk loading of B-tree indexes.

35
2-Way Merge-sortRequires 3 Buffers
  • Pass 1 Read a page, sort it, write it.
  • only one buffer page is used
  • Pass 2, 3, , etc.
  • three buffer pages used.

INPUT 1
OUTPUT
INPUT 2
Main memory buffers
Disk
Disk
36
Two-Way External Merge Sort
  • Each pass we read write each page in file.
  • N pages in the file gt the number of passes
  • So total cost is
  • Improvement start with larger runs
  • Sort 1GB with 1MB memory in 10 passes

Input file
6,2
2
3,4
9,4
8,7
5,6
3,1
PASS 0
1-page runs
1,3
2
3,4
5,6
2,6
4,9
7,8
PASS 1
4,7
1,3
2,3
2-page runs
8,9
5,6
2
4,6
PASS 2
2,3
4,4
1,2
4-page runs
6,7
3,5
6
8,9
PASS 3
1,2
2,3
3,4
8-page runs
4,5
6,6
7,8
9
37
Two-Way External Merge Sort
2
3
6
8
1
4
5
3
4
9
6
2
7
3
4
6
2
4
9
8
7
5
1
3
6
2
5
Input file
PASS 0
2
6
5
3
1
7
8
4
9
6
2
4
3
4
1-page runs
PASS 1
2-page runs
PASS 2
38
2
3
4
7
1
3
7
4
2
3
3
1
2-page runs
4
6
6
8
9
5
2
9
8
6
4
6
5
2
PASS 2
2
3
4
4
1
2
4-page runs
7
6
5
3
6
8
9
PASS 3
8-page runs
39
Can We Do Better ?
  • We have more main memory
  • Should use it to improve performance

40
Cost Model for Our Analysis
  • B Block size
  • M Size of main memory
  • N Number of records in the file
  • R Size of one record

41
External Merge-Sort
  • Phase one load M bytes in memory, sort
  • Result runs of length M/R records

M/R records
. . .
. . .
Disk
Disk
M bytes of main memory
42
Phase Two
  • Merge M/B 1 runs into a new run
  • Result runs have now M/R (M/B 1) records

Input 1
. . .
. . .
Input 2
Output
. . . .
Input M/B
Disk
Disk
M bytes of main memory
43
Phase Three
  • Merge M/B 1 runs into a new run
  • Result runs have now M/R (M/B 1)2 records

Input 1
. . .
. . .
Input 2
Output
. . . .
Input M/B
Disk
Disk
M bytes of main memory
44
Cost of External Merge Sort
  • Number of passes
  • Think differently
  • Given B 4KB, M 64MB, R 0.1KB
  • Pass 1 runs of length M/R 640000
  • Have now sorted runs of 640000 records
  • Pass 2 runs increase by a factor of M/B 1
    16000
  • Have now sorted runs of 10,240,000,000 1010
    records
  • Pass 3 runs increase by a factor of M/B 1
    16000
  • Have now sorted runs of 1014 records
  • Nobody has so much data !
  • Can sort everything in 2 or 3 passes !

45
Indexes
46
Outline
  • Types of indexes
  • B trees
  • Hash tables (maybe)

47
Indexes
  • An index on a file speeds up selections on the
    search key field(s)
  • Search key any subset of the fields of a
    relation
  • Search key is not the same as key (minimal set of
    fields that uniquely identify a record in a
    relation).
  • Entries in an index (k, r), where
  • k the key
  • r the record OR record id OR record ids

48
Index Classification
  • Clustered/unclustered
  • Clustered records sorted in the key order
  • Unclustered no
  • Dense/sparse
  • Dense each record has an entry in the index
  • Sparse only some records have
  • Primary/secondary
  • Primary on the primary key
  • Secondary on any key
  • Some books interpret these differently
  • B tree / Hash table /

49
Clustered Index
  • NOT the same as clustered relation in book
  • File is sorted on the index attribute
  • Dense index sequence of (key,pointer) pairs
  • See Sections 13.1.2 13.1.3 in book

50
Clustered Index
  • Sparse index one key per data block

51
Clustered Index with Duplicate Keys
  • Dense index point to the first record with that
    key

52
Clustered Index with Duplicate Keys
  • Sparse index pointer to lowest search key in
    each block
  • Search for 20

53
Clustered Index with Duplicate Keys
  • Better pointer to lowest new search key in each
    block
  • Search for 20

54
Unclustered Indexes
  • To index other attributes than primary key
  • Always dense (why ?)

55
Summary Clustered vs. Unclustered Index
Data entries
Data entries
(Index File)
(Data file)
Data Records
Data Records
CLUSTERED

UNCLUSTERED
56
Composite Search Keys
  • Composite Search Keys Search on a combination of
    fields.
  • Equality query Every field value is equal to a
    constant value. E.g. wrt ltsal,agegt index
  • age20 and sal 75
  • Range query Some field value is not a constant.
    E.g.
  • age 20 or age20 and sal gt 10

Examples of composite key indexes using
lexicographic order.
11,80
11
12
12,10
name
age
sal
12,20
12
bob
10
12
13,75
13
cal
80
11
ltage, salgt
ltagegt
joe
12
20
sue
13
75
10,12
10
20
20,12
Data records sorted by name
75,13
75
80,11
80
ltsal, agegt
ltsalgt
Data entries in index sorted by ltsal,agegt
Data entries sorted by ltsalgt
57
B Trees
  • Search trees
  • Idea in B Trees
  • make 1 node 1 block
  • Idea in B Trees
  • Make leaves into a linked list (range queries are
    easier)

58
B Trees Basics
  • Parameter d the degree
  • Each node has gt d and lt 2d keys (except root)
  • Each leaf has gtd and lt 2d keys

Keys k lt 30
Keys 120ltklt240
Keys 240ltk
Keys 30ltklt120
Next leaf
40
50
60
59
B Tree Example
d 2
10
15
18
20
30
40
50
60
65
80
85
90
60
B Tree Design
  • How large d ?
  • Example
  • Key size 4 bytes
  • Pointer size 8 bytes
  • Block size 4096 byes
  • 2d x 4 (2d1) x 8 lt 4096
  • d 170

61
Searching a B Tree
  • Exact key values
  • Start at the root
  • Proceed down, to the leaf
  • Range queries
  • As above
  • Then sequential traversal

Select name From people Where age 25
Select name From people Where 20 lt age and
age lt 30
62
B Trees in Practice
  • Typical order 100. Typical fill-factor 67.
  • average fanout 133
  • Typical capacities
  • Height 4 1334 312,900,700 records
  • Height 3 1333 2,352,637 records
  • Can often hold top levels in buffer pool
  • Level 1 1 page 8 Kbytes
  • Level 2 133 pages 1 Mbyte
  • Level 3 17,689 pages 133 MBytes

63
Insertion in a B Tree
  • Insert (K, P)
  • Find leaf where K belongs, insert
  • If no overflow (2d keys or less), halt
  • If overflow (2d1 keys), split node, insert in
    parent
  • If leaf, keep K3 too in right node
  • When root splits, new root has 1 key only

(K3, ) to parent
64
Insertion in a B Tree
Insert K19
10
15
18
20
30
40
50
60
65
80
85
90
65
Insertion in a B Tree
After insertion
10
15
18
20
30
40
50
60
65
80
85
90
19
66
Insertion in a B Tree
Now insert 25
10
15
18
20
30
40
50
60
65
80
85
90
19
67
Insertion in a B Tree
After insertion
10
15
18
20
25
30
40
60
65
80
85
90
19
50
68
Insertion in a B Tree
But now have to split !
10
15
18
20
25
30
40
60
65
80
85
90
19
50
69
Insertion in a B Tree
After the split
10
15
18
20
25
30
40
60
65
80
85
90
19
50
70
Deletion from a B Tree
Delete 30
10
15
18
20
25
30
40
60
65
80
85
90
19
50
71
Deletion from a B Tree
After deleting 30
May change to 40, or not
10
15
18
20
25
40
60
65
80
85
90
19
50
72
Deletion from a B Tree
Now delete 25
10
15
18
20
25
40
60
65
80
85
90
19
50
73
Deletion from a B Tree
After deleting 25 Need to rebalance Rotate
10
15
18
20
40
60
65
80
85
90
19
50
74
Deletion from a B Tree
Now delete 40
10
15
18
20
40
60
65
80
85
90
19
50
75
Deletion from a B Tree
After deleting 40 Rotation not possible Need to
merge nodes
10
15
18
20
60
65
80
85
90
19
50
76
Deletion from a B Tree
Final tree
10
15
18
20
60
65
80
85
90
19
50
Write a Comment
User Comments (0)
About PowerShow.com