PPT – 13.1 PowerPoint presentation | free to view

About This Presentation

Title:

13.1

Description:

SECTIONS 13.1 13.3 – PowerPoint PPT presentation

Number of Views:74

Avg rating:3.0/5.0

Slides: 545

Provided by: ricky

Category:

more less

Transcript and Presenter's Notes

Title: 13.1

1
13.1 13.2
SECONDARY STORAGE MANAGEMENT
202 Sanuja Dabade
2
13.1.1 Memory Hierarchy

devices for data storage have different data
capacities available
Cost per byte to store data also varies
Device with smallest capacity offer the fastest
speed with highest cost per bit

3
Memory Hierarchy Diagram

Programs,
DBMS
Main Memory DBMSs

Tertiary Storage
As Visual Memory Disk
File System
Main Memory
Cache
4
13.1.1 Memory Hierarchy

Cache
Lowest level of Memory Hierarchy, but it is
fastest
Data items are copies of certain locations of
main memory
Sometimes, values in cache are changed and
corresponding changes to main memory are delayed
Machine looks for instructions as well as data
for those instructions in the cache
Holds limited amount of data

5
13.1.1 Memory Hierarchy (cont)

In a single processor computer the data in main
memory doesnt have to be updated immediately
In multiple processors computer the data is
updated immediately to main memory
.called as write through

6
Main Memory

Everything happens in the computer i.e.
instruction execution, data manipulation, as
working on information that is resident in main
memory
Main memories are random access.one can obtain
any byte in the same amount of time

7
Secondary storage

Used to store data and programs when they are not
being processed
More permanent than main memory, as data and
programs are retained when the power is turned
off
E.g. magnetic disks, hard disks

8
13.2
SECONDARY STORAGE MANAGEMENT
201 Eilbroun Benjamin
9
Tertiary Storage

Holds data volumes in terabytes
Used for databases much larger than what can be
stored on disk

10
13.1.2 Transfer of Data Between levels

Data moves between adjacent levels of the
hierarchy
At the secondary or tertiary levels accessing the
desired data or finding the desired place to
store the data takes a lot of time
Disk is devided into blocks
Entire blocks are moved to and from memory called
a buffer

11
13.1.2 Transfer of Data Between level (contd)

A key technique for speeding up database
operations is to arrange the data so that when
one piece of data block is needed it is likely
that other data on the same block will be needed
at the same time
Same idea applies to other hierarchy levels

12
13.1.3 Volatile and Non Volatile Storage

A volatile device forgets what data is stored on
it after power off
Non volatile holds data for longer period even
when device is turned off
All the secondary and tertiary devices are non
volatile and main memory is volatile

13
13.1.4 Virtual Memory

Typical software executes in virtual memory
Address space is typically 32 bit or 232 bytes or
4GB
Transfer between memory and disk is in terms of
blocks

14
13.2.1 Mechanism of Disk

Mechanisms of Disks
Use of secondary storage is one of the important
characteristic of DBMS
Consists of 2 moving pieces of a disk
1. disk assembly
2. head assembly
Disk assembly consists of one or more platters
Platters rotate around a central spindle
Bits are stored on upper and lower surfaces of
platters

15
13.2.1 Mechanism of Disk

Disk is devided into tracks
Track is devided into sectors
Tracks are the segments of circle separated by
gap
The tracks that are at fixed radius from center
form one cylinder

16
(No Transcript)
17
13.2.2 Disk Controller

One or more disks are controlled by disk
controllers
Disks controllers are capable of
Controlling the mechanical actuator that moves
the head assembly
Selecting the sector from among all those in the
cylinder at which heads are positioned
Transferring bits between desired sector and main
memory
Possible buffering an entire track

18
13.2.3 Disk Access Characteristics

Accessing (reading/writing) a block requires 3
steps
Disk controller positions the head assembly at
the cylinder containing the track on which the
block is located. It is a seek time
The disk controller waits while the first sector
of the block moves under the head. This is a
rotational latency
All the sectors and the gaps between them pass
the head, while disk controller reads or writes
data in these sectors. This is a transfer time.

19
13.3 Accelerating Access to Secondary Storage

Several approaches for more-efficiently accessing
data in secondary storage
Place blocks that are together in the same
cylinder.
Divide the data among multiple disks.
Mirror disks.
Use disk-scheduling algorithms.
Prefetch blocks into main memory.
Scheduling Latency added delay in accessing
data caused by a disk scheduling algorithm.
Throughput the number of disk accesses per
second that the system can accommodate.

20
13.3.1 The I/O Model of Computation

The number of block accesses (Disk I/Os) is a
important time approximation for the algorithm.
This should be minimized.
Ex 13.3 You want to have an index on R to
identify the block on which the desired tuple
appears, but not where on the block it resides.
For Megatron 747 (M747) example, it takes 11ms to
read a 16k block.
A standard microprocessor can execute millions of
instruction in 11ms, making any delay in
searching for the desired tuple negligible.

21
13.3.2 Organizing Data by Cylinders

If we read all blocks on a single track or
cylinder consecutively, then we can neglect all
but first seek time and first rotational latency.
Ex 13.4 We request 1024 blocks of M747.
If data is randomly distributed, average latency
is 10.76ms by Ex 13.2, making total latency 11s.
If all blocks are consecutively stored on 1
cylinder
6.46ms 8.33ms 16 139ms
(1 average seek) (time per rotation) ( rotations)

22
13.3.3 Using Multiple Disks

If we have n disks, read/write performance will
increase by a factor of n.
Striping distributing a relation across
multiple disks following this pattern
Data on disk R1 R1, R1n, R12n,
Data on disk R2 R2, R2n, R22n,
Data on disk Rn Rn, Rnn, Rn2n,
Ex 13.5 We request 1024 blocks with n 4.
6.46ms (8.33ms (16/4)) 39.8ms
(1 average seek) (time per rotation) ( rotations)

23
13.3.4 Mirroring Disks

Mirroring Disks having 2 or more disks to hold
identical copied of data.
Benefit 1 If n disks are mirrors of each other,
the system can survive a crash by n-1 disks.
Benefit 2 If we have n disks, read performance
increases by a factor of n.
Performance increases further by having the
controller select the disk which has its head
closest to desired data block for each read.

24
13.3.5 Disk Scheduling and the Elevator Problem

Disk controller can run this algorithm to select
which of several requests to process first.
Pseudo code
requests // array of all non-processed data
requests
upon receiving new data request
requests.add(new request)
while(requests is not empty)
move head to next location
if(head location is at data in requests)
retrieve data
remove data from requests
if(head reaches end)
reverse head direction

25
13.3.5 Disk Scheduling and the Elevator Problem
(cont)
Events Head starting point Request data at
8000 Request data at 24000 Request data at
56000 Get data at 8000 Request data at 16000 Get
data at 24000 Request data at 64000 Get data at
56000 Request Data at 40000 Get data at 64000 Get
data at 40000 Get data at 16000
64000
56000
48000
Current time

Current time
0
Current time
4.3
Current time
10
Current time
13.6
Current time
20
Current time
26.9
Current time
30
Current time
34.2
Current time
45.5
Current time
56.8
40000
32000
24000
16000
8000
data time

data time
8000.. 4.3

data time
8000.. 4.3
24000.. 13.6

data time
8000.. 4.3
24000.. 13.6
56000.. 26.9

data time
8000.. 4.3
24000.. 13.6
56000.. 26.9
64000.. 34.2

data time
8000.. 4.3
24000.. 13.6
56000.. 26.9
64000.. 34.2
40000.. 45.5

data time
8000.. 4.3
24000.. 13.6
56000.. 26.9
64000.. 34.2
40000.. 45.5
16000.. 56.8
26
13.3.5 Disk Scheduling and the Elevator Problem
(cont)
Elevator Algorithm
FIFO Algorithm
data time
8000.. 4.3
24000.. 13.6
56000.. 26.9
64000.. 34.2
40000.. 45.5
16000.. 56.8
data time
8000.. 4.3
24000.. 13.6
56000.. 26.9
16000.. 42.2
64000.. 59.5
40000.. 70.8
27
13.3.6 Prefetching and Large-Scale Buffering

If at the application level, we can predict the
order blocks will be requested, we can load them
into main memory before they are needed.

28
13.4
DISK FAILURE WAYS AND THEIR MITIGATION
203 Priya Gangaraju
29
Ways in which disks can fail-

Intermittent failure.
Media Decay.
Write failure.
Disk Crash.

30
Intermittent Failures

Read or write operation on a sector is successful
but after repeated tries.
The most common form of failure.
Parity checks can be used to detect this kind of
failure.

31
Media Decay

Serious form of failure.
Bit/Bits are permanently corrupted.
Impossible to read a sector correctly even after
many trials.
Stable storage technique for organizing a disk
is used to avoid this failure.

32
Write failure

Attempt to write a sector is not possible.
Attempt to retrieve previously written sector is
unsuccessful.
Possible reason power outage while writing of
the sector.
Stable Storage Technique can be used to avoid
this.

33
Disk Crash

Most serious form of disk failure.
Entire disk becomes unreadable, suddenly and
permanently.
RAID techniques can be used for coping with disk
crashes.

34
More on Intermittent failures

When we try to read a sector, but the correct
content of that sector is not delivered to the
disk controller.
If the controller has a way to tell that the
sector is good or bad (checksums), it can then
re-issue the read request when the bad data is
read.

35
More on Intermittent Failures..

The controller can attempt to write a sector,
but the contents of the sector are not what was
intended.
The only way to check this kind of failure is to
read the sector again.
One way to perform the check is to read the
sector and compare it with the sector we intend
to write.

36
Contd..

Instead of performing the complete comparison at
the disk controller, simpler way is to read the
sector and see if a good sector was read.
If it is good sector, then the write was correct
otherwise the write was unsuccessful and must be
repeated.

37
Checksums.

Technique used to determine the good/bad status
of a sector.
Each sector has some additional bits called the
checksum that are set depending on the values of
the data bits in that sector.
If checksum is not proper on reading, then there
is an error in reading.

38
Checksums(contd..)

There is a small chance that the block was not
read correctly even if the checksum is proper.
The probability of correctness can be increased
by using many checksum bits.

39
Checksum calculation.

Checksum is based on the parity of all bits in
the sector.
If there are odd number of 1s among a collection
of bits, the bits are said to have odd parity. A
parity bit 1 is added.
If there are even number of 1s then the
collection of bits is said to have even parity. A
parity bit 0 is added.

40
Checksum calculation(contd..)

The number of 1s among a collection of bits and
their parity bit is always even.
During a write operation, the disk controller
calculates the parity bit and append it to the
sequence of bits written in the sector.
Every sector will have a even parity.

41
Examples

A sequence of bits 01101000 has odd number of
1s. The parity bit will be 1. So the sequence
with the parity bit will now be 011010001.
A sequence of bits 11101110 will have an even
parity as it has even number of 1s. So with the
parity bit 0, the sequence will be 111011100.

42
Checksum calculation(contd..)

Any one-bit error in reading or writing the bits
results in a sequence of bits that has
odd-parity.
The disk controller can count the number of 1s
and can determine if the sector has odd parity in
the presence of an error.

43
Odds.

There are chances that more than one bit can be
corrupted and the error can be unnoticed.
Increasing the number of parity bits can increase
the chances of detecting errors.
In general, if there are n independent bits as
checksum, the chances of error will be one in 2n.

44
Stable Storage.

Checksums can detect the error but cannot correct
it.
Sometimes we overwrite the previous contents of a
sector and yet cannot read the new contents
correctly.
To deal with these problems, Stable Storage
policy can be implemented on the disks.

45
Stable-Storage(contd..)

Sectors are paired and each pair represents one
sector-contents X.
The left copy of the sector may be represented as
XL and XR as the right copy.

46
Assumptions.

We assume that copies are written with sufficient
number of parity bits to reduce the chance of
bad sector looks good when the parity checks are
considered.
Also, If the read function returns a good value w
for either XL or XR then it is assumed that w is
the true value of X.

47
Stable -Storage Writing Policy

Write the value of X into XL. Check the value has
status good i.e., the parity-check bits are
correct in the written copy. If not repeat write.
If after a set number of write attempts, we have
not successfully written X in XL, assume that
there is a media failure in this sector. A
fix-up such as substituting a spare sector for XL
must be adopted.
Repeat (1) for XR.

48
Stable-Storage Reading Policy

The policy is to alternate trying to read XL and
XR until a good value is returned.
If a good value is not returned after pre chosen
number of tries, then it is assumed that X is
truly unreadable.

49
Error-Handling capabilities

Media failures
If after storing X in sectors XL and XR, one of
them undergoes media failure and becomes
permanently unreadable, we can read from the
second one.
If both the sectors have failed to read, then
sector X cannot be read.
The probability of both failing is extremely
small.

50
Error-Handling Capabilities(contd..)

Write Failure
When writing X, if there is a system failure(like
power shortage), the X in the main memory is
lost and the copy of X being written will be
erroneous.
Half of the sector may be written with part of
new value of X, while the other half remains as
it was.

51
Error-Handling Capabilities(contd..)

The possible cases when the system becomes
available
The failure occurred when writing to XL. Then XL
is considered bad. Since XR was never changed,
its status is good. We can make a copy of XR into
XL, which is the old value of X.
The failure occurred after XL is written. Then XL
will have the good status and XR which has the
old value of XR has bad status. We can copy the
new value of X to XR from XL.

52
Recovery from Disk Crashes.

To reduce the data loss by Dish crashes, schemes
which involve redundancy, extending the idea of
parity checks or duplicate sectors can be
applied.
The term used for these strategies is RAID i.e.
Redundant Arrays of Independent Disks.
In general, if the mean time to failure of disks
is n years, then in any given year, 1/nth of the
surviving disks fail.

53
Recovery from Disk Crashes(contd..)

Each of the RAID schemes has data disks and
redundant disks.
Data disks are one or more disks that hold the
data.
Redundant disks are one or more disks that hold
information that is completely determined by the
contents of the data disks.
When there is a disk crash of any disk, then the
other disks can be used to restore the failed
disk to avoid a permanent information loss.

54
13.4
DISK FAILURES
204 Xiaqing He
55
1) Mirroring

The simplest scheme to recovery from Disk Crashes
How does Mirror work?
-- make two or more copies of the data and
save on different disks
Benefit
-- save data in case of one disk is fail
-- same data on several disks and let access
to several blocks at once

56
1) Mirroring (cont)

For mirroring, when the data can be lost?
-- the only way data can be lost if both
disks crash
Possibility
Suppose
One disk mean time to failure 10 years
One of the two disk average of mean time to
failure 5 years
The process of replacing the failed disk 3
hours1/2920 year
So
the possibility of the mirror disk will fail1/10
1/2,920 1/29,200
The possibility of data loss by mirroring 1/5
1/29,200 1/146,000

57
2)Parity Blocks

why changes?
-- disadvantages of Mirroring uses too many
redundant disks
Whats new?
-- RAID level 4 uses only one redundant disk
How this one redundant disk works?
-- modulo-2 sum
-- the jth bit of the redundant disk is the
modulo-2 sum of the jth bits of all the data
disks.
Example

58
2)Parity Blocks(cont)___Example

Data disks
Disk1 11110000
Disk2 10101010
Disk3 00111000
Redundant disk
Disk4 01100010

59
2)RAID 4 (cont)

Reading
-- Similar with reading blocks from any disk
Writing
1)change the data disk
2)change the corresponding block of the
redundant disk
Why?
-- hold the parity checks for the
corresponding blocks of all the data disks

60
2)RAID 4 (cont) _ writing

For a total of N data disks
1) naïve way
read N data disks and compute the modulo-2 sum
of the corresponding blocks
rewrite the redundant disk according to
modulo-2 sum of the data disks
2) better way
Take modulo-2 sum of the old and new version of
the data block which was rewritten
Change the position of the redundant disk which
was 1s in the modulo-2 sum

61
2)RAID 4 (cont) _ writing_Example

Data disks
Disk1 11110000
Disk2 10101010 ? 01100010
Disk3 00111000
to do
Modulo-2 sum of the old and new version of disk
2 11001100
So, we need to change the positions 1,2,5,6 of
the redundant disk.
Redundant disk
Disk4 01100010 ? 10101110

62
2)RAID 4 (cont) _failure recovery

Redundant disk crash
-- swap a new one and recomputed data from all
the data disks
One of Data disks crash
-- swap a new one
-- recomputed data from the other disks including
data disks and redundant disk
How to recomputed? (same rule, thats why there
will be some improvement)
-- take modulo-2 sum of all the corresponding
bits of all the other disks

63
3) An Improvement RAID 5

Why need a improvement?
-- Shortcoming of RAID level 4 suffers from a
bottleneck defect (when updating data disk need
to read and write the redundant disk)
Principle of RAID level 5 (RAID 5)
-- treat each disk as the redundant disk for
some of the blocks
Why it is feasible?
The rule of failure recovery for redundant disk
and data disk is the same
take modulo-2 sum of all the corresponding bits
of all the other disks
So, there is no need to retreat one disk as
redundant disk and others as data disks

64
3) RAID 5 (cont)

How to recognize which blocks of each disk treat
this disk as redundant disk?
-- if there are n1 disks which were labeled from
0 to N, then we can treat the ith cylinder of
disk J as redundant if J is the remainder when I
is divided by n1
Example

65
3) RAID 5 (cont)_example

N3
The first disk, labeled as 0 4,8,12
The second disk, labeled as 1 1,5,9
The third disk, labeled as 2 2,6,10
.
Suppose all the 4 disks are equally likely to
be written, for one of the 4 disks, the
possibility of being written
1/4 3 /4 1/3 1/2
If Nm gt 1/m (m-1)/m 1/(m-1) 2/m

66
4) Coping with multiple disk crashes

RAID 6
deal with any number of disk crashes if
using enough redundant disks
Example
a system of seven disks ( four data
disks_numer 1-4 and 3 redundant disks_ number
5-7)
How to set up this 37 matrix ?
(why is 3? there are 3 redundant
disks)
1)every column values three 1s and 0s except
for all three 0s
2) column of the redundant disk has single 1s
3) column of the data disk has at least two 1s

67
4) Coping with multiple disk crashes (cont)

Reading
read form the data disks and ignore the
redundant disk
Writing
Change the data disk
Change the corresponding bits of all the
redundant disks

68
4) Coping with multiple disk crashes (cont)

In those system which has 4 data disks and 3
redundant disk, how they can correct up to 2 disk
crashes?
Suppose disk a and b failed
find some row r (in 37 matrix)in which the
column for a and b are different (suppose a is
0s and b is 1s)
Compute the correct b by taking modulo-2 sum of
the corresponding bits from all the other disks
other than b which have 1s in row r
After getting the correct b, Compute the correct
a with all other disks available
Example

69
4) Coping with multiple disk crashes
(cont)_example

37 matrix
data
disk redundant disk
disk number 1 2 3 4 5
6 7

1 1 1 0 1 0 0
1 1 0 1 0 1 0
1 0 1 1 0 0 1
70
4) Coping with multiple disk crashes
(cont)_example

First block of all the disks
disk contents
1) 11110000
2) 10101010
3) 00111000
4) 01000001
5) 01100010
6) 00011011
7) 10001001

71
4) Coping with multiple disk crashes
(cont)_example

Two disks crashes
disk contents
1) 11110000
2) ?????????
3) 00111000
4) 01000001
5) ?????????
6) 00011011
7) 10001001

72
4) Coping with multiple disk crashes
(cont)_example

In that 37 matrix, find in row 2, disk 2
and 5 have different value and disk 2s value is
1 and 5s value is 0.
so compute the first block of disk 2 by
modulo-2 sum of all the corresponding bits of
disk 1,4,6
then compute the first block of disk 2 by
modulo-2 sum of all the corresponding bits of
disk 1,2,3
1) 11110000
2) ????????? gt
00001111
3) 00111000
4) 01000001
5) ????????? gt
01100010
6) 00011011
7) 10001001

73
13.5
ARRANGING DATA ON DISK
205 Meghna Jain
74
Data elements are represented as records, which
stores in consecutive bytes in same same disk
block. Basic layout techniques of storing data
Fixed-Length Records Allocation criteria -
data should start at word boundary. Fixed
Length record header 1. A pointer to record
schema. 2. The length of the record. 3.
Timestamps to indicate last modification or last
read.
75
Example

CREATE TABLE employee(
name CHAR(30) PRIMARY KEY,
address VARCHAR(255),
gender CHAR(1),
birthdate DATE
)
Data should start at word boundary and contain
header and four fields name, address, gender and
birthdate.

76
Packing Fixed-Length Records into Blocks

Records are stored in the form of blocks on the
disk and they move into main memory when we need
to update or access them.
A block header is written first, and which is
followed by series of blocks.

77
Block header contains the following information

one or more blocks that are part of a network of
blocks.
Information about the role played by this block
in such a network.
Information about the relation, the tuples in
this block belong to.
A "directory" gives the offset of each record in
the block.
Time stamp(s) to indicate time of the block's
last modification and/or access.

78
Example

Along with the header we can pack as many record
as we can
in one block as shown in the figure and remaining
space will
be unused.

79
13.6
REPRESENTING BLOCK AND RECORD ADDRESSES
206 Ramya Karri
80
INTRODUCTION

Address of a block and Record
In Main Memory
Address of the block is the virtual memory
address of the first byte
Address of the record within the block is the
virtual memory address of the first byte of the
record
In Secondary Memory sequence of bytes describe
the location of the block in the overall system
Sequence of Bytes describe the location of the
block the device ID for the disk, Cylinder
number, etc.

81
ADDRESSES IN CLIENT-SERVER SYSTEMS

The addresses in address space are represented
in two ways
Physical Addresses byte strings that determine
the place within the secondary storage system
where the record can be found.
Logical Addresses arbitrary string of bytes of
some fixed length
Physical Address bits are used to indicate
Host to which the storage is attached
Identifier for the disk
Number of the cylinder
Number of the track
Offset of the beginning of the record

82
Addresses in Client-Server Systems (Contd..)

Map Table relates logical addresses to physical
addresses.

Logical Physical

83
LOGICAL AND STRUCTURED ADDRESSES

Purpose of logical address?
Gives more flexibility, when we
Move the record around within the block
Move the record to another block
Gives us an option of deciding what to do when a
record is deleted?

Record 4 Record 3 Record 2 Record 1
84
POINTER SWIZZLING

Having pointers is common in an object-relational
database systems
Important to learn about the management of
pointers
Every data item (block, record, etc.) has two
addresses
database address address on the disk
memory address if the item is in virtual memory

85
POINTER SWIZZLING (CONTD)

Translation Table Maps database address to
memory address
All addressable items in the database have
entries in the map table, while only those items
currently in memory are mentioned in the
translation table

Dbaddr Mem-addr

86
POINTER SWIZZLING (CONTD)

Pointer consists of the following two fields
Bit indicating the type of address
Database or memory address
Example 13.17

Memory
Disk
Swizzled
Block 1
Block 1
Unswizzled
Block 2
87
EXAMPLE 13.7

Block 1 has a record with pointers to a second
record on the same block and to a record on
another block
If Block 1 is copied to the memory
The first pointer which points within Block 1 can
be swizzled so it points directly to the memory
address of the target record
Since Block 2 is not in memory, we cannot swizzle
the second pointer

88
POINTER SWIZZLING (CONTD)

Three types of swizzling
Automatic Swizzling
As soon as block is brought into memory, swizzle
all relevant pointers.
Swizzling on Demand
Only swizzle a pointer if and when it is actually
followed.
No Swizzling
Pointers are not swizzled they are accesses using
the database address.

89
PROGRAMMER CONTROL OF SWIZZLING

Unswizzling
When a block is moved from memory back to disk,
all pointers must go back to database (disk)
addresses
Use translation table again
Important to have an efficient data structure for
the translation table

90
PINNED RECORDS AND BLOCKS

A block in memory is said to be pinned if it
cannot be written back to disk safely.
If block B1 has swizzled pointer to an item in
block B2, then B2 is pinned
Unpin a block, we must unswizzle any pointers to
it
Keep in the translation table the places in
memory holding swizzled pointers to that item
Unswizzle those pointers (use translation table
to replace the memory addresses with database
(disk) addresses

91
13.7
VARIABLE LENGTH DATA AND RECORDS
221 Eswara Satya Pavan Rajesh Pinapala
92
Example
name
address
gender
birth date
0 30 286 287
297
Fig 1 Movie star record with four fields
93
Records with Variable Fields

An effective way to represent variable length
records is as follows
Fixed length fields are Kept ahead of the
variable length fields
Record header contains
Length of the record
Pointers to the beginning of all variable
length fields except the first one.

94
Records with Variable Length Fields
header information
record length
to address
birth date
name
address
gender
Figure 2 A Movie Star record with name and
address implemented as variable length character
strings
95
Records with Repeating Fields

Records contains variable number of occurrences
of a field F
All occurrences of field F are grouped together
and the record
header contains a pointer to the first
occurrence of field F
L bytes are devoted to one instance of field F
Locating an occurrence of field F within the
record
Add to the offset for the field F which are the
integer multiples of L starting with 0 , L ,2L,3L
and so on to locate
We stop upon reaching the offset of the field F.

96
Records with Repeating Fields
other header information
record length
to address
to movie pointers
name
address
pointers to movies
Figure 3 A record with a repeating group of
references to movies
97
Records with Repeating Fields
record header information
length of name
to name
to address
length of address
to movie references
number of references
address
name
Figure 4 Storing variable-length fields
separately from the record
98
Records with Repeating Fields

Advantage
Keeping the record itself fixed length allows
record to be searched more efficiently, minimizes
the overhead in the block headers, and allows
records to be moved within or among the blocks
with minimum effort.
Disadvantage
Storing variable length components on another
block increases the number of disk I/Os needed
to examine all components of a record.

99
Records with Repeating Fields

A compromise strategy is to allocate a fixed
portion of the record for the repeating fields
If the number of repeating fields is lesser than
allocated space, then there will be some
unused space
If the number of repeating fields is greater
than
allocated space, then extra fields are stored
in a
different location and
Pointer to that location and count of additional
occurrences is stored in the record

100
Variable Format Records

Records that do not have fixed schema
Variable format records are represented by
sequence of
tagged fields
Each of the tagged fields consist of information
Attribute or field name
Type of the field
Length of the field
Value of the field
Why use tagged fields
Information Integration applications
Records with a very flexible schema

101
Variable Format Records
code for name
code for restaurant owned
code for string type
code for string type
length
length
N
16
S
S
14
Clint Eastwood
Hogs Breath Inn
R
Fig 5 A record with tagged fields
102
13.8
RECORD MODIFICATIONS
222 Neha Samant
103
Records that do not fit in a block

When the length of a record is greater than one
block size ,then record is divided and placed
into two or more blocks
Portion of the record in each block is referred
to as a
RECORD FRAGMENT
Record with two or more fragments is called
SPANNED RECORD
Record that do not cross a block boundary is
called
UNSPANNED RECORD

104
Spanned Records

Spanned records require the following extra
header information
A bit indicates whether it is fragment or not
A bit indicates whether it is first or last
fragment of
a record
Pointers to the next or previous fragment for
the
same record

105
Records that do not fit in a block
block header
record header
record 2 - b
record 2 - a
record 1
record 3
block 1
block 2
Figure 6 Storing spanned records across blocks
106
BLOBS

Large binary objects are called BLOBS
e.g. audio files, video files
Storage of BLOBS
Retrieval of BLOBS

107
Modification types

Insertion
Deletion
Update

107
108
Insertion

Insertion of records without order
Records can be placed in a blocks free
space or in a new block.
Insertion of records in fixed order
Space available in the block
No space available in the block (outside the
block)
Structured address
Pointer to a record from outside the block.

108
109
Insertion in fixed order

Space available within the block
Use of an offset table in the header of each
block with pointers to the location of each
record in the block.
The records are slid within the block and the
pointers in the offset table are adjusted.

Record 1
109
110
Insertion in fixed order

No space available within the block (outside the
block)
Find space on a nearby block.
In case of no space available on a block, look at
the following block in sorted order of blocks.
If space is available in that block ,move the
highest records of first block 1 to block 2 and
slide the records around on both blocks.
Create an overflow block
Records can be stored in overflow block.
Each block has place for a pointer to an overflow
block in its header.
The overflow block can point to a second overflow
block as shown below.

110
111
Deletion

Recover space after deletion
When using an offset table, the records can be
slid around the block so there will be an unused
region in the center that can be recovered.
In case we cannot slide records, an available
space list can be maintained in the block header.
The list head goes in the block header and
available regions hold the links in the list.

111
112
Deletion

Use of tombstone
The tombstone is placed in a record in order to
avoid pointers to the deleted record to point to
new records.
The tombstone is permanent until the entire
database is reconstructed.
If pointers go to fixed locations from which the
location of the record is found then we put the
tombstone in that fixed location. (See examples)
Where a tombstone is placed depends on the nature
of the record pointers.
Map table is used to translate logical record
address to physical address.

112
113
Deletion

Use of tombstone
If we need to replace records by tombstones,
place the bit that serves as the tombstone at the
beginning of the record.
This bit remains the record location and
subsequent bytes can be reused for another record

Record 1 can be replaced, but the tombstone
remains, record 2 has no tombstone and can be
seen when we follow a pointer to it.
113
114
Update

Fixed Length update
No effect on storage system as it occupies
same space as before update.
Variable length update
Longer length
Short length

114
115
Update

Variable length update (longer length)
Stored on the same block
Sliding records
Creation of overflow block.
Stored on another block
Move records around that block
Create a new block for storing variable length
fields.

115
116
Update

Variable length update (Shorter length)
Same as deletion
Recover space
Consolidate space.

116
117
14.2
BTrees Bitmap Indexes
207 Maciej Kicinski
118
Structure

A balanced tree, meaning that all paths from the
leaf node have the same length.
There is a parameter n associated with each Btree
block. Each block will have space for n search
keys and n1 pointers.
The root may have only 1 parameter, but all other
blocks most be at least half full.

119
Structure

? A typical node gt
? a typical interior
node would have
pointers pointing to
leaves with out
values
? a typical leaf would
have pointers point
to records
N search keys
N1 pointers

120
Application

The search key of the Btree is the primary key
for the data file.
Data file is sorted by its primary key.
Data file is sorted by an attribute that is not a
key,and this attribute is the search key for the
Btree.

121
Lookup

If at an interior node, choose the correct
pointer to use. This is done by comparing keys to
search value.

122
Lookup

If at a leaf node, choose the key that matches
what
you are looking for and the pointer for that
leads
to the data.

123
Insertion

When inserting, choose the correct leaf node to
put pointer to data.
If node is full, create a new node and split keys
between the two.
Recursively move up, if cannot create new pointer
to new node because full, create new node.
This would end with creating a new root node, if
the current root was full.

124
Deletion

Perform lookup to find node to delete and
delete it.
If node is no longer half full, perform join on
adjacent node and recursively delete up, or key
move if that node is full and recursively change
pointer up.

125
Efficiency

Btrees allow lookup, insertion, and deletion of
records using very few disk I/Os.
Each level of a Btree would require one read.
Then you would follow the pointer of that read to
the next or final read.

126
Efficiency

Three levels are sufficient for Btrees. Having
each block have 255 pointers, 2553 is about 16.6
million.
You can even reduce disk I/Os by keeping a level
of a Btree in main memory. Keeping the first
block with 255 pointers would reduce the reads to
2, and even possible to keep the next 255
pointers in memory to reduce reads to 1.

127
14.7
BTrees Bitmap Indexes
221 Deepti Kundu
128
Definition

A bitmap index for a field F is a collection of
bit-vectors of length n, one for each possible
value that may appear in that field F.1

129
What does that mean?

Assume relation R with
2 attributes A and B.
Attribute A is of type Integer and B is of type
String.
6 records, numbered 1 through 6 as shown.

A B
1 30 foo
2 30 bar
3 40 baz
4 50 foo
5 40 bar
6 30 baz
130
Example Continued

A bitmap for attribute B is

A B
1 30 foo
2 30 bar
3 40 baz
4 50 foo
5 40 bar
6 30 baz
Value Vector
foo 100100
bar 010010
baz 001001
131
Where do we reach?

A bitmap index is a special kind of database
index that uses bitmaps.2
Bitmap indexes have traditionally been considered
to work well for data such as gender, which has a
small number of distinct values, e.g., male and
female, but many occurrences of those values.2

132
A little more

A bitmap index for attribute A of relation R is
A collection of bit-vectors
The number of bit-vectors the number of
distinct values of A in R.
The length of each bit-vector the cardinality
of R.
The bit-vector for value v has 1 in position i,
if the ith record has v in attribute A, and it
has 0 there if not.3
Records are allocated permanent numbers.3
There is a mapping between record numbers and
record addresses.3

133
Motivation for Bitmap Indexes

Very efficient when used for partial match
queries.3
They offer the advantage of buckets 2
Where we find tuples with several specified
attributes without first retrieving all the
record that matched in each of the attributes.
They can also help answer range queries 3

134
Another Example

Multidimensional Array of multiple types
(5,d),(79,t),(4,d),(79,d),(5,t),(6,a)
5 100010
79 010100
4 001000
6 000001
d 101100
t 010010
a 000001

135
Example Continued

(5,d),(79,t),(4,d),(79,d),(5,t),(6,a)
Searching for items is easy, just AND together.
To search for (5,d)
5 100010
d 101100
100010 AND 101100 100000

The location of the record has been traced!
136
Compressed Bitmaps

Assume
The number of records in R are n
Attribute A has m distinct values in R
The size of a bitmap index on attribute A is mn.
If m is large, then the number of 1s will be
around 1/m.
Opportunity to encode
A common encoding approach is called run-length
encoding.1

137
Run-length encoding

Represents runs
A run is a sequence of i 0s followed by a 1, by
some suitable binary encoding of the integer i.
A run of i 0s followed by a 1 is encoded by
First computing how many bits are needed to
represent i, Say k
Then represent the run by k-1 1s and a single
0 followed by k bits which represent i in binary.
The encoding for i 1 is 01. k 1
The encoding for i 0 is 00. k 1
We concatenate the codes for each run together,
and the sequence of bits is the encoding of the
entire bit-vector

138
Understanding with an Example

Let us decode the sequence 11101101001011
Staring at the beginning (left most bit)
First run The first 0 is at position 4, so k
4. The next 4 bits are 1101, so we know that the
first integer is i 13
Second run 001011
k 1
i 0
Last run 1011
k 1
i 3
Our entire run length is thus 13,0,3, hence our
bit-vector is
0000000000000110001

139
Managing Bitmap Indexes

1) How do you find a specific bit-vector for a
value efficiently?
2) After selecting results that match, how do you
retrieve the results efficiently?
3) When data is changed, do you you alter bitmap
index?

140
1) Finding bit vectors

Think of each bit-vector as a key to a value.1
Any secondary storage technique will be efficient
in retrieving the values.1
Create secondary key with the attribute value as
a search key 3
Btree
Hash

141
2) Finding Records

Create secondary key with the record number as a
search key 3
Or in other words,
Once you learn that you need record k, you can
create a secondary index using the kth position
as a search key.1

142
3) Handling Modifications

Two things to remember

Record numbers must remain fixed once assigned
Changes to data file require changes to bitmap
index
143

Deletion
Tombstone replaces deleted record
Corresponding bit is set to 0

144

Insertion
Record assigned the next record number.
A bit of value 0 or 1 is appended to each bit
vector
If new record contains a new value of the
attribute, add one bit-vector.

145

Modification
Change the bit corresponding to the old value of
the modified record to 0
Change the bit corresponding to the new value of
the modified record to 1
If the new value is a new value of A, then insert
a new bit-vector.

146
18.1 18.2
CONCURRENCY CONTROL
210 Chiu Luk
147
Concurrency Control

Concurrency control in database management
systems (DBMS) ensures that database transactions
are performed concurrently without the
concurrency violating the data integrity of a
database.
Executed transactions should follow the ACID
rules. The DBMS must guarantee that only
serializable (unless Serializability is
intentionally relaxed), recoverable schedules are
generated.
It also guarantees that committed transactions
wont be lost, and aborted (rolled back)
transactions wont remain in the related
database.

148
Transaction ACID rules
Atomicity - Either the effects of all or none of
its operations remain when a transaction is
completed - in other words, to the outside world
the transaction appears to be indivisible,
atomic. Consistency - Every transaction must
leave the database in a consistent state.
Isolation - Transactions cannot interfere with
each other. Providing isolation is the main goal
of concurrency control. Durability - Successful
transactions must persist through crashes.
149
Serial and Serializable Schedules

In the field of databases, a schedule is a list
of actions, (i.e. reading, writing, aborting,
committing), from a set of transactions.
In this example, Schedule D is the set of 3
transactions T1, T2, T3. The schedule describes
the actions of the transactions as seen by the
DBMS. T1 Reads and writes to object X, and then
T2 Reads and writes to object Y, and finally T3
Reads and writes to object Z. This is an example
of a serial schedule, because the actions of the
3 transactions are not interleaved.

150
Serial and Serializable Schedules

A schedule that is equivalent to a serial
schedule has the serializability property.
In schedule E, the order in which the actions of
the transactions are executed is not the same as
in D, but in the end, E gives the same result as
D.

151
Serial Schedule TI precedes T2

T1 T2
Read(A) A ? A100
Write(A)
Read(B) B ? B100
Write(B)
Read(A)A ? A?2
Write(A)
Read(B)B ? B?2
Write(B)

152
Serial Schedule T2 precedes Tl

T1 T2
Read(A)A ? A?2
Write(A)
Read(B)B ? B?2
Write(B)
Read(A) A ? A100
Write(A)
Read(B) B ? B100
Write(B)

153
serializable, but not serial, schedule

T1 T2
Read(A) A ? A100
Write(A)
Read(A)A ? A?2
Write(A)
Read(B) B ? B100
Write(B)
Read(B)B ? B?2
Write(B)

r1(A) w1 (A) r2(A) w2(A) r1 (B) w1 (B)
r2(B) w2(B)

154
nonserializable schedule

T1 T2
Read(A) A ? A100
Write(A)
Read(A)A ? A?2
Write(A)
Read(B)B ? B?2
Write(B)
Read(B) B ? B100
Write(B)

155
schedule that is serializable only because of the
detailed behavior of the transactions

T1 T2
Read(A) A ? A100
Write(A)
Read(A)A ? A?1
Write(A)
Read(B)B ? B?1
Write(B)
Read(B) B ? B100
Write(B)
regardless of the consistent initial state the
final state will be consistent.

156
Non-Conflicting Actions
Two actions are non-conflicting if whenever
theyoccur consecutively in a schedule, swapping
themdoes not affect the final state produced by
theschedule. Otherwise, they are conflicting.
157
Conflicting Actions General Rules

Two actions of the same transaction conflict
because of the fix sequence
r1(A) w1(B)
Two actions over the same database element
conflict, if one of them is a write
r1(A) w2(A)
w1(A) w2(A)

158
Conflict actions

Two or more actions are said to be in conflict
if
The actions belong to different transactions.
At least one of the actions is a write operation.
The actions access the same object (read or
write).
The following set of actions is conflicting
T1R(X), T2W(X), T3W(X)
While the following sets of actions are not
T1R(X), T2R(X), T3R(X)
T1R(X), T2W(Y), T3R(X)

159
Conflict Serializable

We may take any schedule and make as many
nonconflicting swaps as we wish.
With the goal of turning the schedule into a
serial schedule.
If we can do so, then the original schedule is
serializable, because its effect on the database
state remains the same as we perform each of the
nonconflicting
swaps.

160
Conflict Serializable

A schedule is said to be conflict-serializable
when the schedule is conflict-equivalent to one
or more serial schedules.
Another definition for conflict-serializability
is that a schedule is conflict-serializable if
and only if there exists an acyclic precedence
graph/serializability graph for the schedule.
Which is conflict-equivalent to the serial
schedule ltT1,T2gt, but not ltT2,T1gt.

161
Conflict equivalent / conflict-serializable

Let Ai and Aj are consecutive non-conflicting
actions that belongs to different transactions.
We can swap Ai and Aj without changing the
result.
Two schedules are conflict equivalent if they can
be turned one into the other by a sequence of
non-conflicting swaps of adjacent actions.
We shall call a schedule conflict-serializable if
it is conflict-equivalent to a serial schedule.

162
conflict-serializable
T1 T2
R(A)
W(A)
R(A)
R(B)
W(A)
W(B)
R(B)
W(B)
163
conflict-serializable
T1 T2
R(A)
W(A)
R(B)
R(A)
W(A)
W(B)
R(B)
W(B)
164
conflict-serializable
T1 T2
R(A)
W(A)
R(A)
R(B)
W(B)
W(A)
R(B)
W(B)
Cant swap this pair!!
165
conflict-serializable
T1 T2
R(A)
W(A)
R(A)
W(B)
R(B)
W(A)
R(B)
W(B)
Serial Schedule
166
18.3 18.4
CONCURRENCY CONTROL
211 Donavon Norwood
167
INTRODUCTION

Enforcing serializability by locks
Locks
Locking scheduler
Two phase locking
Locking systems with several lock modes
Shared and exclusive locks
Compatibility matrices
Upgrading/updating locks
Incrementing locks

167
168
Locks

It works like as follows
A request from transaction
Scheduler checks in the lock table
Generates a serializable schedule of actions.

168
169
Consistency of transactions

Actions and locks must relate each other
Transactions can only read write only if has a
lock and has not released the lock.
Unlocking an element is compulsory.
Legality of schedules
No two transactions can aquire the lock on same
element without the prior one releasing it.

169
170
Locking scheduler