Storing Data: Disks and Files - PowerPoint PPT Presentation

1 / 40

About This Presentation

Title:

Storing Data: Disks and Files

Description:

... of the page tells the buffer manger that the page is modified by setting ... are some unpinned pages, the buffer manger chooses one by using the replacement ... – PowerPoint PPT presentation

Number of Views:288

Avg rating:3.0/5.0

Slides: 41

Provided by: RaghuRamak150

Category:

more less

Transcript and Presenter's Notes

Title: Storing Data: Disks and Files

1
Storing Data Disks and Files

Chapter 7

Yea, from the table of my memory Ill wipe away
all trivial fond records. -- Shakespeare, Hamlet
2
Disks and Files

DBMS stores information on (hard) disks.
This has major implications for DBMS design!
READ transfer data from disk to main memory
(RAM).
WRITE transfer data from RAM to disk.
Both are high-cost operations, relative to
in-memory operations, so must be planned
carefully!

3
Why Not Store Everything in Main Memory?

Costs too much. 1000 will buy you either 128MB
of RAM or 7.5GB of disk today.
Main memory is volatile. We want data to be
saved between runs. (Obviously!)
Typical storage hierarchy
Main memory (RAM) for currently used data.
Disk for the main database (secondary storage).
Tapes for archiving older versions of the data
(tertiary storage).

4
Disks

Secondary storage device of choice.
Main advantage over tapes random access vs.
sequential.
Data is stored and retrieved in units called disk
blocks or pages.
Unlike RAM, time to retrieve a disk page varies
depending upon location on disk.
Therefore, relative placement of pages on disk
has major impact on DBMS performance!

5
Components of a Disk
Spindle
Disk head

The platters spin (say, 90rps).

The arm assembly is moved in or out to position
a head on a desired track. Tracks under heads
make a cylinder (imaginary!).

Sector
Platters

Only one head reads/writes at any one time.

Block size is a multiple of sector
size (which is fixed).

6
Accessing a Disk Page

Time to access (read/write) a disk block
seek time (moving arms to position disk head on
track)
rotational delay (waiting for block to rotate
under head)
transfer time (actually moving data to/from disk
surface)
Seek time and rotational delay dominate.
Seek time varies from about 1 to 20msec
Rotational delay varies from 0 to 10msec
Transfer rate is about 1msec per 4KB page
Key to lower I/O cost reduce seek/rotation
delays! Hardware vs. software solutions?

7
Arranging Pages on Disk

Next block concept
blocks on same track, followed by
blocks on same cylinder, followed by
blocks on adjacent cylinder
Blocks in a file should be arranged sequentially
on disk (by next), to minimize seek and
rotational delay.
For a sequential scan, pre-fetching several pages
at a time is a big win!

8
RAID

Disk Array Arrangement of several disks that
gives abstraction of a single, large disk.
Goals Increase performance and reliability.
Two main techniques
Data striping Data is partitioned size of a
partition is called the striping unit. Partitions
(of equal size called stripping unit) are
distributed over several disks.
Redundancy More disks -gt more failures.
Redundant information allows reconstruction of
data if a disk fails.

9
RAID Levels

Level 0 No redundancy
Level 1 Mirrored (two identical copies)
Each disk has a mirror image (check disk)
Parallel reads, a write involves two disks.
Maximum transfer rate transfer rate of one disk
Level 01 Striping and Mirroring
Parallel reads, a write involves two disks.
Maximum transfer rate aggregate bandwidth

10
RAID Levels (Contd.)

Level 2 Error-Correcting Codes
Striping Unit One bit.
Redundancy scheme Hamming Code
Keeps more redundant information than necessary

11
RAID Levels (Contd.)

Level 3 Bit-Interleaved Parity
Striping Unit One bit. One check disk.
Each read and write request involves all disks
disk array can process one request at a time.
Level 4 Block-Interleaved Parity
Striping Unit One disk block. One check disk.
Parallel reads possible for small requests, large
requests can utilize full bandwidth
Writes involve modified block and check disk
Level 5 Block-Interleaved Distributed Parity
Similar to RAID Level 4, but parity blocks are
distributed over all disks

12
Disk Space Management

Lowest layer of DBMS software manages space on
disk.
Supports concept of page as unit of data
size of page chosen as size of disk block
Pages stored as disk blocks
Reading writing page can be done in one disk I/O
Higher levels call upon this layer to
allocate/de-allocate a page
read/write a page
Request for a sequence of pages must be satisfied
by allocating the pages sequentially on disk!
Higher levels dont need to know how this is
done, or how free space is managed.

13
Practice Chapter 7

What is the most important difference between a
disk and a tape?
Tapes are sequential devices, disk support direct
access to desired page.
Explain the terms seek time, rotational delay,
and transfer time.
Time to access (read/write) a disk block
seek time time to move disk heads to track on
which desired block is located
rotational delay waiting for block to rotate
under disk head it is the time required for half
a rotation on average, and is usually less than
seek time.
transfer time actually read/write data to/from
block once head is positioned, I.e. time for disk
to rotate over block.

14
Practice Chapter 7 (Cont.)

Both disks and main memory support direct access
to any desired location (page). On average, main
memory accesses are faster, of course. What is
the other important difference (from the
perspective of the time required to access a
desired page)?
The time to access a disk page is not constant.
It depends on the location of the data.
Accessing to some data might be much faster than
to others. It is different in memory access to
memory is uniform for most computer systems.
If you have a large file that is frequently
scanned sequentially, explain how you would store
the pages in the file on a disk.
The pages in the file should be stored
sequentially on a disk. We should put two
logically adjacent pages as close as possible.
In decreasing order of closeness, they could be
on the same track, the same cylinder, or an
adjacent cylinder.

15
Components of a Disk
Spindle
Disk head

The platters spin (say, 90rps).

The arm assembly is moved in or out to position
a head on a desired track. Tracks under heads
make a cylinder (imaginary!).

Sector
Platters

Only one head reads/writes at any one time.

Block size is a multiple of sector
size (which is fixed).

16
Practice Chapter 7 (Cont.)

Consider a disk with sector size 512 bytes, 2,000
tracks per surface, 50 sectors per track, 5
double-sided platters, average seek time of 10
msec.
What is the capacity of a track in bytes?
What is the capacity of each surface?
What is the capacity of the disk?

What is the capacity of a track in bytes?
gtgt Bytes/track bytes/sector x sector/track
512 x 50 25K
2. What is the capacity of each surface?
gtgt Bytes/surface bytes/track x tracks/surface
25K x 2000 50,000K
3. What is the capacity of the disk?
gtgt Bytes/disk bytes/surface x surfaces/disk
50,000K x 10

17
Practice Chapter 7 (Cont.)

Consider a disk with sector size 512 bytes, 2,000
tracks per surface, 50 sectors per track, 5
double-sided platters, average seek time of 10
msec.
4. How many cylinders does the disk have?
5. Give examples of valid block sizes. Is 256
bytes a valid block size? 2,048? 51,200?

4. gtgt The number of cylinders is the same as the
number of tracks on each platter, which is 2000.
gtgt Block size should be a multiple of the sector
size 256 not valid block size, but 2,048 and
51,200 are.

18
Practice Chapter 7 (Cont.)

Consider a disk with sector size 512 bytes, 2,000
tracks per surface, 50 sectors per track, 5
double-sided platters, average seek time of 10
msec.
6. If the disk platters rotate at 5,400 rpm
(revolutions per minute), what is the maximum
rotational delay?

6. gtgt If the disk platters rotate at 5,400 rpm,
the time required for a rotation, which is the
maximum rotational delay, is (1/5400) x 60
0.011 seconds The average rotational delay is
half of the rotation time, 0.006 seconds.
19
Practice Chapter 7 (Cont.)

Consider a disk with sector size 512 bytes, 2,000
tracks per surface, 50 sectors per track, 5
double-sided platters, average seek time of 10
msec.
7. Assuming that one track of data can be
transferred per revolution, what is the transfer
rate?

7. gtgt The capacity of a track is 25K bytes.
Since one track of data can be transferred by
revolution, the data transfer rate is 25K / 0.011
2,250K bytespersec
20
Buffer Management in a DBMS
Page Requests from Higher Levels
BUFFER POOL
disk page
free frame
MAIN MEMORY
DISK
choice of frame dictated by replacement policy

Data must be in RAM for DBMS to operate on it!
Table of ltframe, pageidgt pairs is maintained.

21
When a Page is Requested ...

If requested page is not in pool
Choose a frame for replacement
If frame is dirty, write it to disk
Read requested page into chosen frame
Pin the page and return its address.

Requires two variables for buffer manager ?
pincount (initially set to zero for each
frame) ? dirty (initially off for each frame)

If requests can be predicted (e.g., sequential
scans)
pages can be pre-fetched several pages at a
time!

22
More on Buffer Management

Requestor of page must unpin it, and indicate
whether page has been modified
dirty bit is used for this.
Page in pool may be requested many times,
a pin count is used. A page is a candidate for
replacement iff pin count 0.
CC recovery may entail additional I/O when a
frame is chosen for replacement. (Write-Ahead Log
protocol more later.)

23
Buffer Replacement Policy

Frame is chosen for replacement by a replacement
policy
Least-recently-used (LRU), Clock, MRU etc.
Policy can have big impact on of I/Os depends
on the access pattern.
Sequential flooding Nasty situation caused by
LRU repeated sequential scans.
buffer frames lt pages in file means each page
request causes an I/O. MRU much better in this
situation (but not in all situations, of course).

24
DBMS vs. OS File System

OS does disk space buffer mgmt why not let
OS manage these tasks?
Differences in OS support portability issues
DBMS can predict access patterns in typical DB
operations,
because most page references are generated by
higher-level opns (e.g. seq. scans or
implementations of rel. algebra operators)
and therefore it can adjust replacement policy,
and pre-fetch pages based on those access
patterns
Buffer management in DBMS requires ability to
pin a page in buffer pool, force a page to disk
(important for implementing CC recovery),

25
Unordered (Heap) Files

Simplest file structure contains records in no
particular order.
Only guarantee one can retrieve all records in
file by repeated requests for the next record.
As file grows and shrinks, disk pages are
allocated and de-allocated.
Every record in file has unique record id, rid,
and every page in file is of same size.
To support record level operations, we must
keep track of the pages in a file
keep track of free space on pages
keep track of the records on a page
Supported operations create and destroy files,
insert delete and get a record with rid, scan all
records.
There are many alternatives for keeping track of
this.

26
Heap File Implemented as a List
Data Page
Data Page
Data Page
Full Pages
Header Page
Data Page
Data Page
Data Page
Pages with Free Space

The ltheader page id, Heap file namegt must be
stored someplace DBMS maintains table of such
pairs.
Each page contains 2 pointers plus data.

27
Heap File Using a Page Directory

The entry for a page can include the number of
free bytes on the page.
The directory is a collection of pages linked
list implementation is just one alternative.
Much smaller than linked list of all HF pages!

28
Indexes

A Heap file allows us to retrieve records
by specifying the rid, or
by scanning all records sequentially
Sometimes, we want to retrieve records by
specifying the values in one or more fields,
e.g.,
Find all students in the CS department
Find all students with a gpa gt 3
Find all books by Asimov (? index by Author)
Find Foundation (? index by Title)
Indexes are file structures that enable us to
answer such value-based queries efficiently.
Implementation just another kind of file
containing records that direct traffic on
requests for data records.

29
Indexes (Cont.)

Each index has an associated search key
collection of one or more fields of the file of
records for which we are building the index
Any subset of the fields can be a search key
Sometimes refer to the file of records as the
indexed file.
Each index designed to speed up equality or range
selections on search key
E.g. If want to build index to improve efficiency
of queries about employees of given age ? build
index on age attribute of employee dataset.
Records stored in index file (called entries
vs. data records) allow to find data records
with given search key value
ltage, ridgt where rid identifies data record

30
Summary

Disks provide cheap, non-volatile storage.
Unit of Xfer from disk into main memory is called
block or page.
Blocks are arranged on tracks on several
platters.
Random access, but cost depends on location of
page on disk important to arrange data
sequentially to minimize seek and rotation
delays.
Buffer manager brings pages into RAM.
Page stays in RAM until released by requestor.
Written to disk when frame chosen for replacement
(which is sometime after requestor releases the
page).
Choice of frame to replace based on replacement
policy.
Tries to pre-fetch several pages at a time.

31
Summary (Contd.)

DBMS vs. OS File Support
DBMS needs features not found in many OSs, e.g.,
forcing a page to disk, controlling the order of
page writes to disk, files spanning disks,
ability to control pre-fetching and page
replacement policy based on predictable access
patterns, etc.
Variable length record format with field offset
directory offers support for direct access to
ith field and null values.
Slotted page format supports variable length
records and allows records to move on page.

32
Summary (Contd.)

File layer keeps track of pages in a file, and
supports abstraction of a collection of records.
Pages with free space identified using linked
list or directory structure (similar to how pages
in file are kept track of).
Indexes support efficient retrieval of records
based on the values in some fields.
Catalog relations store information about
relations, indexes and views. (Information that
is common to all records in a given collection.)

33
Practice Chapter 7 (Cont.??)

Consider a disk with sector size 512 bytes, 2,000
tracks per surface, 50 sectors per track, 5
double-sided platters, average seek time of 10
msec., and suppose block size of 1,024 is chosen.
Suppose that a file containing 100,000 records
of 100 bytes each is to be stored on such a disk
and that no record is allowed to span on two
blocks.
How many records fit onto a block?

blocksize / bytesperrec
1024 / 100 10. We can have at most 10 records
in a block.

34
Chapter 7 Practice (Cont.)

Explain what the buffer manager must do to
process a read request for a page. What happens
if the requested page is in the pool but not
pinned?
When a page is requested the buffer manager does
the following
The buffer pool is checked to see if it contains
the requested page. If the page is not in the
pool, it is brought as follows
A frame is chosen for replacement, using the
replacement policy.
If the frame chosen for replacement is dirty (has
been written to), it is flushed (the page it
contains is written out to disk).
The requested page is read into the frame chosen
for replacement.
The requested page is pinned (pin_count of its
frame is incremented), and its address is
returned to the requestor.
Note that if the page is not pinned, it could be
removed from buffer pool even if it is actually
needed in main memory.

35
Chapter 7 Practice (Cont.)

When does a buffer manager write a page to disk?
If a page in the buffer pool is chosen to be
replaced and this page is dirty, the buffer
manager must write the page to the disk called
flushing the page to the disk.
Sometimes buffer mgr can also force a page to
disk for recovery-related purposes (to ensure
that log records corresponding to a modified page
are written to disk before modified page itself
is written to disk).

36
Chapter 7 Practice (Cont.)

What does it mean to say that a page is pinned in
the buffer pool? Who is responsible for pinning
pages? Who is responsible for unpinning pages?
Pinning a page means the pin_count of its frame
is incremented. Pinning a page guarantees
higher-level DBMS s/w that the page will not be
removed from the buffer pool by the buffer
manager. That is, another file page will not be
read into the frame containing this page until it
is unpinned by this requestor.
It is the buffer managers responsibility to pin
a page.
It is the responsibility of the requestor of that
page to tell the buffer manager to unpin a page.

37
Chapter 7 Practice (Cont.)

When a page in the buffer pool is modified, how
does the DBMS ensure that this change is
propagated to disk? (Explain role of buffer mgr
as well as modifier of the page).
The modifier of the page tells the buffer manger
that the page is modified by setting the dirty
bit of the page.
The buffer manager flushes the page to disk when
necessary.

38
Chapter 7 Practice (Cont.)

What happens if there is a page request when all
pages in the buffer pool are dirty?
If there are some unpinned pages, the buffer
manger chooses one by using the replacement
policy, flushes this page, and then replaces it
with the requested page.
If there are no unpinned pages, the buffer
manager has to wait until an unpinned page is
available (or signal an error condition to the
page requestor).

39
Chapter 7 Practice (Cont.)

What is sequential flooding of the buffer pool?
Some DB operations (e.g. certain implementations
of the join relational algebra operator) require
repeated sequential scans of a relation. Suppose
that there are 10 frames available in the buffer
pool, and the file to be scanned has 11 or more
pages (I.e. at least one more than the number of
available pages in the buffer pool). Using LRU,
every scan of the file will result in reading in
every page of the file! In this situation,
called sequential flooding, LRU is the worst
possible replacement strategy.

40
Chapter 7 Practice (Cont.)

Name an important capability of a DBMS buffer mgr
that is not supported by a typical OSs buffer
mgr.
Pinning a page to prevent it from being replaced.
Ability to explicitly force a single page to
disk.
Explain the term prefetching. Why is it
important?
Because most page references in a DBMS env. Are
with a known reference pattern, the buffer mgr
can anticipate the next several page requests and
fetch the corresponding pages into memory before
the pages are requested. This is prefetching.
Benefits 1) pages are available in buffer pool
when they are requested 2) reading in a
contiguous block of pages is much faster than
reading the same pages at different times in
response to distinct requests.