Files%20and%20Buffer%20Manager - PowerPoint PPT Presentation

About This Presentation

Title:

Files%20and%20Buffer%20Manager

Description:

After a crash, recovery starts with what is found in durable storage. ... Durable storage: The access modules inform the buffer manager if their page ... – PowerPoint PPT presentation

Number of Views:20

Avg rating:3.0/5.0

Slides: 40

Provided by: ResearchM53

Category:

more less

Transcript and Presenter's Notes

Title: Files%20and%20Buffer%20Manager

1
Files and Buffer Manager
Chapter 15
2
Abstractions Provided by the File Manager

Device independence The file manager turns the
large variety of external storage devices, such
as disks (with their different numbers of
cylinders, tracks, arms, and read/write heads),
ram-disks, tapes, and so on, into simple abstract
data types.
Allocation independence The file manager does
its own space management for storing the data
objects presented by the client. It may store the
same objects in more than one place (replication).

3
Abstractions Provided by the File Manager

Address independence Whereas objects in main
memory are always accessed through their
addresses, the file manager provides mechanisms
for associative access. Thus, for example, the
client can request access to all records with a
specified value in some field of the record.
Support for associative access comes in many
flavors, from simple mechanisms yielding fast
retrieval via the primary key up to the
expressive power of the SQL select statement.

4
External Storage vs. Main Memory

Capacity Main memory is usually limited to a
size that is some orders of magnitude smaller
than what large databases need.
Economics External storage holds large volumes
of data at reasonable cost.
Durability Main memory is volatile. External
storage devices such as magnetic or optical disks
are inherently durable and therefore are
appropriate for storing persistent objects.
After a crash, recovery starts with what is found
in durable storage.

5
External Storage vs. Main Memory

Speed External storage devices are some orders
of magnitude slower than main memory. As a
result, it is more costly, both in terms of
latency and in terms of pathlength, to get data
from external storage to the CPU than to load
data from main memory.
Functionality Data cannot be processed directly
on external storage they can neither be compared
nor modified out there.

6
The Storage Pyramid
current data
Electronic RAM
main
and bulk
memory
storage
stale
Magnetic
online
data
/ optical
external
disks
storage
near line
Automated archives
(archive)
(e.g. optical disk
storage
jukeboxes, tape
robots, etc.)
typical capacity
7
Interfacing to External MemoryRead-Write Mapping
8
Interfacing to External MemoryFile Mapping
9
Interfacing to External MemorySingle-Level
Storage
10
Locality and Cacheing

The movement of data through the pyramid is
guided by the principle of locality
Locality of active data Data that have recently
been referenced will very likely be referenced
again.
Locality of passive data Data that have not been
referenced recently will most likely not be
referenced in the future.

11
Levels of Abstraction in a File and Database
Manager
12
Operations of the Basic File System

STATUS create(filename, allocparmp)
STATUS delete(filename)
STATUS open(filename, ACCESSMODE, FILEID)
STATUS close(FILEID)
STATUS extend(FILEID, allocparmp)
STATUS read(FILEID, BLOCKID, BLOCKP)
STATUS readc(FILEID, BLOCKID, blockcount,
BLOCKP)
STATUS write(FILEID, BLOCKID, BLOCKP)
STATUS writec(FILEID, BLOCKID, blockcount,
BLOCKP)

13
Mapping Files To Disk
14
Issues in Managing Disk Space

Initial allocation When a file is created, how
many contiguous slots should be allocated to it?
Incremental expansion If an existing file grows
beyond the number of slots currently allocated,
how many additional contiguous blocks should be
assigned to that file?
Reorganization When and how should the free
space on the disk be reorganized?

15
Extent-Based Allocation
16
Buddy Systems
17
Simple Mapping of Relations To Disks
18
A Usual Way of Mapping of Relations To Disks
19
Principles of the Database Buffer
20
Design Options for the Buffer Manager

Buffer per file Each file has its own private
buffer pool..
Buffer per page size In systems with different
page (and block) sizes, there is usually at least
one buffer for each page size.
Buffer per file type There are files like
indices, which are accessed in a significantly
different way from other files. Therefore, some
systems dedicate buffers to files depending on
the access pattern and try to manage each of them
in a way that is optimal for the respective file
organization.

21
Logic of the Buffer Manager

Search in buffer Check if the requested page is
in the buffer. If found, return the address F of
this frame to the caller.
Find free frame If the page is not in the
buffer, find a frame that holds no valid page.
Determine replacement victim If no such frame
exists, determine a page that can be removed from
the buffer (in order to reuse its frame).

22
Logic of the Buffer Manager

Write modified page If replacement page has been
changed, write it.
Establish frame address Denote the start address
of the frame as F.
Determine block address Translate the requested
PAGEID P into a FILEID and a block number. Read
the block into the frame selected.
Return Return the frame address F to the caller.

23
Synchronization in the Buffer
24
What the Buffer Manager Does for Synchronization

Sharing Pages are made addressable to all
processes that run the database code.
Semaphore protection Each requestor gets the
address of a semaphor protecting the page.
Durable storage The access modules inform the
buffer manager if their page access has resulted
in an update of the page the actual write
operation, however, is issued by the buffer
manager, probably at a time when the update
transaction is long gone.

25
The Interface to the Buffer Manager

typedef struct
PAGEID pageid / id of page in file
/
PAGEPTR pageaddr / base addr. in buffer
/
int index / record within page
/
semaphore pagesem / pointer to the
sem. /
Boolean modified / caller modif.
page /
Boolean invalid / destroyed page
/
BUFFER_ACC_CB, BUFFER_ACC_CBP
/ control block for buffer access /

26
The Need for Fix and Unfix
27
The Fix-Use-Unfix Protocol I

FIX The client requests access to a page using
the bufferfix interface.
USE The client uses the page and the pointer to
the frame containing the page will remain valid.
UNFIX The client explicitly waives further usage
of the frame pointer that is, it tells the
buffer manager that it no longer wants to use
that page.

28
The Fix-Use-Unfix Protocol II
29
Structure of the Buffer Manager
30
Logging and Recovery from the Buffer Manager's
Perspective I
Transaction
Buffer
Database
Remark
running
OK old state in DB
running
OK old state in DB
running
database corrupted
running
conflicting view on TA
committed
OK Read-only TA
committed
DB not in new state
committed
database corrupted
committed
OK new state in DB
31
Logging and Recovery from the Buffer Manager's
Perspective II
state of
state of
result of recovery
transaction
page A in
using operation log
TA
database
aborted
old
wrong tuple might be deleted
aborted
new
inverse operation succeeds
committed
old
operation succeeds
new
duplicate of tuple is inserted
committed
32
The Log and Page LSNs
33
Different Buffer Management Policies

Steal policy When the buffer manager needs
space, it can decide to replace dirty pages.
No-Steal policy Pages can be replaced only if
they are clean.
Force policy At end of transaction, all modified
pages are forced to disk in a series of
synchronous write operations.
No-Force policy No modified page is forced
during commit. REDO log records are written to
the log.

34
The Problem of Hotspot Pages
35
The Basic Checkpoint Algorithm

Quiesce Delay all incoming update DML calls
until all fixes with exclusive semaphores have
been released.
Flush the buffer Write all modified pages.
Log the checkpoint Write a record to the log,
saying that a checkpoint has been generated.
Resume normal operation The bufferfix requests
for updates that have been delayed in order to
take the checkpoint can now be processed again.

36
The Case for Indirect Checkpointing
37
The Indirect Checkpointing Algorithm

Record TOC Log the list of PAGEIDs.
Compare with prev. ckpt See if any modified
pages have not been replaced since last ckpt.
Force lazy pages Schedule the writing of those
pages during the next checkpoint interval.
Low-water mark Find the LSN of the oldest
still-volatile update write it to the log.
Write Checkpoint done record
Resume normal operation

38
Further Possibilities for Optimization

Pre-flushing can be performed by an asynchronous
process that scans the buffer for "old" modified
pages. Writing is done under semaphore
protection.
Pre-fetching can, among other things, be used to
make restart more efficient. If page reads are
logged one can use the recent checkpoint plus the
log to prime the bufferpool, i.e. it will look
almost exactly like at the moment of the crash.

39
Further Possibilities for Optimization

Transaction scheduling and buffer management
can take hints from the query optimizer
This relation will be scanned sequentially.
This is a sequential scan of the leaves of a
B-tree.
This is the traversal of a B-tree, starting at
the root.
This is a nested-loop join, where the inner
relation is scanned in physically sequential
order.