Title: Files%20and%20Buffer%20Manager
1Files and Buffer Manager
Chapter 15
2Abstractions Provided by the File Manager
- Device independence The file manager turns the
large variety of external storage devices, such
as disks (with their different numbers of
cylinders, tracks, arms, and read/write heads),
ram-disks, tapes, and so on, into simple abstract
data types. - Allocation independence The file manager does
its own space management for storing the data
objects presented by the client. It may store the
same objects in more than one place (replication).
3Abstractions Provided by the File Manager
- Address independence Whereas objects in main
memory are always accessed through their
addresses, the file manager provides mechanisms
for associative access. Thus, for example, the
client can request access to all records with a
specified value in some field of the record.
Support for associative access comes in many
flavors, from simple mechanisms yielding fast
retrieval via the primary key up to the
expressive power of the SQL select statement.
4External Storage vs. Main Memory
- Capacity Main memory is usually limited to a
size that is some orders of magnitude smaller
than what large databases need. - Economics External storage holds large volumes
of data at reasonable cost. - Durability Main memory is volatile. External
storage devices such as magnetic or optical disks
are inherently durable and therefore are
appropriate for storing persistent objects.
After a crash, recovery starts with what is found
in durable storage.
5External Storage vs. Main Memory
- Speed External storage devices are some orders
of magnitude slower than main memory. As a
result, it is more costly, both in terms of
latency and in terms of pathlength, to get data
from external storage to the CPU than to load
data from main memory. - Functionality Data cannot be processed directly
on external storage they can neither be compared
nor modified out there.
6The Storage Pyramid
current data
Electronic RAM
main
and bulk
memory
storage
stale
Magnetic
online
data
/ optical
external
disks
storage
near line
Automated archives
(archive)
(e.g. optical disk
storage
jukeboxes, tape
robots, etc.)
typical capacity
7Interfacing to External MemoryRead-Write Mapping
8Interfacing to External MemoryFile Mapping
9Interfacing to External MemorySingle-Level
Storage
10Locality and Cacheing
- The movement of data through the pyramid is
guided by the principle of locality - Locality of active data Data that have recently
been referenced will very likely be referenced
again. - Locality of passive data Data that have not been
referenced recently will most likely not be
referenced in the future.
11Levels of Abstraction in a File and Database
Manager
12Operations of the Basic File System
- STATUS create(filename, allocparmp)
- STATUS delete(filename)
- STATUS open(filename, ACCESSMODE, FILEID)
- STATUS close(FILEID)
- STATUS extend(FILEID, allocparmp)
- STATUS read(FILEID, BLOCKID, BLOCKP)
- STATUS readc(FILEID, BLOCKID, blockcount,
BLOCKP) - STATUS write(FILEID, BLOCKID, BLOCKP)
- STATUS writec(FILEID, BLOCKID, blockcount,
- BLOCKP)
13Mapping Files To Disk
14Issues in Managing Disk Space
- Initial allocation When a file is created, how
many contiguous slots should be allocated to it? - Incremental expansion If an existing file grows
beyond the number of slots currently allocated,
how many additional contiguous blocks should be
assigned to that file? - Reorganization When and how should the free
space on the disk be reorganized?
15Extent-Based Allocation
16Buddy Systems
17Simple Mapping of Relations To Disks
18A Usual Way of Mapping of Relations To Disks
19Principles of the Database Buffer
20Design Options for the Buffer Manager
- Buffer per file Each file has its own private
buffer pool.. - Buffer per page size In systems with different
page (and block) sizes, there is usually at least
one buffer for each page size. - Buffer per file type There are files like
indices, which are accessed in a significantly
different way from other files. Therefore, some
systems dedicate buffers to files depending on
the access pattern and try to manage each of them
in a way that is optimal for the respective file
organization.
21Logic of the Buffer Manager
- Search in buffer Check if the requested page is
in the buffer. If found, return the address F of
this frame to the caller. - Find free frame If the page is not in the
buffer, find a frame that holds no valid page. - Determine replacement victim If no such frame
exists, determine a page that can be removed from
the buffer (in order to reuse its frame).
22Logic of the Buffer Manager
- Write modified page If replacement page has been
changed, write it. - Establish frame address Denote the start address
of the frame as F. - Determine block address Translate the requested
PAGEID P into a FILEID and a block number. Read
the block into the frame selected. - Return Return the frame address F to the caller.
23Synchronization in the Buffer
24What the Buffer Manager Does for Synchronization
- Sharing Pages are made addressable to all
processes that run the database code. - Semaphore protection Each requestor gets the
address of a semaphor protecting the page. - Durable storage The access modules inform the
buffer manager if their page access has resulted
in an update of the page the actual write
operation, however, is issued by the buffer
manager, probably at a time when the update
transaction is long gone.
25The Interface to the Buffer Manager
- typedef struct
- PAGEID pageid / id of page in file
/ - PAGEPTR pageaddr / base addr. in buffer
/ - int index / record within page
/ - semaphore pagesem / pointer to the
sem. / - Boolean modified / caller modif.
page / - Boolean invalid / destroyed page
/ - BUFFER_ACC_CB, BUFFER_ACC_CBP
- / control block for buffer access /
26The Need for Fix and Unfix
27The Fix-Use-Unfix Protocol I
- FIX The client requests access to a page using
the bufferfix interface. - USE The client uses the page and the pointer to
the frame containing the page will remain valid. - UNFIX The client explicitly waives further usage
of the frame pointer that is, it tells the
buffer manager that it no longer wants to use
that page.
28The Fix-Use-Unfix Protocol II
29Structure of the Buffer Manager
30Logging and Recovery from the Buffer Manager's
Perspective I
Transaction
Buffer
Database
Remark
running
OK old state in DB
running
OK old state in DB
running
database corrupted
running
conflicting view on TA
committed
OK Read-only TA
committed
DB not in new state
committed
database corrupted
committed
OK new state in DB
31Logging and Recovery from the Buffer Manager's
Perspective II
state of
state of
result of recovery
transaction
page A in
using operation log
TA
database
aborted
old
wrong tuple might be deleted
aborted
new
inverse operation succeeds
committed
old
operation succeeds
new
duplicate of tuple is inserted
committed
32The Log and Page LSNs
33Different Buffer Management Policies
- Steal policy When the buffer manager needs
space, it can decide to replace dirty pages. - No-Steal policy Pages can be replaced only if
they are clean. - Force policy At end of transaction, all modified
pages are forced to disk in a series of
synchronous write operations. - No-Force policy No modified page is forced
during commit. REDO log records are written to
the log.
34The Problem of Hotspot Pages
35The Basic Checkpoint Algorithm
- Quiesce Delay all incoming update DML calls
until all fixes with exclusive semaphores have
been released. - Flush the buffer Write all modified pages.
- Log the checkpoint Write a record to the log,
saying that a checkpoint has been generated. - Resume normal operation The bufferfix requests
for updates that have been delayed in order to
take the checkpoint can now be processed again.
36The Case for Indirect Checkpointing
37The Indirect Checkpointing Algorithm
- Record TOCÂ Log the list of PAGEIDs.
- Compare with prev. ckpt See if any modified
pages have not been replaced since last ckpt. - Force lazy pages Schedule the writing of those
pages during the next checkpoint interval. - Low-water mark Find the LSN of the oldest
still-volatile update write it to the log. - Write Checkpoint done record
- Resume normal operation
38Further Possibilities for Optimization
- Pre-flushing can be performed by an asynchronous
process that scans the buffer for "old" modified
pages. Writing is done under semaphore
protection. - Pre-fetching can, among other things, be used to
make restart more efficient. If page reads are
logged one can use the recent checkpoint plus the
log to prime the bufferpool, i.e. it will look
almost exactly like at the moment of the crash.
39Further Possibilities for Optimization
- Transaction scheduling and buffer management
can take hints from the query optimizer - This relation will be scanned sequentially.
- This is a sequential scan of the leaves of a
B-tree. - This is the traversal of a B-tree, starting at
the root. - This is a nested-loop join, where the inner
relation is scanned in physically sequential
order.