Title: Storage and File Organization
1Storage and File Organization
2General Overview - rel. model
- Relational model - SQL
- Formal commercial query languages
- Functional Dependencies
- Normalization
- Physical Design
- Indexing
3Review
- DBMSs store data on disk
- Disk characteristics
- 2 orders of magnitude slower than MM
- Unit of read/write operations a block/page
(multiple of sectors)? - Access time seek time rotation time
transfer time - Sequential I/O much faster than random I/O
- Database Systems try to minimize the overhead of
moving data from and to disk
4Review
- Methods to improve MM to/from disk transfers
- Improve the disk technology
- Use faster disks (more RPMs)?
- Parallelization (RAID) some redundancy
- Avoid unnecessary reads from disk
- Buffer management go to buffer (MM) instead of
disk - Good file organization
- Other (OS based improvements) disk scheduling
(elevator algorithm, batch writes, etc)?
5Buffer Management
- Keep pages in a part of memory (buffer), read
directly from there - What happens if you need to bring a new page into
buffer and buffer is full you have to evict one
page - Replacement policy
- LRU Least Recently Used (CLOCK)?
- MRU Most Recently Used
- Toss-immediate remove a page if you know that
you will not need it again - Pinning (needed in recovery, index based
processing,etc)? - Other DB specific RPs DBMIN, LRU-k, 2Q
6File Organization
- Basics
- A database is a collection of files, file is a
collection of records, record (tuple) is a
collection of fields (attributes)? - Files are stored on Disks (that use blocks to
read and write)? - Two important issues
- Representation of each record
- Grouping/Ordering of records and storage in
blocks
7File Organization
- Goal and considerations
- Compactness
- Overhead of insertion/deletion
- Retrieval speed sometime we prefer to bring more
tuples than necessary in MM and use CPU to filter
out the unnecessary ones!
8Record Representation
- Fixed-Length Records
- Example
- Account( acc-number char(10), branch-name
char(20), balance real) - Each record is 38 bytes.
- Store them sequentially, one after the other
- Record1 at position 0, record2 at position 38,
record3 at position 76 etc
Compactness (350 bytes)?
9Fixed-Length Records
- Simple approach
- Store record i starting from byte n ? (i 1),
where n is the size of each record. - Record access is simple but records may cross
blocks - Modification do not allow records to cross block
boundaries - Insertion of record i Add at the end
- Deletion of record i Two alternatives
- move records
- i 1, . . ., n to i, . . . , n 1
- record n to i
- do not move records, but link all free records
on a free list
10Free Lists
- 2nd approach FLR with Free Lists
- Store the address of the first deleted record in
the file header. - Use this first record to store the address of the
second deleted record, and so on - Can think of these stored addresses as pointers
since they point to the location of a record. - More space efficient representation reuse space
for normal attributes of free records to store
pointers. (No pointers stored in in-use
records.)?
Better handling ins/del
Less compact 420 bytes
11Variable-Length Records
- 3rd approach Variable-length records arise in
database systems in several ways - Storage of multiple record types in a file.
- Record types that allow variable lengths for one
or more fields. - Record types that allow repeating fields or
multivalued attribute. - Byte string representation
- Attach an end-of-record (?) control character to
the end of each record - Difficulty with deletion (leaves holes)?
- Difficulty with growth
4
?
?
?
R1
Field Count
R2
R3
12Variable-Length Records Slotted Page Structure
- 4th approach VLR-SP
- Slotted page header contains
- number of record entries
- end of free space in the block
- location and size of each record
- Records stored at the bottom of the page
- External tuple pointers point to record ptrs
rec-id ltpage-id, slotgt
13Rid (i,N)?
Page i
Rid (i,2)?
Rid (i,1)?
N
Pointer to start of free space
20
16
24
N . . . 2 1
slots
SLOT DIRECTORY
Insertion 1) Use FP to find space and insert
2) Find available ptr in the
directory (or create a new one)?
3) adjust FP and number of records
Deletion ?
14Variable-Length Records (Cont.)?
- Fixed-length representation
- reserved space
- pointers
- 5th approach Fixed Limit Records (for VLR)?
- Reserved space can use fixed-length records of
a known maximum length unused space in shorter
records filled with a null or end-of-record
symbol.
15Pointer Method
- 6th approach Pointer method
- Pointer method
- A variable-length record is represented by a list
of fixed-length records, chained together via
pointers. - Can be used even if the maximum record length is
not known
16Pointer Method (Cont.)?
- Disadvantage to pointer structure space is
wasted in all records except the first in a a
chain. - Solution is to allow two kinds of block in file
- Anchor block contains the first records of
chain - Overflow block contains records other than
those that are the first records of chairs.
17Ordering and Grouping records
- Issue 1
- In what order we place records in a block?
- Heap technique assign anywhere there is space
- Ordered technique maintain an order on some
attribute - So, we can use binary search if selection on this
attribute.
18Sequential File Organization
- Suitable for applications that require sequential
processing of the entire file - The records in the file are ordered by a
search-key
19Sequential File Organization (Cont.)?
- Deletion use pointer chains
- Insertion locate the position where the record
is to be inserted - if there is free space insert there
- if no free space, insert the record in an
overflow block - In either case, pointer chain must be updated
- Need to reorganize the file from time to time to
restore sequential order
20Clustering File Organization
- Simple file structure stores each relation in a
separate file - Can instead store several relations in one file
using a clustering file organization - E.g., clustering organization of customer and
depositor
SELECT account-number, customer-name FROM
depositor d, account a WHERE d.customer-name
a.customer-name
- good for queries involving depositor
customer, and for queries involving one single
customer and his accounts - bad for queries involving only customer
- results in variable size records
21File organization
- Issue 2 In which blocks should records be
placed - Many alternatives exist, each ideal for some
situation , and not so good in others - Heap files Add at the end of the file.Suitable
when typical access is a file scan retrieving all
records. - Sorted FilesKeep the pages ordered. Best if
records must be retrieved in some order, or only
a range of records is needed. - Hashed Files Good for equality selections.
Assign records to blocks according to their value
for some attribute
22Data Dictionary Storage
Data dictionary (also called system catalog)
stores metadata that is, data about data, such
as
- Information about relations
- names of relations
- names and types of attributes of each relation
- names and definitions of views
- integrity constraints
- User and accounting information, including
passwords - Statistical and descriptive data
- number of tuples in each relation
- Physical file organization information
- How relation is stored (sequential/hash/)?
- Physical location of relation
- operating system file name or
- disk addresses of blocks containing records of
the relation - Information about indices (Chapter 12)
23Data dictionary storage
- Stored as tables!!
- E-R diagram?
- Relations, attributes, domains
- Each relation has name, some attributes
- Each attribute has name, length and domain
- Also, views, integrity constraints, indices
- User info (authorizations etc)?
- statistics
24A-name
name
position
1
N
relation
has
attribute
domain
25Data Dictionary Storage (Cont.)?
- A possible catalog representation
-
Relation-metadata (relation-name,
number-of-attributes,
storage-organization, location)Attribute-
metadata (attribute-name, relation-name,
domain-type, position, length)? User-metadata
(user-name, encrypted-password,
group)? Index-metadata (index-name,
relation-name, index-type, index-attributes)? Vi
ew-metadata (view-name, definition)
26Large Objects
- Large objects binary large objects (blobs) and
character large objects (clobs)? - Examples include
- text documents
- graphical data such as images and computer aided
designs audio and video data - Large objects may need to be stored in a
contiguous sequence of bytes when brought into
memory. - If an object is bigger than a page, contiguous
pages of the buffer pool must be allocated to
store it. - May be preferable to disallow direct access to
data, and only allow access through a
file-system-like API, to remove need for
contiguous storage.
27Modifying Large Objects
- If the application requires insert/delete of
bytes from specified regions of an object - B-tree file organization (described later in
Chapter 12) can be modified to represent large
objects - Each leaf page of the tree stores between half
and 1 page worth of data from the object - Special-purpose application programs outside the
database are used to manipulate large objects - Text data treated as a byte string manipulated by
editors and formatters. - Graphical data and audio/video data is typically
created and displayed by separate application - checkout/checkin method for concurrency control
and creation of versions