ICS 214A: Database Management Systems - PowerPoint PPT Presentation

1 / 40

About This Presentation

Title:

ICS 214A: Database Management Systems

Description:

Hobby:chess # of var. fields. ICS214A. Notes 04. 12. Variations ... Each page header stores (besides other things): information about contiguous free space and ... – PowerPoint PPT presentation

Number of Views:24

Avg rating:3.0/5.0

Slides: 41

Provided by: che7

Category:

more less

Transcript and Presenter's Notes

Title: ICS 214A: Database Management Systems

1
ICS 214A Database Management Systems

Notes 04 Record-Oriented File Systems
Professor Chen Li

2
Overview

Provides a tuple read/write interface to files
Provides the abstraction of a file as a
collection of records on top of the basic file
system.
Records can be
fixed size or variable length
short, long, or very long
attributes can be fixed length or variable length
simple or complex (e.g., containing set-valued
attributes)
Operations supported
create, delete, open, close, alter, drop
read, insert, update, delete record
scan all records in a file

3
Record Manager Functionality

Storage allocation store tuples in file blocks
Tuple addressing give tuple an identifier (id)
and provide fast access via that id.
Enumeration fast enumeration of all tuples of a
relation.
Content addressing give fast accessible via
attribute values.
Maintenance update/delete a tuple and its
access paths.
Protection support for security encryption or
tuple-granularity access control.

4
Basic-Item Representations

Integer (short) 2 bytes
e.g., 35 ---gt 00000000 00100011
Real, floating point
Characters use ASCII coding
Boolean TRUE -gt 11111111, FALSE-gt00000000
Dates
Integer, days since Jan 1, 1900
8 characters, YYYYMMDD
7 characters, YYYYDDD (not YYMMDD! Why?)
Time
Integer seconds since midnight
Characters HHMMSSFF

5
String representation

String of characters
Null terminated
Length given
- Fixed length

3
6
Record Representation

Record a collection of data items
CREATE TABLE emp(e int, name char(10), dept
char(2))
Type of records
Fixed vs variable format
Fixed vs variable length

7
Fixed format

Format defined by a schema
fields
type of each field
order in record
meaning of each field
Example (fixed format and fixed length)
E, 2 byte integer
E.name, 10 char.
Dept, 2 byte code

55
s m i t h
02
Records
83
j o n e s
01
8
Variable format

Record itself contains format
Self Describing,
Example variable format and length

Fields Code identifying field as
E Integer type Code for Ename String
type Length of str.
9
Why Variable format?

Sparse records
Evolving formats
Repeating fields
Different books have different numbers of authors
A fixed schema might not be suitable
Pick the maximal number of authors?
Waste space

3
B_title Java
Author Tom
Author Jack
10
Variants between fixed - variable formats

Example 1 Include record head
record type
record length
time stamp
other stuff ...

1310
5
27
. . . .
timestamp
type
length
11
Example 2

Hybrid format
one part is fixed, other variable
E.g. All employees have E, name, dept. Other
fields vary.

25
Smith
Toy
2
retired
Hobbychess
of var fields
12
Variations in record internal organization
Field length

3
F3
10
F1
5
F2
12
total size
3
32
5
15
20
F1
F2
F3
0 1 2 3 4 5
15 20
offsets
13
Long Fields

Tuples containing long fields may not fit on a
page
Two approaches
Separate pages for long fields. Store long fields
on separate pages and keep pointer in the record.
Tuple Fragmentation fragment a tuple into
multiple pages and store tuple as a linked list
of parts.

14
Questions

We have seen examples for
Fixed format and length records
Variable format and length records
Does fixed format and variable length make sense?
Does variable format and fixed length make sense?
Other issues
Compression
within record, e.g. code selection
collection of records, e.g. find common patterns
Encryption

15
Next how to place records in blocks?
records

blocks ...
a file

assume fixed length blocks
16
Place records on disk outline

(1) Separating records
(2) Spanned vs. unspanned
(3) Mixed record types clustering
(4) Split records
(5) Sequencing
(6) Storing records in a page

17
(1) Separating records
R2
R1
R3

Approaches
Fixed size no need to separate
Variable length
special marker
give record lengths (or offsets)
within each record
in block header each record has a length field

18
(2) Spanned vs. Unspanned

Unspanned records must be within one block
block 1 block 2
...
Spanned records can be stored in different
blocks
block 1 block 2 ...

R1
R2
R3
R4
R5
R1
R2
R3 (a)
R3 (b)
R6
R5
R4
R7 (a)
19
Storing spanned records

need indication need indication
of a partial record of continuation
plus a pointer to rest ( from where?)

20
Comparisons

Unspanned
simpler,
but may waste space
Spanned
essential if record size gt block size

21
Example

106 records
each of size 2,050 bytes (fixed)
block size 4096 bytes
Therefore
Total space 4 x 109
Total wasted 2 x 109
Utilization 50

22
BLOBS

Large values for records
Images (GIF, JPG)
Movies (MPEG)
Signals (audio, radar,)
Called binary, large objects (BLOBS)
Storage
Sequence of blocks
Using consecutive blocks or linked-list of blocks
We might stripe data across multiple disks to
improve output bandwidth
Retrieval
Sequential scan
Random access need an indexing structure

23
(3) Mixed record types

Mixed - records of different types
e.g. EMPLOYEE, DEPT tuples allowed in same block
Why? Clustering
Records that are frequently accessed together
should be in the same block
Variation of clustering store related records
in the cylinder

EMP
e1
DEPT
d1
DEPT
d2
24
Clustering Example

Q1 select A, C_NAME, C_CITY,
from DEPOSIT, CUSTOMER
where DEPOSIT.C_NAME CUSTOMER.C.NAME
Q2 SELECT FROM CUSTOMER
If Q1 is frequent, clustering is good
But if Q2 is frequent clustering is bad

CUSTOMER,NAMESMITH
DEPOSIT,NAMESMITH
DEPOSIT,NAMESMITH
25
(4) Split records

Typically for hybrid format
Fixed fields in one block
Variable fields in another block

R1 (a)
R1 (b)
Blocks with variable fields.
R2 (a)
R2 (b)

Block with fixed fields.

R2 (c)
26
Question

What is difference between
splitting records, and
simply using two different record types?
Efficiency?

27
(5) Sequencing

Ordering records in file (and block) by some key
value
Why sequencing?
Typically to make it possible to efficiently read
records in order

28
Sequencing Options

(a) Next record physically contiguous
...
(b) Linked

Next (R1)
R1
R1
Next (R1)
29
Sequencing Options (cont)

(c) Overflow area
Records
in sequence
We will talk about indexing structures later!

R1
R2
R3
R4
R5
30
(6) Storing Records in Pages
Free space

Different structures for fixed-length records and
variable-length records
Fixed-length just the starting position
Variable-length need the length as well

31
Block header

May contain
- File ID (or RELATION or DB ID)
- This block ID
- Record directory
- Pointer to free space
- Type of block (e.g. contains recs type 4 is
overflow, )
- Pointer to other blocks like it
- Timestamp ...

32
Managing space within each page

Each page header stores (besides other things)
information about contiguous free space and
unused (possibly segmented) space on the page.
E.g., bitmap
Insertion, deletion, update, and reorganization
change the amounts of contiguous free space and
unused space

33
Operations on Pages

Read a record
Insert a new record
Delete an existing record
Update a specific record
Bulk load tuples into a relation
Reorganize a page
Except reading a record, all other operations can
cause changes to free space on the page.

34
Finding space to insert a tuple

If tuple fits in page contiguous free space
easy.
If tuple fits in page contiguous free space
unused (segmented) space
reorganize the page and compress
If tuple does not fit, we could split the record
and store the pieces in two pages, and leave
leave forwarding address on the first page.

35
Tuple Identification

Option 1 Relative byte address offset in the
file
OK for insert-then-read-only DBs
Disadvantages record can't easily grow, deleted
space not easily reclaimed.
Option 2 (indirect) page, index
Disadvantages expensive reorganization to fix
overflows

pageno
dir_index
3
7446
7446
this tuple
36
Indirection within a block

In the indirect option
Notice that the offset is the slot offset, not
the offset of the starting position of the record
Reason this indirection allows easy record
movements within a block
E.g., when reorganizing space within a page, we
can move the records and set the new offsets of
the header slots.
This change will not affect the rids.

37
Next pointer swizzling

Pointers in records
Records can have pointers pointing to other
records
Useful for object-oriented databases
How to use pointers efficiently in memory

38
Database address versus memory address
dbaddr
memory
database address
Memory address
Translation table

Database address
usually record physical address
Memory address
the address of the record in the memory, if it
has been fetched from the hard disk.
Records not fetched into memory dont have such
addr
The translation table does the mapping

39
Pointer swizzling
dbaddr
memory
Record A
Record B
In memory
Memory address
Swizzled
Translation table

Suppose record A has a pointer to record B
If B is memory, we need to go through the
translation table to find Bs address in memory
We can save this time by pointing from A to the
Bs memory address
Called pointer swizzling

40
Pointer swizzling (cont)
Record A
Record B
disk
unswizzled