Title: ICS 214A: Database Management Systems
1ICS 214A Database Management Systems
- Notes 04 Record-Oriented File Systems
- Professor Chen Li
2Overview
- Provides a tuple read/write interface to files
- Provides the abstraction of a file as a
collection of records on top of the basic file
system. - Records can be
- fixed size or variable length
- short, long, or very long
- attributes can be fixed length or variable length
- simple or complex (e.g., containing set-valued
attributes) - Operations supported
- create, delete, open, close, alter, drop
- read, insert, update, delete record
- scan all records in a file
3Record Manager Functionality
- Storage allocation store tuples in file blocks
- Tuple addressing give tuple an identifier (id)
and provide fast access via that id. - Enumeration fast enumeration of all tuples of a
relation. - Content addressing give fast accessible via
attribute values. - Maintenance update/delete a tuple and its
access paths. - Protection support for security encryption or
tuple-granularity access control.
4Basic-Item Representations
- Integer (short) 2 bytes
- e.g., 35 ---gt 00000000 00100011
- Real, floating point
- Characters use ASCII coding
- Boolean TRUE -gt 11111111, FALSE-gt00000000
- Dates
- Integer, days since Jan 1, 1900
- 8 characters, YYYYMMDD
- 7 characters, YYYYDDD (not YYMMDD! Why?)
- Time
- Integer seconds since midnight
- Characters HHMMSSFF
5String representation
- String of characters
- Null terminated
-
- Length given
-
- - Fixed length
3
6Record Representation
- Record a collection of data items
- CREATE TABLE emp(e int, name char(10), dept
char(2)) - Type of records
- Fixed vs variable format
- Fixed vs variable length
7Fixed format
- Format defined by a schema
- fields
- type of each field
- order in record
- meaning of each field
- Example (fixed format and fixed length)
- E, 2 byte integer
- E.name, 10 char.
- Dept, 2 byte code
55
s m i t h
02
Records
83
j o n e s
01
8Variable format
- Record itself contains format
- Self Describing,
- Example variable format and length
Fields Code identifying field as
E Integer type Code for Ename String
type Length of str.
9Why Variable format?
- Sparse records
- Evolving formats
- Repeating fields
- Different books have different numbers of authors
- A fixed schema might not be suitable
- Pick the maximal number of authors?
- Waste space
3
B_title Java
Author Tom
Author Jack
10Variants between fixed - variable formats
- Example 1 Include record head
- record type
- record length
- time stamp
- other stuff ...
1310
5
27
. . . .
timestamp
type
length
11Example 2
- Hybrid format
- one part is fixed, other variable
- E.g. All employees have E, name, dept. Other
fields vary.
25
Smith
Toy
2
retired
Hobbychess
of var fields
12Variations in record internal organization
Field length
3
F3
10
F1
5
F2
12
total size
3
32
5
15
20
F1
F2
F3
0 1 2 3 4 5
15 20
offsets
13Long Fields
- Tuples containing long fields may not fit on a
page - Two approaches
- Separate pages for long fields. Store long fields
on separate pages and keep pointer in the record. - Tuple Fragmentation fragment a tuple into
multiple pages and store tuple as a linked list
of parts.
14Questions
- We have seen examples for
- Fixed format and length records
- Variable format and length records
- Does fixed format and variable length make sense?
- Does variable format and fixed length make sense?
- Other issues
- Compression
- within record, e.g. code selection
- collection of records, e.g. find common patterns
- Encryption
15Next how to place records in blocks?
records
assume fixed length blocks
16Place records on disk outline
- (1) Separating records
- (2) Spanned vs. unspanned
- (3) Mixed record types clustering
- (4) Split records
- (5) Sequencing
- (6) Storing records in a page
17(1) Separating records
R2
R1
R3
- Approaches
- Fixed size no need to separate
- Variable length
- special marker
- give record lengths (or offsets)
- within each record
- in block header each record has a length field
18(2) Spanned vs. Unspanned
- Unspanned records must be within one block
- block 1 block 2
- ...
- Spanned records can be stored in different
blocks - block 1 block 2 ...
R1
R2
R3
R4
R5
R1
R2
R3 (a)
R3 (b)
R6
R5
R4
R7 (a)
19Storing spanned records
- need indication need indication
- of a partial record of continuation
- plus a pointer to rest ( from where?)
20Comparisons
- Unspanned
- simpler,
- but may waste space
- Spanned
- essential if record size gt block size
21Example
- 106 records
- each of size 2,050 bytes (fixed)
- block size 4096 bytes
- Therefore
- Total space 4 x 109
- Total wasted 2 x 109
- Utilization 50
22BLOBS
- Large values for records
- Images (GIF, JPG)
- Movies (MPEG)
- Signals (audio, radar,)
- Called binary, large objects (BLOBS)
- Storage
- Sequence of blocks
- Using consecutive blocks or linked-list of blocks
- We might stripe data across multiple disks to
improve output bandwidth - Retrieval
- Sequential scan
- Random access need an indexing structure
23(3) Mixed record types
- Mixed - records of different types
- e.g. EMPLOYEE, DEPT tuples allowed in same block
- Why? Clustering
- Records that are frequently accessed together
should be in the same block - Variation of clustering store related records
in the cylinder
EMP
e1
DEPT
d1
DEPT
d2
24Clustering Example
- Q1 select A, C_NAME, C_CITY,
- from DEPOSIT, CUSTOMER
- where DEPOSIT.C_NAME CUSTOMER.C.NAME
- Q2 SELECT FROM CUSTOMER
- If Q1 is frequent, clustering is good
- But if Q2 is frequent clustering is bad
CUSTOMER,NAMESMITH
DEPOSIT,NAMESMITH
DEPOSIT,NAMESMITH
25(4) Split records
- Typically for hybrid format
- Fixed fields in one block
- Variable fields in another block
R1 (a)
R1 (b)
Blocks with variable fields.
R2 (a)
R2 (b)
R2 (c)
26Question
- What is difference between
- splitting records, and
- simply using two different record types?
- Efficiency?
27(5) Sequencing
- Ordering records in file (and block) by some key
value - Why sequencing?
- Typically to make it possible to efficiently read
records in order
28Sequencing Options
- (a) Next record physically contiguous
- ...
- (b) Linked
Next (R1)
R1
R1
Next (R1)
29Sequencing Options (cont)
- (c) Overflow area
- Records
- in sequence
- We will talk about indexing structures later!
R1
R2
R3
R4
R5
30(6) Storing Records in Pages
Free space
- Different structures for fixed-length records and
variable-length records - Fixed-length just the starting position
- Variable-length need the length as well
31Block header
- May contain
- - File ID (or RELATION or DB ID)
- - This block ID
- - Record directory
- - Pointer to free space
- - Type of block (e.g. contains recs type 4 is
overflow, ) - - Pointer to other blocks like it
- - Timestamp ...
32Managing space within each page
- Each page header stores (besides other things)
- information about contiguous free space and
- unused (possibly segmented) space on the page.
E.g., bitmap - Insertion, deletion, update, and reorganization
change the amounts of contiguous free space and
unused space
33Operations on Pages
- Read a record
- Insert a new record
- Delete an existing record
- Update a specific record
- Bulk load tuples into a relation
- Reorganize a page
- Except reading a record, all other operations can
cause changes to free space on the page.
34Finding space to insert a tuple
- If tuple fits in page contiguous free space
easy. - If tuple fits in page contiguous free space
unused (segmented) space - reorganize the page and compress
- If tuple does not fit, we could split the record
and store the pieces in two pages, and leave
leave forwarding address on the first page.
35Tuple Identification
- Option 1 Relative byte address offset in the
file - OK for insert-then-read-only DBs
- Disadvantages record can't easily grow, deleted
space not easily reclaimed. - Option 2 (indirect) page, index
- Disadvantages expensive reorganization to fix
overflows
pageno
dir_index
3
7446
7446
this tuple
36Indirection within a block
- In the indirect option
- Notice that the offset is the slot offset, not
the offset of the starting position of the record - Reason this indirection allows easy record
movements within a block - E.g., when reorganizing space within a page, we
can move the records and set the new offsets of
the header slots. - This change will not affect the rids.
37Next pointer swizzling
- Pointers in records
- Records can have pointers pointing to other
records - Useful for object-oriented databases
- How to use pointers efficiently in memory
38Database address versus memory address
dbaddr
memory
database address
Memory address
Translation table
- Database address
- usually record physical address
- Memory address
- the address of the record in the memory, if it
has been fetched from the hard disk. - Records not fetched into memory dont have such
addr - The translation table does the mapping
39Pointer swizzling
dbaddr
memory
Record A
Record B
In memory
Memory address
Swizzled
Translation table
- Suppose record A has a pointer to record B
- If B is memory, we need to go through the
translation table to find Bs address in memory - We can save this time by pointing from A to the
Bs memory address - Called pointer swizzling
40Pointer swizzling (cont)
Record A
Record B
disk
unswizzled
- If record B is not in memory, then As pointer is
unswizzled - Thus we need a bit to indicate whether As
pointer is swizzled - This swizziling information needs to be
maintained when - Record A is read from or write to disk
- Record B is read from or write to disk