ICS 214A: Database Management Systems - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

ICS 214A: Database Management Systems

Description:

Hobby:chess # of var. fields. ICS214A. Notes 04. 12. Variations ... Each page header stores (besides other things): information about contiguous free space and ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 41
Provided by: che7
Category:

less

Transcript and Presenter's Notes

Title: ICS 214A: Database Management Systems


1
ICS 214A Database Management Systems
  • Notes 04 Record-Oriented File Systems
  • Professor Chen Li

2
Overview
  • Provides a tuple read/write interface to files
  • Provides the abstraction of a file as a
    collection of records on top of the basic file
    system.
  • Records can be
  • fixed size or variable length
  • short, long, or very long
  • attributes can be fixed length or variable length
  • simple or complex (e.g., containing set-valued
    attributes)
  • Operations supported
  • create, delete, open, close, alter, drop
  • read, insert, update, delete record
  • scan all records in a file

3
Record Manager Functionality
  • Storage allocation store tuples in file blocks
  • Tuple addressing give tuple an identifier (id)
    and provide fast access via that id.
  • Enumeration fast enumeration of all tuples of a
    relation.
  • Content addressing give fast accessible via
    attribute values.
  • Maintenance update/delete a tuple and its
    access paths.
  • Protection support for security encryption or
    tuple-granularity access control.

4
Basic-Item Representations
  • Integer (short) 2 bytes
  • e.g., 35 ---gt 00000000 00100011
  • Real, floating point
  • Characters use ASCII coding
  • Boolean TRUE -gt 11111111, FALSE-gt00000000
  • Dates
  • Integer, days since Jan 1, 1900
  • 8 characters, YYYYMMDD
  • 7 characters, YYYYDDD (not YYMMDD! Why?)
  • Time
  • Integer seconds since midnight
  • Characters HHMMSSFF

5
String representation
  • String of characters
  • Null terminated
  • Length given
  • - Fixed length

3
6
Record Representation
  • Record a collection of data items
  • CREATE TABLE emp(e int, name char(10), dept
    char(2))
  • Type of records
  • Fixed vs variable format
  • Fixed vs variable length

7
Fixed format
  • Format defined by a schema
  • fields
  • type of each field
  • order in record
  • meaning of each field
  • Example (fixed format and fixed length)
  • E, 2 byte integer
  • E.name, 10 char.
  • Dept, 2 byte code

55
s m i t h
02
Records
83
j o n e s
01
8
Variable format
  • Record itself contains format
  • Self Describing,
  • Example variable format and length

Fields Code identifying field as
E Integer type Code for Ename String
type Length of str.
9
Why Variable format?
  • Sparse records
  • Evolving formats
  • Repeating fields
  • Different books have different numbers of authors
  • A fixed schema might not be suitable
  • Pick the maximal number of authors?
  • Waste space

3
B_title Java
Author Tom
Author Jack
10
Variants between fixed - variable formats
  • Example 1 Include record head
  • record type
  • record length
  • time stamp
  • other stuff ...

1310
5
27
. . . .
timestamp
type
length
11
Example 2
  • Hybrid format
  • one part is fixed, other variable
  • E.g. All employees have E, name, dept. Other
    fields vary.

25
Smith
Toy
2
retired
Hobbychess
of var fields
12
Variations in record internal organization
Field length

3
F3
10
F1
5
F2
12
total size
3
32
5
15
20
F1
F2
F3
0 1 2 3 4 5
15 20
offsets
13
Long Fields
  • Tuples containing long fields may not fit on a
    page
  • Two approaches
  • Separate pages for long fields. Store long fields
    on separate pages and keep pointer in the record.
  • Tuple Fragmentation fragment a tuple into
    multiple pages and store tuple as a linked list
    of parts.

14
Questions
  • We have seen examples for
  • Fixed format and length records
  • Variable format and length records
  • Does fixed format and variable length make sense?
  • Does variable format and fixed length make sense?
  • Other issues
  • Compression
  • within record, e.g. code selection
  • collection of records, e.g. find common patterns
  • Encryption

15
Next how to place records in blocks?
records
  • blocks ...
  • a file

assume fixed length blocks
16
Place records on disk outline
  • (1) Separating records
  • (2) Spanned vs. unspanned
  • (3) Mixed record types clustering
  • (4) Split records
  • (5) Sequencing
  • (6) Storing records in a page

17
(1) Separating records
R2
R1
R3
  • Approaches
  • Fixed size no need to separate
  • Variable length
  • special marker
  • give record lengths (or offsets)
  • within each record
  • in block header each record has a length field

18
(2) Spanned vs. Unspanned
  • Unspanned records must be within one block
  • block 1 block 2
  • ...
  • Spanned records can be stored in different
    blocks
  • block 1 block 2 ...

R1
R2
R3
R4
R5
R1
R2
R3 (a)
R3 (b)
R6
R5
R4
R7 (a)
19
Storing spanned records
  • need indication need indication
  • of a partial record of continuation
  • plus a pointer to rest ( from where?)

20
Comparisons
  • Unspanned
  • simpler,
  • but may waste space
  • Spanned
  • essential if record size gt block size

21
Example
  • 106 records
  • each of size 2,050 bytes (fixed)
  • block size 4096 bytes
  • Therefore
  • Total space 4 x 109
  • Total wasted 2 x 109
  • Utilization 50

22
BLOBS
  • Large values for records
  • Images (GIF, JPG)
  • Movies (MPEG)
  • Signals (audio, radar,)
  • Called binary, large objects (BLOBS)
  • Storage
  • Sequence of blocks
  • Using consecutive blocks or linked-list of blocks
  • We might stripe data across multiple disks to
    improve output bandwidth
  • Retrieval
  • Sequential scan
  • Random access need an indexing structure

23
(3) Mixed record types
  • Mixed - records of different types
  • e.g. EMPLOYEE, DEPT tuples allowed in same block
  • Why? Clustering
  • Records that are frequently accessed together
    should be in the same block
  • Variation of clustering store related records
    in the cylinder

EMP
e1
DEPT
d1
DEPT
d2
24
Clustering Example
  • Q1 select A, C_NAME, C_CITY,
  • from DEPOSIT, CUSTOMER
  • where DEPOSIT.C_NAME CUSTOMER.C.NAME
  • Q2 SELECT FROM CUSTOMER
  • If Q1 is frequent, clustering is good
  • But if Q2 is frequent clustering is bad

CUSTOMER,NAMESMITH
DEPOSIT,NAMESMITH
DEPOSIT,NAMESMITH
25
(4) Split records
  • Typically for hybrid format
  • Fixed fields in one block
  • Variable fields in another block

R1 (a)
R1 (b)
Blocks with variable fields.
R2 (a)
R2 (b)
  • Block with fixed fields.

R2 (c)
26
Question
  • What is difference between
  • splitting records, and
  • simply using two different record types?
  • Efficiency?

27
(5) Sequencing
  • Ordering records in file (and block) by some key
    value
  • Why sequencing?
  • Typically to make it possible to efficiently read
    records in order

28
Sequencing Options
  • (a) Next record physically contiguous
  • ...
  • (b) Linked

Next (R1)
R1
R1
Next (R1)
29
Sequencing Options (cont)
  • (c) Overflow area
  • Records
  • in sequence
  • We will talk about indexing structures later!

R1
R2
R3
R4
R5
30
(6) Storing Records in Pages
Free space
  • Different structures for fixed-length records and
    variable-length records
  • Fixed-length just the starting position
  • Variable-length need the length as well

31
Block header
  • May contain
  • - File ID (or RELATION or DB ID)
  • - This block ID
  • - Record directory
  • - Pointer to free space
  • - Type of block (e.g. contains recs type 4 is
    overflow, )
  • - Pointer to other blocks like it
  • - Timestamp ...

32
Managing space within each page
  • Each page header stores (besides other things)
  • information about contiguous free space and
  • unused (possibly segmented) space on the page.
    E.g., bitmap
  • Insertion, deletion, update, and reorganization
    change the amounts of contiguous free space and
    unused space

33
Operations on Pages
  • Read a record
  • Insert a new record
  • Delete an existing record
  • Update a specific record
  • Bulk load tuples into a relation
  • Reorganize a page
  • Except reading a record, all other operations can
    cause changes to free space on the page.

34
Finding space to insert a tuple
  • If tuple fits in page contiguous free space
    easy.
  • If tuple fits in page contiguous free space
    unused (segmented) space
  • reorganize the page and compress
  • If tuple does not fit, we could split the record
    and store the pieces in two pages, and leave
    leave forwarding address on the first page.

35
Tuple Identification
  • Option 1 Relative byte address offset in the
    file
  • OK for insert-then-read-only DBs
  • Disadvantages record can't easily grow, deleted
    space not easily reclaimed.
  • Option 2 (indirect) page, index
  • Disadvantages expensive reorganization to fix
    overflows

pageno
dir_index
3
7446
7446
this tuple
36
Indirection within a block
  • In the indirect option
  • Notice that the offset is the slot offset, not
    the offset of the starting position of the record
  • Reason this indirection allows easy record
    movements within a block
  • E.g., when reorganizing space within a page, we
    can move the records and set the new offsets of
    the header slots.
  • This change will not affect the rids.

37
Next pointer swizzling
  • Pointers in records
  • Records can have pointers pointing to other
    records
  • Useful for object-oriented databases
  • How to use pointers efficiently in memory

38
Database address versus memory address
dbaddr
memory
database address
Memory address
Translation table
  • Database address
  • usually record physical address
  • Memory address
  • the address of the record in the memory, if it
    has been fetched from the hard disk.
  • Records not fetched into memory dont have such
    addr
  • The translation table does the mapping

39
Pointer swizzling
dbaddr
memory
Record A
Record B
In memory
Memory address
Swizzled
Translation table
  • Suppose record A has a pointer to record B
  • If B is memory, we need to go through the
    translation table to find Bs address in memory
  • We can save this time by pointing from A to the
    Bs memory address
  • Called pointer swizzling

40
Pointer swizzling (cont)
Record A
Record B
disk
unswizzled
  • If record B is not in memory, then As pointer is
    unswizzled
  • Thus we need a bit to indicate whether As
    pointer is swizzled
  • This swizziling information needs to be
    maintained when
  • Record A is read from or write to disk
  • Record B is read from or write to disk
Write a Comment
User Comments (0)
About PowerShow.com