Structured Files - PowerPoint PPT Presentation

About This Presentation

Title:

Structured Files

Description:

Storage allocation: store tuples in file blocks. Tuple addressing: give tuple an id identifier ... Trailer. Page. Dir. Page. Head. Block. Head. checksum ... – PowerPoint PPT presentation

Number of Views:84

Avg rating:3.0/5.0

Slides: 49

Provided by: ResearchM53

Category:

more less

Transcript and Presenter's Notes

Title: Structured Files

1
Structured Files
Chapter 19
2
What The Record Manager Does

Storage allocation store tuples in file blocks
Tuple addressing give tuple an id identifier
provide fast access via that id.
Enumeration fast enumeration of all relations
tuples
Content addressing give fast accessible via
attribute values.
Maintenance update/delete a tuple and its
access paths.
Protection support for security
encrypt or tuple-granularity access
control.

3
Outline

Representing values
Representing records
Storing records in pages and across pages
Organizing records (entry, relative, key, hash)
Examples of fix/log/log logic.

4
Record Allocation in a Page

Recall
File is a collection of fixed-length pages
(blocks).
File and buffer managers map files to disc/RAM

slot on disk
block
page
Page Dir
Block Head
Block Trailer
Page Head
page body
checksum
5
Page Declares

typedef struct / global page numbers /
FILENO fileno /file where the page
lives /
uint pageno / page number within the
file /
PAGEID, PAGEIDP / /
typedef struct
PAGEID thatsme / identifies the page /
PAGE_TYPE page_type / see description
above /
OBJID object_id / internal id of the
relation,index,etc. /
LSN safe_up_to / page LSN for the WAL -
protocol /
PAGEID previous / often pages are members
of doubly /
PAGEID next / linked lists /
PAGE_STATE status / valid,in-doubt,copy of
something,etc/
int no_entries / entries in page dir
(see below) /
int unused / free bytes not in
freespace /
int freespace / contiguous free bytes
for data /
char stuff / will grow /
PAGE_HEADER, PAGE_PTR / /

6
Different uses of pages

Data Homogeneous record storage
Cluster like Data except many different record
types
Index (access path) hashed or B-tree
Free-space bitmap describes status of ³ 4,000
other pages.
Directory meta-data about this or other files

7
Page Directory Points to Records on Page
Page Header
1st Tuple
2nd
2nd Tuple
3rd Tuple
4thTuple
5th Tuple
Tuples are inserted in this direction
Page directory grows in this direction
2
1
3
4
5

Record id is File, Page, Directory_offset

8
Accessing a Record

Read by TID Lock record shared
locate page
Get semaphore shared
follow directory offset
copy tuple
Give semaphore
Insert by TID Lock record exclusive
locate page
Get semaphore exclusive
Find space
Insert
log insert (tid, new value).
update page LSN, header, directory,
Give semaphore

9
Accessing a Record

Delete by TID Lock record exclusive
locate page
Get semaphore exclusive
Add record to free space
Log delete (tid, old value).
update page LSN, header, directory,
Give semaphore
Update TID much like delete--insert

10
Finding Space for Insert / Update

If tuple fits in page contiguous free-space
easy.
If tuple fits in page free space reorganize
(compress)
Physiological logging makes this cheap.
If tuple does not fit then
leave forwarding address on page.
Optionally leave record prefix on page.
Segment record among several pages.

tid
11
Finding space within a file

Free space table
Summarizes status of many pages
(8KB page gt 64Kb gt 500MB of 8KB data pages)
Good for clustered contiguous allocation
bitmap should be transaction protected
If transaction aborts, page is freed again.
Alternatively, treat bitmap as a hint
Rebuild periodically.

p1
p2
p3
p4
p5
p6
p7
P19
P20
P21
.
.
.
.
f2
F19
21
f3
f4
f5
f6
f7
.
.
.

.....
.
.
.
.
.
.
f17
Free space directories
12
Finding space within a file

Free space cursor/list
Chain should be transaction protected
Else rebuild at restart
do not trust pointers
(free page may be allocated).

file catalog
chain of empty pages
.
.

empty_page_anchor
point_of_insert
page for next insert
13
Tuple Allocation - I
The first strategy maintains a pointer to the
current block for insert (CBI). When that
block fills up, an empty block is requested from
a system service, which then becomes the new
current block for insert.
CBI
where next?
And so on. This is the sequential insert
strategy.
Questions What happens, when the pointer arrives
at the last block? How do we reclaim space freed
by deleted tuples?
14
Incremental Space Expansion - I
When the list of empty blocks is exhausted, there
are two options to find space for new tuples. Let
us assume the following configuration
The first option is to let the CBI pointer
circulate over the set of allocated
blocks, assuming that space is released by
deleted tuples.
And so on. This works as long as enough space is
freed up by deleted tuples. If there are only few
gaps, finding space for a new tuple can become
very expensive, because many blocks have to be
probed sequentially.
The need to probe blocks that are completely
filled can be avoided by maintaining a an array
of bits that contains one bit per block
indicating whether a block is full
15
Naming Tuples (records)
Relative byte address file, offset in file
OK for insert-then-read-only DBs record
can't easily grow. deleted space not easily
reclaimed. Tuple Identifier file, page, index
The design shown below. Main
disadvantage expensive reorganization (fixing
overflows)
pageno
dir_index
fileid
nodeid
pageno
fileid
nodeid
dir_index
3
7446
7446
3
7446
7446
5127
this tuple
this tuple
pseudo -TID
16
Implementing Database Keys

Address record via directory
Address has a ID to allow for invalidation
ID never reused.
Pointer can be swizzled.
Popular with network OO DBs

17
Naming Tuples via Primary Key

Entry Sequenced, Relative primary key is
physical addr
Hash, B-tree primary key is content (primary
key)
Primary Key an alternative to DBkey
B-tree clusters related data
ProblemsB-tree access is slower than Hash.Hash
B-tree keys not fixed lengthbut neither is
node.db_key
Benefit key can grow to LARGE databasesGood
for distributed/partitioned data
Its religious.

18
Datatype Representation

E External representation ASCII, ISO Latin1,
Unicode,...
P Programming language representation
many PL/1, Cobol, C, all have different
VARCHAR
many type mismatches between P and F
interval, datetime, user,...
F File representation "native" types (e.g.
null values, ....).
Lots of mapping functions.
It would be great if F-1(F(x)) x for these
functions, but....
Called the impedance mismatch between DB and PL

19
Datatype Representations

P _ F Implies a special language
(all other languages are 2nd class)
E _ F Use characters for everything.
Problem E changes from country to country!
(all other languages are 2nd class)
No easy way out of this.
Unicode will help most of us and make E_F more
attractive

20
Representing Records
21
Representing Records

struct relations
Uint relation_no / internal id for the
relation /
char owner / user id of the creator /
long creation_date / date when it was
created /
PAGENO current_point_of_insert / free space
done via /
PAGENO empty_page_anchor / free space cursor
method /
Uint no_of_attributes /attributes in
relation /
Uint no_of_fixed_atts / fixed-length
attributes /
Uint no_of_var_atts / variable-length
attributes /
struct attributes p_attr / pointer to
the attributes array /
struct attributes / attributes array /
char attribute_name / external name of
the attribute /
Uint attribute_position / index of the field
in the tuple (1,2,...) /
char attribute_type / this encodes the SQL -
type definition /
Boolean var_length / is it variable_length
field ? /
Boolean nulls_allowed / can field assume NULL
value ? /
char default_value / value assumed if none
stored in tuple /
Uint field_length / maximum length of field
/
int accumulated_offset / explained later
/

22
Representing Records

Generic header
(rid, tid, fields)
all fixed length encoding
(fat records, fast-simple code
max lt page path length)
variable fields have length
(short records, slow code)
type-length-value
(simple slow code, easy reorg)
fixed ptrs to variables.
(compact, fast code)

general prefix to all tuple
representations
relation-id
number of fields in
tuple-id
the tuple or
actual tuple length
number of
fields
F2
F3
F4
F6
F5
F1
3
4
2
4
8
10
name
n
m
L
tuple
length
F
F
F
F
m
F
4
2
6
F
1
3
5
L
3
4
2
4
n
tuple
length
number of
F1
F2
F3
F4
F5
F6
fields
3
4
2
4
23
Representing Records (Reuter Recommends)
24
Some Details

Representing null values
missing field
special value
extra field
bitmap
Representing keys
efficient comparison is important
store "conditioned" key so simple byte-compare.
Flip integer sign (so negative sorts low)
Flip float so exponent first, mantissa second,
flipped signs
Compress varchars.
MANY refinements.
Want an order-preserving compression.

25
Fat Records (Longer Than a Page)
Record must fit on page. Long fields
segregated to separate page may be good in some
cases (Multi-media DBs) Overflow page
chains Segment record across pages
26
Obese records (Longer Than 10 Pages)

If record is super-large, then may want to index
into it quickly.
Obvious" design is standard tree.
Record is root of tree.
Grow levels when one fills.
Allows blob growth, update,...

27
Non-Normalized Relations
28
Structured File Definition
29
File Layouts

Unstructured
a sequence of bytes
Structured,
Entry Sequenced.
Records inserted at end
Records cannot grow
key is RBA (relative byte address)
Relative
fixed size record slots
records limited by that size
key is relative record number

30
Associative File Types

Hashed
Records addressed by key field(s)
bucket has list of records
overflow to other buckets
or to overflow pages.
Key Sequenced
Records addressed by keyfield(s)
Records in sorted order.
either sorting or b-tree or...

31
Parameters at Create

Database
Record type (fields)
Key
Organization Entry Sequenced, Relative, Hashed,
Key Sequenced
Block size (page size)
Extent size (storage area)
Partitioning (among discs or nodes) by key.
Attributes access control
allocation and archive strategy
transactional
lifetime, zero on free, and on and on ....

32
Parameters at Create

"Secondary" indices.
Primary key is....(e.g. customer number).
Secondary key is social security number
Non-Unique secondary key is Last_Name, First_name
Secondary indices can be unique or not
and hashed or Key Sequenced
index is like a table.
fields of index are
secondary key, primary key
So can define index on any
kind of base table

33
Secondary Index Example

Base table is key-sequenced on CustomerNumber.
Index table is key sequence on Name-CustomerNumber
.
Index can be a replica of the base table in
another order.Transaction recovery and locking
keeps them consistent.
Tuple management system
Maintains indices (insert, update, delete)
Navigates to base table via secondary index as
one request.

34
What happens when you open a relation?

Many files get opened. Read directory (catalog)
Partitions,
Indices

35
Once OPEN, Application can SCAN the relation

Scan is a row column subset
SELECT ltcolumn listgt
FROM lttablegt
WHERE ltpredicategt
With a specified start/stop key
AND ltkeygt BETWEEN ltlowgt AND lthighgt
In a specified order (supported by a secondary
index)
ASCENDING DESCENDING
A locking protocol Serializable Repeatable
Read Committed Read
Uncommited Read Skip Uncommitted
TIMEOUT ltsecondsgt

36
SCAN States
37
SCAN States How they change

On error, scan state does not change.
On open,
scan is before after the first last set
element
if scan is ascending descending
On fetch next
if not end of set at end of set
scan is at next before first after last
element
On insert
scan is at element
On delete
scan is at the missing element

38
SCAN States How they change

On update scan position is not affected.
if tuple moves (because ordering attributes
affected)
scan key position is unchanged
Can create Halloween problem (give everybody a
10 raise)
But scan enumerates entire set.

39
SCAN Data structure

enum
SCAN_STATE TOP, ON, BOTTOM, BETWEEN, NIL /
the 5 scan states /
enum
ISOLATION UNCOMMITTED_READ,..., SERIALIZABLE,
READ_PAST, BOUNCE
typedef struct
Uint scanid / handle for scan returned by
open_scan/
TRID owner / which transaction uses the
scan /
FILE fileid / handle of file the scan is
defined on /
char scan_key / specification of scan key
attribute(s) /
char start_key / lower bound of scan
range /
char stop_key / upper bound of scan range
/
char filter / qualifying predicate for all
tuples in scan /
ISOLATION isol_degree / locking policy
for tuples accessed /
SCAN_STATE scan_state / state of scan
pointer /
char scan_key / scan key the scan is
before, at, or after /
SCANCB, SCANCBP

40
Entry Sequenced File Insert

fix page descriptor page
find eof page
fix eof data page
if no space in page
lt see next slide for transaction to advance
pagegt
unfix descriptor page
add record to page (updating on-page directory)
generate log record (new value) and update page
LSN.
compute lock name of record (based on TID).
get lock on record
unfix data page.
To make this work, MUST be assured lock is
available
Otherwise page sem can (undetected)deadlock with
lock wait
So, UNDO of entry-sequence insert does not free
the space,
it just invalidates the record.

41
Entry Sequenced File Insert If EOF page or File
is Full
Top level transaction to extend file

Begin new transaction (will not abort if insert
aborts)
to extend file EOF page. (leaves insert
transaction)
unfix directory page
if file full, panic()
start a top-level transaction
fix the directory
advance the page eof updating directory and
freespace
log the changes
fix the data page
format it
log the change
unfix the directory and data page
commit the transaction resume insert
transaction
fix directory, fix eof, check to see that there
is room for the record.

42
Entry Sequenced Operations

Delete by RBA.
get record lock
(node,
file,
RBA) exclusive
if timeout, deadlock, error
return error
Fix page
Mark record invalid
Generate log record
Update page lsn
Unfix page.

Read by RBA. get record lock (node,
file, RBA) shared if timeout,
deadlock, error return error Fix page if
record valid copy to buffer Unfix page Return
record or null
Note both must test that RBA lt EOF. Update,
ReadNext, ... are similar.
43
Relative Files

Records fit in fixed-length slots
Operation on slots.
Separate transactions extend the file EOF
(allocate and format pages)

44
Relative Files

Read Insert Update Delete by key are all
easy
Insert "near" key works by
Plan A
look at page
Look at neighbor pages (left, right, left,
right,...)
Plan B
allocate overflow page for base page
Plan C
Look in free-space bit-map or byte (full) map.

45
Key Sequenced or Hashed Files

Key sequenced
is subject of next chapter.

46
File Clustering

Different record types kept in same page/file
For example Master and detail records of an
invoice.Detail records always accessed if master
is.
SituationMaster key InvoiceNoDetail key
InvoiceNo Foreign Key References Master
SequenceNo
TechniqueHash or Key sequence Master on
InvoiceNoHash or Key Sequence Detail on
InvoiceNoSequenceNo in same table.

47
Clustering different record types in a page