Structured Files - PowerPoint PPT Presentation

About This Presentation
Title:

Structured Files

Description:

Storage allocation: store tuples in file blocks. Tuple addressing: give tuple an id identifier ... Trailer. Page. Dir. Page. Head. Block. Head. checksum ... – PowerPoint PPT presentation

Number of Views:84
Avg rating:3.0/5.0
Slides: 49
Provided by: ResearchM53
Category:

less

Transcript and Presenter's Notes

Title: Structured Files


1
Structured Files
Chapter 19
2
What The Record Manager Does
  • Storage allocation store tuples in file blocks
  • Tuple addressing give tuple an id identifier
  • provide fast access via that id.
  • Enumeration fast enumeration of all relations
    tuples
  • Content addressing give fast accessible via
    attribute values.
  • Maintenance update/delete a tuple and its
    access paths.
  • Protection support for security
  • encrypt or tuple-granularity access
    control.

3
Outline
  • Representing values
  • Representing records
  • Storing records in pages and across pages
  • Organizing records (entry, relative, key, hash)
  • Examples of fix/log/log logic.

4
Record Allocation in a Page
  • Recall
  • File is a collection of fixed-length pages
    (blocks).
  • File and buffer managers map files to disc/RAM

slot on disk
block
page
Page Dir
Block Head
Block Trailer
Page Head
page body
checksum
5
Page Declares
  • typedef struct / global page numbers /
  • FILENO fileno /file where the page
    lives /
  • uint pageno / page number within the
    file /
  • PAGEID, PAGEIDP / /
  • typedef struct
  • PAGEID thatsme / identifies the page /
  • PAGE_TYPE page_type / see description
    above /
  • OBJID object_id / internal id of the
    relation,index,etc. /
  • LSN safe_up_to / page LSN for the WAL -
    protocol /
  • PAGEID previous / often pages are members
    of doubly /
  • PAGEID next / linked lists /
  • PAGE_STATE status / valid,in-doubt,copy of
    something,etc/
  • int no_entries / entries in page dir
    (see below) /
  • int unused / free bytes not in
    freespace /
  • int freespace / contiguous free bytes
    for data /
  • char stuff / will grow /
  • PAGE_HEADER, PAGE_PTR / /

6
Different uses of pages
  • Data Homogeneous record storage
  • Cluster like Data except many different record
    types
  • Index (access path) hashed or B-tree
  • Free-space bitmap describes status of ³ 4,000
    other pages.
  • Directory meta-data about this or other files

7
Page Directory Points to Records on Page
Page Header
1st Tuple
2nd
2nd Tuple
3rd Tuple
4thTuple
5th Tuple
Tuples are inserted in this direction
Page directory grows in this direction
2
1
3
4
5
  • Record id is File, Page, Directory_offset

8
Accessing a Record
  • Read by TID Lock record shared
  • locate page
  • Get semaphore shared
  • follow directory offset
  • copy tuple
  • Give semaphore
  • Insert by TID Lock record exclusive
  • locate page
  • Get semaphore exclusive
  • Find space
  • Insert
  • log insert (tid, new value).
  • update page LSN, header, directory,
  • Give semaphore

9
Accessing a Record
  • Delete by TID Lock record exclusive
  • locate page
  • Get semaphore exclusive
  • Add record to free space
  • Log delete (tid, old value).
  • update page LSN, header, directory,
  • Give semaphore
  • Update TID much like delete--insert

10
Finding Space for Insert / Update
  • If tuple fits in page contiguous free-space
    easy.
  • If tuple fits in page free space reorganize
    (compress)
  • Physiological logging makes this cheap.
  • If tuple does not fit then
  • leave forwarding address on page.
  • Optionally leave record prefix on page.
  • Segment record among several pages.

tid
11
Finding space within a file
  • Free space table
  • Summarizes status of many pages
  • (8KB page gt 64Kb gt 500MB of 8KB data pages)
  • Good for clustered contiguous allocation
  • bitmap should be transaction protected
  • If transaction aborts, page is freed again.
  • Alternatively, treat bitmap as a hint
  • Rebuild periodically.

p1
p2
p3
p4
p5
p6
p7
P19
P20
P21
.
.
.
.
f2
F19
21
f3
f4
f5
f6
f7
.
.
.


.....
.
.
.
.
.
.
f17
Free space directories
12
Finding space within a file
  • Free space cursor/list
  • Chain should be transaction protected
  • Else rebuild at restart
  • do not trust pointers
  • (free page may be allocated).

file catalog
chain of empty pages
.
.

empty_page_anchor
point_of_insert
page for next insert
13
Tuple Allocation - I
The first strategy maintains a pointer to the
current block for insert (CBI). When that
block fills up, an empty block is requested from
a system service, which then becomes the new
current block for insert.
CBI
where next?
And so on. This is the sequential insert
strategy.
Questions What happens, when the pointer arrives
at the last block? How do we reclaim space freed
by deleted tuples?
14
Incremental Space Expansion - I
When the list of empty blocks is exhausted, there
are two options to find space for new tuples. Let
us assume the following configuration
The first option is to let the CBI pointer
circulate over the set of allocated
blocks, assuming that space is released by
deleted tuples.
And so on. This works as long as enough space is
freed up by deleted tuples. If there are only few
gaps, finding space for a new tuple can become
very expensive, because many blocks have to be
probed sequentially.
The need to probe blocks that are completely
filled can be avoided by maintaining a an array
of bits that contains one bit per block
indicating whether a block is full
15
Naming Tuples (records)
Relative byte address file, offset in file
OK for insert-then-read-only DBs record
can't easily grow. deleted space not easily
reclaimed. Tuple Identifier file, page, index
The design shown below. Main
disadvantage expensive reorganization (fixing
overflows)
pageno
dir_index
fileid
nodeid
pageno
fileid
nodeid
dir_index
3
7446
7446
3
7446
7446
5127
this tuple
this tuple
pseudo -TID
16
Implementing Database Keys
  • Address record via directory
  • Address has a ID to allow for invalidation
  • ID never reused.
  • Pointer can be swizzled.
  • Popular with network OO DBs

17
Naming Tuples via Primary Key
  • Entry Sequenced, Relative primary key is
    physical addr
  • Hash, B-tree primary key is content (primary
    key)
  • Primary Key an alternative to DBkey
  • B-tree clusters related data
  • ProblemsB-tree access is slower than Hash.Hash
    B-tree keys not fixed lengthbut neither is
    node.db_key
  • Benefit key can grow to LARGE databasesGood
    for distributed/partitioned data
  • Its religious.

18
Datatype Representation
  • E External representation ASCII, ISO Latin1,
    Unicode,...
  • P Programming language representation
  • many PL/1, Cobol, C, all have different
    VARCHAR
  • many type mismatches between P and F
  • interval, datetime, user,...
  • F File representation "native" types (e.g.
    null values, ....).
  • Lots of mapping functions.
  • It would be great if F-1(F(x)) x for these
    functions, but....
  • Called the impedance mismatch between DB and PL

19
Datatype Representations
  • P _ F Implies a special language
  • (all other languages are 2nd class)
  • E _ F Use characters for everything.
  • Problem E changes from country to country!
  • (all other languages are 2nd class)
  • No easy way out of this.
  • Unicode will help most of us and make E_F more
    attractive

20
Representing Records
21
Representing Records
  • struct relations
  • Uint relation_no / internal id for the
    relation /
  • char owner / user id of the creator /
  • long creation_date / date when it was
    created /
  • PAGENO current_point_of_insert / free space
    done via /
  • PAGENO empty_page_anchor / free space cursor
    method /
  • Uint no_of_attributes /attributes in
    relation /
  • Uint no_of_fixed_atts / fixed-length
    attributes /
  • Uint no_of_var_atts / variable-length
    attributes /
  • struct attributes p_attr / pointer to
    the attributes array /
  • struct attributes / attributes array /
  • char attribute_name / external name of
    the attribute /
  • Uint attribute_position / index of the field
    in the tuple (1,2,...) /
  • char attribute_type / this encodes the SQL -
    type definition /
  • Boolean var_length / is it variable_length
    field ? /
  • Boolean nulls_allowed / can field assume NULL
    value ? /
  • char default_value / value assumed if none
    stored in tuple /
  • Uint field_length / maximum length of field
    /
  • int accumulated_offset / explained later
    /

22
Representing Records
  • Generic header
  • (rid, tid, fields)
  • all fixed length encoding
  • (fat records, fast-simple code
  • max lt page path length)
  • variable fields have length
  • (short records, slow code)
  • type-length-value
  • (simple slow code, easy reorg)
  • fixed ptrs to variables.
  • (compact, fast code)

general prefix to all tuple
representations
relation-id
number of fields in
tuple-id
the tuple or
actual tuple length
number of
fields
F2
F3
F4
F6
F5
F1
3
4
2
4
8
10
name
n
m
L
tuple
length
F
F
F
F
m
F
4
2
6
F
1
3
5
L
3
4
2
4
n
tuple
length
number of
F1
F2
F3
F4
F5
F6
fields
3
4
2
4
23
Representing Records (Reuter Recommends)
24
Some Details
  • Representing null values
  • missing field
  • special value
  • extra field
  • bitmap
  • Representing keys
  • efficient comparison is important
  • store "conditioned" key so simple byte-compare.
  • Flip integer sign (so negative sorts low)
  • Flip float so exponent first, mantissa second,
    flipped signs
  • Compress varchars.
  • MANY refinements.
  • Want an order-preserving compression.

25
Fat Records (Longer Than a Page)
Record must fit on page. Long fields
segregated to separate page may be good in some
cases (Multi-media DBs) Overflow page
chains Segment record across pages
26
Obese records (Longer Than 10 Pages)
  • If record is super-large, then may want to index
    into it quickly.
  • Obvious" design is standard tree.
  • Record is root of tree.
  • Grow levels when one fills.
  • Allows blob growth, update,...

27
Non-Normalized Relations
28
Structured File Definition
29
File Layouts
  • Unstructured
  • a sequence of bytes
  • Structured,
  • Entry Sequenced.
  • Records inserted at end
  • Records cannot grow
  • key is RBA (relative byte address)
  • Relative
  • fixed size record slots
  • records limited by that size
  • key is relative record number

30
Associative File Types
  • Hashed
  • Records addressed by key field(s)
  • bucket has list of records
  • overflow to other buckets
  • or to overflow pages.
  • Key Sequenced
  • Records addressed by keyfield(s)
  • Records in sorted order.
  • either sorting or b-tree or...

31
Parameters at Create
  • Database
  • Record type (fields)
  • Key
  • Organization Entry Sequenced, Relative, Hashed,
    Key Sequenced
  • Block size (page size)
  • Extent size (storage area)
  • Partitioning (among discs or nodes) by key.
  • Attributes access control
  • allocation and archive strategy
  • transactional
  • lifetime, zero on free, and on and on ....

32
Parameters at Create
  • "Secondary" indices.
  • Primary key is....(e.g. customer number).
  • Secondary key is social security number
  • Non-Unique secondary key is Last_Name, First_name
  • Secondary indices can be unique or not
    and hashed or Key Sequenced
  • index is like a table.
  • fields of index are
  • secondary key, primary key
  • So can define index on any
  • kind of base table

33
Secondary Index Example
  • Base table is key-sequenced on CustomerNumber.
  • Index table is key sequence on Name-CustomerNumber
    .
  • Index can be a replica of the base table in
    another order.Transaction recovery and locking
    keeps them consistent.
  • Tuple management system
  • Maintains indices (insert, update, delete)
  • Navigates to base table via secondary index as
    one request.

34
What happens when you open a relation?
  • Many files get opened. Read directory (catalog)
  • Partitions,
  • Indices

35
Once OPEN, Application can SCAN the relation
  • Scan is a row column subset
  • SELECT ltcolumn listgt
  • FROM lttablegt
  • WHERE ltpredicategt
  • With a specified start/stop key
  • AND ltkeygt BETWEEN ltlowgt AND lthighgt
  • In a specified order (supported by a secondary
    index)
  • ASCENDING DESCENDING
  • A locking protocol Serializable Repeatable
    Read Committed Read
  • Uncommited Read Skip Uncommitted
  • TIMEOUT ltsecondsgt

36
SCAN States
37
SCAN States How they change
  • On error, scan state does not change.
  • On open,
  • scan is before after the first last set
    element
  • if scan is ascending descending
  • On fetch next
  • if not end of set at end of set
  • scan is at next before first after last
    element
  • On insert
  • scan is at element
  • On delete
  • scan is at the missing element

38
SCAN States How they change
  • On update scan position is not affected.
  • if tuple moves (because ordering attributes
    affected)
  • scan key position is unchanged
  • Can create Halloween problem (give everybody a
    10 raise)
  • But scan enumerates entire set.

39
SCAN Data structure
  • enum
  • SCAN_STATE TOP, ON, BOTTOM, BETWEEN, NIL /
    the 5 scan states /
  • enum
  • ISOLATION UNCOMMITTED_READ,..., SERIALIZABLE,
    READ_PAST, BOUNCE
  • typedef struct
  • Uint scanid / handle for scan returned by
    open_scan/
  • TRID owner / which transaction uses the
    scan /
  • FILE fileid / handle of file the scan is
    defined on /
  • char scan_key / specification of scan key
    attribute(s) /
  • char start_key / lower bound of scan
    range /
  • char stop_key / upper bound of scan range
    /
  • char filter / qualifying predicate for all
    tuples in scan /
  • ISOLATION isol_degree / locking policy
    for tuples accessed /
  • SCAN_STATE scan_state / state of scan
    pointer /
  • char scan_key / scan key the scan is
    before, at, or after /
  • SCANCB, SCANCBP

40
Entry Sequenced File Insert
  • fix page descriptor page
  • find eof page
  • fix eof data page
  • if no space in page
  • lt see next slide for transaction to advance
    pagegt
  • unfix descriptor page
  • add record to page (updating on-page directory)
  • generate log record (new value) and update page
    LSN.
  • compute lock name of record (based on TID).
  • get lock on record
  • unfix data page.
  • To make this work, MUST be assured lock is
    available
  • Otherwise page sem can (undetected)deadlock with
    lock wait
  • So, UNDO of entry-sequence insert does not free
    the space,
  • it just invalidates the record.

41
Entry Sequenced File Insert If EOF page or File
is Full
Top level transaction to extend file
  • Begin new transaction (will not abort if insert
    aborts)
  • to extend file EOF page. (leaves insert
    transaction)
  • unfix directory page
  • if file full, panic()
  • start a top-level transaction
  • fix the directory
  • advance the page eof updating directory and
    freespace
  • log the changes
  • fix the data page
  • format it
  • log the change
  • unfix the directory and data page
  • commit the transaction resume insert
    transaction
  • fix directory, fix eof, check to see that there
    is room for the record.

42
Entry Sequenced Operations
  • Delete by RBA.
  • get record lock
  • (node,
  • file,
  • RBA) exclusive
  • if timeout, deadlock, error
  • return error
  • Fix page
  • Mark record invalid
  • Generate log record
  • Update page lsn
  • Unfix page.

Read by RBA. get record lock (node,
file, RBA) shared if timeout,
deadlock, error return error Fix page if
record valid copy to buffer Unfix page Return
record or null
Note both must test that RBA lt EOF. Update,
ReadNext, ... are similar.
43
Relative Files
  • Records fit in fixed-length slots
  • Operation on slots.
  • Separate transactions extend the file EOF
    (allocate and format pages)

44
Relative Files
  • Read Insert Update Delete by key are all
    easy
  • Insert "near" key works by
  • Plan A
  • look at page
  • Look at neighbor pages (left, right, left,
    right,...)
  • Plan B
  • allocate overflow page for base page
  • Plan C
  • Look in free-space bit-map or byte (full) map.

45
Key Sequenced or Hashed Files
  • Key sequenced
  • is subject of next chapter.

46
File Clustering
  • Different record types kept in same page/file
  • For example Master and detail records of an
    invoice.Detail records always accessed if master
    is.
  • SituationMaster key InvoiceNoDetail key
    InvoiceNo Foreign Key References Master
    SequenceNo
  • TechniqueHash or Key sequence Master on
    InvoiceNoHash or Key Sequence Detail on
    InvoiceNoSequenceNo in same table.

47
Clustering different record types in a page
  • One disc request gets the entire order.
  • Concept works for any storage hierarchy
  • Is natural for Hierarchical database systems.

48
Summary
  • Representing values
  • Representing records
  • storing records in pages and across pages
  • Organizing records (entry, relative, key, hash)
  • Examples of fix/log/log logic.
Write a Comment
User Comments (0)
About PowerShow.com