Title: Database management systems
1Database management systems
2Boyce-Codd Normal Form (BCNF)
- BCNF is a simpler form of 3NF that is more
restrictive - If a functional dependency X?A holds in R, then
either - A?X (trivial), or
- X is a superkey of R
- Each attribute is identified by nothing but the
key - Sometimes too restrictive, may not be dependency
preserving with regard to closure, for example - R has ABC, F includes A ?B,B ?C,C ?A, so
(FAB?FBC)? F
3Multi-Valued Dependencies (4NF)
- An employee can be assigned to any project and,
within those projects, to any activities - An employee can be assigned to the same
activities regardless of project assignments
(assignments to projects and activities are
independent) - A project or activity can have any number of
employees assigned to it
EN?P EN?A
4More - normal forms
- Join Dependences (5NF)
- A further generalization of MVDs
- All MVDs are JD, but not all JDs are MVDs
- For every JD ?R1,Rn, one of the following is
true - Ri R for some i, or
- The JD is implied by the set of those FDs over R
in which the left side is a key for R - If a relation schema is in 3NF and each of its
keys consists of a single attribute, it is also
in 5NF
55NF
If an employee works for a project, the employee
will be assigned to activities within that project
EN,P,A JD EN,P EN,P,A JD P,A EN,P,A JD
EN,A
6Physical Storage
- Managed by DBMS, OS or DBMS/OS
- Includes
- Primary Storage fast and expensive
- Secondary Storage cheap and slow
- The combination so must be optimized
- DBMS makes request to buffer manager need a
block is needed from disk. If block is already
in buffer, buffer manager passes address to
requestor. If the block isnt in buffer, space
is allocated (possibly removing some other
block).
7File operations
- Retrieval or update
- DBMS decomposes selects to basic operation, then
- Open allocates buffers, retrieves file header,
sets pointer - Find Searches for first true, transfers block
to buffer, sets file pointer - Read Copies record from buffer to program
variable - FindNext Searches for next true, transfers to
buffer - Close releases buffers, closes file
8DBMS Storage Hierarchy
- Page
- aka, block, data block, blocking unit, control
interval, row group - a storage location for rows of data, typically
- the same size across the storage medium
- multiples of 1024 bytes
- contain an integral number of rows
- rows in a page preferably come from the same
table - used for
- a minimal unit for disk i/o (but can read
multiple pages) - locking
- caching to a buffer pool
- housekeeping (includes header information)
- Extent
- a group of contiguous, stored pages (not for I/O
read groups are for that) - provides for allocation performance improvement
- File
- a group of contiguous extents
- Partition
- also a group of contiguous extent, but may be
part of a file or multiple files
9Record blocking
- Fixed length
- x Records/Block (per floor function)
- Unused space b (bfr R) bytes
- Variable length
- Spanning provides pointer at end of first block
to surface, track, block location of remainder of
record - b r/bfr blocks (per ceiling function)
- Slotted-page structure
- Block Header Records
10Records in a block
- How to store records in blocks?
- number of records r
- block size B
- record size R
- blocking factor bf number of records in a
block - Bf ?B/R? (spanned, unspanned)
- number of blocks needed b
- b ?r/bf?
11File header
- Contains description used for access of records
in the file - Disk addresses of blocks
- Record formats
- Field lengths
- Field order (fixed length)
- Field type
- Separators
- Codes (variable length)
12Physical Storage
- Goal minimize block transfers
- Heap file
- Ordered file
- Hashed file
- Indexes
- Hashed
- Trees
- Bitmaps
13- Physical Record Storage (Heap File)
Block 1
Block 2
14Heap Files
- Unordered
- Easy writes
- New records inserted at end of file
- Once a block is full, pointer set to new block
and rest of record is written - Requires a linear search for anything
- Large number/size of records decrease speed
- Deletes require write to buffer, mark record as
deleted, write back to disk with deleted space
left in place. - Modifications force periodic file reorganization
to recover disk space - Fixed-length fields contiguously allocated in
unspanned blocks improves search latency (i.e.,
i/bfr denotes block, i mod bfr denotes location
in block)
15Ordered Files
- One field (attribute) selected for ordering
- If a key field, data is key-sequenced
- Allows binary searches for faster retrieval
(always retrieves mid-page between upper and
lower limits until correct page is found), since
log2(B) blocks accessed - Inserts and deletes require ordering to be
maintained (may require writing all pages above
affected record) - Overflow (transaction) file will help to reduce
this problem - Typically only used if a primary index is applied
- No gain for non-ordered fields
- Typically requires indexed file access path
16Indexes
- Ordered indices values in sorted fashion
- Hash indices values distributed across
buckets by using a function - An index record consists of a value and pointers
to one or more records with that value. Can be - Dense every value group indexed
- Sparse only some values are indexed
- Include
- Compound indexes (values from more than one data
column) - Covering index (uses values in the index for the
SELECT clause) - Unique index
- Clustering indexes (stores similar data rows near
each other) - Bitmap indexes (assigns 1 if a value is true, 0
if false)
17B-Trees
- Well established as the most common structures
for indexes - Multi-level
- d is the order of the tree it is a measure of
the tree node capacity - Every node except the root contains m entries,
where d/2 lt m lt d - The root node contains 1 lt m lt d entries
- Non-leaf nodes with m index entries contain m1
pointers to children - Pointer Pi points to a subtree with K values such
that Ki-1 lt K lt Ki
18B-Trees (order 2)
- Query find all values with a pointer value of P
- If search value is lt SearchKey value, go left
otherwise, go right
19B Tree index/sequence sets
20B Trees datapage leaf
Sort order Anizy,Apach,Apensen,Ardwick,Arnham,Ath
ens
21B-trees performance impact
- A 4k page can many records per page
- ((4 b/pointer 4b/field)n, 4b/pointer) order
of 512 - Root 511 records
- Level 1 261,632 records
- Level 2 133,955,584 records
- Total 134,217,727 records
- Shallow is better
22B-trees performance impact
- 1000000 records of 300B (including header)
- Search key is a 4 byte int a pointer requires 4
bytes - 4KB blocks, no block header, random placement,
avg. retrieval time 5.6 ms - No time for memory reads
- 13.6 records/block 76924 blocks to store data
- 512 indexes/block 1954 blocks to store index
- No index
- (76924/2) 38462 block accesses (avg.)
- Time to find a record 38462 5.6 ms 215.4 s
- Indexed, binary search
- log(1954) 1 11 1 12 block accesses
(maximum)time to find a record 12 5.6 ms
67.2 ms - Indexing increased speed by 3205 times.
23Hash Files
- Records written in non-sequential order
- Hash function calculates address of the page
where record is stored - based on a one or more base fields (hash field)
- If a key field, called hash key
- Hash function creates even spread of records
across file - Folding applies math to different parts of the
has field (empID 0110 could become (01)10. 11
is address of disk page - Division-remainder uses mod 0110 mod 100. 10 is
address of the disk page