Database Management Systems CSE530a - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

Database Management Systems CSE530a

Description:

Sandra. Lee. 009. HR. James. Johnson. 004. IT. Bill. Williams. 010 ... Division-remainder uses mod: 0110 mod 100. 10 is address of the disk page. Can use: ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 42
Provided by: thebutl
Category:

less

Transcript and Presenter's Notes

Title: Database Management Systems CSE530a


1
Database Management Systems CSE530a
2
Today
  • Database Design with a normalized relational
    model query considerations
  • Internal Level Physical Storage and Access

3
Impact of normalization on file organization
  • Optimized tables with little or no redundancy can
    be optimized for storage and retrieval
  • Common file organizations include
  • Heap
  • Hash
  • Indexed Sequential Access Method (ISAM)
  • B-tree
  • Clusters

4
Indices - Guidelines
  • Dont index small relations
  • Index a key that is not the primary key of the
    relation
  • Add a secondary index to foreign keys that are
    accessed on a regular basis
  • Add a secondary index to an attribute that is
    heavily used

5
Indices - Guidelines Cont.
  • Add secondary keys to attributes used frequently
    in
  • Selection or join criteria
  • ORDER BY
  • GROUP BY
  • Other sorting operations
  • UNION
  • DISTINCT
  • Add secondary indexes to attributes involved in
    aggregate functions
  • AVG, SUM, MIN, MAX, etc.

6
Indices - DONTs
  • Avoid indexing attributes that are frequently
    updated
  • Avoid indexing attributes used in queries that
    will return significant percentages of data (e.g.
    25 or more)
  • Avoid indexing attributes consisting of long
    character strings

7
File Organization Considerations
8
File Storage - Resources
  • Main Memory
  • Maximize when affordable
  • Swap between memory to disk
  • CPU
  • Prevent contention
  • Disk I/O
  • Bottlenecks
  • Even Distribution
  • Network
  • Collisions

9
Storage-Devices - Overview
  • Hierarchy of storage media according to speed and
    cost

Access time increase
Cost/bit increase
10
Physical Storage Basic Concepts
  • Managed by DBMS, OS or DBMS/OS
  • Includes
  • Primary Storage fast and expensive
  • Secondary Storage cheap and slow
  • The combination so must be optimized
  • DBMS makes request to buffer manager when a
    block is needed from disk. If block is already
    in buffer, buffer manager passes address to
    requestor. If the block isnt in buffer, space
    is allocated (possibly removing some other
    block).

11
Physical Storage Basic Concepts
  • Secondary storage must consider
  • Seek time
  • Rotational delay
  • Number of disk accesses
  • Arrangement of data on disk

12
RAID - Levels
  • Levels vary from 0 6
  • Each level has 4 disks worth of data
  • Extra disks used to store redundant information
    for failure recovery
  • P Error-correcting bits
  • C Second copy of data

13
File Storage - Concepts
  • Physical Record is the unit of transfer pages

14
DBMS /Host system intercommunication
15
File Storage - Analyze Transactions
  • Map all transaction paths to relations
  • Transaction/Relation cross-reference matrix
  • Depicts the transaction required and the
    relations accessed
  • Indicate the number of accesses over a time
    period
  • Determine frequency information
  • Transaction usage map
  • Average, maximum number of times and peak load
  • Operating pattern conflicts
  • Analyze data usage
  • The relations and attributes accessed by the
    transaction and access types
  • Attributes involved in the join of two or more
    relations
  • Transaction performance goals

16
Buffer Manager
  • Database system programs make requests to buffer
    manager when they need a block from disk.
  • If block is already in buffer, the buffer manager
    passes the address to the requestor.
  • If the block isnt in the buffer, the buffer
    manager allocates space in the buffer. Throwing
    out some other block if needed.

17
Buffer Manager Techniques
  • Buffer replacement strategy
  • When the buffer is full, a block must be removed
    before another can be read in.
  • Pinned blocks
  • For database to recover from crashes, need to
    restrict the times when a block may be written
    back to disk. A block that is not allowed to be
    written is pinned. Essential resilient database
    systems.

18
DBMS Storage Hierarchy example
Database - a group of tablespaces
Partition - also a group of contiguous extents,
but may be part of a file or multiple files -
Useful when there are a large number of users and
extents to allow parallelism
Tablespace - a file or group of files containing
data, so can have or share multiple extents -
Allows 1) Different users in different physical
areas, 2) Separation of user data from system
data, 3) Backups on only part of a database
File - a group of contiguous extents
Extent - a group of contiguous, stored pages (not
for I/O read groups are for that) - Provides
for allocation performance improvement
Page - aka, block, data block, blocking unit,
control interval, row group - A storage location
for rows of data, typically the same size across
the storage medium multiples of 1024 bytes are
common - Contains an integral number of rows -
Rows in a page preferably come from the same
table - Used for 1) a minimal unit for disk i/o
(but can read multiple pages), 2) locking, 3)
caching to a buffer pool, 4) housekeeping
(includes header information)
19
File operations
  • Disk Manager
  • Retrieve page from block
  • Replace page within page set
  • Add a new page to page set
  • Remove page from page set
  • File Manager
  • Retrieve stored record from file
  • Replace stored record within file
  • Add a store record to file and return RRN
  • Remove stored record from file
  • Create a new stored file
  • Destroy a stored file

20
File Manager
  • Component of OS
  • Communicates with the DBMS
  • Maps DBMS request into page set request
  • Each page has a unique page id
  • Page set, unique page set id - collection pages
  • Passes request to Disk Manager
  • Many OS do not provide File Manager facility -
    DBMS communicates with Disk Manager

21
Disk Manager
  • Component of OS
  • Responsible for basic I/O services
  • Maps logical page set to physical disk address
  • Advantage - device specific code isolation
  • New allocation - free space page set

22
File operations
  • Retrieval or update
  • DBMS/OS decomposes selects to basic operation,
    then
  • Open allocates buffers, retrieves file header,
    sets pointer
  • Find Searches for first true, transfers block
    to buffer, sets file pointer
  • Read Copies record from buffer to program
    variable
  • FindNext Searches for next true, transfers to
    buffer
  • Close releases buffers, closes file

DBMS
File Manager
Disk Manager
Data File
Record
Block
Disk I/O
23
Physical Storage Basic Concepts
  • Goal minimize block transfers, seek time, spin
    latency
  • Heap file (Unordered)
  • Ordered file
  • Hashed file
  • Indexes
  • Hashed
  • Trees
  • Bitmaps

24
File storage
Page ID
Page No Offset from foot of page
25
Record blocking
  • Fixed length
  • x Records/Block (per floor function)
  • Unused space b (bfr R) bytes
  • Variable length
  • Spanning provides pointer at end of first block
    to surface, track and block location of remainder
    of record
  • b r/bfr blocks (per ceiling function)
  • Slotted-page structure
  • Block Header Records

26
File header
  • Contains description used for access of records
    in the file
  • Disk addresses of blocks
  • Record formats
  • Field lengths
  • Field order (fixed length)
  • Field type
  • Separators
  • Codes (variable length)
  • Located by Relative Record Number (RRN)

27
Heap Files
  • Unordered
  • Easy writes
  • New records inserted at end of file
  • Once a block is full, pointer set to new block
    and rest of record is written
  • Requires a linear search for anything
  • Large number/size of records decrease speed
  • Deletes require write to buffer, mark record as
    deleted, write back to disk with deleted space
    left in place.
  • Modifications force periodic file reorganization
    to recover disk space
  • Fixed-length fields contiguously allocated in
    unspanned blocks improves search latency (i.e.,
    i/bfr denotes block, i mod bfr denotes location
    in block)

28
Physical Record Storage (Heap File)
29
Ordered Files
  • One field (attribute) selected for ordering
  • If a key field, data is key-sequenced
  • Allows binary searches for faster retrieval
    (always retrieves mid-page between upper and
    lower limits until correct page is found), since
    log2(B) blocks accessed
  • Inserts and deletes require ordering to be
    maintained (may require writing all pages above
    affected record)
  • Overflow (transaction) file will help to reduce
    this problem
  • Typically only used if a primary index is applied
  • No gain for non-ordered fields
  • Typically requires indexed file access path

30
File storage Ordered with linked lists
  • Assume one record/page (i.e., 5B/Fixed Block)
  • Then
  • Insert Dad
  • Delete Geode
  • Insert Pod

31
File storage Ordered with linked lists
  • Assume one record/page (i.e., 5B/Fixed Block)
  • Then
  • Insert Dad
  • Delete Geode
  • Insert Pod

32
Hash Files
  • Records written in non-sequential order
  • Hash function calculates address of the page
    where record is stored
  • based on a one or more base fields (hash field)
  • If a key field, called hash key
  • Hash function creates even spread of records
    across file
  • Folding applies math to different parts of the
    has field (empID 0110 could become (01)10. 11
    is address of disk page
  • Division-remainder uses mod 0110 mod 100. 10 is
    address of the disk page
  • Can use
  • Open addressing, Unchained overflow, Chained
    overflow, Multiple hashing, Dynamic hashing
  • Limitations
  • Useful for exact match
  • Poor for ranges, patterns, additional fields

33
Indexes
  • Ordered indices values in sorted fashion
  • Hash indices values distributed across
    buckets by using a function
  • An index record consists of a value and pointers
    to one or more records with that value. Can be
  • Dense every value group indexed
  • Sparse only some values are indexed
  • Include
  • Compound indexes (values from more than one data
    column)
  • Covering index (uses values in the index for the
    SELECT clause)
  • Unique index
  • Clustering indexes (stores similar data rows near
    each other)
  • Bitmap indexes (assigns 1 if a value is true, 0
    if false)

34
B-Trees
  • Well established as the most common structures
    for indexes
  • Multi-level
  • d is the order of the tree it is a measure of
    the tree node capacity
  • Every node except the root contains m entries,
    where d/2 lt m lt d
  • The root node contains 1 lt m lt d entries
  • Non-leaf nodes with m index entries contain m1
    pointers to children
  • Pointer Pi points to a subtree with K values such
    that Ki-1 lt K lt Ki

35
B-Trees (order 2) one algorithm
  • Query find all values with a pointer value of P
  • If search value is lt SearchKey value, go left
    otherwise, go right

36
B Tree index/sequence sets
37
B Trees datapage leaf
Sort order Anizy,Apach,Apensen,Ardwick,Arnham,Ath
ens
38
B Tree Example
  • Rules for this example
  • d is the order of the tree it is a measure of
    the capacity of child nodes
  • Every node except the root contains m entries,
    where d/2 lt m lt d
  • The root node contains 1 lt m lt d entries
  • Non-leaf nodes with m index entries contain
    between (m1)/2 and m1 pointers to children
  • Pointer Pi points to a subtree with K values such
    that Ki-1 lt K lt Ki
  • Search uses pointer to the right for greater than
    or equal to in non-leaf nodes, greater than in
    leaf nodes until equal to is found or not found

39
Records in a block
  • How to store records in blocks?
  • number of records r
  • block size B
  • record size R
  • blocking factor bf number of records in a
    block
  • Bf ?B/R? (spanned, unspanned)
  • number of blocks needed b
  • b ?r/bf?

40
B-trees performance impact
  • A 4k page can many records per page
  • ((4 b/pointer 4b/field)n, 4b/pointer) order
    of 512
  • Root 511 records
  • Level 1 261,632 records
  • Level 2 133,955,584 records
  • Total 134,217,727 records
  • Shallow is better

41
B-trees performance impact
  • 1,000,000 records of 300B (including header)
  • Search key is a 4 byte int a pointer requires 4
    bytes
  • 4KB blocks, no block header, random placement,
    avg. retrieval time 5.6 ms
  • No time for memory reads
  • 13.6 records/block 76924 blocks to store data
  • 512 indexes/block 1954 blocks to store index
  • No index
  • (76924/2) 38462 block accesses (avg.)
  • Time to find a record 38462 5.6 ms 215.4 s
  • Indexed, binary search
  • log(1954) 1 11 1 12 block accesses
    (maximum)time to find a record 12 5.6 ms
    67.2 ms
  • Indexing increased speed by 3205 times.
Write a Comment
User Comments (0)
About PowerShow.com