Dal Mainmemory Storage Manager - PowerPoint PPT Presentation

1 / 56
About This Presentation
Title:

Dal Mainmemory Storage Manager

Description:

Commit record for proxy is similar to compenstation log records (CLRs) in ARIES ... Bitmap mirrors allocator's free list. Collections and Indexing. Extendible hashing ... – PowerPoint PPT presentation

Number of Views:86
Avg rating:3.0/5.0
Slides: 57
Provided by: fulln
Category:

less

Transcript and Presenter's Notes

Title: Dal Mainmemory Storage Manager


1
Dalí Main-memory Storage Manager
Tomasz Piech
2
Salvador Dalí - Persistence of Memory (1931)
3
Introduction
  • Dalí
  • Implemented at Bell Laboratories
  • Storage manager for persistent data
  • Architecture optimized for databases resident in
    main memory
  • Application real-time billing and control of
    multimedia content deliver
  • High transaction rates, low latency

4
Introduction
  • Dalí Techniques
  • Direct access to data direct pointers to
    information stored in dbase high performance
  • No interprocess communication communication
    with server only during dis/connection
    concurrency, logging provided via shared memory
  • Fault-tolerant advanced, multi-level
    transaction model high concurrency indexing and
    storage

5
Introduction
  • Dalí
  • Recovery from process failure in addition to
    system failure
  • Use of codewords and memory protection
    integrity of data (discussed later)
  • Consistency of response time key requirement
    for applications with memory-resident data
  • Designed for databases that fit into main memory
    (virtual will work but not as well)

6
Overview of Presentation
  • Architecture
  • Storage
  • Transaction Management
  • Fault Tolerance
  • Concurrency Control
  • Collections and Indexing
  • Higher Level Interfaces

7
Architecture
8
Architecture
  • Database files user data, one or more exist in
    database
  • System database files database support related
    data, such as locks and logs
  • Files opened by a process are directly mapped
    into its address space
  • mmap files or shared-memory segments used to
    provide mapping

9
Layers of Abstraction
Dalí architecture is organized to support the
toolkit approach
10
Layers of Abstraction
  • Toolkit approach
  • Logging can be turned off for data which need not
    be persistent
  • Locking can be turned off if data is private to a
    process
  • Multiple interface levels
  • Low-level components are exposed to user for
    optimization

11
Storage
12
Pointers and Offsets
  • Each process has a database-offset table
  • Specifies where in memory a file is mapped
  • Implemented as an array indexed by file id
  • Primary Dalí pointer (p)
  • Dbase file local-identifier offset within file
  • To dereference, add offset from p to virtual
    memory address from offset table
  • Secondary pointer
  • Index in one file, store just the offset since
    location of file is known

13
Storage Allocation
  • Motivation
  • Control data should be stored separately from
    user data
  • protection of control data from stray pointers
  • Indirection should not exist at the lowest level
  • Indirection adds a level of latching for each
    data access increases path length for
    dereferecing itself
  • Dalí exposes direct pointers to allocated data,
    provides time and space efficiency

14
Storage Allocation
  • Motivation
  • Large objects should be stored contiguously
  • Advantage is speed recreating a file from
    smaller files takes away that advantage
  • Different recovery characteristics should be
    available for different regions of the database
  • Not all data needs to be recovered from a crash
  • Indexes can be rebuilt, etc.

15
Storage Allocation
  • Two levels of non-recovered data
  • Zeroed memory remains allocated but is zeroed
  • Transient memory data no longer allocated upon
    recovery

16
Segments and Chunks
  • Segment
  • contiguous page-aligned unit of allocation
    arbitrarily large database files are comprised
    of segments
  • Chunk
  • A collection of segments

17
Segments and Chunks
18
Segments and Chunks
  • Allocators
  • Return standard Dalí pointers to allocated space
    within a chunk indirection not imposed at
    storage manager level
  • No record of allocated space is retained
  • 3 different allocators
  • Power-of-two allocates buckets of size 2im
  • Inline power-of-two as above free space list
    uses 1st few bytes of each free block

19
Segments and Chunks
  • Allocators (contd)
  • Coalescing allocator merges adjacent free space
    uses a free tree
  • Power of 2 inline faster but neither coalesces
    adjacent free space fragmentation (thus fixed
    size records only)
  • Coalescing uses free tree based on T-tree to
    keep track of free space logarithmic time for
    allocation and freeing

20
Page Table Segment Headers
  • Segment header associate info about a
    segment/chunk with a physical pointer
  • Allocated when segment is added to a chunk
  • Can store additional info about data in segment
  • Page table maps pages to segment headers
  • Pre-allocated based on max of pages in dbase

21
Transaction Management
  • Recovery
  • System Overview
  • Checkpointing

22
Transaction Management in Dalí
  • Transaction atomicity, isolation durability in
    Dalí
  • Regions - logically organized data
  • A tuple, an object or arbitrary data structure (a
    tree or a list)
  • Region lock - X or S lock that guards
    access/updates to a region

23
Multi-Level Recovery
  • Permits use of weaker operation locks in place of
    X/S region locks
  • Example, index management
  • An update to index structure (i.e. Insert)
  • Physical undo description must be valid until
    transaction commit
  • Unacceptable level of concurrency

24
Multi-level Recovery
  • Replace low-level physical undo log records with
    higher-level logical undo log records
    (description at operation level)
  • Insert logical-undo record replaces
    physical-undo record by specifying that the
    inserted key must be deleted
  • Region locks can be released and less restrictive
    operation locks persist ? higher level of
    concurrency

25
Multi-level Recovery
  • An example of find and insert ?
  • Releasing region locks would allow updates on the
    same region
  • Cascading aborts - rolling back the first
    operation would damage effects of later actions
  • Only compensating undo operation can be used to
    undo the operation

26
Multi-level Recovery Example
27
System Overview
  • Stored on disk
  • Two checkpoint images Ckpt_A Ckpt_B
  • cur_ckpt anchor to the most recent valid
    checkpoint image for database
  • Single system log containing redo information,
    its tail in memory
  • end_of_stable_log pointer all records prior to
    it were flushed to stable system log

28
System Overview
29
System Overview
  • Stored in the system database with each
    checkpoint
  • Active Transaction Table (ATT)
  • Stores separate redo undo logs for each active
    transaction
  • dpt dirty page table stores pages updated
    since the last checkpoint
  • ckpt_dpt dpt in a checkpoint

30
Transactions and Operations
  • Transaction a list of operations
  • Each op. has a level Li associate with it
  • Op at level Li is can consist of ops of level
    Li-1
  • L0 are physical updates to regions
  • Pre-commit the commit record enters the system
    log in memory
  • Commit - commit record hits the stable storage

31
Logging Model
  • Updates generate physical undo and redo log
    records appended to Txs undo redo logs (in
    ATT)
  • When Tx pre-commits, redo appended to system log,
    and logical-undo included in operation commit log
    in system log
  • When operation pre-commits, undo log records are
    deleted for its sub-operations/updates from Txs
    undo log this operations logical undo appended
    to Txs undo log

32
Logging Model
  • Locks released once Tx/operation pre-commits
  • System log flushed to disk when Tx commits
  • Dirty pages are marked in the dpt by he flushing
    procedure no page latching

33
Ping-pong Checkpointing
  • Traditionally, systems implement WAL for recovery
    it is impossible to enforce WAL without latches
  • Latches increase access cost in main memory
    interfere with normal processing
  • Solution, store two copies of dbase image on
    disk dirty pages written to alternate
    checkpoints
  • Fuzzy checkpointing no latches used, no
    interference with normal operations

34
Ping-pong Checkpointing
  • Checkpoints are allowed to be temporarily
    inconsistent updates written out without undo
    records
  • Redo and undo info from ATT is written out to a
    checkpoint and brings it to a consistent state
  • If failure occurs, the other checkpoint is still
    consistent and can be used for recovery

35
Ping-pong Checkpointing
  • Log flush necessary at end of checkpointing
    before toggling cur_ckpt commit might take
    place before writing out ATT, leaving no undo
    information if system crashes

36
Abort Processing
  • Upon abort, undo log records undone by
    sequentially traversing undo log from end
  • New physical-redo log record created for every
    physical-undo encountered
  • Similarly, for logical-undo compensation
    operation is executed (proxy)
  • All undo log records deleted when proxy commits

37
Abort Processing
  • Commit record for proxy is similar to
    compenstation log records (CLRs) in ARIES
  • During recovery, logical-undo log record deleted
    from Txs undo log if a CLR encountered,
    preventing Tx from being undone gagin

38
Recovery
  • end_of_stable_log is where recovery begins
  • Initializes ATT and undo logs with copies from
    last checkpoint
  • Loads database image and sets dpt to zero
  • Applies all redo log following begin-recovery-poin
    t
  • Then all active transactions are rolled back
  • First all completed L0 operations must be rolled
    back then L1, then L2 and so on.

39
Post-commit Operations
  • Operations guaranteed to be carried out after
    commit of a transaction/operation even if the
    system crashes
  • Some operations cannot be rolled back once
    performed (deletion then allocation of same space
    to different operation)
  • Need to ensure high concurrency on storage
    allocator cannot hold locks
  • Solution perform these operations after
    transaction commits (keep post-commit log)

40
Fault Tolerance
  • Process Death and Its Detection

41
Fault Tolerance
  • Techniques that help cope with process failure
    scenarios

42
Process Death
  • Caused by an attempt to access invalid memory, or
    by an operator kill
  • Must return shared data partially updated to
    consistent state
  • Abort any uncommitted transactions owned by that
    process
  • Cleanup server is primarily responsible for
    cleaning up dead processes

43
Process Death
  • Active Process Table (APT) keeps track of all
    processes in the system scanned periodically to
    check if any are dead
  • Low-level clean up
  • Process registers with APT any latch acquired
  • If latch held by dead process clean up function
    for that latch is called
  • If not possible to clean up latch then simulate
    system crash

44
Process Death
  • Cleaning up Transactions
  • Clean-up agent scan Tx table and abort any Tx
    running on behalf of the dead process or execute
    post-commit actions for committed Tx
  • Multiple clean up agents spawn if multiple
    processes have died

45
Protection from Application Errors
  • Memory protection
  • munprotect called right before an update to a
    page and mprotect after Tx commits to protect
    pages
  • Codewords
  • associate logical parity word with each page of
    data
  • Erroneous writes will update only physical data
    not codeword crash simulated if error found

46
Concurrency Control
  • Implementation of Latches

47
Concurrency Control
  • Concurrency control facilities
  • Latches (low-level locks for mutual exclusion)
  • Queuing locks
  • Latch Implementation
  • Semaphores too expensive system call overhead
  • Implementation must complement cleanup server

48
Latch Implementation
49
Latch Implementation
  • Processes that wish to acquire a latch keep a
    pointer to that latch in their wants field
  • cleanup-in-progress flag forbids processes to
    attempt to get a latch is set to True
  • Cleanup server waits for process to set their
    wants fields to null or another lock or to die
  • If a dead process is a registered owner of the
    latch, cleanup function is called

50
Locking System
  • Lock header structure
  • Stores a pointer to a list of locks that have
    been requested (but not released) by transactions
  • Request times out if not granted in a certain
    amount of time
  • Add new lock modes with the use of conflicts and
    covers
  • covers holder of lock A checks for conflicts
    when requesting new lock of type B, unless A
    covers B

51
Collections and Indexing
  • Heap Files
  • Extendible Hashing

52
Collections and Indexing
  • Dalí provides higher level interfaces for
    grouping related data items performing scans
    associative access on items in group
  • Heap file
  • abstraction for handling a large number of
    fixed-length data items
  • Scans are supported through bitmaps in segment
    header
  • Entries deleted from heap are 0 in the bitmap
  • Bitmap mirrors allocators free list

53
Collections and Indexing
  • Extendible hashing
  • Similar to what was covered in CS 432
  • Utilization factor determines when to double
    the directory more tolerant than bucket overflow
    trigger avoids space problems/util.

54
Extendible Hashing
55
T-tree indexes
  • Briefly internal nodes, semi-leaf leaf nodes
  • To search for value, at each node check if key is
    bounded by left and right-most key values. If
    so, check if key value returned if contained in
    the node otherwise traverse tree further down

56
Higher Level Interfaces
  • Two database management systems built on Dalí
  • Dalí Relational Manager
  • Main Memory ODE Object Oriented Database
Write a Comment
User Comments (0)
About PowerShow.com