Memory Constrained DBMS with Updates - PowerPoint PPT Presentation

About This Presentation
Title:

Memory Constrained DBMS with Updates

Description:

Database management techniques are needed to meet the above requirements ... CPU intensive. Competes with other CPU intensive DBMS tasks. May slow down the DBMS ... – PowerPoint PPT presentation

Number of Views:124
Avg rating:3.0/5.0
Slides: 54
Provided by: ashw152
Category:

less

Transcript and Presenter's Notes

Title: Memory Constrained DBMS with Updates


1
Memory Constrained DBMS with Updates
  • Ashwini G. Rao
  • Guide
  • Prof. Krithi Ramamritham

2
Outline of the talk
  • Need for Handheld DBMS
  • New Issues in Implementation
  • Project Goals
  • Review of Existing Work
  • Compression in Storage
  • Transaction Management
  • Synchronization
  • Current Implementation Status
  • Conclusions and Future work

3
Handhelds
  • Small, Convenient, Carry anywhere
  • Powerful
  • E.g. Simputer- 206MHz, 32MB SDRAM, 24 MB Flash
    memory, LCD display, Smart card
  • Applications
  • Personal Info Management
  • E-dairy
  • Enterprise Applications
  • Health-care, Micro-banking

4
Need for Handheld DBMS
  • Handheld applications
  • Volume of data is high
  • Simple and Complex Queries
  • select, project, aggregate
  • ACID properties of transactions
  • Require Data Privacy
  • Need Synchronization
  • Database management techniques are needed to
    meet the above requirements

5
New Issues in Implementation
  • Handheld DBMS vs. Disk DBMS
  • Handheld DB is Flash memory based
  • Disk read time is very small
  • Storage model should consider small memory and
    computation power
  • Transaction management and synchronization have
    to consider disconnections, mobility and
    communication cost
  • Handheld Operating System provides lesser
    facilities
  • E.g. no multi-threading support in PalmOS
  • Better security measures are required as
    handhelds are easily stolen, damaged and lost

6
Project Goals
  • Existing work
  • Storage models
  • Query processing optimization
  • Executor
  • My work
  • Compression in Storage
  • Transaction management
  • Synchronization

7
Existing Work Review
  • Storage Management
  • Aim at compactness in representation of data
  • Limited storage could preclude any additional
    index
  • Data model should try to incorporate some index
    information
  • Query Processing
  • Minimize writes to secondary storage
  • Efficient usage of limited main memory

8
Storage Management
  • Existing storage models
  • Flat Storage
  • Tuples are stored sequentially. Duplicates not
    eliminated
  • Pointer-based Domain Storage
  • Values partitioned into domains which are sets of
    unique values
  • Tuples reference the attribute value by means of
    pointers
  • One domain shared among multiple attributes

9
Storage Management (cont)
Flat Storage
Domain Storage

  • In Domain Storage, pointer of size p (typically 4
    bytes) points to the domain value. Can we further
    reduce the storage cost?

10
ID Based Storage
11
ID Based Storage
  • ID Storage
  • An identifier for each of the domain values
  • Store the smaller identifier instead of the
    pointer
  • Identifier is the positional value in the domain
    table. Use it as an offset into the domain table
  • D domain values can be distinguished by
    identifiers of length log2D /8 bytes.

12
ID Storage (cont)
  • Extendable IDs are used. Length of the identifier
    grows and shrinks depending on the number of
    domain values
  • Starting with 1 byte identifiers, the length
    grows and shrinks.
  • To reduce reorganization of data, ID values are
    projected out from the rest of the relation and
    stored separately maintaining Positional
    Indexing.

13
ID Storage (cont)
  • Ping Pong Effect
  • At the boundaries, there is reorganization of ID
    values
  • when the identifier length changes
  • Frequent insertions and deletions at the
    boundaries might
  • result in a lot of reorganization
  • Phenomena should be avoided
  • No deletion of Domain values
  • Domain structure means a future insertion might
    reference
  • the deleted value
  • Do not delete a domain value even it is not
    referenced
  • Setting a threshold for deletion for domain
    values
  • Delete only if number of deletions exceeds a
    threshold
  • Increase the threshold when boundaries are being
    crossed to reduce ping pong effect

14
ID Storage (cont)
  • Primary Key-Foreign Key relationship
  • Primary key is a domain in itself
  • IDs for primary key values
  • Values present in child table are the
    corresponding primary key IDs
  • Projected foreign key column forms a Join Index

15
ID Storage (cont)
  • ID based Storage wins over Domain Storage when
    pointer size gt log2D /8
  • Relations in a small device do not have a very
    high cardinality Above condition true for most of
    the data.
  • Advantages of ID storage
  • Considerable saving in storage cost.
  • Efficient join between parent table and child
    table

16
Query Processing
  • Considerations
  • Minimize writes to secondary storage
  • Use Main memory as write buffer
  • Need for Left-deep Query Plan
  • Reduce materialization in flash memory. If
    absolutely necessary use main memory
  • Bushy trees use materialization
  • Left deep tree is most suited for pipelined
    evaluation
  • Right operand in a left-deep tree is always a
    stored relation

17
Query Processing (cont)
  • Need for optimal memory allocation
  • Using nested loop algorithms for every operator
    ensures that minimum amount of memory used to
    execute the plan
  • Nested loop algorithms are inefficient
  • Different devices come with different memory
    sizes
  • Query plans should make efficient use of memory.
    Memory must be optimally allocated among all
    operators
  • Need to generate the best query execution plan
    depending on the available memory

18
Query Processing (cont)
  • Operator evaluation schemes
  • Different schemes for an operator
  • Schemes conform to left-deep tree query plan
  • All have different memory usage and cost
  • Cost of a scheme is the computation time

19
Query Processing (cont)
  • 2-Phase optimizer
  • Phase 1 Query is first optimized to get a query
    plan
  • Phase 2 Division of memory among the operators
  • Scheme for every operator is determined in phase
    1 and remains unchanged after phase 2, memory
    allocation in phase 2 is on the basis of the cost
    functions of the schemes
  • Memory is assumed to be available for all the
    schemes, this may not be true for a resource
    constrained device
  • Traditional 2-phase optimization cannot be used

20
Query Processing (cont)
  • 1-Phase optimizer
  • Query optimizer is made memory cognizant
  • Modified optimizer takes into account division of
    memory among operators while choosing between
    plans
  • Ideally, 1-phase optimization should be done but
    the optimizer becomes complex.

21
Query Processing (cont)
  • Modified 2-phase optimizer
  • Optimal division of memory involves the decision
    of selecting the best scheme for every operator
  • Phase 1
  • Determine the optimal left-deep join order using
    dynamic programming approach
  • Phase 2
  • Divide memory among the operators
  • Choose the scheme for every operator depending on
    the memory allocated

22
Query Processing (cont)
  • Memory allocation algorithms
  • Exact memory allocation
  • Heuristic memory allocation
  • Conclusions
  • Response times highest with minimum memory and
    least with maximum memory
  • Computing power of the handheld affects the
    response time in a big way
  • Heuristic memory allocation differed from exact
    algorithm in a few points only

23
Compression in DB
  • Advantages
  • Saves space
  • Reduces read time and write time as less data is
    processed
  • Logging consumes less space and time
  • Disadvantages
  • CPU intensive
  • Competes with other CPU intensive DBMS tasks.
  • May slow down the DBMS

24
Compression in Disk DB
  • Main assumption
  • The high disk read time compensates for the
    extra time required for compression and
    decompression
  • E.g. Let time taken to read 10 blocks of data
    from the disk be 10ms. Let the time taken for
    compression and decompression be 5ms. After
    compression 10 blocks occupy only 1 block.
  • Processing time with compression/decompression
  • ( 1ms 5ms) 6ms
  • Handheld DB is Flash memory based
  • Read time is very less. Above assumption is no
    longer valid!!

25
Compression in Handhelds
  • Techniques can exploit high write time of flash
    memory
  • Logging
  • Compressed records consume lesser log space
  • Writing time is reduced
  • Decompression done when recovery is initiated
  • Highly beneficial if failures are rare
  • Saves communication cost when log records have to
    be sent over the network
  • E.g., Transaction management

26
Compression in Handhelds (cont)
  • Data compression in Smart cards
  • Consider Handheld with Smart card support
  • Data stored in smart cards is accessed and
    updated
  • E.g., Personal database
  • Memory in smart cards is limited
  • Compression will save space
  • Data can be decompressed and processed in the
    handheld

27
Transaction Management
  • Ensure ACID properties of local and global
    transactions
  • Local transaction - Update address book entry in
    Simputer
  • Global transaction - Transfer money from a bank
    account to an epurse in a smart card attached to
    a Simputer
  • Issues
  • Frequent disconnections, resource constraints,
    mobility, loss or damage to handheld

28
Transaction Management (cont)
  • We will Look into
  • Concurrency control
  • Atomicity
  • Local
  • Global
  • Consistency
  • Durability

29
Concurrency control
  • Concurrency in handhelds depends on
  • Multi-tasking support from the handheld OS
  • E.g., Linux in Simputer, PalmOS
  • User requirements
  • Several tasks may have to execute concurrently
  • E.g., A periodic synchronization task, address
    book access and an aggregation operation may run
    concurrently.
  • Strict 2PL, table level locks can be used
  • Small number of concurrent processes
  • Very few data conflicts
  • Table level locking has small overhead and allows
    non conflicting processes to continue execution

30
Atomicity
  • Ensure the All or nothing property
  • Local atomicity
  • E.g., enter name, email, phone number in the
    address book of Simputer
  • Shadow based update vs. In place update
  • Global atomicity
  • E.g., In an epurse application the updates are
    made at the bank's server, the Simputer and the
    smart card
  • 2PC, optimizations to 2PC, 1PC

31
Local atomicity
  • Shadow based update
  • Advantages
  • No disk locality problem in handheld DB
  • Simplifies recovery
  • Disadvantages
  • Poorly adopted to Pointer based storage models
  • Cost increases with increase in size of flash
    memory
  • In place update
  • Uses WAL
  • Accommodates Pointer based storage models
  • Cost does not increase with size of flash memory
  • Buffer replacement policy is Steal
  • Dirty blocks can be written to Smart card storage
    to avoid Undo

32
Global atomicity
  • Two Phase Commit (2PC)
  • Most commonly used atomic commit protocol
  • Shortcomings in handheld scenario
  • Two rounds (decision and voting) of messages
    imposes high communication overhead
  • Requires the handheld to be connected during the
    voting and decision phase
  • Large number of forced writes
  • Optimizations to 2PC
  • Presumed commit
  • Presumed abort

33
Global atomicity (cont)
  • One Phase Commit (1PC)
  • Advantages
  • Only one round of messages- no voting phase
  • Handheld can disconnect as soon as log records
    are transferred to fixed server
  • Lesser number of forced writes
  • Transactions involving Smart card and Handheld
    can use 1PC
  • Disadvantages
  • Requires participants to enforce 2PL. Will work
    with weak levels of consistency under certain
    conditions. In heterogeneous environment it is
    difficult to control the local DBMS concurrency
    control policies.

34
Consistency and Durability
  • Consistency
  • Local consistency can be ensured by defining
    integrity constraints
  • Durability
  • Either the changes of the transaction or enough
    information about the changes are written to
    stable storage before the transaction commits
  • Network durability- transfer log records to a
    server on the fixed network.
  • 1PC ensures network durability
  • Pointer based logging
  • Extended ephemeral logging

35
Synchronization
  • Access data Anytime and Anywhere using the
    handheld
  • Mobile sales person, Wireless ware house
  • Problem Not possible to remain connected always
  • Solution- Replicate data in the handheld
  • Download a copy of the data into the handheld
    from the remote server and process it offline.
    Periodically merge the changes with the server

36
Synchronization -Issues
  • Data replication can lead to conflicts
  • Update-update, Update-delete, Unique key
    violation, Integrity constraint violation
  • Maintain global consistency between replicated
    copies
  • Strict consistency with Data partitioning
  • Strict consistency with Reservation protocols or
    Leases
  • Efficient when data is rarely shared
  • Weak consistency with Eventual consistency
  • leases restrictive when data is shared between
    many copies
  • Independently access and update data
  • only tentative commits possible
  • Actual commit when transaction is executed at
    the server

37
Synchronization Issues (cont)
  • Application specific conflict detection and
    resolution
  • Maximum flexibility
  • Device, network and backend agnostic
  • XML, Unicode
  • Incremental maintenance
  • Save communication cost
  • Download parts of relations, i.e., views

38
Synchronization Existing Models
  • Publish Subscribe Model
  • Three tier
  • Enterprise applications
  • Independent updates
  • Eventual consistency
  • Conflict detection, resolution and merge
  • PC to Handheld Model
  • Two tier
  • Personal information

39
Publish Subscribe Model
  • Eventual consistency model
  • Merge replication in Win SQL CE, Oracle Lite
  • Publish Subscribe Process
  • Publication and article
  • Publishing
  • Subscribing
  • Subscription
  • Synchronization
  • Merging

40
Publish Subscribe Architecture
  • Application
  • SQL DB Engine
  • SQL Database
  • Client Agent
  • Server Agent
  • Merge Agent
  • Conflict Detection
  • Conflict Resolution
  • Replication Provider
  • SQL Server Database
  • Communication Link

41
Conflict Detection and Resolution
  • Conflict detection
  • Row level tracking
  • Associate RowID and Version with each row
  • RowID is used to uniquely identify each row
  • Version is used to check whether the a given row
    has changed in the server
  • Conflict resolution
  • A conflict resolution procedure is invoked when a
    conflict is detected. The resolution procedure is
    created when the article is published
  • output can be server wins or handheld wins. Here
    the server always wins

42
Row level tracking
STEP 1
STEP 2
43
Row level tracking (cont)
STEP 4
STEP 3
44
Current Implementation Status
  • Two Synchronization tools have been implemented
    for the Simputer
  • First Sync tool assumes that no updates are done
    in the handheld database
  • Second sync tool is based on Merge replication in
    Windows SQL CE. It allows independent updates in
    the handhelds.

45
Conclusions
  • Handheld DBMS techniques have to consider the
    resource constraints, mobility, frequent
    disconnections, and security aspects of the
    handheld
  • The techniques used for one component will
    influence the choice of the technique used in
    another component. There is a very strong
    interdependence between the components of the
    handheld DBMS
  • Techniques rejected for the disk environment may
    be explored in the handheld environment

46
Future work
  • Enhance the Sync tool
  • Transaction management component
  • Recovery management component
  • Concurrency control component
  • Performance analysis of existing compression
    techniques in handheld environment

47
References
48
References (cont)
49
References (cont)
50
References (cont)
51
  • Thank You

52
Query Processing (cont)
  • Benefit/Size of a scheme
  • Every scheme is characterized by a benefit/size
    ratio which represents its benefit per unit
    memory allocation
  • Minimum scheme for an operator is the scheme that
    has max. cost and min. memory
  • Assume n schemes s1, s2,sn to implement an
    operator o
  • min(o)smin
  • i, 1in Cost(si) Cost(smin) ,
  • Memory(si) Memory(smin)
  • smin is the minimum scheme for operator o
  • Benefit(si)Cost(smin)
    Cost(si)
  • Size(si) Memory(si) Memory(smin

53
Query Processing (cont)
  • Every operator is a collection of (size, benefit)
    points, n points for n schemes
  • Operator cost function is the collection of
    (cost, memory) points of its schemes
Write a Comment
User Comments (0)
About PowerShow.com