The DELite Project: Database Support for Embedded Lightweight Devices - PowerPoint PPT Presentation

About This Presentation
Title:

The DELite Project: Database Support for Embedded Lightweight Devices

Description:

E.g. Simputer- 206MHz, 32MB SDRAM, 24 MB Flash memory, LCD display, Smart card. Applications ... Handheld DB is Flash memory based. Disk read time is very small ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 34
Provided by: ashw152
Category:

less

Transcript and Presenter's Notes

Title: The DELite Project: Database Support for Embedded Lightweight Devices


1
The DELite Project Database Support for
Embedded Lightweight Devices
  • Prof. Krithi Ramamritham

2
Outline of the talk
  • Need for small footprint DBMSs
  • New Issues in Implementation
  • Project Goals
  • Review of Existing Work
  • Current Implementation Status

3
Small DBMSs, e.g., for Handhelds
  • Small, Convenient, Carry anywhere
  • Powerful
  • E.g. Simputer- 206MHz, 32MB SDRAM, 24 MB Flash
    memory, LCD display, Smart card
  • Applications
  • Personal Info Management
  • E-dairy
  • Enterprise Applications
  • Health-care, Micro-banking

4
Need for Handheld DBMS
  • Handheld applications
  • Volume of data is high
  • Simple and Complex Queries
  • select, project, aggregate
  • ACID properties of transactions
  • Require Data Privacy
  • Need Synchronization
  • Database management techniques are needed to
    meet the above requirements

5
New Issues in Implementation
  • Small DBMS vs. Disk DBMS
  • Handheld DB is Flash memory based
  • Disk read time is very small
  • Storage model should consider small memory and
    computation power
  • Transaction management and synchronization have
    to consider disconnections, mobility and
    communication cost
  • Handheld Operating System provides lesser
    facilities
  • E.g. no multi-threading support in PalmOS
  • Better security measures are required as
    handhelds are easily stolen, damaged and lost

6
Project Goals
  • Existing work
  • Investigations of
  • Storage models
  • Query processing optimization
  • Executor
  • Proposed work
  • Compression in Storage
  • Transaction management
  • Synchronization

7
Existing Work Review
  • Storage Management
  • Aim at compactness in representation of data
  • Limited storage could preclude any additional
    index
  • Data model should try to incorporate some index
    information
  • Query Processing
  • Minimize writes to secondary storage
  • Efficient usage of limited main memory

8
Storage Management
  • Existing storage models
  • Flat Storage
  • Tuples are stored sequentially. Duplicates not
    eliminated
  • Pointer-based Domain Storage
  • Values partitioned into domains which are sets of
    unique values
  • Tuples reference the attribute value by means of
    pointers
  • One domain shared among multiple attributes

9
Storage Management (cont)
Flat Storage
Domain Storage

  • In Domain Storage, pointer of size p (typically 4
    bytes) points to the domain value. Can we further
    reduce the storage cost?

10
ID Based Storage
11
ID Based Storage
  • ID Storage
  • An identifier for each of the domain values
  • Store the smaller identifier instead of the
    pointer
  • Identifier is the positional value in the domain
    table. Use it as an offset into the domain table
  • D domain values can be distinguished by
    identifiers of length log2D /8 bytes.

12
ID Storage (cont)
  • Extendable IDs are used. Length of the identifier
    grows and shrinks depending on the number of
    domain values
  • Starting with 1 byte identifiers, the length
    grows and shrinks.
  • To reduce reorganization of data, ID values are
    projected out from the rest of the relation and
    stored separately maintaining Positional
    Indexing.

13
ID Storage (cont)
  • Ping Pong Effect
  • At the boundaries, there is reorganization of ID
    values
  • when the identifier length changes
  • Frequent insertions and deletions at the
    boundaries might
  • result in a lot of reorganization
  • Phenomena should be avoided
  • No deletion of Domain values
  • Domain structure means a future insertion might
    reference
  • the deleted value
  • Do not delete a domain value even it is not
    referenced
  • Setting a threshold for deletion for domain
    values
  • Delete only if number of deletions exceeds a
    threshold
  • Increase the threshold when boundaries are being
    crossed to reduce ping pong effect

14
ID Storage (cont)
  • Primary Key-Foreign Key relationship
  • Primary key is a domain in itself
  • IDs for primary key values
  • Values present in child table are the
    corresponding primary key IDs
  • Projected foreign key column forms a Join Index

15
ID Storage (cont)
  • ID based Storage wins over Domain Storage when
    pointer size gt log2D /8
  • Relations in a small device do not have a very
    high cardinality.
  • Above condition true for most of the data.
  • Advantages of ID storage
  • Considerable saving in storage cost.
  • Efficient join between parent table and child
    table

16
Query Processing
  • Considerations
  • Minimize writes to secondary storage
  • Use Main memory as write buffer
  • Need for Left-deep Query Plan
  • Reduce materialization in flash memory. If
    absolutely necessary use main memory
  • Bushy trees use materialization
  • Left deep tree is most suited for pipelined
    evaluation
  • Right operand in a left-deep tree is always a
    stored relation

17
Query Processing (cont)
  • Need for optimal memory allocation
  • Using nested loop algorithms for every operator
    ensures that minimum amount of memory used to
    execute the plan
  • Nested loop algorithms are inefficient
  • Different devices come with different memory
    sizes
  • Query plans should make efficient use of memory.
    Memory must be optimally allocated among all
    operators
  • Need to generate the best query execution plan
    depending on the available memory

18
Query Processing (cont)
  • Operator evaluation schemes
  • Different schemes for an operator
  • Schemes conform to left-deep tree query plan
  • All have different memory usage and cost
  • Cost of a scheme is the computation time

19
Query Processing (cont)
  • 2-Phase optimizer
  • Phase 1 Query is first optimized to get a query
    plan
  • Phase 2 Division of memory among the operators
  • Scheme for every operator is determined in phase
    1 and remains unchanged after phase 2, memory
    allocation in phase 2 is on the basis of the cost
    functions of the schemes
  • Memory is assumed to be available for all the
    schemes, this may not be true for a resource
    constrained device
  • Traditional 2-phase optimization cannot be used

20
Query Processing (cont)
  • 1-Phase optimizer
  • Query optimizer is made memory cognizant
  • Modified optimizer takes into account division of
    memory among operators while choosing between
    plans
  • Ideally, 1-phase optimization should be done but
    the optimizer becomes complex.

21
Query Processing (cont)
  • Modified 2-phase optimizer
  • Optimal division of memory involves the decision
    of selecting the best scheme for every operator
  • Phase 1
  • Determine the optimal left-deep join order using
    dynamic programming approach
  • Phase 2
  • Divide memory among the operators
  • Choose the scheme for every operator depending on
    the memory allocated

22
Query Processing (cont)
  • Memory allocation algorithms
  • Exact memory allocation
  • Heuristic memory allocation
  • Conclusions
  • Response times highest with minimum memory and
    least with maximum memory
  • Computing power of the handheld affects the
    response time in a big way
  • Heuristic memory allocation differed from exact
    algorithm in a few points only

23
Compression in DB
  • Advantages
  • Saves space
  • Reduces read time and write time as less data is
    processed
  • Logging consumes less space and time
  • Disadvantages
  • CPU intensive
  • Competes with other CPU intensive DBMS tasks.
  • May slow down the DBMS

24
Compression in Disk DB
  • Main assumption
  • The high disk read time compensates for the
    extra time required for compression and
    decompression
  • E.g. Let time taken to read 10 blocks of data
    from the disk be 10ms. Let the time taken for
    compression and decompression be 5ms. After
    compression 10 blocks occupy only 1 block.
  • Processing time with compression/decompression
  • ( 1ms 5ms) 6ms
  • Handheld DB is Flash memory based
  • Read time is very less. Above assumption is no
    longer valid!!

25
Transaction Management
  • Ensure ACID properties of local and global
    transactions
  • Local transaction - Update address book entry in
    Simputer
  • Global transaction - Transfer money from a bank
    account to an epurse in a smart card attached to
    a Simputer
  • Issues
  • Frequent disconnections, resource constraints,
    mobility, loss or damage to handheld

26
Synchronization
  • Access data Anytime and Anywhere using the
    handheld
  • Mobile sales person, Wireless ware house
  • Problem Not possible to remain connected always
  • Solution- Replicate data in the handheld
  • Download a copy of the data into the handheld
    from the remote server and process it offline.
    Periodically merge the changes with the server

27
Synchronization -Issues
  • Data replication can lead to conflicts
  • Update-update, Update-delete, Unique key
    violation, Integrity constraint violation
  • Maintain global consistency between replicated
    copies
  • Strict consistency with Data partitioning
  • Strict consistency with Reservation protocols or
    Leases
  • Efficient when data is rarely shared
  • Weak consistency with Eventual consistency
  • leases restrictive when data is shared between
    many copies
  • Independently access and update data
  • only tentative commits possible
  • Actual commit when transaction is executed at
    the server

28
Conclusions
  • Handheld DBMS techniques have to consider the
    resource constraints, mobility, frequent
    disconnections, and security aspects of the
    handheld
  • The techniques used for one component will
    influence the choice of the technique used in
    another component. There is a very strong
    interdependence between the components of the
    handheld DBMS
  • Techniques rejected for the disk environment may
    be explored in the handheld environment

29
Future work
  • Sync tool
  • Transaction management component
  • Recovery management component
  • Concurrency control component
  • Performance analysis of existing compression
    techniques in handheld environment

30
References
31
References (cont)
32
References (cont)
33
References (cont)
Write a Comment
User Comments (0)
About PowerShow.com