The DELite Project: Database Support for Embedded Lightweight Devices - PowerPoint PPT Presentation

About This Presentation

Title:

The DELite Project: Database Support for Embedded Lightweight Devices

Description:

E.g. Simputer- 206MHz, 32MB SDRAM, 24 MB Flash memory, LCD display, Smart card. Applications ... Handheld DB is Flash memory based. Disk read time is very small ... – PowerPoint PPT presentation

Number of Views:47

Avg rating:3.0/5.0

Slides: 34

Provided by: ashw152

Category:

more less

Transcript and Presenter's Notes

Title: The DELite Project: Database Support for Embedded Lightweight Devices

1
The DELite Project Database Support for
Embedded Lightweight Devices

Prof. Krithi Ramamritham

2
Outline of the talk

Need for small footprint DBMSs
New Issues in Implementation
Project Goals
Review of Existing Work
Current Implementation Status

3
Small DBMSs, e.g., for Handhelds

Small, Convenient, Carry anywhere
Powerful
E.g. Simputer- 206MHz, 32MB SDRAM, 24 MB Flash
memory, LCD display, Smart card
Applications
Personal Info Management
E-dairy
Enterprise Applications
Health-care, Micro-banking

4
Need for Handheld DBMS

Handheld applications
Volume of data is high
Simple and Complex Queries
select, project, aggregate
ACID properties of transactions
Require Data Privacy
Need Synchronization
Database management techniques are needed to
meet the above requirements

5
New Issues in Implementation

Small DBMS vs. Disk DBMS
Handheld DB is Flash memory based
Disk read time is very small
Storage model should consider small memory and
computation power
Transaction management and synchronization have
to consider disconnections, mobility and
communication cost
Handheld Operating System provides lesser
facilities
E.g. no multi-threading support in PalmOS
Better security measures are required as
handhelds are easily stolen, damaged and lost

6
Project Goals

Existing work
Investigations of
Storage models
Query processing optimization
Executor
Proposed work
Compression in Storage
Transaction management
Synchronization

7
Existing Work Review

Storage Management
Aim at compactness in representation of data
Limited storage could preclude any additional
index
Data model should try to incorporate some index
information
Query Processing
Minimize writes to secondary storage
Efficient usage of limited main memory

8
Storage Management

Existing storage models
Flat Storage
Tuples are stored sequentially. Duplicates not
eliminated
Pointer-based Domain Storage
Values partitioned into domains which are sets of
unique values
Tuples reference the attribute value by means of
pointers
One domain shared among multiple attributes

9
Storage Management (cont)
Flat Storage
Domain Storage

In Domain Storage, pointer of size p (typically 4
bytes) points to the domain value. Can we further
reduce the storage cost?

10
ID Based Storage
11
ID Based Storage

ID Storage
An identifier for each of the domain values
Store the smaller identifier instead of the
pointer
Identifier is the positional value in the domain
table. Use it as an offset into the domain table
D domain values can be distinguished by
identifiers of length log2D /8 bytes.

12
ID Storage (cont)

Extendable IDs are used. Length of the identifier
grows and shrinks depending on the number of
domain values
Starting with 1 byte identifiers, the length
grows and shrinks.
To reduce reorganization of data, ID values are
projected out from the rest of the relation and
stored separately maintaining Positional
Indexing.

13
ID Storage (cont)

Ping Pong Effect
At the boundaries, there is reorganization of ID
values
when the identifier length changes
Frequent insertions and deletions at the
boundaries might
result in a lot of reorganization
Phenomena should be avoided
No deletion of Domain values
Domain structure means a future insertion might
reference
the deleted value
Do not delete a domain value even it is not
referenced
Setting a threshold for deletion for domain
values
Delete only if number of deletions exceeds a
threshold
Increase the threshold when boundaries are being
crossed to reduce ping pong effect

14
ID Storage (cont)

Primary Key-Foreign Key relationship
Primary key is a domain in itself
IDs for primary key values
Values present in child table are the
corresponding primary key IDs
Projected foreign key column forms a Join Index

15
ID Storage (cont)

ID based Storage wins over Domain Storage when
pointer size gt log2D /8
Relations in a small device do not have a very
high cardinality.
Above condition true for most of the data.
Advantages of ID storage
Considerable saving in storage cost.
Efficient join between parent table and child
table

16
Query Processing

Considerations
Minimize writes to secondary storage
Use Main memory as write buffer
Need for Left-deep Query Plan
Reduce materialization in flash memory. If
absolutely necessary use main memory
Bushy trees use materialization
Left deep tree is most suited for pipelined
evaluation
Right operand in a left-deep tree is always a
stored relation

17
Query Processing (cont)

Need for optimal memory allocation
Using nested loop algorithms for every operator
ensures that minimum amount of memory used to
execute the plan
Nested loop algorithms are inefficient
Different devices come with different memory
sizes
Query plans should make efficient use of memory.
Memory must be optimally allocated among all
operators
Need to generate the best query execution plan
depending on the available memory

18
Query Processing (cont)

Operator evaluation schemes
Different schemes for an operator
Schemes conform to left-deep tree query plan
All have different memory usage and cost
Cost of a scheme is the computation time

19
Query Processing (cont)

2-Phase optimizer
Phase 1 Query is first optimized to get a query
plan
Phase 2 Division of memory among the operators
Scheme for every operator is determined in phase
1 and remains unchanged after phase 2, memory
allocation in phase 2 is on the basis of the cost
functions of the schemes
Memory is assumed to be available for all the
schemes, this may not be true for a resource
constrained device
Traditional 2-phase optimization cannot be used

20
Query Processing (cont)

1-Phase optimizer
Query optimizer is made memory cognizant
Modified optimizer takes into account division of
memory among operators while choosing between
plans
Ideally, 1-phase optimization should be done but
the optimizer becomes complex.

21
Query Processing (cont)

Modified 2-phase optimizer
Optimal division of memory involves the decision
of selecting the best scheme for every operator
Phase 1
Determine the optimal left-deep join order using
dynamic programming approach
Phase 2
Divide memory among the operators
Choose the scheme for every operator depending on
the memory allocated

22
Query Processing (cont)

Memory allocation algorithms
Exact memory allocation
Heuristic memory allocation
Conclusions
Response times highest with minimum memory and
least with maximum memory
Computing power of the handheld affects the
response time in a big way
Heuristic memory allocation differed from exact
algorithm in a few points only

23
Compression in DB

Advantages
Saves space
Reduces read time and write time as less data is
processed
Logging consumes less space and time
Disadvantages
CPU intensive
Competes with other CPU intensive DBMS tasks.
May slow down the DBMS

24
Compression in Disk DB

Main assumption
The high disk read time compensates for the
extra time required for compression and
decompression
E.g. Let time taken to read 10 blocks of data
from the disk be 10ms. Let the time taken for
compression and decompression be 5ms. After
compression 10 blocks occupy only 1 block.
Processing time with compression/decompression
( 1ms 5ms) 6ms
Handheld DB is Flash memory based
Read time is very less. Above assumption is no
longer valid!!

25
Transaction Management

Ensure ACID properties of local and global
transactions
Local transaction - Update address book entry in
Simputer
Global transaction - Transfer money from a bank
account to an epurse in a smart card attached to
a Simputer
Issues
Frequent disconnections, resource constraints,
mobility, loss or damage to handheld

26
Synchronization

Access data Anytime and Anywhere using the
handheld
Mobile sales person, Wireless ware house
Problem Not possible to remain connected always
Solution- Replicate data in the handheld
Download a copy of the data into the handheld
from the remote server and process it offline.
Periodically merge the changes with the server

27
Synchronization -Issues

Data replication can lead to conflicts
Update-update, Update-delete, Unique key
violation, Integrity constraint violation
Maintain global consistency between replicated
copies
Strict consistency with Data partitioning
Strict consistency with Reservation protocols or
Leases
Efficient when data is rarely shared
Weak consistency with Eventual consistency
leases restrictive when data is shared between
many copies
Independently access and update data
only tentative commits possible
Actual commit when transaction is executed at
the server

28
Conclusions

Handheld DBMS techniques have to consider the
resource constraints, mobility, frequent
disconnections, and security aspects of the
handheld
The techniques used for one component will
influence the choice of the technique used in
another component. There is a very strong
interdependence between the components of the
handheld DBMS
Techniques rejected for the disk environment may
be explored in the handheld environment

29
Future work