Memory Constrained DBMS with Updates

About This Presentation

Title:

Memory Constrained DBMS with Updates

Description:

Database management techniques are needed to meet the above requirements ... CPU intensive. Competes with other CPU intensive DBMS tasks. May slow down the DBMS ... – PowerPoint PPT presentation

Number of Views:124

Avg rating:3.0/5.0

Slides: 54

Provided by: ashw152

Category:

more less

Transcript and Presenter's Notes

Title: Memory Constrained DBMS with Updates

1
Memory Constrained DBMS with Updates

Ashwini G. Rao
Guide
Prof. Krithi Ramamritham

2
Outline of the talk

Need for Handheld DBMS
New Issues in Implementation
Project Goals
Review of Existing Work
Compression in Storage
Transaction Management
Synchronization
Current Implementation Status
Conclusions and Future work

3
Handhelds

Small, Convenient, Carry anywhere
Powerful
E.g. Simputer- 206MHz, 32MB SDRAM, 24 MB Flash
memory, LCD display, Smart card
Applications
Personal Info Management
E-dairy
Enterprise Applications
Health-care, Micro-banking

4
Need for Handheld DBMS

Handheld applications
Volume of data is high
Simple and Complex Queries
select, project, aggregate
ACID properties of transactions
Require Data Privacy
Need Synchronization
Database management techniques are needed to
meet the above requirements

5
New Issues in Implementation

Handheld DBMS vs. Disk DBMS
Handheld DB is Flash memory based
Disk read time is very small
Storage model should consider small memory and
computation power
Transaction management and synchronization have
to consider disconnections, mobility and
communication cost
Handheld Operating System provides lesser
facilities
E.g. no multi-threading support in PalmOS
Better security measures are required as
handhelds are easily stolen, damaged and lost

6
Project Goals

Existing work
Storage models
Query processing optimization
Executor
My work
Compression in Storage
Transaction management
Synchronization

7
Existing Work Review

Storage Management
Aim at compactness in representation of data
Limited storage could preclude any additional
index
Data model should try to incorporate some index
information
Query Processing
Minimize writes to secondary storage
Efficient usage of limited main memory

8
Storage Management

Existing storage models
Flat Storage
Tuples are stored sequentially. Duplicates not
eliminated
Pointer-based Domain Storage
Values partitioned into domains which are sets of
unique values
Tuples reference the attribute value by means of
pointers
One domain shared among multiple attributes

9
Storage Management (cont)
Flat Storage
Domain Storage

In Domain Storage, pointer of size p (typically 4
bytes) points to the domain value. Can we further
reduce the storage cost?

10
ID Based Storage
11
ID Based Storage

ID Storage
An identifier for each of the domain values
Store the smaller identifier instead of the
pointer
Identifier is the positional value in the domain
table. Use it as an offset into the domain table
D domain values can be distinguished by
identifiers of length log2D /8 bytes.

12
ID Storage (cont)

Extendable IDs are used. Length of the identifier
grows and shrinks depending on the number of
domain values
Starting with 1 byte identifiers, the length
grows and shrinks.
To reduce reorganization of data, ID values are
projected out from the rest of the relation and
stored separately maintaining Positional
Indexing.

13
ID Storage (cont)

Ping Pong Effect
At the boundaries, there is reorganization of ID
values
when the identifier length changes
Frequent insertions and deletions at the
boundaries might
result in a lot of reorganization
Phenomena should be avoided
No deletion of Domain values
Domain structure means a future insertion might
reference
the deleted value
Do not delete a domain value even it is not
referenced
Setting a threshold for deletion for domain
values
Delete only if number of deletions exceeds a
threshold
Increase the threshold when boundaries are being
crossed to reduce ping pong effect

14
ID Storage (cont)

Primary Key-Foreign Key relationship
Primary key is a domain in itself
IDs for primary key values
Values present in child table are the
corresponding primary key IDs
Projected foreign key column forms a Join Index

15
ID Storage (cont)

ID based Storage wins over Domain Storage when
pointer size gt log2D /8
Relations in a small device do not have a very
high cardinality Above condition true for most of
the data.
Advantages of ID storage
Considerable saving in storage cost.
Efficient join between parent table and child
table

16
Query Processing

Considerations
Minimize writes to secondary storage
Use Main memory as write buffer
Need for Left-deep Query Plan
Reduce materialization in flash memory. If
absolutely necessary use main memory
Bushy trees use materialization
Left deep tree is most suited for pipelined
evaluation
Right operand in a left-deep tree is always a
stored relation

17
Query Processing (cont)

Need for optimal memory allocation
Using nested loop algorithms for every operator
ensures that minimum amount of memory used to
execute the plan
Nested loop algorithms are inefficient
Different devices come with different memory
sizes
Query plans should make efficient use of memory.
Memory must be optimally allocated among all
operators
Need to generate the best query execution plan
depending on the available memory

18
Query Processing (cont)

Operator evaluation schemes
Different schemes for an operator
Schemes conform to left-deep tree query plan
All have different memory usage and cost
Cost of a scheme is the computation time

19
Query Processing (cont)

2-Phase optimizer
Phase 1 Query is first optimized to get a query
plan
Phase 2 Division of memory among the operators
Scheme for every operator is determined in phase
1 and remains unchanged after phase 2, memory
allocation in phase 2 is on the basis of the cost
functions of the schemes
Memory is assumed to be available for all the
schemes, this may not be true for a resource
constrained device
Traditional 2-phase optimization cannot be used

20
Query Processing (cont)

1-Phase optimizer
Query optimizer is made memory cognizant
Modified optimizer takes into account division of
memory among operators while choosing between
plans
Ideally, 1-phase optimization should be done but
the optimizer becomes complex.

21
Query Processing (cont)

Modified 2-phase optimizer
Optimal division of memory involves the decision
of selecting the best scheme for every operator
Phase 1
Determine the optimal left-deep join order using
dynamic programming approach
Phase 2
Divide memory among the operators
Choose the scheme for every operator depending on
the memory allocated

22
Query Processing (cont)

Memory allocation algorithms
Exact memory allocation
Heuristic memory allocation
Conclusions
Response times highest with minimum memory and
least with maximum memory
Computing power of the handheld affects the
response time in a big way
Heuristic memory allocation differed from exact
algorithm in a few points only

23
Compression in DB

Advantages
Saves space
Reduces read time and write time as less data is
processed
Logging consumes less space and time
Disadvantages
CPU intensive
Competes with other CPU intensive DBMS tasks.
May slow down the DBMS

24
Compression in Disk DB

Main assumption
The high disk read time compensates for the
extra time required for compression and
decompression
E.g. Let time taken to read 10 blocks of data
from the disk be 10ms. Let the time taken for
compression and decompression be 5ms. After
compression 10 blocks occupy only 1 block.
Processing time with compression/decompression
( 1ms 5ms) 6ms
Handheld DB is Flash memory based
Read time is very less. Above assumption is no
longer valid!!

25
Compression in Handhelds

Techniques can exploit high write time of flash
memory
Logging
Compressed records consume lesser log space
Writing time is reduced
Decompression done when recovery is initiated
Highly beneficial if failures are rare
Saves communication cost when log records have to
be sent over the network
E.g., Transaction management

26
Compression in Handhelds (cont)

Data compression in Smart cards
Consider Handheld with Smart card support
Data stored in smart cards is accessed and
updated
E.g., Personal database
Memory in smart cards is limited
Compression will save space
Data can be decompressed and processed in the
handheld

27
Transaction Management

Ensure ACID properties of local and global
transactions
Local transaction - Update address book entry in
Simputer
Global transaction - Transfer money from a bank
account to an epurse in a smart card attached to
a Simputer
Issues
Frequent disconnections, resource constraints,
mobility, loss or damage to handheld

28
Transaction Management (cont)

We will Look into
Concurrency control
Atomicity
Local
Global
Consistency
Durability

29
Concurrency control

Concurrency in handhelds depends on
Multi-tasking support from the handheld OS
E.g., Linux in Simputer, PalmOS
User requirements
Several tasks may have to execute concurrently
E.g., A periodic synchronization task, address
book access and an aggregation operation may run
concurrently.
Strict 2PL, table level locks can be used
Small number of concurrent processes
Very few data conflicts
Table level locking has small overhead and allows
non conflicting processes to continue execution

30
Atomicity

Ensure the All or nothing property
Local atomicity
E.g., enter name, email, phone number in the
address book of Simputer
Shadow based update vs. In place update
Global atomicity
E.g., In an epurse application the updates are
made at the bank's server, the Simputer and the
smart card
2PC, optimizations to 2PC, 1PC

31
Local atomicity

Shadow based update
Advantages
No disk locality problem in handheld DB
Simplifies recovery
Disadvantages
Poorly adopted to Pointer based storage models
Cost increases with increase in size of flash
memory
In place update
Uses WAL
Accommodates Pointer based storage models
Cost does not increase with size of flash memory
Buffer replacement policy is Steal
Dirty blocks can be written to Smart card storage
to avoid Undo

32
Global atomicity

Two Phase Commit (2PC)
Most commonly used atomic commit protocol
Shortcomings in handheld scenario
Two rounds (decision and voting) of messages
imposes high communication overhead
Requires the handheld to be connected during the
voting and decision phase
Large number of forced writes
Optimizations to 2PC
Presumed commit
Presumed abort

33
Global atomicity (cont)

One Phase Commit (1PC)
Advantages
Only one round of messages- no voting phase
Handheld can disconnect as soon as log records
are transferred to fixed server
Lesser number of forced writes
Transactions involving Smart card and Handheld
can use 1PC
Disadvantages
Requires participants to enforce 2PL. Will work
with weak levels of consistency under certain
conditions. In heterogeneous environment it is
difficult to control the local DBMS concurrency
control policies.

34
Consistency and Durability

Consistency
Local consistency can be ensured by defining
integrity constraints
Durability
Either the changes of the transaction or enough
information about the changes are written to
stable storage before the transaction commits
Network durability- transfer log records to a
server on the fixed network.
1PC ensures network durability
Pointer based logging
Extended ephemeral logging

35
Synchronization

Access data Anytime and Anywhere using the
handheld
Mobile sales person, Wireless ware house
Problem Not possible to remain connected always
Solution- Replicate data in the handheld
Download a copy of the data into the handheld
from the remote server and process it offline.
Periodically merge the changes with the server

36
Synchronization -Issues

Data replication can lead to conflicts
Update-update, Update-delete, Unique key
violation, Integrity constraint violation
Maintain global consistency between replicated
copies
Strict consistency with Data partitioning
Strict consistency with Reservation protocols or
Leases
Efficient when data is rarely shared
Weak consistency with Eventual consistency
leases restrictive when data is shared between
many copies
Independently access and update data
only tentative commits possible
Actual commit when transaction is executed at
the server

37
Synchronization Issues (cont)

Application specific conflict detection and
resolution
Maximum flexibility
Device, network and backend agnostic
XML, Unicode
Incremental maintenance
Save communication cost
Download parts of relations, i.e., views

38
Synchronization Existing Models

Publish Subscribe Model
Three tier
Enterprise applications
Independent updates
Eventual consistency
Conflict detection, resolution and merge

PC to Handheld Model
Two tier
Personal information

39
Publish Subscribe Model

Eventual consistency model
Merge replication in Win SQL CE, Oracle Lite
Publish Subscribe Process
Publication and article
Publishing
Subscribing
Subscription
Synchronization
Merging

40
Publish Subscribe Architecture

Application
SQL DB Engine
SQL Database
Client Agent
Server Agent
Merge Agent
Conflict Detection
Conflict Resolution
Replication Provider
SQL Server Database
Communication Link

41
Conflict Detection and Resolution

Conflict detection
Row level tracking
Associate RowID and Version with each row
RowID is used to uniquely identify each row
Version is used to check whether the a given row
has changed in the server
Conflict resolution
A conflict resolution procedure is invoked when a
conflict is detected. The resolution procedure is
created when the article is published
output can be server wins or handheld wins. Here
the server always wins

42
Row level tracking
STEP 1
STEP 2
43
Row level tracking (cont)
STEP 4
STEP 3
44
Current Implementation Status

Two Synchronization tools have been implemented
for the Simputer
First Sync tool assumes that no updates are done
in the handheld database
Second sync tool is based on Merge replication in
Windows SQL CE. It allows independent updates in
the handhelds.

45
Conclusions

Handheld DBMS techniques have to consider the
resource constraints, mobility, frequent
disconnections, and security aspects of the
handheld
The techniques used for one component will
influence the choice of the technique used in
another component. There is a very strong
interdependence between the components of the
handheld DBMS
Techniques rejected for the disk environment may
be explored in the handheld environment

46
Future work

Enhance the Sync tool
Transaction management component
Recovery management component
Concurrency control component
Performance analysis of existing compression
techniques in handheld environment

47
References
48
References (cont)
49
References (cont)
50
References (cont)
51

Thank You

52
Query Processing (cont)

Benefit/Size of a scheme
Every scheme is characterized by a benefit/size
ratio which represents its benefit per unit
memory allocation
Minimum scheme for an operator is the scheme that
has max. cost and min. memory
Assume n schemes s1, s2,sn to implement an
operator o
min(o)smin
i, 1in Cost(si) Cost(smin) ,
Memory(si) Memory(smin)
smin is the minimum scheme for operator o
Benefit(si)Cost(smin)
Cost(si)
Size(si) Memory(si) Memory(smin

53
Query Processing (cont)

Every operator is a collection of (size, benefit)
points, n points for n schemes
Operator cost function is the collection of
(cost, memory) points of its schemes

Write a Comment

User Comments (0)