Title: PicoDBMS: Scaling down Database Techniques for the Smartcard
1PicoDBMS Scaling down DatabaseTechniques for
the Smartcard
- Authors Christophe Bobineau,
- Luc Bouganim,
- Philippe Pucheral,
- Patrick Valduriez
- Presented by ON Saitung
2Outline
- Smartcard Overview
- Motivation of DBMS on Smartcard
- Challenges of Scaling down DBMS for Smartcard
- PicoDBMS an elegant solution
- - Storage Model
- - Query Processing Model
- Conclusion
3Smartcard Products
4Smartcard Architecture
5Smartcard Overview
- Advantages
- - Portability (small size, light weight)
- - Security
- Physical view tamper proof
- Data view strong encrypted
- - Availability (anytime, anywhere, on any
terminal) - - Widely used
- banking, healthcare, insurance, etc.
- - Low cost (to be sold in large volumes)
6Smartcard Overview
- Disadvantages
- - Limited storage capacity (up to 128Kb EEPROM)
- - Very slow write speed in EEPROM (10ms/word)
- - Extremely small size of the RAM (i.e., 4KB)
- - Lack of autonomy
- i.e., no independent power supply, thus no
- asynchronous and disconnected processing
7Motivation of DBMS on Smartcard
- The Trend towards multi-application, versatility
8Motivation of DBMS on Smartcard
- Smartcard will become more powerful
- Moores law for processor and memory capacities
- To handle complex queries on a large volume of
data - To Keep data consistency
- i.e. to meet the transaction ACID properties
- The nature merit of DBMS
- Security concern
- unsuitable to implement DBMS outside smartcard
9Why should the traditional DBMS be scaled down?
- Recall the limitations on Smart Card
- - tiny RAM, little stable storage
- - costly write and lack of autonomy
- - high demand on security issues
- Traditional DBMS
- - Consume significant amounts of RAM
- - Use caching and asynchronous I/Os to reduce
- disk accesses
- - Complex algorithms
Conflict
Therefore, how to scale down database techniques
becomes the major problem!
10The existing solutions to scale down Database
techniques
Used on PDA, have a small footprint but still
use much RAM and stable memory
- - Sybase Adaptive Server Anywhere
- - Oracle 8i Lite
- - DB2 Everywhere
- - ISOLs SQLJava Machine DBMS
- - SCQL - standard for smart card database
language
Used on smart card, their optimization is limited
on simple select query
Therefore, further scaling down database
techniques is required!
11Database Management Requirements
Database management requirements of various
classes of smartcard applications
12Detailed study on Health Card Application
- The representative of personal folder
applications - The amount of data is significant (duplicates
exist) - Queries can be rather complex
- Mature access rights management are required
- Atomicity must be preserved.
- Durability is a must.
13Challenges of Scaling down DBMS for Smartcard
- How to minimize the data structures to fit on the
smart card? - How to minimize the RAM usage so that no overflow
happened? - How to minimize write operations to enhance
performance? - How to take advantage of the fast read operations
of EEPROM? - How to enforce ACID for transactions?
- How to minimize the algorithms complexity to
keep it secure?
14PicoDBMS Architecture
- Storage manager
- manages the storage of data indices
- Query manager
- processes query plans composed of select,
project, join and aggregates. - Transaction manager
- enforces the ACID properties.
- Access right manager
- provides access rights on base data and on
user-defined views.
15PicoDBMS Design Rules
- Compactness rule
- Minimize size of data indices
- RAM rule
- Minimize the RAM usage while executing
queries - Write rule
- Minimize writes in stable storage
- Read rule
- Take advantage of fast read of EEPROM
- Access rule
- Take advantage of direct access capability
of EEPROM - Security rule
- Never externalize protected data Simplify
algorithms
These rules are addressing the aforementioned
challenges!
16PicoDBMS Storage Model
- Existing storage models
- Storage cost evaluation
- - A combination of the existing storage models
- - Chooses the best storage model for each
attribute
17Flat Storage model
- Tuples are stored sequentially with attribute
- values embedded.
- Two drawbacks
- 1. Space consuming (cannot avoid value
duplicates) - 2. Inefficient (have to compute sequentially)
- Indexed FS
- solve the second drawback, but makes the first
- drawback worse.
Relation R
18Domain Storage model
- Factoring out values into a domain table for
- data compactness.
- Three advantages
- 1. Possible space saving
- 2. Amount of Writes can be possibly reduced
- 3. All tuples of all relations become
fixed-size - One disadvantage
- Inefficient (have to compute sequentially)
- Only use when
- 1. data size gt pointer size
- 2. duplicates exist
19Domain Storage Illustration
Relation S
Relation R
Domain value
Relation S
Relation R
20Join on Domain Storage
Relation R
Relation S
R.a
S.b
It naturally forms an unidirectional join
index from S.b to R.a
m
n
S.b foreign key referencing R.a
R.a primary key
e.g. select from S, R where S.b R.a
Cost O(m) v.s. O(mn)
21Ring Storage Model
- Address both index compactness and data
compactness. - Replacing Tuple to Value pointers with Value to
Tuple pointers - One pointer per domain value whatever the
cardinality is.
Relation S
attr
Relation S
Index on S.attr
attr
Domain
22Select on Ring Storage
Relation S
Relation S
attr
Index on S.attr
attr
Domain
e.g. Select from S where S.attr 1
23Join on Ring Storage
R.a
Relation R
Here, it naturally forms a bi-directional join
index between S.b and R.a
Relation S
S.b
m
n
S.b foreign key referencing R.a
R.a primary key
e.g. select from S, R where S.b R.a
Cost O(m2/(2n)) v.s. O(mn)
24Storage cost evaluation
- - The proposed storage model is a combination
of FS,DS,RS and Indexed FS. - - Cost evaluation helps to decide which model
to be adopted for storing each attributes. - - For attributes need indexing, the choice is
between RS and - Indexed FS
- - For attributes do not need indexing, the
choice is between FS and DS
25Parameters of Cost Model
- CardRel cardinality of the relation holding the
attribute - CardDom cardinality of the attribute domain
table - a Average length of attribute (bytes)
- p Pointer size (3 bytes..)
- S Selectivity factor of the attribute lt1
(redundancy of the attribute) SCardDom/CardRel
26Cost Formula
- Cost (FS) CardRela
- Cost (DS) CardRelpSCardRela
- Cost (Indexed FS) Cost (FS)SCardRelaCardRel
p - Cost (RS) Cost (DS)SCardRelp
- Cost (FS) Cost (DS) when S(a-p)/a - Cost
(Indexed FS) Cost (RS) when Sa/p
27FS vs. DS
- Cost (FS) Cost (DS) when S (a - p) /a
S
P 3
FS better
DS better
a
28Indexed FS vs. RS
- - Cost (Indexed FS) Cost (RS) when S a/p
P 3
S
FS better
RS better
a
29Traditional Query Processing
- Traditional Query Processing consumes large main
memory for storing - - Intermediate Results
- - Temporary data structures (e.g. hash
functions) - When no space to hold them, state-of-the-art
algorithms store them on disk to avoid memory
overflow
30Why not use these traditional algorithms?
- For a longer lifetime of stable memory, writes on
stable memory are forbidden. - Unable to estimate the required RAM size
- Those state-of-the-art algorithms are complex,
thus may bring security problems for smartcard.
A query processing technique which do not use any
RAM nor incur any writes in the stable memory is
proposed in PicoDBMS!
31Query Execution
- Query processing is done in two steps
- - Query execution plan (QEP) generation
- - Query engine executes this QEP
- Different shapes of QEP
- - Left deep tree
- - Right deep tree
- - Bushy tree
- - Extreme right deep tree
32Query Example (From Healthcare Database)
- Doctor (DocId,name,specialty,...)
- Prescription (VisId,DrugId,qty,...)
- Visit (VisId,DocId,date,diagnostics,...)
- Drug (DrugId,name,type)
- Query Q1 Who prescribes antibiotics in 1999
33Left deep tree
- Operators are executed sequentially - Each
intermediate result is stored in stable memory.
Query Q1 Who prescribes antibiotics in 1999
34Right deep tree
- Operators are executed in a pipelined fashion
- All left-operands must be stored in memory
Query Q1 Who prescribes antibiotics in 1999
35Bushy tree
- Both the intermediate results and left operands
need to be materialized.
Query Q1 Who prescribes antibiotics in 1999
36Extreme right tree
- Uses pipelining on every operator (including
select) - As left operands are always base
relations, they are already materialized in
stable memory. - Thus, no ram consumption and
no writes incur
Query Q1 Who prescribes antibiotics in 1999
37Iterator Model
- Pipeline execution can be easily achieved using
Iterator Model. - Each operator is an iterator that supports 3
procedure calls - 1. Open - prepare an operator for producing an
item - 2. next - to produce an item
- 3. Close - perform final clean up
- Starts from the root.
38Complex Query Execution
- Aggregate (e.g., count, min, max) - Sort -
Duplicate removal
- Rely on materializing intermediate results -
Thus, not suitable for smartcard
39Complex Query Execution without RAM
- If the incoming tuples are already grouped by
distinct values, aggregate and duplicate removal
can be done in pipeline. - Pipeline operators
are in order because tuples are consumed in their
arrival order.
Therefore, maintaining a right order at the
leaf of the execution tree allows pipelined
aggregation and duplicate removal!
40Example of Complex Query Execution
Query Q2 Number of antibiotics prescribed per
doctor in 1999
41Another Example
The output data should be grouped by type of
drugs rather than Drug.id, therefore, an
additional join is required between Drug and
Drug.type.
Query Q3 Number of prescription per type of
drug
42Performance Evaluation
- Evaluate whether PicoDBMS performance matches the
smartcard applications requirements - Evaluate which techniques are really necessary
- - ring indices
- - query optimization
-
43Performance Simple Query
For simple query,
Ring indices are necessary in medium and
large database!
Query optimization is useful for large
database!
Query Q1 Who prescribes antibiotics in 1999
44Performance Complex Query
For complex query,
Ring indices are necessary in medium and
large database!
Query optimization is useful for large
database!
Query Q4 Number of prescriptions per doctor
and type of drugs
45Conclusion
- Overall, a great contribution to industry
- - Ranked the best paper of VLDB 2000
- - Elegant ring-based storage model
- - Powerful query processing engine
- - Meets the requirements of smartcard
applications - Large potential for future deployment
46