Title: 1\50 pages
1CERIA Laboratory
Thesis presentation
Algebraic Signatures For Scalable Distributed
Data Structures
Riad MOKADEM Riad.Mokadem_at_dauphine.fr
http//ceria.dauphine.fr/riadmokadem/riad.html
2PLAN
P L A N
1. Introduction
2. SDDS-2005
- Algebraic signatures
- Backup Scheme, Concurency Update
4. Algebraic Cumulative Signatures
5. String Matching in SDDS-2005
6. Performance Measurements
7. Conclusion Future Work
3Introduction SDDS-2005 Algebraic Signatures
Cumulative Algebraic Signatures String
Matching Performance Measurement Conclusion
Future Work
Facts SDDS SDDS RP Objective
Facts
- New architectures ? new data structures
and file system. - Data in Distributed Ram.
- Scalability.
- Parallel queries
- An SDDS is a new class of data structures
- Specific for Multicomputers, P2P, Grids
- For Any application needing scalability and fast
response time
4Introduction SDDS-2005 Algebraic Signatures
Cumulative Algebraic Signatures String
Matching Performance Measurement Conclusion
Future Work
Facts SDDS SDDS RP Objective
SDDSs Family
5Introduction SDDS-2005 Algebraic Signatures
Cumulative Algebraic Signatures String
Matching Performance Measurement Conclusion
Future Work
Facts SDDS SDDS RP Objective
SDDS RP Scheme
- Files are range partitioning (RP) based
- Records in distributed RAM
- Record (key, non-key field)
- Buckets split using median key
- Like in a B-tree
- Clients are not synchronously informed about
splits. - May send a query to an incorrect server
- Servers forward incorrectly addressed queries and
send back Image Adjustment Messages to adjust
client image. - Key search queries
- Range queries.
6Facts SDDS SDDS RP Objective
Introduction SDDS-2005 architecture Algebraic
Signatures Cumulative Algebraic
Signatures String Matching Performance
Measurement Conclusion Future Work
Objective New Capabilities SDDS-2005
- Parallel Store, restore file to/from disk
storage - The SDDS Backup Scheme
- Concurrent access Useless update detection
- Record Scheme Updates
- Protection against incidental viewing of data
in Servers - Encoded data in bucket
- Scans (non-key parallel search)
- Various string matches
- Prefix, String, longest common
7PLAN
P L A N
1. Introduction
2. SDDS-2005
- Algebraic signatures
- Backup Scheme, Concurency Update
4. Algebraic Cumulative Signatures
5. String Matching in SDDS-2005
6. Performance Measurements
7. Conclusion Future Work
8 SDDS-2005 Architecture Internal Organization of
Bucket SDDS-2005 Communication Server Client Demo
Client Interface
Introduction SDDS-2005 Algebraic Signatures
Cumulative Algebraic Signatures String
Matching Performance Measurement Conclusion
Future Work
SDDS-2005 Architecture
Multithread architecture
9 SDDS-2005 Architecture Internal Organization of
Bucket SDDS-2005 Communication Server Client Demo
Client Interface
Introduction SDDS-2005 Algebraic Signatures
Cumulative Algebraic Signatures String
Matching Performance Measurement Conclusion
Future Work
Internal Organization Of Bucket SDDS-2005
Index a few Kbytes up to MByte File Mapped
structures Data file Dozens of Mbytes up to
GBytes
10 SDDS-2005 Architecture Internal Organization of
Bucket SDDS-2005 Communication Server Client Demo
Client Interface
Introduction SDDS-2005 Algebraic Signatures
Cumulative Algebraic Signatures String
Matching Performance Measurement Conclusion
Future Work
Communication Server - Client
Client SDDS-2005
Client SDDS-2005
Client SDDS-2005
Network
Threads
Responses client
Requests client
Server SDDS-2005
11 SDDS-2005 Architecture Internal Organization of
Bucket SDDS-2005 Communication sever Client Demo
Client Interface
Introduction SDDS-2005 Algebraic Signatures
Cumulative Algebraic Signatures String
Matching Performance Measurement Conclusion
Future Work
Demo Client Interface
Choice of Search command
Search by Key
Search by content
12PLAN
P L A N
1. Introduction
2. SDDS-2005
- Algebraic signatures
- Backup Scheme, Concurency Update
4. Algebraic Cumulative Signatures
5. String Matching in SDDS-2005
6. Performance Measurements
7. Conclusion Future Work
13Introduction SDDS-2005 Algebraic Signatures
Cumulative Algebraic Signatures String
Matching Performance Measurement Conclusion
Future Work
Algebraic Signatures? Backup/ Restore in
SDDS-2005 Update Scheme
Algebraic Signatures? ICDE04
- Galois Field GF(2f ) fgtgt1
- Each symbol has size f
- f 8 or f 16 in SDDS-2005
- XOR used for and operations .
- Antilog and Log tables used for and / .
- Using a primitive element ?
-
GF(28) string ASCII Code GF(216 )
string Unicode
14Introduction SDDS-2005 Algebraic Signatures
Cumulative Algebraic Signatures String
Matching Performance Measurement Conclusion
Future Work
Algebraic Signatures? Backup/ Restore in
SDDS-2005 Update Scheme
Calculus Algebraic Signatures
- 1-symbol signature
- Sign ? ( P )? pi ? i i 1..n
- With P(p1,p2,,pn) and (? ?, ?2,
?3 ) - N-symbol signature
- Sign ? (P) (Sign ?( P ), Sign ?2( P ),Sign ?N (
P )) - Typical Collision Probability 2-Nf.
- In SDDS-2005
- N 1 or N 2.
15Introduction SDDS-2005 Algebraic Signatures
Cumulative Algebraic Signatures String
Matching Performance Measurement Conclusion
Future Work
Algebraic Signatures? Backup/ Restore in
SDDS-2005 Update Scheme
Backup Scheme WDAS03
- Need for RAM file backup at disk
- Backup only
- Protection againt RAM failure
- File remains in RAM
- Eviction
- RAM sharing among different SDDS files
- Restore
- SDDS file load from disk to RAM
16Introduction SDDS-2005 Algebraic Signatures
Cumulative Algebraic Signatures String
Matching Performance Measurement Conclusion
Future Work
Algebraic Signatures? Backup/ Restore in
SDDS-2005 Update Scheme
Backup Scheme WDAS03
- 2 mapped files Data_file, Index_file
- Bucket Paging
- Signed Data Page of 64 KB
- Signed Index Page of 256 B.
- List of Page Signatures Bucket Map
- Also backed up at the disk.
- Page Signature Algebraic Signature
- 2-Symbol in GF(216) 4B long
- Much shorter than SHA-1
17Introduction SDDS-2005 Algebraic Signatures
Cumulative Algebraic Signatures String
Matching Performance Measurement Conclusion
Future Work
Algebraic Signatures? Backup/ Restore in
SDDS-2005 Update Scheme
Parallel Backup on Disk Storage
- Write to the disk only the parts (pages) changed
since last backup.
Client
RAM Buckets
18Introduction SDDS-2005 Algebraic Signatures
Cumulative Algebraic Signatures String
Matching Performance Measurement Conclusion
Future Work
Algebraic Signatures? Backup/ Restore in
SDDS-2005 Update Scheme
Update Scheme PDMST DEXA06
- Normal update
- Compare Signature_before and Signature_after of
each record. - Send the update only if these signatures differ.
- The client sends only the effectively changed
data - Blind update
- Not search of record.
Management of concurrence
- Record signatures used as timestamps.
- Clients reads every record without any wait.
- It sends back the Before_Signature for
comparison with that stored. - There is a conflict if these signatures differ
19Introduction SDDS-2005 Algebraic Signatures
Cumulative Algebraic Signatures String
Matching Performance Measurement Conclusion
Future Work
Algebraic Signatures? Backup/ Restore in
SDDS-2005 Update Scheme
Concurent Blind Updates
Buckets of data in RAM
Calculus signature_before (R1)? Sgn1
s1
(R2) ?Sgn2
s2
Calculus Sgn2 Sgn2Sgn2?Update
(R2)?v2 Sgn2 ? Sgn2?Concurrent update
by another client ? No update
Client
. . .
sk
20PLAN
P L A N
1. Introduction
2. SDDS-2005
- Algebraic signatures
- Backup Scheme, Concurency Update
4. Algebraic Cumulative Signatures
5. String Matching in SDDS-2005
6. Performance Measurements
7. Conclusion Future Work
21Introduction SDDS-2005 Algebraic Signatures
Cumulative Algebraic Signatures String
Matching Performance Measurement Conclusion
Future Work
Cumulative Algebraic Signatures? Calculus
Cumulative Signatures Protection Afgainst
Incidental viewing of Data Encoding/ decoding
Cumulative Algebraic Signatures? VLDB-DBISP2P05
- Encodes each symbol pi in the record P
(p1,p2,..pi,..pn) with the signature of prefix
ending at pi. - Protects against incidental data viewing on the
servers - Decoding is necessary
- Speeds up string match
22Introduction SDDS-2005 Algebraic Signatures
Cumulative Algebraic Signatures String
Matching Performance Measurement Conclusion
Future Work
Cumulative Algebraic Signatures? Calculus
Cumulative Signatures Encoding/
decoding Protection Against Incidental Viewing of
Data
Encoding/ Decoding
Key Data
Non Key Data
Records Structure
- Encoding / decoding concern only non key data.
- Encoding / Decoding in clients (Signatures are
calculated in clients)
23Introduction SDDS-2005 Algebraic Signatures
Cumulative Algebraic Signatures String
Matching Performance Measurement Conclusion
Future Work
Cumulative Algebraic Signatures? Calculus
Cumulative Signatures Encoding/
decoding Protection Against Incidental Viewing of
Data
Encoding/ Decoding
Encoding P(p1, p2,..,pn) - -gt P(p1,
p2,..,pn) pi pi-1 pi pi-1 XOR
pi pipi ?i antilog (log pii)
Serv 1
Insertion P
Encode
Serv 2
Client
Decode
Serv 3
Decoding P(p1, p2, pn) --gt P(p1, p2,,
pn) pipi / ?i antilog (log pi- i) pi
pi - pi-1 pi XOR pi-1
Signature Match
24Introduction SDDS-2005 Algebraic Signatures
Cumulative Algebraic Signatures String
Matching Performance Measurement Conclusion
Future Work
Cumulative Algebraic Signatures? Calculus
Cumulative Signatures Encoding/
decoding Protection Afgainst Incidental viewing
of Data
Protection against incidental viewing of Data
Example P SOUTENANCE_RIAD_MOKADEM Encoding
P(p1p2,.p30)(? S, p1 ? 2O,.,p29 ?
23M) Sign(P) Sign(M) ? S ? 2O. ? 23M
SDDS-2005 1 symbol (1B) per signature in GF (28).
25PLAN
P L A N
1. Introduction
2. SDDS-2005
- Algebraic signatures
- Backup Scheme, Concurency Update
4. Algebraic Cumulative Signatures
5. String Matching in SDDS-2005
6. Performance Measurements
7. Conclusion Future Work
26Introduction SDDS-2005 Algebraic Signatures
Cumulative Algebraic Signatures String
Matching in SDDS-2005 Performance
Measurement Conclusion Future Work
Our Approch Prefix Search, Complete
Search Sequential Search. N-Gram Search Longest
Common Prefix. Longest Common String
String Search LNCS05
Preview works (String matching) Boyer-Moore,
Karp-Rabin, Knuth- Morris-Pratt, Quick Search
- Cumulative Signatures
- Search in Non key data.
- Various string matches.
- Prefix, String, Longest common prefix, Longest
common string. - For Prefix, String search
- No sent data to search (Sending signature)
- Best confidentiality
- Faster messaging
27Introduction SDDS-2005 Algebraic Signatures
Cumulative Algebraic Signatures String
Matching Performance Measurement Conclusion
Future Work
Our approch Prefix Search, Complete Search String
Search Longest Common Prefix Longest Common
String
Prefix Search
Example
Search prefix S PARIS DAUPHINE
Client calculates Sc Sign(S) Sign (E)
Sends only Sc Size 15 to servers
In server
Sign (p15) !Sc
Collision resolution in client
Sign (p15) Sc ? Prefix found in Pj
Complexity O(1)
28Introduction SDDS-2005 Algebraic Signatures
Cumulative Algebraic Signatures String
Matching Performance Measurement Conclusion
Future Work
Our approch Prefix Search, Complete Search String
Search Longest Common Prefix. Longest Common
String
Complete Search
Full match
Client sends signature S to search.
Lr Es K Lc Sg Data
In server Comparison with algebraic signature Sg
stored in heading of each record. (Test
1st symbol then 2nd if equality)
Lr Longer of record. Es Pointer to next
record K Key of record. Lc Version Sg
Signature of record.
Record Structure
Sequential cover of records.
Complexity O(1)
29Introduction SDDS-2005 Algebraic Signatures
Cumulative Algebraic Signatures String
Matching Performance Measurement Conclusion
Future Work
Our approch Prefix Search, Complete
Search Sequenetial String Serach, n-Gram
Method Longest Common Prefix. Longest Common
String
Sequentail Search
Search for string S PARIS
Client sends Sc Sign (Paris) Sign (S)
size l 5 to servers.
Record P
UNIVERSITE PARIS DAUPHINE
Complexity O(n-l) n Size of record P
Collision resolution on the client
30Introduction SDDS-2005 Algebraic Signatures
Cumulative Algebraic Signatures String
Matching Performance Measurement Conclusion
Future Work
Our approch Prefix Search, Complete
Search Sequential Serach, n-Gram Method Longest
Common Prefix Longest Common String
String search by n-Gam
Example searcg by digram (n2)
Calculus of table of n-gram T
s1sl-n si Sign(sisin)
Client send S Dauphine size l to sever
On server
Sign j d
si ? ne si not in T ? Jump l-1 7 positions
.. up ? ne up in T ? Jump j1 4
positions ? S found 6 comparisons 5
shifts Complexity O (m-l)/ (l-n1) m size of
record
S(au) 2 0
S(da) 1 0
S(up) 3 0
S(hi) 5 0
S(ph) 4 0
S(in) 6 0
S(ne)
T Meta Table n-Gram
31Introduction SDDS-2005 Algebraic Signatures
Cumulative Algebraic Signatures String
Matching Performance Measurement Conclusion
Future Work
Our approch Prefix Search, Complete Search String
Serach Longest Common Prefix. Longest Common
String
Longest Common Prefix
Example
Record P in S1 Equality in p1, p2, p4,
p8. Inequality in p16, p12. Equality in p10,
p11. ? L11 Record P in S2 Equality in
p1, p2, p4, p8, p16. Inequality in p25, p21, p19,
p17. ? L16
Client send String S to servers S1, S2.
On servers
Prefix S received by servers
UNIVERSITE PARIS DAUPHINE
UNIVERSITE DAUPHINE
Record P In S1
Record P In S2
UNIVERSITE PARIS9 DAUPHINE
S1 S2 send L to client.
Client select L 16
Collision resolution on the servers
Complexity Best case O(1) Worse case O(Log2
L-L). (L, L size of successif longest prefix)
32Introduction SDDS-2005 Algebraic Signatures
Cumulative Algebraic Signatures String
Matching Performance Measurement Conclusion
Future Work
Our approch Prefix Search, Complete Search String
Serach Longest Common Prefix Longest Common
String
Longest Common string
Example
Client send string S to servers S1 (P, P in S1).
On the server
BIENVENUES LABORATOIRE CERIA DAUPHINE
String S
......
Record P
LABORATOIRE RECHECHE INFORMATIQUE
L12
Record P
LABORATOIRE CERIA DAUPHINE UNIVERSITY FRANCE
Collisions resolution on the server
L 27
S1 sends L27 to client.
Complexity per record Best case
O(1) 1/N Worse case O(nl). (N size of
bucket, n size of record, l size of string)
33PLAN
P L A N
1. Introduction
2. SDDS-2005
- Algebraic signatures
- Backup Scheme, Concurency Update
4. Algebraic Cumulative Signatures
5. String Matching in SDDS-2005
6. Performance Measurements
7. Conclusion Future Work
34Introduction SDDS-2005 Algebraic
Signatures Cumulative Algebraic
Signatures Performance Measurement Conclusion
Future Work
Hardaware Configuration File Storage Update
Analysis Encoding/ Decoding Data String Search
Experiments
Hardware Configuration
- 1.8 GHz P4 Servers
- 800 MHz P3 Client
- 500 MHz P3 Name Server
- 1 Gbs Ethernet
- Windows 2K Server OS
35Introduction SDDS-2005 Algebraic
Signatures Cumulative Algebraic
Signatures Performance Measurement Conclusion
Future Work
Hardaware onfiguration File Storage Update
Measurements Encoding/ Decoding Data String
Search Experiments
File Storage Performance Analysis
Storage Time (ms)
Scalability
Number of record
36Introduction SDDS-2005 Algebraic
Signatures Cumulative Algebraic
Signatures Performance Measurement Conclusion
Future Work
Hardaware onfiguration File Storage Update
Measurements Encoding/ Decoding Data String
Search Experiments
SHA-1/ Algebraic Signatures
SHA-1 Signatures 20 Bytes Algebraic Signatures
4 Bytes
37Introduction SDDS-2005 Algebraic
Signatures Cumulative Algebraic
Signatures Performance Measurement Conclusion
Future Work
Hardaware onfiguration File Storage Update
Measurements Encoding/ Decoding Data String
Search Experiments
Performance of Backup command
1 st request Signature Calculus (375 ms)
Storage of all pages (4922 ms)
2nd Request No bucket change (375 ms)
3rd Request 1 page changed (375 16 ms)
38Introduction SDDS-2005 Algebraic
Signatures Cumulative Algebraic
Signatures Performance Measurement Conclusion
Future Work
Hardaware onfiguration File Storage Update
Measurements Encoding/ Decoding Data String
Search Experiments
Update Measurements
Update Update Time (with change) (ms) Update Time (No change) (ms)
Normal Update 0.92 0.28
Blind Update 0.74 0.20
Avoid lost updates
39Introduction SDDS-2005 Algebraic
Signatures Cumulative Algebraic
Signatures Performance Measurement Conclusion
Future Work
Hardaware Configuration File Storage Update
Measurement Encoding/ Decoding Data String Search
Experiments
Cost of Encoding/ Decoding Data
Encoding 0.045 ms/KB Decoding 0.042
ms/KB Insertion 0.25 ms/ KB Search
0.28ms/ KB
14
16
- Protection against incidental viewing of data in
severs - String matching possibilities
40Introduction SDDS-2005 Algebraic
Signatures Cumulative Algebraic
Signatures Performance Measurement Conclusion
Future Work
Hardaware Configuration File Storage Update
Measurements Encoding/ Decoding Data String
Search Experiments
Response Time For String Matching
Record Record String Offset Time Position
Size (B) Size(B) (B) (msec) 1
20 5 13 0.44 1 100 20 70 0.68 1
100 20 80 0.682 100 100 20 70 72.5
100 100 30 70 71.7 200
100 20 70 165
Record Record Préfix Time Position Size
(B) Size (B) (msec) 1 100 20 0.369 100
250 20 37.8 100 250 35 37.78 200 250
20 71.3 300 250 20 120.53 500 250 20
197.5
String Match
Prefix Match
Key search 0.27 ms/ KB
41Introduction SDDS-2005 Algebraic
Signatures Cumulative Algebraic
Signatures Performance Measurement Conclusion
Future Work
Hardaware Configuration File Storage Update
Measurements Encoding/ Decoding Data String
Search Experiments
Response Time For String Matching
Record position Size of inserted data(B) Size last Record (B) Size prefix to search (size prefix found) (B) Time to search (ms)
1 50 50 25 (20) 0.372
1/100 250 50 25 (20) 43
49/100 250 50 25 (20) 46.2
99/100 250 50 25 (20) 47
Longest Prefix Match
Record position Size of inserted data (B) Size of last record (prefix) (B) Size string to search( Size string found) (B) Offset string in record (B) Time to search (ms)
1 100 100 22 (20) 70 0.62
100 100 22 10 (5) 10 290
100 100 45 15 (10) 10 470
100 120 45 15 (10) 10 565
Longest Common String Match
42Introduction SDDS-2005 Algebraic
Signatures Cumulative Algebraic
Signatures Performance Measurement Conclusion
Future Work
Hardaware Configuration File Storage Update
Measurements Encoding/ Decoding Data String
Search Experiments
n-Gram search Measurements
Searh Time (ms)
Digram Search Cumulative Search
Search in 1 record (300 Bytes)
String Search Size (bytes)
Up to (l-1) times faster than Cumulative search
algorithm l Size of string to search
43Introduction SDDS-2005 Algebraic
Signatures Cumulative Algebraic
Signatures Performance Measurement Conclusion
Future Work
Hardaware Configuration File Storage Update
Measurements Encoding/ Decoding Data String
Search Experiments
Comparison (String Matching)
Cumulative signature reduced string matching times
44Introduction SDDS-2005 Algebraic
Signatures Cumulative Algebraic
Signatures Performance Measurement Conclusion
Future Work
Hardaware Configuration File Storage Update
Measurements Encoding/ Decoding Data String
Search Experiments
Comparison (String Matching)
Search Time (ms)
XOR Algebraic Signatures Karp Rabin Cumulative
Signatures
Records Number
String Search for data lt 32B (left) and data gt
32B (right)
Gain of Cumulative Search - Previous
Algorithms (Karp-Rabin) Saving 5 for string
lt 32B Saving 20 for string gt 32B - No
encoded data (Saving of 30).
45Introduction SDDS-2005 Algebraic
Signatures Cumulative Algebraic
Signatures Performance Measurement Conclusion
Future Work
Hardaware Configuration File Storage Update
Measurements Encoding/ Decoding Data String
Search Experiments
Example of préfix search
Dialog at the Client
Prefix Search operation
Result received from Servers
46PLAN
P L A N
1. Introduction
2. SDDS-2005
- Algebraic signatures
- Backup Scheme, Concurency Update
4. Algebraic Cumulative Signatures
5. String Matching in SDDS-2005
6. Performance Measurements
7. Conclusion Future Work
47Introduction SDDS-2005 Algebraic
Signatures Cumulative Algebraic
Signatures Performance Measurement Conclusion
Future Work
Conclusion Future Work Thanking
Conclusion
- Algebraic Signature are an Efficient Basis for
New SDDS-2005 Capabilites (Backup, Updates) - Cumulative Algebraic Signatures are Efficient for
Incidental View Protection String Search - Prototype SDDS-2005
- Up and running
- submitted to DBWorld
- Free download at http\ceria.dauphine.fr
48Introduction SDDS-2005 Algebraic
Signatures Cumulative Algebraic
Signatures Performance Measurement Conclusion
Future Work
Conclusion Future Work Thanking
Future Work
- More on n-Grams.
- Altrenative Signature Schemes
- (Inverse Signatures using Horner Scheme)
- Delta Compression using Cumulative signatures.
- Protection against silent corruption.
- Alternative GF multiplication methods
- (Prefetch, Broder, Tables of ?)
- Collision Resolution of on the clients.
- SDDS-2005 as part of Virtual Repository of eGov
documents (EGov Project).
49Introduction SDDS-2005 Algebraic
Signatures Cumulative Algebraic
Signatures Performance Measurement Conclusion
Future Work
Future Work Conclusion Thanking
Acknowledgements
- Work partly supported by
- CEE Project eGov
- MS Research
- CEE ICONS Project
- IBM Almaden Res. Cntr
50Introduction SDDS-2005 Algebraic
Signatures Cumulative Algebraic
Signatures Performance Measurement Conclusion
Future Work
Conclusion Future Work Thanking
Thank you for your attention
Riad MOKADEM Riad.Mokadem_at_dauphine.fr