Title: ICDE 2002
1Lecture 17 Executing SQL over Encrypted Data in
Database-Service-Provider Model Professor Chen Li
2Executing SQL over Encrypted Data in
Database-Service-Provider Model
- Hakan Hacigumus
- University of California, Irvine
- Bala Iyer
- IBM Silicon Valley Lab.
- Chen Li
- University of California, Irvine
- Sharad Mehrotra
- University of California, Irvine
- SIGMOD 2002, Madison, Wisconsin, USA
3What do we want to do?
Server
User Data
Encrypted User Database
Untrusted Administrator
- We want to store the data on a server
- But the problem is we do not trust the server
for sensitive information! - encrypt the data and store it
- but still be able to run queries over the
encrypted data - do most of the work at the server
- If the server is trusted, ICDE 2002
4Why is it important anyway?
Server
Encrypted User Database
Untrusted Administrator
(Untrusted) Application Service Provider
- Application Service Provider (ASP) Model for
Database - DB management transferred to service provider for
- backup, administration, restoration, space
management, upgrades etc. - use the database as a service provided by an
ASP - use SW, HW, human resources of ASP, instead of
your own
5Talk Outline
- System Architecture
- How to create Metadata Relational Encryption and
Storage Model - Query Decomposition and Relational Operators
- Query Decomposition Examples
- Experimental Results
- Conclusion
6System Architecture
Server Site
Client Site
Encrypted Results
Client Side Query
?
Server Side Query
Service Provider
Original Query
?
Actual Results
?
7Talk Outline
- System Architecture
- How to create Metadata Relational Encryption and
Storage Model - Query Decomposition and Relational Operators
- Query Decomposition Examples
- Experimental Results
- Conclusion
8Talk Outline
- System Architecture
- How to create Metadata Relational Encryption and
Storage Model - Query Decomposition and Relational Operators
- Query Decomposition Examples
- Experimental Results
- Conclusion
9Relational Encryption
etuple N_ID S_ID P_ID
fErf!Q!!vddfgtgtlt/ 50 1 10
F3wgfErf! 65 2 10
gfsdf343vltl 50 2 20
33wgfs! 65 2 20
NAME SALARY PID
John 50000 2
Marry 110000 2
James 95000 3
Lisa 105000 4
Server Site
- Store an encrypted string etuple for each
tuple in the original table - This is called row level encryption
- Any kind of encryption technique can be used
- Blowfish encryption algorithm is used for this
work - Create an index for each (or selected)
attribute(s) in the original table
10Building the IndexPartition and Identification
Functions
- Partition function divides domain values into
partitions (buckets) - Partition (R.A) 0,200, (200,400,
(400,600, (600,800, (800,1000 - partitioning function has an impact on
performance as well as privacy
11Mapping Functions
- Mapping function maps a value v in the domain of
attribute A to the id of the partition which
value v belongs to - e.g. MapR.A( 250 ) 7, MapR.A( 620 ) 1
12Storing Encrypted Data
- R lt A, B, C gt ? RS lt etuple, A_id, B_id,
C_id gt - etuple encrypt ( A B C )
- A_id MapR.A( A ), B_id MapR.B( B ), C_id
MapR.C( C ) -
Table EMPLOYEES
Table EMPLOYEE
Etuple N_ID S_ID P_ID
fErf!Q!!vddfgtgtlt/ 50 1 10
F3wgfErf! 65 2 10
gfsdf343vltl 50 2 20
33wgfs! 65 2 20
NAME SALARY PID
John 50000 2
Marry 110000 2
James 95000 3
Lisa 105000 4
13Talk Outline
- System Architecture
- How to create Metadata Relational Encryption and
Storage Model - Query Decomposition and Relational Operators
- Query Decomposition Examples
- Experimental Results
- Conclusion
14Talk Outline
- System Architecture
- How to create Metadata Relational Encryption and
Storage Model - Query Decomposition and Relational Operators
- Query Decomposition Examples
- Experimental Results
- Conclusion
15Mapping Conditions
- Q SELECT name, pname FROM emp, proj
- WHERE emp.pidproj.pid AND salary gt 100k
- Server stores attribute indices determined by
mapping functions - Client stores metadata and utilizes that to
translate the query - Conditions
- Condition ? Attribute op Value
- Condition ? Attribute op Attribute
- Condition ? (Condition ? Condition) (Condition
? Condition) - (not Condition)
16Mapping Conditions (2)
- Example
- Attribute Value
- Mapcond( A v ) ? AS MapA( v )
- Mapcond( A 250 ) ? AS 7
17Mapping Conditions (3)
- Attribute1 Attribute2
- Mapcond( A B ) ? ?N (AS identA( pk ) ? BS
identB( pl )) -
- where N is pk ? partition (A), pl ? partition
(B), pk ? pl ? ? -
Partitions A_id
0,100 2
(100,200 4
(200,300 3
Partitions B_id
0,200 9
(200,400 8
- C A B ? C (AS 2 ? BS 9)
- ? (AS 4 ? BS 9)
- ? (AS 3 ? BS 8)
18Talk Outline
- System Architecture
- How to create Metadata Relational Encryption and
Storage Model - Query Decomposition and Relational Operators
- Query Decomposition Examples
- Experimental Results
- Conclusion
19Relational Operators over Encrypted Relations
- Partition the computation of the operators across
client and server - Compute (possibly) superset of answers at the
server - Filter the answers at the client
- Objective minimize the work at the client and
process the answers as soon as they arrive
without requiring storage at the client - Operators studied
- Selection
- Join
- Grouping and Aggregation
- Sorting
- Duplicate Elimination
- Set Difference
- Union
- Projection
20Selection Operator
?c( R ) ?c( D (?SMapcond(c)( RS ) )
Example
21Join Operator
R c T ?c( D ( RS SMapcond(c) TS )
Example
C A B ? C (A_id 2 ? B_id 9) ?(A_id
4 ? B_id 9) ?(A_id 3 ? B_id 8)
Partitions A_id
0,100 2
(100,200 4
(200,300 3
Partitions B_id
0,200 9
(200,400 8
22Talk Outline
- System Architecture
- How to create Metadata Relational Encryption and
Storage Model - Query Decomposition and Relational Operators
- Query Decomposition Examples
- Experimental Results
- Conclusion
23Talk Outline
- System Architecture
- How to create Metadata Relational Encryption and
Storage Model - Query Decomposition and Relational Operators
- Query Decomposition Examples
- Experimental Results
- Conclusion
24Query Decomposition
- Q SELECT name, pname FROM emp, proj
- WHERE emp.pidproj.pid AND salary gt 100k
Client Query
25Query Decomposition (2)
Client Query
?name,pname
Client Query
?name,pname
e.pid p.pid
?salary gt100k
D
e.pid p.pid
?salary gt100k
D
D
E_PROJ
D
?s_id 1 v s_id 2
E_PROJ
E_EMP
E_EMP
Server Query
Server Query
26Query Decomposition (3)
Client Query
Client Query
?name,pname
?name,pname
?salary gt100k ? e.pid p.pid
e.pid p.pid
?salary gt100k
D
D
D
e.p_id p.p_id
?s_id 1 v s_id 2
E_PROJ
?s_id 1 v s_id 2
E_PROJ
E_EMP
E_EMP
Server Query
Server Query
27Query Decomposition (4)
Client Query
?name,pname
- Q SELECT name, pname FROM emp,
proj - WHERE emp.pidproj.pid AND salary gt
100k - QS SELECT e_emp.etuple, e_proj.etuple FROM
e_emp, e_proj - WHERE e.p_idp.p_id AND
s_id 1 OR s_id 2 - QC SELECT name, pname FROM temp
- WHERE emp.pidproj.pid AND
salary gt 100k
?salary gt100k ? e.pid p.pid
D
e.p_id p.p_id
E_PROJ
?s_id 1 v s_id 2
E_EMP
Server Query
28Talk Outline
- System Architecture
- How to create Metadata Relational Encryption and
Storage Model - Query Decomposition and Relational Operators
- Query Decomposition Examples
- Experimental Results
- Conclusion
29Talk Outline
- System Architecture
- How to create Metadata Relational Encryption and
Storage Model - Query Decomposition and Relational Operators
- Query Decomposition Examples
- Experimental Results
- Conclusion
30Experimental Evaluation
- Data
- TPC-H database, scale factor 0.1
- Queries
- Based on TPC-H Queries Q6 and Q3
- Partitioning Strategy
- Equi-depth histograms for the first set of
experiments - Equi-width histograms for the second set of
experiments
31Effect of Number of Buckets in Non-Join Query
- Client and communications costs decreases with
increasing number of buckets due to better
filtering at the server - Server cost doesnt decrease as much, table scan
remains best choice in the optimizer
32Effect of Number of Buckets in Non-Join Query
- Single Server Server is trusted and performs all
operations including decryption on site - Shows that proposed query execution protocol
doesnt introduce significant overhead
33Effect of Number of Buckets in Join Query
- Sharp decrease in query response time with
increase in the number of buckets due to better
filtering at the server - Client side query response time is greater than
server side query response time due to dominant
decryption cost on the query (second graph)
34Effect of Number of Buckets in Join Query
- Single Server Server is trusted and performs all
operations including decryption on site - Consistent with the previous results showing
proposed query execution protocol doesnt
introduce significant overhead
35Conclusion
- ASP model is a promising solution for enterprise
computing in Internet era - We studied data privacy problem
- in the context of ASP model
- when the ASP is not trusted
- Proposed solution
- encrypts data, creates coarse indexes and
stores the data at ASP - allows only data owner to decrypt the data
- With query decomposition
- most of query execution performed at ASP
- client only performs encryption/decryption,
filtering and continues to benefit from ASP model