Title: Efficient Secure Query Evaluation over Encrypted XML Databases
1Efficient Secure Query Evaluation over Encrypted
XML Databases
- Wendy Hui Wang
- Laks V.S. Lakshmanan
- University of British Columbia, Canada
2Outline
- Introduction
- Design of metadata
- Secure and efficient query processing
3Database-as-Service (DAS) Model
- Data owner
- Small business with limited budget (e.g., an
online art gallery owner) - Owns an XML database of large size (e.g., a
database contains the information of paintings
customers) - Cannot afford a suitable database server
- More cost effective hosts the database on a
third-party remote server - E.g., Caspio web database service provider
4Security Concerns in DAS Model
- Data owner
- Does NOT trust the server
- Protects the sensitive information in the
database - Individual XML element with its content
(structure of the subelements, data values,
etc..) - E.g., the customers financial information
-
- Association between data values
- E.g., the customers name and the paintings
he/she purchased
FinancialAccount
visa
mastercard
visa
5Database-as-Service Model (Cont.)
- Data Owner
- Stores the encrypted database on the server
- Keeps decryption keys to himself
- Server
- Provides data storage query engine as services
- Doesnt have decryption keys
6The Queries by Data Owner
- Remotely sent by data owners handheld devices
- The answers are a very small portion of the
database - E.g., The name of paintings that Andy bid for
- The answers are post-processed on the handheld
devices - The devices are installed with decryptor and
query engine - Limited bandwidth
- Limited memory and processing power
7Naïve Method of Query Processing
- Returns the whole encrypted database back to the
client - Disadvantages
- Expensive cost of data transportation, decryption
and query post-processing - May exceed the computational capabilities of
handheld devices
Encrypted XML Database
Untrusted Server
XML Decryptor
Client
Query Executor
Answer of Query
8Another Option for Query Processing
- Encrypts tags data values in the database
individually - E.g.,
- Tags values in the query are encrypted as the
same as in database - E.g., purchasecnameAndy/pname
- Query processing is more efficient than naïve
method - But there exists security breach!
- E.g., the attacker knows Andy is the biggest
customer of the art gallery. Then the encrypted
value on customer that is of the largest of
occurrences must correspond to Andy.
purchase
purchase
cname
cname
pname
pname
Andy
Lily
A
Lily
purchasecnameA/pname
9Our Goals
- Security
- Guarantee no leakage of sensitive information to
the untrusted server/disk - Efficient query evaluation
- The server returns ONLY the portions of database
that is relevant to the data owners query
10Our Approach
Query Executor
Encryption blocks relevant to Q
purchases
Qs
Metadata
purchase
purchase
cname
pname
cname
pname
Andy
Lily
Betty
Reflection
purchase
Untrusted Server
Encrypted XML Database
cname
pname
Lily
Andy
Client
XML Decryptor
Query Translator
Query Q
Query Executor
//purchase//cnameAndy/pname
Answer of Query Q
Lily
11Our Contributions
- Security constraints (see paper)
- Formal definition of attack model and security
- Construction of the secure encryption scheme (see
paper) - Finding an optimal secure encryption scheme is
NP-hard - Design of the metadata on the server
- Efficient and secure query processing
12Outline
- Introduction
- Design of metadata
- Structural index
- Value index
- Secure and efficient query processing
13Structural Index
- Purpose for efficient processing of tags and
XPath predicates (/, //,, sibings, etc..) in
the query - The interval index of the element
- Each element is assigned an interval (start end)
- For parent u and child v, ustart lt vstart lt
vend lt uend - The intervals of adjacent nodes dont overlap
- The structural index
- Index table entry lt(encrypted) tag, the interval
indexgt - Encryption block table entry ltencryption block
ID, the interval indexgt
14Attacks on Structural Index
- By accessing structural index T and encrypted
element E, the attacker constructs the candidates
of the original element that - have the same structural index T
- the size of the encrypted candidate is the same
as that of E
a
0, 1
?
ß
d
d
d
0.85, 0.9
0.55, 0.75
0.83, 0.84
0.8, 0.82
0.2, 0.25
The of such candidates is 1, i.e., the attacker
can reconstruct the structure of the original
element!
Index table
15More Secure Structural Index
- Grouping on the intervals in the index table
- The intervals of the adjacent nodes with the same
tag and encrypted in the same block are grouped
together
Index table after grouping
Index table before grouping
16Security Example of Structural Index
A 0, 1
3 intervals on 5 leaf nodes
C
D
D
D
B
0.2,0.25
0.55, 0.75
0.8, 0.9
Original element
A 0, 1
A 0, 1
B
C
C
D
D
C
C
D
B
B
0.2,0.25
0.55, 0.75
0.8, 0.9
0.2,0.25
0.55, 0.75
0.8, 0.9
Candidate 2
Candidate 1
of Candidates
17Technical Result of Security of Structural
Metadata
- We prove there exists a large number of candidate
databases (including the true hosted database)
such that - By applying any query that is captured by any
security constraint, only the true database
returns the non-empty answer - By looking at the structural index, the
candidates are pairwise indistinguishable
18Related Work of Structural Index
- Efficient Tree Search in Encrypted XML Database
Brinkman et al. 2004 - stores a relational table containing structural
information of the database on server - compromises security of structural information
- XML interval index schemes Al-Khalifa et
al.2002, Chien et.al, 2002, etc.. - Only focus on efficiency. Dont consider security
19Outline
- Design of metadata
- Structural index
- Value index
- Secure and efficient query processing
20Value Index
- Purpose for efficient processing of value-based
constraints in the queries - Every encrypted data value in the database is
indexed in format lt(Encrypted) value, block IDsgt - By accessing value index, the attacker counts the
of occurrences of encrypted values
21Attacks on Value Index
- Attackers aim infer mapping between plaintext
values and corresponding index, consequently
crack the associations between data values - E.g., he wants to find out what are the paintings
Andy has bought. The names of paintings are not
encrypted. But the names of customers are. - His prior knowledge of occurrences of some
data values in the original database - E.g., from the newspaper, he knows Andy has
bought 10 paintings from the art gallery for
charity purpose.
22Attacks on Value Index (Cont.)
- His approach map the encrypted values with
plaintext based on their of occurrences - E.g., A is the only value in index whose
occurrences 10. Then A must map to Andy.
Consequently the attacker finds out which
paintings that Andy has purchased
23Our Solution
- Order preserving encryption with splitting and
scaling (OPESS) - Order preserving efficient query processing
- Splitting and scaling
- Purpose change frequency distribution of
encrypted data values in value index to be
different from that of the frequencies of
original values
24Splitting
- Every plaintext value p is encrypted into
multiple distinct ciphertext values v1, v2..vn
by using distinct keys. ?vip. - ? encrypted value vi, vi ? m-1, m, m1
- Orders preserved. Encrypted values corresponding
to different plaintext values never straddle each
other - Mapping between encrypted values and plaintext
values is unique, i.e., splitting alone is not
secure! - E.g., for data values on attribute CustomerName
Plaintext value of CustomerName
of occurrence
Encrypted value of CustomerName
of occurrence
KA
3
345 12
4
KH
12
Andy
5
KT
5
Betty
SF
55
5
WA
45 9
9
4
Carl
WE
5
25Scaling
- Every encrypted value replicated multiple times
so their occurrences will be scaled up. - By scaling, the mapping between encrypted values
and plaintext values is not unique!
- E.g., for data values on attribute CustomerName
of occurrence
Encrypted value of CustomerName
Plaintext value of CustomerName
of occurrence
3
6
KA
4
6
KH
Andy
Scale to
12
5
KT
6
5
Betty
SF
5
6
4
WA
6
Carl
9
5
WE
6
To map 6 distinct ciphertext values to 3 distinct
plaintext values, of mappings
26Technical Results of Security of Value Index
- We prove there exists a large number of candidate
databases (including the true hosted database)
such that - By applying any query that is captured by any
security constraint, only the true database
returns the non-empty answer - By looking at the value index, the candidates are
pairwise indistinguishable
27Related Work of Value Index
- Efficient processing of queries on encrypted
relational database Hacigumus et al. 2002 - Index on the bucket ID, which represents the
partition to which the unencrypted value belongs - DO NOT consider occurrence-based distribution
model - Order-preserving encryption for numeric data
Agrawal et al. 2004 - Consider a DIFFERENT histogram-based distribution
model - Balancing security and efficiency in untrusted
relational DBMSs Damiani et.al 2003 - Propose indexing scheme by direct encryption and
hashing, and measure the information exposure - For the same occurrence-based distribution model
as ours, their probability of information
exposure can be HIGH - The encryption is NOT order-preserving
28Outline
- Introduction
- Design of metadata
- Secure and efficient query processing
29Example of Query Processing
purchases
Encrypted XML Database
purchase
purchase
cname
pname
cname
mname
purchase
Reflection
Andy
Lily
Betty
cname
mname
Block 1
Block 2
Lily
Andy
Block 1
Block 1
Block 1, 2
Block 1
Join
Structural index
Value index
ßKA AND ß?KT
//a ß/?
Translated Query Qs
Untrusted Server
ß
// a
/?
/?
ßKA AND ß?KT
Query Translator
XML Decryptor
Client
Query Executor
Query Q
//purchasecnameAndy/pname
Lily
30Technical Results of Security of Query Answering
- Let A be any query that is captured by the
security constraints, and Bel(B(A)) be the
attackers belief probability of whether the
hosted database satisfies A - We prove that by answering queries, Bel(B(A))
does not increase
31Experiments
Compared with naïve method, our approach gets gt
80 of savings!
32Conclusion
- We consider the problem of efficient and secure
evaluation of XPath queries on encrypted XML
database - We formally define the attack model and security
(see paper) - We propose
- The security constraints (see paper)
- The secure encryption scheme (see paper)
- The design of secure structural and value index
- The secure and efficient query evaluation
33Future Work
- More prior knowledge
- Tag distribution
- Query workload distribution
- Correlations between data values
- Updates on database
- Definition of security
- Secure encryption scheme
- How to design metadata
34 35Extra Slides
36Similar Application Scenario Untrusted Disk
- The attacker may install the Trojan virus on the
disk where the databases are stored (maybe
locally), and spy the operations on the databases - The disk is not trusted anymore, which is similar
to the untrusted server
37More Discussion on Security of Structural Index
- Attacker still can infer the structural relations
(e.g., parent/child, siblings, etc..) between the
nodes in the encrypted elements - However, he cannot reconstruct the exact content
of original element
38Other Contributions Security Constraints
- Node type constraint
- For sensitive XML element with its content
- E.g., //customer//prescription
- Association type constraint
- For sensitive associations between data values
- E.g., //customer (/name, //purchase//mname)
39Security Definitions
Query Executor
A set of encryption blocks
Encrypted XML Database
Qs
Metadata
Untrusted Server
XML Decryptor
Client
Query Translator
Query Executor
Query Q
Answer of Query Q
40XML Encryption
- W3C standard
- Different encryption granularity
Info
purchases
purchase
purchase
cname
pname
cname
pname
Andy
Lily
Betty
Last supper
41XML Encryption (Cont.)
- Tradeoff exists between encryption granularity
and efficient query processing - Next question is
- Whats the optimal encryption scheme s.t.
- (1) it is secure, and
- (2) it facilitate the query processing?
42Secure Encryption Scheme
- Encryption Scheme S
- Every security constraint is enforced
- ?node type constraint c, ?node that c binds to is
encrypted - E.g., for the security constraint
//customer//prescription, encrypt every
prescription element - ?association type constraint p(q1, q2), nodes
that binds to either p/q1 or p/q2 are encrypted - E.g., for the security constraint //customer
(/name, //purchase//mname), either
//customer/name or //customer//purchase//mname is
encrypted
43Secure Encryption Scheme (Cont.)
- More protection
- Every leaf element containing data values is
encrypted with encryption decoy - Effect every encrypted value is of unique number
of occurrence - E.g., original values (AIDS, AIDS, cold)
are encrypted to be (CCED, PACS, DAEE) - Goal defense of frequency-based attack
44Secure Encryption Scheme (Cont.)
- Theorem the encryption scheme is a secure
encryption scheme - Theorem Finding an optimal secure encryption
scheme is NP-hard in size of security constraints
45Unsafety of Continuous Interval Index
A 1, 10
B 2,5
B 6, 9
C 3, 4
D 7, 8
Original database
A 1, 10
The original structure is revealed by the gap!
B 2,9
B 2,5
B 6, 9
C 3, 4
D 7, 8
46Safety of Discontinuous Interval Index
A 0, 1
B 0.1,0.4
B 0.5,0.9
C 0.2,0.25
D0.55, 0.75
Original database
A 0, 1
B 0.1,0.9
C 0.2,0.25
D0.65, 0.75
D0.55, 0.6
A fake candidate
47Splitting
- E.g., for data values on attribute Age
of occurrence
Ciphertext value of Age
of occurrence ki
Plaintext value of Age
18
10
20
5
30
27
Ki ? m-1, m, m1, m6
48Scaling
of occurrence
Ciphertext value of price
of occurrence
Plaintext value of price
7
101
6
18
10
124
5
189
5
210
20
5
312
7
7
367
30
27
7
371
6
389
To map 8 distinct ciphertext values to 3 distinct
plaintext values, of mappings
49Value Metadata (Cont.)
50Query Processing at Client
- The tags and values are encrypted
- E.g., original query //customer//zipcode12500
//name
- customer a, zipcode ß, name
?
4000
4000 ? ß and ß ? 7000
Zipcode 12500
12500
7000
Translated query //a 4000 ? ß and ß ?
7000 // ?
51Query Processing at Server
structural index
A set of encryption block IDs Bs
value index
(2) Value-based Constraints 4000 ? ß and
ß ? 7000
A set of encryption block IDs Bv
(3) The blocks corresponding to Bs ? Bv are
returned to the client. Each returned block
contains the answers of the original query
52Experiments (Cont.)
- Effects of Various Secure Encryption Schemes
- Optimal encryption scheme always has the best
performance of query evaluation - The performance of approximate scheme is around
1.1-1.3 times of that by optimal encryption
scheme