How to compile searching software so that it is impossible to reverse-engineer. - PowerPoint PPT Presentation

About This Presentation

Title:

How to compile searching software so that it is impossible to reverse-engineer.

Description:

How to compile searching software so that it is impossible to reverse-engineer. (Private Keyword Search on Streaming Data) Rafail Ostrovsky William ... – PowerPoint PPT presentation

Number of Views:127

Avg rating:3.0/5.0

Slides: 39

Provided by: ucl110

Learn more at: http://web.cs.ucla.edu

Category:

more less

Transcript and Presenter's Notes

Title: How to compile searching software so that it is impossible to reverse-engineer.

1
How to compile searching software so that it is
impossible to reverse-engineer.
(Private Keyword Search on Streaming Data)
Rafail Ostrovsky William
Skeith UCLA
(patent pending)
2
MOTIVATION Problem 1.

Each hour, we wish to find if any of hundreds of
passenger lists has a name from Possible
Terrorists list and if so his/hers itinerary.
Possible Terrorists list is classified and
should not be revealed to airports
Tantalizing question can the airports help (and
do all the search work) if they are not allowed
to get possible terrorist list?

PROBLEM 1 Is it possible to design mobile
software that can be transmitted to all airports
(including potentially revealing this software to
the adversary due to leaks) so that this software
collects ONLY information needed and without
revealing what it is collecting at each node?
Non-triviality requirement must send back
only needed information, not everything!
3
MOTIVATION Problem 2.

Looking for malicious insiders and/or terrorists
communication
(I) First, we must identify some signature
criteria (rules) for suspicious behavior
typically, this is done by analysts.
(II) Second, we must detect which nodes/stations
transmit these signatures.
Here, we want to tackle part (II).

Public networks
PROBLEM 2 Is it possible to design software that
can capture all messages (and network locations)
that include secret/classified set of rules?
Key challenge the software must not reveal
secret rules. Non-triviality requirement the
software must send back only locations and
messages that match given rules, not
everything it sees.
4
What we want
Punch line we can send executable code
publicly. (it wont reveal its secrets!)
5
Current Practice

Continuously transfer all data to a secure
environment.
After data is transferred, filter in the
classified environment, keep only small fraction
of documents.

6
Current practice

Classified Environment

Filter
Storage
? D(1,3)?D(1,2)? D(1,1)?
D(3,1)
D(1,1)
D(1,2)
D(2,2)
D(2,3)
D(3,2)
D(2,1)
D(1,3)
D(3,3)
?D(2,3)?D(2,2) ?D(2,1)?
Filter rules are written by an analyst and are
classified!
? D(3,3) ? D(3,2) ?D(3,1) ?
Amount of data that must be transferred to a
classified environment is enormous!
7
Current Practice

Drawbacks
Communication
Processing
Cost and timeliness

8
How to improve performance?

Distribute work to many locations on a network,
where you decide on the fly which data is
useful
Seemingly ideal solution, but
Major problem
Not clear how to maintain security, which is the
focus of this technology.

Classified Environment

Storage E (D(1,2)) E (D(1,3))
Filter
? D(1,3)? D(1,2)?D(1,1)?
Decrypt
Storage E (D(2,2))
Filter
? D(2,3)?D(2,2)?D(2,1)?
Storage D(1,2) D(1,3) D(2,2)
Storage
Filter
?D(3,3)?D(3,2)?D(3,1)?
10

Example Filters
Look for all documents that contain special
classified keywords (or string or data-item
and/or do not contain some other data), selected
by an analyst.
Privacy
Must hide what rules are used to create the
filter
Output must be encrypted

11
More generally

We define the notion of Public Key Program
Obfuscation
Encrypted version of a program
Performs same functionality as un-obfuscated
program, but
Produces encrypted output
Impossible to reverse engineer
A little more formally

12
Public Key Program Obfuscation

Can compile any code into a obfuscated code with
small storage.
Think of the Compiler as a mapping
Source code ? Smart Public-Key Encryption with
initial Encrypted Storage Decryption Key.
Non-triviality Sizes of complied program
encrypted storage encrypted output are not much
bigger, compared to uncomplied code.
Nothing about the program is revealed, given
compiled code storage.
Yet, Someone who has the decryption key get
recover the original output.

13
Privacy
14
Related Notions

PIR (Private Information Retrieval)
CGKS,KO,CMS
Keyword PIR KO,CGN,FIPR
Cryptographic counters KMO
Program Obfuscation BGIRSVY
Here output is identical to un-obfuscated
program, but in our case it is encrypted.
Public Key Program Obfuscation
A more general notion than PIR, with lots of
applications

15
What do we want?
Filter
Storage E (D(1,2)) E (D(1,3))
?D(1,3)?D(1,2)?D(1,1)?
2 requirements correctness only matching
documents are saved, nothing else. efficiency
the decoding is proportional to the length of the
buffer, not the size of the entire stream.
Conundrum Complied Filter Code is not allowed to
have ANY branches (i.e. any if then else
executables). Only straight-line code is allowed!
16
Simplifying Assumptions for this Talk

All keywords come from some poly-size dictionary
Truncate documents beyond a certain length

17
Sneak peak the compiled code

Suppose we are looking for all documents that
contain some secret word from Webster dictionary.
Here is how it looks to the adversary For each
document, execute the same code as follows

18
Lookup encryptions of all words appearing in the
document and multiply them together. Take this
value and apply a fixed formula to it to get
value g.
w1 E()
w2 E()
w3 E()
w4 E()
w5 E()
D
Dictionary
. . .
wn-2 E()
wn-1 E()
wn E()
g
(,,) (,,) (,,) (,,) (,,) (,,) (,,) (,,) (,,) (,,)
Small Output Buffer
19
How should a solution look?
20
This is matching document 2

This is a Non-matching document
This is matching document 1
This is matching document 3

This is a Non-matching document
This is a Non-matching document

21
How do we accomplish this?
22
Reminder PKE

Key-generation(1k) ? (PK, SK)
E(PK,m,r) ? c
D(c, SK) ? m
We will use PKE with additional properties.

23
Several Solutions based on Homomorphic Public-Key
Encryptions

For this talk Paillier Encryption
Properties
E(x) is probabilistic, in particular can encrypt
a single bit in many different ways, s.t. any
instances of E(0) and any instance of E(1) can
not be distinguished.
Homomorphic i.e., E(x)E(y) E(xy)

24
Using Paillier Encryption

E(x)E(y) E(xy)
Important to note
E(0)c E(0)E(0)
E(00.0) E(0)
E(1)c E(1)E(1)
E(111) E(c)
Assume we can somehow compute an encrypted value
v, where we dont know what v stands for, but
vE(0) for un-interesting documents and vE(1)
for interesting documents.
Whats vc ? It is either E(0) or E(C) where we
dont know which one it is.

25
w1 E(0)
w2 E(1)
w3 E(0)
w4 E(0)
w5 E(1)
D
g E(0) if there are no matching words g E(c)
if there are c matching words
Dictionary
gD E(0) if there are no matching words gD
E(cD) if there are c matching words Thus if we
keep gE(c) and gDE(cD), we can calculate D
exactly.
. . .
wn-2 E(1)
wn-1 E(0)
wn E(0)
(g,gD)
E(0) E(0) E(0) E(0) E(0) E(0) E(0) E(0) E(0) E(0)
Output Buffer
26
Heres another matching document

Collisions cause two problems
Good documents are destroyed
2. Non-existent documents could be fabricated

This is matching document 1
This is matching document3

This is matching document 2
27

Well make use of two combinatorial lemmas

28
(No Transcript)
29
Combinatorial Lemma 1

Claim color survival games succeeds with
probability gt 1-neg(g)

30
How to detect collisions?

Idea append a highly structured, (yet random)
short combinatorial object to the message with
the property that if 2 or more of them collide
the combinatorial property is destroyed.
? can always detect collisions!

100001100010010100001010010

010001010001100001100001010
010100100100010001010001010

100100010111100100111010010
32
Combinatorial Lemma 2
Claim collisions are detected with
probability gt 1 - exp(-k/3)
33
We do the same for all documents!
34
For every document in the stream do the same
Lookup encryptions of all words appearing in the
document and multiply them together ( g).
w1 E()
w2 E()
w3 E()
w4 E()
w5 E()
D
Dictionary
Compute gD and f(g)
. . .
multiply (g,gD,f(g))into g randomly chosen
locations
wn-2 E()
wn-1 E()
wn E()
(g,gD,f(g))
(,,) (,,) (,,) (,,) (,,) (,,) (,,) (,,) (,,) (,,)
Small Output Buffer
35
Overflow how to always collect at least m
items (with arbitrary overflow of matching
documents)

Idea create a logarithmic (in stream size)
number of original buffers.
First buffer is processed for every stream item
Second buffer takes every item with probability ½
Third buffer takes every item with (independent)
probability ¼
ith buffer with probability 1/2i
Key point If number of documents gtM, at least
one buffer will get O(M) matching documents!

36
Comparison of our work to Bethencourt, Song,
Waters 06

OS-05
Buffer size to store m items O(m log m)
Efficiency decoding time is proportional to the
buffer size.

BSW-06
Buffer size to store m items O(m)
Efficiency decoding time is proportional to the
length of the entire stream.

37
More from the paper that we dont have time to
discuss

Reducing program size below dictionary size
(using ? Hiding from CMS)
Queries containing AND (using BGN machinery)
Eliminating negligible error (using perfect
hashing)
Scheme based on arbitrary homomorphic encryption
Extending to words not from dictionary (with
small error prob.)

38
Conclusions

We introduced Private searching on streaming data
More generally Public key program obfuscation --
more general than PIR, or cryptographic counters
Practical, efficient protocols
Eat your cake and have it too ensure that only
useful documents are collected.
Many possible extensions and lots of open
problems
THANK YOU!

Write a Comment

User Comments (0)