How to compile searching software so that it is impossible to reverse-engineer. - PowerPoint PPT Presentation

About This Presentation
Title:

How to compile searching software so that it is impossible to reverse-engineer.

Description:

How to compile searching software so that it is impossible to reverse-engineer. (Private Keyword Search on Streaming Data) Rafail Ostrovsky William ... – PowerPoint PPT presentation

Number of Views:127
Avg rating:3.0/5.0
Slides: 39
Provided by: ucl110
Learn more at: http://web.cs.ucla.edu
Category:

less

Transcript and Presenter's Notes

Title: How to compile searching software so that it is impossible to reverse-engineer.


1
How to compile searching software so that it is
impossible to reverse-engineer.
(Private Keyword Search on Streaming Data)
Rafail Ostrovsky William
Skeith UCLA
(patent pending)
2
MOTIVATION Problem 1.
  • Each hour, we wish to find if any of hundreds of
    passenger lists has a name from Possible
    Terrorists list and if so his/hers itinerary.
  • Possible Terrorists list is classified and
    should not be revealed to airports
  • Tantalizing question can the airports help (and
    do all the search work) if they are not allowed
    to get possible terrorist list?

PROBLEM 1 Is it possible to design mobile
software that can be transmitted to all airports
(including potentially revealing this software to
the adversary due to leaks) so that this software
collects ONLY information needed and without
revealing what it is collecting at each node?
Non-triviality requirement must send back
only needed information, not everything!
3
MOTIVATION Problem 2.
  • Looking for malicious insiders and/or terrorists
    communication
  • (I) First, we must identify some signature
    criteria (rules) for suspicious behavior
    typically, this is done by analysts.
  • (II) Second, we must detect which nodes/stations
    transmit these signatures.
  • Here, we want to tackle part (II).

Public networks
PROBLEM 2 Is it possible to design software that
can capture all messages (and network locations)
that include secret/classified set of rules?
Key challenge the software must not reveal
secret rules. Non-triviality requirement the
software must send back only locations and
messages that match given rules, not
everything it sees.
4
What we want
Punch line we can send executable code
publicly. (it wont reveal its secrets!)
5
Current Practice
  • Continuously transfer all data to a secure
    environment.
  • After data is transferred, filter in the
    classified environment, keep only small fraction
    of documents.

6
Current practice
  • Classified Environment

Filter
Storage
? D(1,3)?D(1,2)? D(1,1)?
D(3,1)
D(1,1)
D(1,2)
D(2,2)
D(2,3)
D(3,2)
D(2,1)
D(1,3)
D(3,3)
?D(2,3)?D(2,2) ?D(2,1)?
Filter rules are written by an analyst and are
classified!
? D(3,3) ? D(3,2) ?D(3,1) ?
Amount of data that must be transferred to a
classified environment is enormous!
7
Current Practice
  • Drawbacks
  • Communication
  • Processing
  • Cost and timeliness

8
How to improve performance?
  • Distribute work to many locations on a network,
    where you decide on the fly which data is
    useful
  • Seemingly ideal solution, but
  • Major problem
  • Not clear how to maintain security, which is the
    focus of this technology.

9
  • Classified Environment

Storage E (D(1,2)) E (D(1,3))
Filter
? D(1,3)? D(1,2)?D(1,1)?
Decrypt
Storage E (D(2,2))
Filter
? D(2,3)?D(2,2)?D(2,1)?
Storage D(1,2) D(1,3) D(2,2)
Storage
Filter
?D(3,3)?D(3,2)?D(3,1)?
10
  • Example Filters
  • Look for all documents that contain special
    classified keywords (or string or data-item
    and/or do not contain some other data), selected
    by an analyst.
  • Privacy
  • Must hide what rules are used to create the
    filter
  • Output must be encrypted

11
More generally
  • We define the notion of Public Key Program
    Obfuscation
  • Encrypted version of a program
  • Performs same functionality as un-obfuscated
    program, but
  • Produces encrypted output
  • Impossible to reverse engineer
  • A little more formally

12
Public Key Program Obfuscation
  • Can compile any code into a obfuscated code with
    small storage.
  • Think of the Compiler as a mapping
  • Source code ? Smart Public-Key Encryption with
    initial Encrypted Storage Decryption Key.
  • Non-triviality Sizes of complied program
    encrypted storage encrypted output are not much
    bigger, compared to uncomplied code.
  • Nothing about the program is revealed, given
    compiled code storage.
  • Yet, Someone who has the decryption key get
    recover the original output.

13
Privacy
14
Related Notions
  • PIR (Private Information Retrieval)
    CGKS,KO,CMS
  • Keyword PIR KO,CGN,FIPR
  • Cryptographic counters KMO
  • Program Obfuscation BGIRSVY
  • Here output is identical to un-obfuscated
    program, but in our case it is encrypted.
  • Public Key Program Obfuscation
  • A more general notion than PIR, with lots of
    applications

15
What do we want?
Filter
Storage E (D(1,2)) E (D(1,3))
?D(1,3)?D(1,2)?D(1,1)?
2 requirements correctness only matching
documents are saved, nothing else. efficiency
the decoding is proportional to the length of the
buffer, not the size of the entire stream.
Conundrum Complied Filter Code is not allowed to
have ANY branches (i.e. any if then else
executables). Only straight-line code is allowed!
16
Simplifying Assumptions for this Talk
  • All keywords come from some poly-size dictionary
  • Truncate documents beyond a certain length

17
Sneak peak the compiled code
  • Suppose we are looking for all documents that
    contain some secret word from Webster dictionary.
  • Here is how it looks to the adversary For each
    document, execute the same code as follows

18
Lookup encryptions of all words appearing in the
document and multiply them together. Take this
value and apply a fixed formula to it to get
value g.
w1 E()
w2 E()
w3 E()
w4 E()
w5 E()
D
Dictionary
. . .
wn-2 E()
wn-1 E()
wn E()
g
(,,) (,,) (,,) (,,) (,,) (,,) (,,) (,,) (,,) (,,)
Small Output Buffer
19
How should a solution look?
20
This is matching document 2

This is a Non-matching document
This is matching document 1
This is matching document 3


This is a Non-matching document
This is a Non-matching document




21
How do we accomplish this?
22
Reminder PKE
  • Key-generation(1k) ? (PK, SK)
  • E(PK,m,r) ? c
  • D(c, SK) ? m
  • We will use PKE with additional properties.

23
Several Solutions based on Homomorphic Public-Key
Encryptions
  • For this talk Paillier Encryption
  • Properties
  • E(x) is probabilistic, in particular can encrypt
    a single bit in many different ways, s.t. any
    instances of E(0) and any instance of E(1) can
    not be distinguished.
  • Homomorphic i.e., E(x)E(y) E(xy)

24
Using Paillier Encryption
  • E(x)E(y) E(xy)
  • Important to note
  • E(0)c E(0)E(0)
  • E(00.0) E(0)
  • E(1)c E(1)E(1)
  • E(111) E(c)
  • Assume we can somehow compute an encrypted value
    v, where we dont know what v stands for, but
    vE(0) for un-interesting documents and vE(1)
    for interesting documents.
  • Whats vc ? It is either E(0) or E(C) where we
    dont know which one it is.

25
w1 E(0)
w2 E(1)
w3 E(0)
w4 E(0)
w5 E(1)
D
g E(0) if there are no matching words g E(c)
if there are c matching words
Dictionary
gD E(0) if there are no matching words gD
E(cD) if there are c matching words Thus if we
keep gE(c) and gDE(cD), we can calculate D
exactly.
. . .
wn-2 E(1)
wn-1 E(0)
wn E(0)
(g,gD)
E(0) E(0) E(0) E(0) E(0) E(0) E(0) E(0) E(0) E(0)
Output Buffer
26
Heres another matching document
  • Collisions cause two problems
  • Good documents are destroyed
  • 2. Non-existent documents could be fabricated

This is matching document 1
This is matching document3

This is matching document 2
27
  • Well make use of two combinatorial lemmas

28
(No Transcript)
29
Combinatorial Lemma 1
  • Claim color survival games succeeds with
    probability gt 1-neg(g)

30
How to detect collisions?
  • Idea append a highly structured, (yet random)
    short combinatorial object to the message with
    the property that if 2 or more of them collide
    the combinatorial property is destroyed.
  • ? can always detect collisions!

31
  • 100001100010010100001010010

010001010001100001100001010
010100100100010001010001010

100100010111100100111010010
32
Combinatorial Lemma 2
Claim collisions are detected with
probability gt 1 - exp(-k/3)
33
We do the same for all documents!
34
For every document in the stream do the same
Lookup encryptions of all words appearing in the
document and multiply them together ( g).
w1 E()
w2 E()
w3 E()
w4 E()
w5 E()
D
Dictionary
Compute gD and f(g)
. . .
multiply (g,gD,f(g))into g randomly chosen
locations
wn-2 E()
wn-1 E()
wn E()
(g,gD,f(g))
(,,) (,,) (,,) (,,) (,,) (,,) (,,) (,,) (,,) (,,)
Small Output Buffer
35
Overflow how to always collect at least m
items (with arbitrary overflow of matching
documents)
  • Idea create a logarithmic (in stream size)
    number of original buffers.
  • First buffer is processed for every stream item
  • Second buffer takes every item with probability ½
  • Third buffer takes every item with (independent)
    probability ¼
  • ith buffer with probability 1/2i
  • Key point If number of documents gtM, at least
    one buffer will get O(M) matching documents!

36
Comparison of our work to Bethencourt, Song,
Waters 06
  • OS-05
  • Buffer size to store m items O(m log m)
  • Efficiency decoding time is proportional to the
    buffer size.
  • BSW-06
  • Buffer size to store m items O(m)
  • Efficiency decoding time is proportional to the
    length of the entire stream.

37
More from the paper that we dont have time to
discuss
  • Reducing program size below dictionary size
    (using ? Hiding from CMS)
  • Queries containing AND (using BGN machinery)
  • Eliminating negligible error (using perfect
    hashing)
  • Scheme based on arbitrary homomorphic encryption
  • Extending to words not from dictionary (with
    small error prob.)

38
Conclusions
  • We introduced Private searching on streaming data
  • More generally Public key program obfuscation --
    more general than PIR, or cryptographic counters
  • Practical, efficient protocols
  • Eat your cake and have it too ensure that only
    useful documents are collected.
  • Many possible extensions and lots of open
    problems
  • THANK YOU!
Write a Comment
User Comments (0)
About PowerShow.com