Title: pebbling and proofs of work
1The Complexity of Pebbling Graphs and Spam
Fighting
Moni Naor WEIZMANN INSTITUTEOF SCIENCE
2Based on
- Cynthia Dwork, Andrew Goldberg, N
- On Memory-Bound Functions for Fighting Spam.
- Cynthia Dwork, N, Hoeteck Wee
- Pebbling and Proofs of Work
3Principal techniques for spam-fighting
- FILTERING
- text-based, trainable filters
- MAKING SENDER PAY
- computation Dwork Naor 92, Back 97, Abadi
Burrows Manasse Wobber 03, DGN 03, DNW05 - human attention Naor 96, Captcha
- micropayments
- NOTE techniques are complementary reinforce
each other!
4Principal techniques for spam-fighting
- FILTERING
- text-based, trainable filters
- MAKING SENDER PAY
- computation Dwork Naor 92, Back 97, Abadi
Burrows Manasse Wobber 03, DGN 03, DNW 05 - human attention Naor 96, Captcha
- micropayments
- NOTE techniques are complementary reinforce
each other!
5Talk Plan
- The proofs of work approach
- DGNs Memory bound functions
- Generating a large random looking table DNW
- Open problems moderately hard functions
6Pricing via processing Dwork-Naor Crypto 92
IDEA If I dont know you prove you spent
significant computational resources (say 10 secs
CPU time), just for me, and just for this message
- automated for the user
- non-interactive, single-pass
- no need for third party or payment infrastructure
7Choosing the function f
- Message m, Sender S, Receiver R and Date and time
d - Hard to compute f(m,S,R,d) - cannot be
amortized - lots of work for the sender
- Should have good understanding of best methods
for computing f - Easy to check z f(m,S,R,d) - little work for
receiver - Parameterized to scale with Moore's Law
- easy to exponentially increase computational
cost, while barely increasing checking cost - Example computing a square root mod a prime vs.
verifying it - x2 y mod P
8Which computational resource(s)?
- WANT corresponds to the same computation time
across machines - computing cycles
- high variance of CPU speeds within desktops
- factors of 10-30
- memory-bound approach Abadi Burrows Manasse
Wobber 03 - low variance in memory lantencies
- factors of 1-4
GOAL design a memory-bound proof of effort
function which requires a large number of cache
misses
9memory-bound model
10memory-bound model
USER
SPAMMER
- CACHE
- cache size at most ½ users main memory
- charge accesses to main memory
- must avoid exploitation of locality
- computation is free
- except for hash function calls
- watch out for low-space crypto attacks
- MAIN MEMORY
- large but slow
- MAIN MEMORY
- may be very very large
11Talk Plan
- The proofs of work approach
- DGNs Memory bound functions
- Generating a large random looking table DNW
- Open problems moderately hard functions
12Path-following approach DGN Crypto 03
- PUBLIC large random table T (2 x spammers cache
size) - PARAMETERS integer L, effort parameter e
- IDEA path is a sequence of L sequential accesses
to T - sender searches collection of paths to find a
good path - collection depends on (m, S, R, d)
- density of good paths 1/2e
- locations in T depends on hash functions H0,,H3
13Path-following approach DGN Crypto 03
- PUBLIC large random table T (2 x spammers cache
size) - PARAMETERS integer L, effort parameter e
- IDEA path is a sequence of L sequential accesses
to T - sender searches collection of paths to find a
good path - OUTPUT (m, S, R, d) description of a good path
- COMPLEXITY sending O(2eL) memory accesses
verifying O(L) accesses
14Collection P of paths. Depends on (m,S,R,d)
L
15Abstracted Algorithm
- Sender and Receiver share large random Table T.
- To send message m, Sender S, Receiver R date/time
d, - Repeat trial for k 1,2, until success
- Current state specified by A auxiliary table
- Thread defined by (m,S,R,d,k)
- Initialization A H0(m,S,R,d,k)
- Main Loop Walk for L steps (Lpath length)
- c H1(A)
- A H2(A,Tc)
- Success if last e bit of H3(A) 000
- Attach to (m,S,R,d) the successful trial number
k and H3(A) - Verification straightforward given (m, S, R, d,
k, H3 (A))
16Animated Algorithm a Single Step in the Loop
A
C
C H1(A)
A H2(A,TC)
T
TC
17Full Specification
- E (expected) factor by which computation cost
exceeds verification expected number of trials
2e - If H3 behaves as a random function
- L length of walk
- Want, say, ELt 10 seconds, where
- t memory latency 0.2 ?sec
- Reasonable choices
- E 24,000, L 2048
- Also need How large is A?
- A should not be very small
abstract algorithm
- Initialize A H0(m,S,R,d,k)
- Main Loop Walk for L steps
- c ? H1(A)
- A ? H2(A,Tc)
- Success if H3(A) 0log E
- Trial repeated for k 1,2,
- Proof (m,S,R,d,k,H3(A))
18Choosing the Hs
- A theoretical approach idealized random
functions - Provide a formal analysis showing that the
amortized number of memory access is high - A concrete approach inspired by RC4 stream cipher
- Very Efficient a few cycles per step
- Dont have time inside inner loop to compute
complex function - A is not small changes gradually
- Experimental Results across different machines
19Path-following approach Dwork-Goldberg-Naor
Crypto 03
- Theorem fix any spammer
- whose cache size is smaller than T/2
- assuming T is truly random
- assuming H0,,H3 are idealized hash functions
- the amortized number of memory accesses per
successful message is ?(2eL).
- Remarks
- lower bound holds for spammer maximizing
throughput across any collection of messages and
recipients - model idealized hash functions using random
oracles - relies on information-theoretic unpredictability
of T
20Why Random Oracles?
- Random Oracles 101
- Can measure progress
- know which oracle calls must be made
- can see when they occur.
- First occurrence of each such call is a progress
call - 1 2 3 1 3 2 3 4
- Initialize A H0(m,S,R,d,k)
- Main Loop Walk for L steps
- c ? H1(A)
- A ? H2(A,Tc)
- Success if H3(A) 0log E
- Trial repeated for k 1,2,
- Proof (m,S,R,d,k,H3(A))
abstract algorithm
21Proof highlights
- Use of idealized hash function implies
- At any point in time A is incompressible
- The average number of oracle calls per success
is ?(EL). - We can follow the progress of the algorithm
- Cast the problem as that of asymmetric
communication complexity between memory and cache
- Only the cache has access to the functions H1 and
H2
Cache
Memory
22Talk Plan
- The proofs of work approach
- DGNs Memory bound functions
- Generating a large random looking table DNW
- Open problems
23Using a succinct table DNW 05
- GOAL use a table T with a succinct description
- easy distribution of software (new users)
- fast updates (over slow connections)
- PROBLEM lose information theoretic
unpredictability - spammer can exploit succinct description to avoid
memory accesses - IDEA generate T using a memory-bound process
- Use time-space trade-offs for pebbling
- Studied extensively in 1970s
User builds the table T once and for all
24Pebbling a graph
- GIVEN a directed acyclic graph
- RULES
- inputs a pebble can be placed on an input node
at any time - a pebble can be placed on any non-input vertex if
all immediate parent nodes have pebbles - pebbles may be removed at any time
- GOAL find a strategy to pebble all the outputs
while using few pebbles and few moves
INPUT
OUTPUT
25What do we know about pebbling
- Any graph can be pebbled using O(N/log N)
pebbles. Valiant - There are graphs requiring ?(N/log N) pebbles
PTC - Any graph of depth d can be pebbled using O(d)
pebbles - Constant degree
- Tight tradeoffs some shallow graphs requires
many (super poly) steps to pebble with a few
pebbles LT - Some results about pebbling outputs hold even
when possible to put the available pebbles in any
initial configuration
26Succinctly generating T
- GIVEN a directed acyclic graph
- constant in-degree
- input node i labeled H4(i)
- non-input node i labeledH4(i, labels of parent
nodes) - entries of T labels of output nodes
OBSERVATION good pebbling strategy ) good
spammer strategy
Lj
Lk
Li H4(i, Lj, Lk)
INPUT
OUTPUT
27Converting spammer strategy to a pebbling
- EX POST FACTO PEBBLING computed by offline
inspection of spammer strategy - PLACING A PEBBLE place a pebble on node i if
- H4 used to compute Li H4(i, Lj, Lk), and
- Lj, Lk are the correct labels
- INITIAL PEBBLES place initial pebble on node j
if - H4 applied with Lj as argument, and
- Lj not computed via H4
- REMOVING A PEBBLE remove a pebble as soon as
its not needed anymore
- computing a label using hash function
- lower bound on moves )lower bound on hash
function calls
- using cache memory fetches
- lower bound on pebbles )lower bound on
memory accesses
IDEA limit of pebbles used by the spammer as
a function of its cache size and of bits it
brings from memory
28Constructing the dag
- CONSTRUCTION dag D composed of D1 D2
- D1 has the property that pebbling many outputs
requires many pebbles - more than cache and pages brought from memory can
supply - stack of superconcentratorsLengauer Tarjan 82
- D2 is a fault-tolerant layered graph
- even if a constant fraction of each layer is
deleted can still embed a superconcentrator - stack of expandersAlon Chung 88, Upfal 92
inputs of D
D1
D2
- SUPERCONCENTRATOR is a dag
- N inputs, N outputs
- any k inputs and k outputs connected by
vertex-disjoint paths
outputs of D
29Using the dag
- CONSTRUCTION dag D composed of D1 D2
- D1 has the property that pebbling many outputs
requires many pebbles - more than cache and pages brought from memory can
supply - stack of superconcentratorsLengauer Tarjan 82
- D2 is a fault-tolerant layered graph
- even if a constant fraction of each layer is
deleted can still embed a superconcentrator - stack of expanders Alon Chung 88, Upfal 92
- idea fix any execution
- let S set of mid-level nodes pebbled
- if S is large, use time-space trade-offs for D1
- if S is small, use fault-tolerant property of D2
- delete nodes whose labels are largely determined
by S
30The lower bound result
- Theorem for the dag D, fix any spammer
- whose cache size is smaller than T/2
- assuming H0,,H4 are idealized hash functions
- makes poly of hash function calls
- the amortized number of memory accesses per
successful message is ?(2e L).
- Remarks
- lower bound holds for spammer maximizing
throughput across any collection of messages and
recipients - model idealized hash functions using random
oracles
31What can we conclude from the lower bound?
- Shows that the design principles are sound
- Gives us a plausibility argument
- Tells us that if something will go wrong we will
know where to look - But
- Based on idealized random functions
- How to implement them
- Might be computationally expensive
- Are applied to all of A
- Might be computationally expensive simply to
touch all of
32Talk Plan
- The proofs of work approach
- DGNs Memory bound functions
- Generating a large random looking table DNW
- Open problems moderately hard functions
33Alternative construction based on sorting
- motivated by time-space trade-offs for sorting
Borodin Cook 82 - easier to implement
Ti H4(i, 1)
- input node i labeled H4(i, 1)
- at each round, sort array
- then apply H4 to current values of the array
SORT
Ti H4(i, Ti, 2)
SORT
OPEN PROBLEM prove a lower bound
34More open problems
- WEAKER ASSUMPTIONS? no recourse to random
oracles - use lower bounds for cell probe model and
branching programs? - Unlike most of cryptography in this case there
is a chance of coming up with an unconditional
result - Physical limitations of computation to form a
reasonable lower bound on the spammers effort
35A theory of moderately hard function?
- Key idea in cryptography use the computational
infeasibility of problems in order to obtain
security. - For many applications moderate hardness is needed
- current applications
- abuse prevention, fairness, few round
zero-knowledge - FURTHER WORK develop a theory of moderate hard
functions
36Open problems moderately hard functions
- Unifying Assumption
- In the intractable world one-way function
necessary and sufficient for many tasks - Is there a similar primitive when moderate
hardness is needed? - Precise model
- Details of the computational model may matter,
unifying it? - Hardness Amplification
- Start with a somewhat hard problem and turn it
into one that is harder. - Hardness vs. Randomness
- Can we turn moderate hardness into and moderate
pseudorandomness? - Following standard transformation is not
necessarily applicable here - Evidence for non-amortization
- It possible to demonstrate that if a certain
problem is not resilient to amortization, then a
single instance can be solved much more quickly?
37Open problems moderately hard functions
- Immunity to Parallel Attacks
- Important for timed-commitments
- For the power function was used, is there a good
argument to show immunity against parallel
attacks? - Is it possible to reduce worst-case to average
case - find a random self reduction.
- In the intractable world it is known that there
are limitations on random self reductions from
NP-Complete problems - Is it possible to randomly reduce a P-Complete
problem to itself? - is it possible to use linear programming or
lattice basis reduction for such purposes? - New Candidates for Moderately Hard Functions
38- Thank you
- Merci beaucoup
- ???? ???
39path-following approach Dwork-Goldberg-Naor
Crypto 03
- PUBLIC large random table T (2 x spammers cache
size) - PARAMETERS integer L, effort parameter e
- IDEA path is a sequence of L sequential accesses
to T - sender searches collection of paths to find a
good path - collection depends on (m, S, R, d)
- locations in T depends on hash functions H0,,H3
- density of good paths 1/2e
- OUTPUT (m, S, R, d) description of a good path
- COMPLEXITY sending O(2eL) memory accesses
verifying O(L) accesses
40path-following approach Dwork-Goldberg-Naor
Crypto 03
- PUBLIC large random table T (2 x spammers cache
size) - INPUT message m, sender S, receiver R, date/time
d - PARAMETERS integer L, effort parameter e
- IDEA sender searches paths of length L for a
good path - path determined by table T and hash functions
H0,,H3 - any path is good with probability 1/2e
- OUTPUT (m, S, R, d) description of a good path
- COMPLEXITY sender O(2eL) memory fetches
verification O(L) fetches
MAIN RESULT ?(2eL) memory fetches necessary
41memory-bound model
USER
SPAMMER
- MAIN MEMORY
- large but slow
- locality
- MAIN MEMORY
- may be very very large
- CACHE
- cache size at most ½ users main memory
- CACHE
- small but fast
- hits/misses
42path-following approach Dwork-Goldberg-Naor
Crypto 03
- PUBLIC large random table T (2 x spammers cache
size) - INPUT message m, sender S, receiver R, date/time
d - sender makes a sequence of random memory
accesses into T - inherently sequential (hence path-following)
- sends a proof of having done so to the receiver
- verification requires only a small number of
accesses - memory access pattern leads to many cache misses
43path-following approach Dwork-Goldberg-Naor
Crypto 03
- PUBLIC large random table T (2 x spammers cache
size) - INPUT message m, sender S, receiver R, date/time
d - OUTPUT attach to (m, S, R, d) the successful
trial number k and H3(A) - COMPLEXITY sender ?(2eL) memory fetches
verification O(L) fetches
- Repeat for k 1, 2,
- Initialize A H0(m,S,R,d,k)
- Main Loop Walk for L steps
- c ? H1(A)
- A ? H2(A,Tc)
- Success last e bits of H3(A) are 0s
- SPAMMER
- needs 2e walks
- each walk requires L/2 fetches
44using the dag
- CONSTRUCTION dag D composed of D1 D2
- D1 has the property that pebbling many outputs
requires many pebbles - more than cache and pages brought from memory can
supply - stack of superconcentratorsLengauer Tarjan 82
- D2 is a fault-tolerant layered graph
- even if a constant fraction of each layer is
deleted can still embed a superconcentrator - stack of expanders Alon Chung 88, Upfal 92
- idea fix any execution
- if many mid-level nodes are pebbled, use D1
- otherwise, use fault-tolerant property of D2