Title: On-The-Fly Verification of Rateless Erasure Codes
1On-The-Fly Verification of Rateless Erasure Codes
- Max Krohn (MIT CSAIL)
- Michael Freedman and David Mazières (NYU)
2On-The-Fly Verification of Rateless Erasure Codes
Multicast Authentication Dead/Exhausted
- Max Krohn (MIT CSAIL)
- Michael Freedman and David Mazières (NYU)
3The Setting
- A large file F
- Linux ISO (650MB)
- H(F) is available
- signed by Publisher (RedHat)
- A handful of untrusted sources/mirrors S1,S8
4A Handful of Senders
5The Setting
- A large file F
- Linux ISO (650MB)
- H(F) is available
- signed by Publisher (RedHat)
- A handful of untrusted sources S1,S8
- Their aggregate BW is limited
- A slew of receivers R1,...,R1,000,000
- Version 81.3 just released! Want it Now!
6Three Desirable Properties
Clients Get Fast Downloads
Sources Can Multicast
Clients Can Verify Blocks On-the-Fly
7Receivers Get Fast, Verifiable Downloads
- The trusted publisher (RedHat)
- Splits up F into n blocks
- Hashes all blocks
- Signs all hashes (or hash tree)
- Receivers
- Download and verify hashes
- Download needed file blocks in parallel
8Everyone for Themselves
S3
S2
S4
S1
R7
R2
R9
R12
R3
R8
R10
R1
R4
R13
R11
R6
R5
9Everyone For Themselves
Clients Get Fast Downloads
Sources Can Multicast
Clients Can Verify Blocks On-the-Fly
10Verifiable Multicast (BitTorrent)
S3
S4
S2
S1
R12
R7
R10
R13
R5
R6
R9
R8
R3
R1
R2
R4
R11
11Verifiable Multicast (BitTorrent)
Clients Get Fast Downloads
Sources Can Multicast
Clients Can Verify Blocks On-the-Fly
12Multicast With Erasure Codes
- Sources erasure encode the file F
blocks
F
n blocks
13Multicast With Erasure Codes
- Sources erasure encode the file F
- Receivers collect blocks and decode
n blocks
F
blocks
F
n blocks
1.03n blocks
14Multicast With Erasure Codes
S2
S3
S1
S4
R8
R9
R10
R6
R5
R12
R4
R1
R11
R3
R13
R7
R2
15Multicast With Erasure Codes
- Bullet SOSP 2003
- SplitStream SOSP 2003
- Big Downloads IPTPS 2003
- Informed Content Delivery SIGCOMM 2002
16Receivers Cannot Verify Content
S3
S2
S1
S4
?
?
?
?
?
?
?
?
?
?
?
?
?
?
R1
17Receivers Cannot Verify Content
S3
S2
S1
S4
?
?
?
?
?
?
?
?
?
?
R1
18Multicast With Erasure Codes
Clients Get Fast Downloads
Sources Can Multicast
Clients Can Verify Blocks On-the-Fly
19Multicast With Erasure Codes
Clients Get Fast Downloads
Sources Can Multicast
Clients Can Verify Blocks On-the-Fly
20What is the Attack Goal?
- To corrupt the file.
- To waste bandwidth.
S2
S3
S1
R
21How To Attack?
- Send correct blocks but with skewed
distributions. - Distribution Attack
- Send incorrect blocks
- Pollution Attack
- Karlof et al. NDSS 04
S2
S3
S1
R
22Properties of a Solution to Pollution
- OK Receivers can tell good from bad.
- Much better Receivers can finger bad blocks as
they arrive.
S2
S3
S1
R
CONTRIBUTION
23Outline
- Introduction
- Review of LT Codes
- Strawman 1
- Strawman 2
- Efficiently Catching Bad Blocks as They Arrive
24LT-Codes Luby, FOCS 2002
b1
b2
b3
b4
b5
F
n5 input blocks
25LT-Codes How To Encode
- Pick degree d1 from a pre-specified distribution.
(d12) - Select d1 input blocks uniformly at random. (Pick
b1 and b4 ) - Compute their sum.
- Output
E(F)
c1
b1
b2
b3
b4
b5
F
26LT-Codes How To Encode (contd)
E(F)
c1
c2
c3
c4
c5
c6
c7
b1
b2
b3
b4
b5
F
27How To Decode
E(F)
c1
c2
c3
c4
c5
c6
c7
b1
b2
b3
b4
b5
F
28How To Decode
E(F)
c1
c2
c3
c4
c5
c6
c7
b1
b2
b3
b4
b5
F
29How To Decode
E(F)
c1
c2
c3
c4
c5
c6
c7
b1
b2
b3
b4
b5
F
30How To Decode
E(F)
c1
c2
c3
c4
c5
c6
c7
b1
b2
b3
b4
b5
F
31How To Decode
E(F)
c1
c2
c3
c4
c5
c6
c7
b5
b5
b5
b1
b2
b3
b4
b5
F
32How To Decode
E(F)
c1
c2
c3
c5
c6
c7
b5
b5
b1
b2
b3
b4
b5
F
33How To Decode
E(F)
c1
c2
c3
c5
c6
c7
b5
b5
b1
b2
b3
b4
b5
F
34How To Decode
E(F)
c1
c2
c3
c5
c6
c7
b5
b5
b1
b2
b3
b4
b5
F
35How To Decode
E(F)
c1
c2
c3
c5
c6
c7
b5
b5
b2
b2
b1
b2
b3
b4
b5
F
36How To Decode
E(F)
c1
c2
c3
c5
c6
c7
b5
b5
b2
b2
b1
b2
b3
b4
b5
F
37Outline
- Introduction
- Review of LT Codes
- Strawman 1
- Simple Solution To Tell Good Blocks From Bad
- Strawman 2
- Efficiently Catching Bad Blocks as They Arrive
38Smart Decoder for LT-Codes
E(F)
c1
c2
c3
c4
c5
c6
c7
b1
b2
b3
b4
b5
F
39Smart Decoder for LT-Codes
E(F)
c1
c2
c3
c4
c5
c6
c7
b1
b2
b3
b4
b5
F
40Smart Decoder for LT-Codes
E(F)
c1
c2
c3
c4
c5
c6
c7
b1
b2
b3
b4
b5
F
41Smart Decoder for LT-Codes
E(F)
c1
c2
c3
c4
c5
c6
c7
b1
b2
b3
b4
b5
F
42Smart Decoder for LT-Codes
E(F)
c1
c2
c3
c4
c5
c6
c7
b1
b2
b3
b4
b5
F
43Smart Decoder for LT-Codes
E(F)
c1
c2
c3
c4
c5
c6
c7
b5
b5
b5
b1
b2
b3
b4
b5
F
44Smart Decoder for LT-Codes
E(F)
c1
c2
c3
c4
c5
c6
c7
b5
b5
b5
b1
b2
b3
b4
b5
F
45Smart Decoder for LT-Codes
E(F)
c1
c2
c3
c4
c5
c6
c7
b5
b5
b5
X
b1
b2
b3
b4
b5
F
46Smart Decoder for LT-Codes
X
E(F)
c1
c2
c3
c4
c5
c6
c7
b5
b5
b5
b1
b2
b3
b4
b5
F
47Smart Decoder for LT-Codes
E(F)
c1
c2
c3
c4
c5
c6
c7
b5
b5
b5
b1
b2
b3
b4
b5
F
48Smart Decoder Problem
- Data collected from 50 random Online encodings of
a 10,000 block file.
49Outline
- Introduction
- Review of LT Codes
- Strawman 1
- Strawman 2
- Hashing/Signing Encoded Blocks
- Efficiently Catching the Bad as They Arrive
50Hashing/Signing Encoded Blocks
n blocks
en blocks
F
- Trusted Publisher (RedHat)
- Picks e, computes en encoded blocks
- Hashes all encoded blocks
- Signs the hashes.
51Hashing/Signing Encoded Blocks
- Expansion factor e should be big to avoid
duplicate blocks. - e should be small to make crypto overhead
acceptable. - Our analysis shows theres no sweet spot.
52Hashing/Signing Encoded Blocks
- Expansion factor e should be big to avoid
duplicate blocks. - e should be small to make crypto overhead
acceptable. - Our analysis shows theres no sweet spot.
- e.g., best case bandwidth requirements 5
- e.g., generating hashes is very expensive as e
gets large.
53Outline
- Introduction
- Review of LT Codes
- Strawman 1
- Strawman 2
- Efficiently Catching the Bad as They Arrive
54Best of Both Worlds
- Goal
- Crypto overhead of one hash for every block in
the input file (Strawman 1) - Verify blocks as they arrive (Strawman 2)
- Idea
- Distribute hashes of file blocks, and use them to
verify encoded blocks. - Need a better hash function.
55Insight Homomorphic Hashing
- Assume function h exists such that
- is homomorphic
- is a CRHF
56Homomorphic Hashing Intuition
R knows R wants proof that
c
c
b2
b5
57Homomorphic Hashing Intuition
R knows R wants proof that
c
c
b2
b5
58Homomorphic Hashing Intuition
R knows R wants proof that
c
c
b2
b5
59Homomorphic Hashing Intuition
R knows R wants proof that
60Homomorphic Hashing Intuition
R knows R wants proof that
Property 1
61Homomorphic Hashing Intuition
R knows R wants proof that
Property 1
Property 2
62Homomorphic Hashing Protocol
- R receives the block
- Compute h(c)
- If
- Accept block mark as valid
- else
- Suspect sender of being bad guy, and switch.
63Homomorphic Hashing Protocol
- R receives the block
- Compute h(c)
- If
- Accept block mark as valid
- else
- Suspect sender of being bad guy, and switch.
- Can such an h possibly exist?
64Homomorphic Hashing Related Work
- DLog-Based CRHF
- Pederson Commitment CRYPTO 91
- Chaum et al. CRYPTO 91
- One-Way Accumulators
- Benaloh and de Mare EUROCRYPT 93
- Baric and Pfitzmann EUROCRYPT 93
- Incremental Hashing
- Bellare et al. CRYPTO 94
- Homomorphic Signatures
- Micali and Rivest RSA 02
- Johnson et al. RSA 02
65Mechanics of Homomorphic Hashing
- Discrete Log Hash
- Pick 1024-bit prime p and 256-bit prime q, q
divides (p-1) - Pick from 512 generators of order q
- Write F as elements in
F
256-bit fragment
16K block
66How to Encode (example)
Standard LT-Codes
Homomorphic Scheme
67How To DLog Hash
- Hashes are elements in (128 bytes big)
- Hash reduces 16K block by a factor of 128
68How To DLog Hash
- Hashes are elements in (128 bytes big)
- Hash reduces 16K block by a factor of 128
- 1 overhead
69DLog-Hash Key Property
70DLog-Hash Key Property
71This Seems Really Expensive
Operation on a 16K Block Throughput (kB/sec)
DLog Hash 39
Arrival on 1.5Mbps DSL 190
SHA1 Hash 57,600
72Key Optimizations
- Hash Generation
- Each publisher picks her own parameters,
- compute with 1 exponentiation (not 512)
- Hash Verification
- Receiver verifies hashes probabilistically and in
batches. - Bellare et al. EUROCRYPT 98
73Much Better
Operation on a 16K Block Throughput (MB/sec)
Naïve DLog Hash 0.038
Per-publisher Generation 11.210
Batch Verification 7.620
Arrival on 1.5 Mbps DSL 0.186
SHA1 Hash 56.250
74Homomorphic Hashing Key Points
- Key Algebraic Feature
- Homomorphism Receivers can compose hashes the
way encoders sum file blocks. - Can check encoded blocks as they arrive.
- Fast
- Can be optimized to achieve good generation and
verification throughputs - Provably Secure
- As hard as discrete log (SHA1/MD5 not needed)
75Conclusion
Clients Get Fast Downloads
Sources Can Multicast
Clients Can Verify Blocks On-the-Fly
76Thank you.