Title: HAIL (High-Availability and Integrity Layer) for Cloud Storage
1HAIL (High-Availability and Integrity Layer) for
Cloud Storage
- Kevin Bowers and Alina Oprea
- RSA Laboratories
- Joint work with Ari Juels
2Cloud storage
Cloud Storage Provider
Storage server
Web server
- Pros
- Lower cost
- Easier management
- Enables sharing and access from anywhere
- Cons
- Loss of control
- No guarantees of data availability
- Provider failures
Client
3Provider failures
Amazon S3 systems failure downs Web 2.0 sites
Twitterers lose their faces, others just want
their data back Computer World, July 21,
2008
Customers Shrug Off S3 Service Failure At about
730 EST this morning, S3, Amazon.coms online
storage service, went down. The 2-hour service
failure affected customers worldwide. Wired,
Feb. 15, 2008
Temporary unavailability
Spectacular Data Loss Drowns Sidekick Users
October 10, 2009
Loss of customer data spurs closure of online
storage service 'The Linkup Network World, Nov
8, 2008
Permanent data loss
How do we increase users confidence in the cloud?
4Outline
- Proofs of Retrievability
- Constructions and practical aspects
- Limitations
- HAIL goals and adversarial model
- HAIL protocol design
- Encoding layer
- Decoding layer
- Challenge-response protocol
- Redistribution of shares in case of failures
- HAIL parameter tradeoffs
- Open problems
5PORs Proofs of Retrievability
- Client outsources a file F to a remote storage
provider - Client would like to ensure that her file F is
retrievable - The simple approach client periodically
downloads F - This is resource-intensive!
- What about spot-checking instead?
- Sample a few file blocks periodically
- If file is not stored locally, need verification
mechanism (e.g., MACs for each file block)
6Spot-checking
Cloud Storage Provider
F
B4
B7
B1
MACkB4
Client
k
7Spot-checking
Cloud Storage Provider
F
B4
B7
B1
Small corruptions go undetected
Client
k
8Error correcting code
Cloud Storage Provider
Corrects small corruption
F
Parity blocks
Client
k
9ECC MAC
Cloud Storage Provider
F
B1
B4
B7
P1
Parity blocks
MACs over file and parity blocks
- Detect large corruption through spot checking
- Corrects small corruption through ECC
Client
k
10Query aggregation
Cloud Storage Provider
F
Parity blocks
MACs over aggregation of blocks
Challenge
Response
k
Client
11Practical considerations
- Applying such an ECC to all of F is impractical
- Instead, we can stripe the ECC
- If adversary knows the stripe structure, she can
corrupt selectively
12Selective corruption
- Adversary targets a particular stripe
- File can not be recovered
- The probability that the client detects the
corruption through sampling is small if stripes
are small - Practical code parameters encode hundreds of
bytes at a time (e.g., Reed-Solomon (255, 223,
32))
13Adversarial codes hide ECC stripes
- Do secret, randomized partitioning of F into
stripes - E.g. use secret key to generate pseudorandom
permutation and then choose stripes sequentially - Encrypt and permute parity blocks
- The encoding is still systematic
- But adversary does not know where stripes are,
so - adversary cannot feasibly target a stripe!
14POR papers
- Proofs of Retrievability (PORs)
- Juels-Kaliski 2007
- Proofs of Data Possession (PDPs)
- Burns et al. 2007
- Erway et al. 2009
- Unlimited queries using homomorphic MACs
- Shacham-Waters 2008
- Ateniese, Kamara and Katz 2009
- Fully general query aggregation in PORs
- Bowers, Juels and Oprea 2009
- Dodis, Vadhan and Wichs 2009
15When PORs fail
Cloud Storage Provider
F
F
decoder
Challenge
Response
Unrecoverable
k
Client
16Outline
- Proofs of Retrievability
- Constructions and practical aspects
- Limitations
- HAIL goals and adversarial model
- HAIL protocol design
- Encoding layer
- Decoding layer
- Challenge-response protocol
- Redistribution of shares in case of failures
- HAIL parameter tradeoffs
- Open problems
17HAIL goals
- Resilience against cloud provider failure and
temporary unavailability - Use multiple cloud providers to construct a
reliable cloud storage service out of unreliable
components - RAID (Reliable Array of Inexpensive Disks) for
cloud storage under adversarial model - Provide clients or third party auditing
capabilities - Efficient proofs of file availability by
interacting with cloud providers
18RAID (Redundant Array of Inexpensive Disks)
Stripe
Data block
Parity block
Data block
Data block
X
B2
B3
P1B1?B2?B3
B1
B1?B3?P1
- Shift from monolithic, high-performance drives to
cheaper drives with redundancy
19RAID in the Cloud
Provider A
Provider B
Provider C
Provider D
- Fuse together cheap cloud providers to provide
high-quality (reliable) abstraction - E.g., Memopal offers 0.02 / GB / Month storage
on a 5-year contract vs. Amazon at 0.15 / GB /
Month
20But the cloud is adversarial!
Provider A
Provider B
Provider C
Provider D
- RAID designed for benign failures (drive crashes)
- Static adversaries are not realistic
- A mobile adversary moves from provider to
provider - System failures and corruptions over time
- Corrupts a threshold of providers in each epoch
(b out of n)
21Mobile adversary
Provider A
Provider B
Provider C
Provider D
- Combination of proactive and reactive models
- Separate each server into code base and storage
base - Code base of servers cleaned at beginning of
epoch (e.g., through reboot) - At most b out of n server have corrupted code in
each epoch - Challenge-responses used for detection of failure
- Corrupted storage recovered when failure is
detected
22HAIL protocols
- File encoding
- Distribute a file across n storage providers
- Add redundancy to tolerate provider failures
- Small state stored locally by client (including
secret key) - File decoding
- Recover original file by contacting a threshold
of providers - Tolerate provider failures or unavailability
- Challenge-response protocol
- Executed a number of times per epoch
- Enables clients to perform integrity checks by
contacting a threshold of providers - Detects failures early and enhances data
availability - Share redistribution
- When failure detected, clients reconstruct shares
from redundancy encoded in other providers
23Outline
- Proofs of Retrievability
- Constructions and practical aspects
- Limitations
- HAIL goals and adversarial model
- HAIL protocol design
- Encoding layer
- Decoding layer
- Challenge-response protocol
- Redistribution of shares in case of failures
- HAIL parameter tradeoffs
- Open problems
24First idea file replication with POR
Provider A
Provider C
Provider B
F
F
F
Parity
Parity
MACs
Parity
MACs
MACs
POR Response
POR Challenge
POR Challenge
POR Response
POR Challenge
POR Response
F
Client
25File replication with POR Issues
F
MACs
- Compute different MACs per provider
- Large encoding overhead
- Large storage overhead due to replication
Client
26Use redundancy across servers
Provider A
Provider C
Provider B
F
F
F
Block i
Block i
Block i
F
Fi
Fi
Fi
Sample and check consistency across providers
Client
27Small-corruption attack
Provider A
Provider C
Provider B
F
F
F
Fi
Fi
Fi
File can not be recovered after n/b epochs
The probability that client samples the corrupted
block is low
Client
28Replication with server code
Provider A
Provider C
Provider B
F
F
F
Parity
Parity
Parity
- Still vulnerable to small-corruption attack, once
corruption exceeds the error correction rate of
server code - Large storage overhead due to replication
Client
29Dispersal erasure code
Primary servers (k)
Secondary servers (n-k)
PA
PB
PC
PD
PE
128 bit
Stripe
F1
F2
F3
Dispersal code parity
Original file F
- File can be recovered from any k available
servers - For encoding efficiency, use striping for 128-bit
blocks
F
30Two encoding layers
PA
PB
PC
PD
PE
Dispersal code parity
F1
F2
F3
Server code
- Dispersal code reduces storage overhead of
replication with similar availability guarantees - Server code improves resilience to
small-corruption attack
31Checking for correct encoding
PA
PB
PC
PD
PE
Check that stripe is a codeword in dispersal code
Client
32Aggregation of stripes
PA
PB
PC
PD
PE
1
a
a2
Check that linear combination of stripes is a
codeword
Client
33Comparison
File replication with POR
- - Large storage overhead due to replication
- - Redundant MACs for POR
- - Large encoding overhead
- - Verifiable by client only
- Increased lifetime
HAILTwo encoding layers (dispersal and server
code)
- Optimal storage overhead for given availability
level - Uses cross-server redundancy for verifying
responses - Reasonable encoding overhead
- Public verifiability
- - Limited lifetime
34Increase protocol lifetime
PA
PB
PC
PD
PE
F1
F2
F3
MAC
- Authenticate stripes with MACs
- One MAC per block
- Large storage overhead
- How can the MACs from multiple stripes be
aggregated?
35Integrity-protected dispersal code
PA
PB
PC
PD
PE
F1
F2
F3
PRFk1(pos)
- Embed integrity information into parity blocks of
dispersal code - Can check linear combination of MACs knowing only
linear combination of blocks
36HAIL protocols
- Encoding
- Two layers of error correction dispersal code
and server code - Integrity-protected dispersal code used to reduce
storage overhead - Server code is adversarial erasure code
- Decoding
- Reverse of encoding, using two layers of error
correction - Tradeoffs
- Erasure dispersal code tolerates n-m-1 failures
per round, but decoding requires brute force in
case of errors (do not know the positions of
erasures) - Error-correcting dispersal code tolerates up to
b (n-m-1)/2 failures per round
37HAIL protocols, cont
- Challenge-response
- Executed in each time round a number of times
- Challenge a number of row positions
- Response aggregated row
- Verification response should be a codeword in
dispersal code and composite MAC should be valid - Redistribution of shares
- Invoked when corruption of a fragment is detected
by challenge-response - Reconstruction done by client and involves
downloading m correct file fragments
38HAIL availability
39Frequency of challenges
40Encoding Performance
- HAIL requires two levels of encoding
- Order is important!
41Encoding Security
- Security of the MAC depends on the size of the
finite field used to perform Reed-Solomon
encoding. - Most Reed-Solomon codes are implemented over
bytes, or at most 4-byte words (typical integer
representation) - 32-bit security is low from a cryptographic
viewpoint - Operating over larger symbols is slow
- Larger encodings can be generated by combining
several smaller encodings - Or, they can be implemented using extension
fields - To speed up larger symbol encoding, need fast
operations in large Galois Fields - Work with Jianqiang Luo and Lihao Xu at Wayne
State Univ.
42Encoding Throughput Improvement
43Decoding Throughput Improvement
44Accelerated Encoding Throughput
45Accelerated Decoding Throughput
46Effect of Placement on Throughput
47Summary
- HAIL is an extension of RAID into the cloud
- High availability and tolerance to adversarial
failures - Low storage overhead due to integrity-protected
dispersal code - Enables client-side integrity checks
- Low bandwidth for challenge-response due to
aggregation - Papers
- K. Bowers, A. Juels, and A. Oprea. Proofs of
Retrievability Theory and Implementation. ACM
CCSW 09. - K. Bowers, A. Juels, and A. Oprea. HAIL High
Availability and Integrity Layer for Cloud
Storage. ACM CCS 09. - http//www.rsalabs.com/