Title: Cooperative backup on Social Network
1Cooperative backup on Social Network
- Nguyen Tran and Jinyang Li
2Motivation
- Backup is important.
- State of the art solutions
- Buy second harddisk
- Manual backup to mobile disk / CDs
- Sign up for online backup (10 bucks for
1GB/month) - Manual backup is not good (additional harddisk,
need to remember) - Important data need long distance separation
between original and backup copy, e.g. Wall
Street centers data. - Idea backup on p2p network (utilize idle space,
backup daemon, remoteness).
3Solution overview
- How to make sure nodes w/ data stay in the
system? - the malicious gets data and go.
- Idea backup on your real friends node(s).
- Consequence lose global space utilization but
gain incentives. - For backup service
- Data safety global space utilization.
4Model
Meta data
Data
5Q1 efficient space allocation
- If I join w/ 100G to back up and 100G to
contribute, can I back up all the data? - Orkut 2363 nodes, 78 space utilization
- Venus 39783 nodes, 81 space utilization
- Over half of nodes can backup all data
- Which buddy to pick to further optimize global
space efficiency? - Buddy with min degree?
6Q2 space optimization w/ coding
- Q If you only have 1G idle space, can you store
5G worth of your friends backups? Ans yes!
a2
a1
A a1?a2? ? an
an
an-1
How about 2 friends crash at the same time?
7How many disk space you need to store a1, a2, ,
an?
8How many disk space you need to store a1, a2, ,
an?
9Definition
- Let S a1, a2, , an
- Let T? S, denote ?(T) is the XOR of all elements
in T. - A solution X S1, S2, ,Sk where Si? S means
you store ?(S1), ?(S2), , ?(Sk) on your machine,
i.e. F(n) k. - Of course, ? Si S
10- Lemma X is a solution that tolerates 2
concurrent crashes iff ? p, q? 1..n, ? i?
1..k Si contains either ap or aq but not both.
a1
a2
a1?a2 and a1?a3
i.e. Xa1, a2, a1,a3
a3
good
11- Lemma X is a solution that tolerates 2
concurrent crashes iff ? p, q? 1..n, ? i?
1..k Si contains either ap or aq but not both. - Proof
- suppose every Si contains both ap aq or non
of them, XOR them cannot reduce individual ap or
aq. - element ai in Si\ap, get ai from the owner (not
crash) and XOR with Si. Finally, we can get ap.
Then getting aq is easy, i.e. X is the solution.
12How small is k? Our ans log(n)
- Solution construction F(2n) F(n) 1
- If there are 2n data a1, a2, , an,an1, , a2n
to backup. - Put a1, a2, , an to X
- For every set in the solution of n data a1, a2,
, an union with its isomorphic in the set
an,an1, , a2n and put in X
13Example
n 4, F(n) k 3
n 2, F(n) k 2
n 8, F(n) k 4
14How many disk space you need to store a1, a2, ,
an?
15My questions
- Is this result known before?
- Log(n) is a lower bound for 2 concurrent crashes
tolerable - F(n) ? for tolerating 3, 4, 5 concurrent
crashes.
16Implementation Options
- 1 backup at which granularity?
- Consolidate backup data into 1 log file
- Pros hide file size, recover older version,
incremental backup - Cons bad space bandwidth efficiency
- Backup data at file granularity
- Pros space bandwidth efficiency
- Cons reveal file size, subtle detail about
cutting big files, wise update,
172 Wise transfer for updating file
- Problem if two versions of the file have little
difference, transfer the whole file again is
expensive. - Idea (rsync) only transfer the necessary bytes.
- Let A is the updated file on node N, A is the
old version of the file kept by M. - M
- Cut A to fix size chunks and compute the hash.
- Send all hash h1, h2, hn to N
- N
- Compute hash of chunks in A in sliding window
fashion. - Compare with h1, h2, hn to know overlapping.
- Sent only necessary bytes to M.
183 Cutting big file into small parts
- Problem One friend doesnt have enough space for
your big file. Therefore, you need to cut big
file into smaller parts. But how to cut them so
that later update is easy. - Fix part size? No, if the file is insert/delete
one byte, all the parts are shifted. Hence, you
need to update all the old parts. - Idea (LBFS) Using file bit pattern of the file
to set the boundary rather than fix size. As a
result, if one byte is inserted/deleted, only the
part containing that file changes.
19Other issues
- 4 Trust but verify your friends.
- check that backup is still there
- how to check if friends contribute right share?
- 5 how to check if the backup copy still exists
if you and your friend are not online at the same
time. - Idea ask other friends to help.
- 6 Sharing files among friends.
- Viewers automatically cache/back up the file.
- Backed up data increase availability of files
shared.
20