Cooperative backup on Social Network - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Cooperative backup on Social Network

Description:

Manual backup is not good (additional harddisk, need to remember) ... Orkut: 2363 nodes, 78% space utilization. Venus: 39783 nodes, 81% space utilization ... – PowerPoint PPT presentation

Number of Views:82
Avg rating:3.0/5.0
Slides: 21
Provided by: dinhngu
Learn more at: https://cs.nyu.edu
Category:

less

Transcript and Presenter's Notes

Title: Cooperative backup on Social Network


1
Cooperative backup on Social Network
  • Nguyen Tran and Jinyang Li

2
Motivation
  • Backup is important.
  • State of the art solutions
  • Buy second harddisk
  • Manual backup to mobile disk / CDs
  • Sign up for online backup (10 bucks for
    1GB/month)
  • Manual backup is not good (additional harddisk,
    need to remember)
  • Important data need long distance separation
    between original and backup copy, e.g. Wall
    Street centers data.
  • Idea backup on p2p network (utilize idle space,
    backup daemon, remoteness).

3
Solution overview
  • How to make sure nodes w/ data stay in the
    system?
  • the malicious gets data and go.
  • Idea backup on your real friends node(s).
  • Consequence lose global space utilization but
    gain incentives.
  • For backup service
  • Data safety global space utilization.

4
Model
Meta data
Data
5
Q1 efficient space allocation
  • If I join w/ 100G to back up and 100G to
    contribute, can I back up all the data?
  • Orkut 2363 nodes, 78 space utilization
  • Venus 39783 nodes, 81 space utilization
  • Over half of nodes can backup all data
  • Which buddy to pick to further optimize global
    space efficiency?
  • Buddy with min degree?

6
Q2 space optimization w/ coding
  • Q If you only have 1G idle space, can you store
    5G worth of your friends backups? Ans yes!

a2
a1
A a1?a2? ? an
an
an-1
How about 2 friends crash at the same time?
7
How many disk space you need to store a1, a2, ,
an?
8
How many disk space you need to store a1, a2, ,
an?
9
Definition
  • Let S a1, a2, , an
  • Let T? S, denote ?(T) is the XOR of all elements
    in T.
  • A solution X S1, S2, ,Sk where Si? S means
    you store ?(S1), ?(S2), , ?(Sk) on your machine,
    i.e. F(n) k.
  • Of course, ? Si S

10
  • Lemma X is a solution that tolerates 2
    concurrent crashes iff ? p, q? 1..n, ? i?
    1..k Si contains either ap or aq but not both.

a1
a2
a1?a2 and a1?a3
i.e. Xa1, a2, a1,a3
a3
good
11
  • Lemma X is a solution that tolerates 2
    concurrent crashes iff ? p, q? 1..n, ? i?
    1..k Si contains either ap or aq but not both.
  • Proof
  • suppose every Si contains both ap aq or non
    of them, XOR them cannot reduce individual ap or
    aq.
  • element ai in Si\ap, get ai from the owner (not
    crash) and XOR with Si. Finally, we can get ap.
    Then getting aq is easy, i.e. X is the solution.

12
How small is k? Our ans log(n)
  • Solution construction F(2n) F(n) 1
  • If there are 2n data a1, a2, , an,an1, , a2n
    to backup.
  • Put a1, a2, , an to X
  • For every set in the solution of n data a1, a2,
    , an union with its isomorphic in the set
    an,an1, , a2n and put in X

13
Example
n 4, F(n) k 3
n 2, F(n) k 2
n 8, F(n) k 4
14
How many disk space you need to store a1, a2, ,
an?
15
My questions
  • Is this result known before?
  • Log(n) is a lower bound for 2 concurrent crashes
    tolerable
  • F(n) ? for tolerating 3, 4, 5 concurrent
    crashes.

16
Implementation Options
  • 1 backup at which granularity?
  • Consolidate backup data into 1 log file
  • Pros hide file size, recover older version,
    incremental backup
  • Cons bad space bandwidth efficiency
  • Backup data at file granularity
  • Pros space bandwidth efficiency
  • Cons reveal file size, subtle detail about
    cutting big files, wise update,

17
2 Wise transfer for updating file
  • Problem if two versions of the file have little
    difference, transfer the whole file again is
    expensive.
  • Idea (rsync) only transfer the necessary bytes.
  • Let A is the updated file on node N, A is the
    old version of the file kept by M.
  • M
  • Cut A to fix size chunks and compute the hash.
  • Send all hash h1, h2, hn to N
  • N
  • Compute hash of chunks in A in sliding window
    fashion.
  • Compare with h1, h2, hn to know overlapping.
  • Sent only necessary bytes to M.

18
3 Cutting big file into small parts
  • Problem One friend doesnt have enough space for
    your big file. Therefore, you need to cut big
    file into smaller parts. But how to cut them so
    that later update is easy.
  • Fix part size? No, if the file is insert/delete
    one byte, all the parts are shifted. Hence, you
    need to update all the old parts.
  • Idea (LBFS) Using file bit pattern of the file
    to set the boundary rather than fix size. As a
    result, if one byte is inserted/deleted, only the
    part containing that file changes.

19
Other issues
  • 4 Trust but verify your friends.
  • check that backup is still there
  • how to check if friends contribute right share?
  • 5 how to check if the backup copy still exists
    if you and your friend are not online at the same
    time.
  • Idea ask other friends to help.
  • 6 Sharing files among friends.
  • Viewers automatically cache/back up the file.
  • Backed up data increase availability of files
    shared.

20
  • The End
Write a Comment
User Comments (0)
About PowerShow.com