Rsync: - PowerPoint PPT Presentation

About This Presentation
Title:

Rsync:

Description:

Ideal Case. Divide files into equal-sized blocks ... Advantage of Ideal ... First bytes have more weight than the tail ones arbitrary decision. 0. 1. 2. 3. 4. 5. 6. 0 ... – PowerPoint PPT presentation

Number of Views:412
Avg rating:3.0/5.0
Slides: 19
Provided by: davids236
Learn more at: http://www.cs.sjsu.edu
Category:
Tags: ideal | rsync | weight

less

Transcript and Presenter's Notes

Title: Rsync:


1
Rsync
  • Efficiently Synchronizing Files Using Hashing
  • By David Shao
  • For CS 265, Spring 2004

2
Problem
  • Want to synchronize with newer version of a file
    on a remote server
  • Want to minimize data sent over slow network link
  • Want to minimize (round-trip) communication
    latencies

3
Solution Rsync
  • Open source software project
  • http//samba.anu.edu.au/rsync/
  • Command line driven server and client for
    Unix-like systems
  • Synchronizes directories as well as files
  • Andrew Tridgells Ph.D. thesis

4
Overview of How Hashing Used
  • Can reduce amount of data sent if willing to live
    with a very small probability of inaccuracy
  • Several layers of hashingfast but less accurate
    and slower but almost always accurate both used

5
Ideal Case
  • Divide files into equal-sized blocks
  • Files are almost identical except for relatively
    few blocks
  • Have almost all of the data blocks one needsbut
    how to know it.

Receiver
Sender
6
Ideal Protocol
Receiver
Sender
7
Sender Analyzes Own Blocks
8
Commands Copy or Add
  • COPY If the receiver already has the data block,
    just tell him to copy it.
  • ADD If the receiver does not have a data block,
    send it to him.
  • COPY cheap, ADD expensive

9
Advantage of Ideal
  • If COPY, reduction in network traffic by factor
    approximately L / h, where L is the block size
    and h is the size of a hash of a block of size L

10
Disadvantage of Ideal
  • Example Edit source code, delete a comment at
    the beginning
  • Blocks no longer neatly aligned

11
Compute More Hashes
  • Sender needs to compute hash at every byte
    position
  • More expensive L times more hashes computed by
    sender
  • Use weaker, faster hash to weed out

12
Ordinary Sum of Bytes
  • Rolling-type property sum of L bytes starting at
    position i1 almost the same as sum starting at
    i.
  • Subtract red, add green, yellow same

13
Disadvantage of a Simple Sum
  • A simple sum is too symmetric
  • Sum of All men are mortals is the same as All
    mortals are men

14
Weighted Sum
  • First bytes have more weight than the tail
    onesarbitrary decision

15
Reordering the i 1 Sum
  • Red part to be subtracted and the green part to
    be added. Yellow is same.

16
Further Enhancements
  • Compute separate (MD4) signature for entire file
  • Reconstruct new file using temporary storage so
    that the old version is never removed until a new
    one is known to be good

17
Synchronizing Directories
  • Divide into separate receiver/generator

Receiver
Sender
Generator
18
Summary of Hashing Used
  • Weaker easier to compute hash with the rolling
    property
  • Stronger hash (MD4) once most candidates have
    been weeded out
  • Signature over entire file as a separate check
Write a Comment
User Comments (0)
About PowerShow.com