Title: Peer-to-peer archival data trading
1Peer-to-peer archival data trading
- Brian Cooper
- Joint work with Hector Garcia-Molina
- (and others)
- Stanford University
2Problem Fragile Data
- Data easy to create, hard to preserve
- Broken tapes
- Human deletions
- Going out of business
3Replication-based preservation
4Replication-based preservation
5Motivation
- Several systems use replication
- Preserve digital collections
- SAV, others
- Archival part of digital library
- Individual organizations cooperate
- Not a lot of money to spend
6Goal
- Reliable replication of digital collections
- Given that
- Resources are limited
- Sites are autonomous
- Not all sites are equal
- Traditional methods
- Central control
- Random
- Replicate popular
- Metric
- Reliability
- Not necessarily efficiency
7Our solution
- Data trading
- Ill store a copy of your collection if youll
store a copy of mine - Sites make local decisions
- Who to trade with
- How many copies to make
- How much space to provide
- Etc.
8Trading network
- A series of binary, peer-to-peer trading links
9Architecture
Local archive
Remote archive
Users
Users
Service layer
Reliability layer
Reliability layer
SAV Archive
SAV Archive
InfoMonitor
Filesystem
This architecture developed with Arturo Crespo
10Overview
- Trading model
- Trading algorithm
- Optimizing (and simulating) trading
- Some results
- Some stuff we are still working on
11Trading model
12Trading model
- Archive site an autonomous archiving provider
13Trading model
- Archive site an autonomous archiving provider
- Digital collection a set of related digital
materials
14Trading model
- Archive site an autonomous archiving provider
- Digital collection a set of related digital
materials - Archival storage stores locally and remotely
owned digital collections
15Trading model
- Archive site an autonomous archiving provider
- Digital collection a set of related digital
materials - Archival storage stores locally and remotely
owned digital collections - Archiving client deposit and retrieve materials
16Trading model
- Archive site an autonomous archiving provider
- Digital collection a set of related digital
materials - Archival storage stores locally and remotely
owned digital collections - Archiving client deposit and retrieve materials
- Data reliability probability that data is not
lost
17Deeds
- A right to use space at another site
- Bookkeeping mechanism for trades
- Used, saved, split, or transferred
- Trading algorithm
- Sites trade deeds
- Sites exercise deeds to
- replicate collections
Deed for space
For use by Library of Congress or for transfer
623 gigabytes
Stanford University
18Deed trading
19The challenge
20The challenge
21Alternative solutions
- Are there other ways besides trading?
22Other solutions central control
23Other solutions client-based
24Other solutions random
25Why is trading good?
- High reliability
- Framework for replication
- Site autonomy
- Make local decisions
- No submission to external authority
- Fairness
- Contribute more more reliability
- Must contribute resources
26Decisions facing an archive
- Who to trade with
- How much to trade
- When to ask for a trade
- Providing space
- Advertising space
- Picking a number of copies
- Coping with varying site reliabilities
- What to do with acquired resources
- How to deliver other services
Many many degrees of freedom!
27Our approach
- Define a basic trading protocol
- Deed trading
- Assume all sites follow same rules
- Basic system for trading
- Extend not all sites are equal
- Some are more reliable or trusted
- Extend sites have freedom to negotiate
- Bid trading
- Extend some sites are malicious
- Ensure documents survive despite evildoers
- For each model, what policies are best?
28How do we evaluate policies?
- Trading simulator
- Generate scenario
- Simulate trading with different policies
- Evaluate reliability for each policy
- Compare each policy
29Simulation parameters
Number of sites 2 to 15
Site reliability 0.5 to 0.8
Collections per site 4 to 25
Data per collection 50 Gb to 1000 Gb
Space per site 2x data to 7x data
Replication goal 2 to 15 copies
Scenarios per simulation 200
30Reliability
- Site reliability
- Will a site fail?
- Example 0.9 10 chance of failure
- Data reliability
- How safe is the data?
- Despite site failures
- Example 320 year MTTF
31Basic trading approach
- How does trading work?
- Assuming all sites follow the rules
- Example advertising policy
B
A
Lets trade. How much space do you have?
32Advertising policy
B
A
I have 120 GB
120 GB
Space fractional policy
B
A
I have 60 GB
60 GB
Data proportional policy
B
A
40 GB Data
I have 40 GB
40 GB
33Result
34Extend some sites gt others
- May prefer certain sites
- More reliable
- Better reputation
- Part of same system
- Example who to trade with?
A
?
?
?
35Who to trade with?
36Extend freedom to negotiate
How much do I pay for 100 GB of your space?
A
120 GB
80 GB
95 GB
37Bid trading
- Questions
- When do I call auctions?
- How much do I bid?
- Can I take advantage of the system by being
clever?
38Extend some sites are malicious
- Secure services
- Publish Makes copies to survive failures
- Search Find documents
- Retrieve Get a copy of a document
- Challenges
- Attacker may delete copy
- Attacker may provide fake search results
- Attacker may provide altered document
- Attacker may disrupt message routing
-
- Joint work with Mayank Bawa and Neil Daswani
39Current and future work
- Access
- Support searching over collections
- Distribute indexes via trading
- Prototype implementation
- Basic SAV architecture implemented
- Trading protocol/policies must be added
- Develop security techniques further
40Current and future work
- Other topics of interest
- Designing peer-to-peer primitives
- Building other p2p services
- Other ways of acquiring data
- How to archive active systems
- Semantic archiving
- Managing format obsolescence
- Finding data once it is archived
41Other parts of SAV project
- SAV data model
- Write-once objects
- Signature-based naming
- How to get objects into SAV
- InfoMonitor filesystem
- Other inputs (Web, DBMS, etc.)
- Modeling archival repositories
- Arturo Crespo
- Choose best components and design
42Related work
- Peer-to-peer replication
- SAV, Intermemory, LOCKSS, OceanStore
- Fault tolerant systems
- RAID, mirrored disks, replicated databases
- Caching systems (Andrew, Coda)
- Deep storage (Tivoli)
- Barter/auction based systems
- ContractNet
- Distributed resource allocation
- File Allocation Problem
43Conclusion
- Important, exciting area
- Preservation critical
- Difficult to accomplish
- Many decisions are ad hoc today
- An effective framework is needed
- Scientific evaluation of decisions
- Trading networks replicate data
- Model for trading networks
- Trading algorithm
- Simulation results
44For more information
- cooperb_at_stanford.edu
- http//www-diglib.stanford.edu/