Title: Overview of LOCKSS
1Overview of LOCKSS
2Session Learning Objectives
- Provide an overview of the LOCKSS architecture.
- Describe the LOCKSS polling process
- Describe how LOCKSS private networks differ.
- Provide a vocabulary of technical terms used
frequently with LOCKSS networks
3Architectural Components
- Provider Sites (digital collections)
- LOCKSS nodes (aka peers)
- Plugins / Plugin Repository
- Cache Manager
- Title Database / Conspectus Database
4Provider Sites
- Prepare a digital collection so that it is web
accessible to the preservation nodes - Expose a manifest web page for each collection,
according to LOCKSS specifications. - Grants permission for LOCKSS to crawl
- Gives starting point for crawl
- Provide information sufficient to create a LOCKSS
plugin for the collection (or else create the
plugin themselves and reposit that plugin with
the LOCKSS network)
5LOCKSS Peer Nodes
- Data caches for harvested content
- Caches organized into archival units (AUs)
- Nodes can select which AUs to crawl and preserve
- There must be gt 6 copies of an AU in order for
the polling process to work properly
6Plugins / Plugin Repository
- Tell LOCKSS where, how and how often to crawl a
provider site for AUs - Plugins are Java based
- Distinct from core LOCKSS software
7Cache Manager
- Distributed separately from LOCKSS
- Can remotely inspect and manage the caches on the
various peer nodes
8Title / Conspectus Databases
- Title database on each node describes and manages
which AUs to preserve on that node - Conspectus Database designed for MetaArchive
Project, provides more extensive metadata about
the preserved digital collections, and feeds the
Title database with entries
9Plugin Repository
DC1
Digital Collection 1
Private LOCKSS Network Nodes
1
DC1
AU 1
DC2
DC2
2
DC2
Web Site
3
Manifest page
DC1
AU 2
4
DC1
DC2
5
DC2
Digital Collection 2
AU 1
6
AU 2
Web Site
DC1
Source Code
7
DC1
DC2
DC1
8
AU 3
DC2
Manifest page
SQL Dump
9
DC2
10The Polling Process
11Invited nodes create fresh SHA1 digest of the AU
Polling Process resulting in landslide loss, AU
repair
Poll Effort Proof is cryptographically derived
and sent to affirmative voters challenges
Affirmative PollChallenge message responses allow
that inner circle node to participate in poll
DC2-AU1
DC2-AU1
2
4
SHA1
SHA1
There is a landslide of valid, disagreeing
votes against the Node 5s SHA1 digest of DC2-AU1
Invitation
Valid vote disagrees
Valid vote disagrees
Node 5 calls poll on AU 1 of Digital Collection 2
PollChallenge
PollProof
1
Once repair is completed, Node 5 immediately
calls a new poll, which effectively verifies, or
invalidates and corrects, the repair
DC2-AU1
Valid vote disagrees
5
DC2-AU1
SHA1
Encrypted RepairRequest message
Repair made
SHA1
Valid vote agrees
Node 9 nominates 7 and 8
Node 5 invites some recently encountered peers to
vote. (Each node maintains a reference list of
the recently encountered peers) Those invited
are the inner circle for this opinion poll.
Node 5 discovers new peers through nomination
process
9
DC2-AU1
Since agreeing votes are below threshold, Node 5
picks a random disagreeing voter from the inner
circle
SHA1
8
DC2-AU1
7
DC2-AU1
Nominated Nodes 7 and 8 belong to the outer
circle, can be invited to subsequent voting
rounds by Node 5
12Polling Refresh Timer
- A peer sets a refresh timer for a given AU to
determine the interval between successive polls - System parameter R is the mean for the possible
random values generated for the refresh timer
13System Parameter Quorum
- Q of valid inner circle votes required to
conclude a poll successfully - Q 6 is the thoroughly tested value in use
- If votes lt Q, poller invites additional peers, or
else aborts the opinion poll
14Polling Outcome Landslide Win
- The poller considers its current copy to have
integrity - This is the only scenario in which an opinion
poll concludes successfully - The poller updates its reference list and then
waits until the next polling period (determined
by the refresh timer)
15Reference List Update
- Happens only after a successful poll
- Poller removes the inner circle peers who had
valid votes in the last opinion poll - Culls peers it has not been able to contact for
some time - Adds outer circle peers whose votes were valid
and eventually agreeing
16Polling Outcome - Inconclusive
- D max allowed minority votes
- If Agreeing Votes gt D, and
- Agreeing Votes lt Total valid votes D,
- Then the poll is inconclusive, raises alarm
- Human intervention needed to determine if nodes
have been compromised - Peers voting in agreement with a known bad copy
are blacklisted if that peer node cant be
identified or it wont cooperate
17Further Details on Polling Process
- Petros Maniatis, Mema Roussopoulos, TJ Giuli,
David S. H. Rosenthal, Mary Baker, and Yanto
Muliadi, "LOCKSS A Peer-to-Peer Digital
Preservation System", ACM Transactions on
Computer Systems (TOCS). http//www.eecs.harvard.e
du/mema/publications/TOCS2005.pdf - See also LOCKSS related publications at
http//www.lockss.org/lockss/Publications
18The LOCKSS Private Network Difference
- More flexible (not appliance based)
- Can run on any operating system that supports
Java - LOCKSS Team maintains rpm packages for Linux
installations - Peer Node administrators have greater discretion
configuring access, customizing functionality,
e.g. altering system parameters
19The LOCKSS Private Network Difference (cont.)
- Can extend LOCKSS core functionality with
supplemental tools and methods to fit new use
cases - E.g. the MetaArchive Conspectus database
20Vocabulary
- (Please refer to the workshop binder for
terminology and definitions)
21Overview of LCAP version 3