Title: GDFR Pilot Discussion
1GDFR Pilot Discussion
- The National Archives
- Washington DC
- July 10, 2008
2Agenda
- Introductions (All)
- Purpose of meeting (Dale)
- Roles (Dale, Richard)
- Background/history (Stephen)
- GDFR Governance Workshop (Richard, Robert)
- Architecture (Stephen)
- Current state (Andrea)
- Relationship to PRONOM (Andrea)
- Issues and observations (Dale)
- Use cases (Andrea)
- Discussion of pilot (All)
- Review next steps from GDFR Governance Workshop
Report (Richard, Robert) - Outreach to other interested parties (All)
- Next steps (All)
3Introductions
4Purpose of the meeting
5Roles
- Harvard Dale Flecker
- NARA Richard Steinbacher
6Background/History
7Background/History
- Format is the key piece of representation
information that permits preservation activities
to be focused on interpretable/renderable
content, not just opaque bit strings
ffd8ffe000104a46494600010201 008300830000ffed0fb05
0686f74 6f73686f7020332e30003842494d 03e90a5072696
e7420496e666f00 0000007800000000004800480000 00000
2f40240ffeeffee03060252 0347052803fc00020000004800
48 0000000002d80228000100000064 000000010003030300
000001270f 0001000100000000000000000000 0000600800
19019000000000 ...
SOI APP0 JFIF 1.2 APP13 IPTC APP2 ICC DQT SOF0
183x512 DRI DHT SOS ECS0 ...
8Background/History
- Traditional methods of managing format
information, e.g. the IANA MIME registry, are
insufficiently descriptive and granular for
effective preservation planning and intervention - The application/word format is essentially
defined as anything produced by the Word
application - TIFF 6.0, TIFF/IT, TIFF/EP, GeoTIFF, ? image/tiff
9Background/History
- Two DLF-sponsored invitational workshops
- Univ. Pennsylvania, January 2003
- Washington, March 2003
- Two independent demonstration projects
- FRED, John Ockerbloom, Univ. Pennsylvania
- FOCUS, Joseph JaJa, Univ. Maryland
10Background/History
- Evolving consensus on scope
- A forum for documenting normative definitions of
format syntax and semantics - A common facility to pool and share scarce
technical expertise on a global basis - A channel for the distribution of that expertise
to the international community of preservation
practitioners - A foundation for additional value-added services
requiring detailed knowledge of digital formats
11Background/History
- Peer-to-peer network of independent, but
cooperating registries
12Background/History
- Harvard University Library (HUL) funded for 2
years by the Andrew W. Mellon Foundation - Technical deliverables only no funded
governance/policy activity - Staffing and technical work subcontracted to OCLC
(July 2006)
13NARA Governance Workshop
- Richard Steinbacher
- Robert Chadduck
14Architecture
15Architecture
- A generic distributed registry framework,
specialized for the GDFR application - Based on well-known products and protocols
- Human and machine interfaces
- Full information content expressible in XML
form can be re-instantiated from that expression - Platform independence
- Globally fault tolerant
- Open source
16Architecture
- Data model is an extension of PRONOM 4
17Architecture
- Based on the OCLC IWSA/RFA framework
18Architecture
- Java, Apache/Tomcat, Berkeley DB XML
- GNU LGPL license
- Including technology newly-developed for the
project and pre-existing OCLC technology
19Current state
20Current state schedule
- July 31, 2008
- Contract with OCLC ends
- GDFR source node at Harvard goes public in beta
mode - August 2008 up to August 2010
- Harvard maintains GDFR software, website and
source node
21Current state GDFR Home website
- It moved!
- Old GDFR Home http//www.formatregistry.org
- New GDFR Home http//www.gdfr.info
- All existing GDFR docs migrated from the old GDFR
Home website - Over the next month
- Updated documentation!
- Demo source node?
22Current state architecture
- Currently
- One GDFR source node
- Where all data additions and edits are performed
- Many GDFR mirror nodes
- Replicated data
- Future?
- Multiple GDFR source nodes?
- Multiple interoperable format registry source
nodes? - Discoverable from GDFR Home website
- Each node has 2 Interfaces
- For humans user interface
- For machines web service interface
23Current state GDFR source node
- Housed by Harvard for now
- http//www.formatregistry.org/registry
- Populated with test data- 2000 formats from
Magic database - Need authorized account to add/edit data
24Current state GDFR mirror nodes
- Test mirror nodes at OCLC and Harvard
- Anyone can run a mirror node
- Synchronize data with the source node
- Can brand your mirror node
25Current state Mirror node set-up
- Dependencies
- Apache 2 (mod_rewrite, mod_jk, mod_perl2)
- Tomcat 5.5.x
- Berkeley DBXML 2.3.10
- Perl 5.8.x
- Java 1.5
- Installation configuration half day
26User interface
- Mirror node
- Search, browse, lookup/retrieve, export, manage
node - Source node
- Same as mirror node
- Plus add, edit
- Sneak preview
27Current state machine interface
- Web services using SRU
- Can do everything supported by the human user
interface - Except browsing
- Plus mirror-to-source node synchronization
28Relationship to PRONOM
29Relationship to PRONOM whats the problem?
- Two different format registries
- Overlapping but digressing data model
- No common format model
- No mechanism to exchange data
- PRONOM is in production, GDFR is not yet
- PRONOM has been publicly available for over 4
years and is used by some preservation
repositories - Interoperates with DROID
- Basis for PLANET projects
- How many format registries does the digital
preservation community need? - Depends on how different they are
30Relationship to PRONOM core differences
- Who governs the registry and makes policy, scope
and enhancement decisions? - PRONOM TNA
- GDFR community-based
- Who adds and edits format information?
- PRONOM TNA (does take addition requests)
- GDFR community-based
- Where is the format information physically
located? - PRONOM at TNA
- GDFR replicated in different geographic
locations
31Relationship to PRONOM whats the solution?
- Recognize there is a problem DONE
- Mutual willingness to resolve
- TNA desire to participate in a GDFR pilot
- Common web service API across the registries?
- PRONOM could become a GDFR node
- PRONOM and GDFR could each support a new web
service API - Cross-walk PRONOM PUIDs and GDFR GFIDs?
- Use common format identification tools (DROID,
JHOVE, etc.) with either registry
32Issues and Observations
33Use cases
34Use cases 3 sets (see handout)
- Higher-level use cases submitted by many
institutions (early 2003) - Lower-level use case model created for the
software design (2006-7) - Use cases arising from informal talks and meetings
35Key use cases discussed but not supported
- Determine duplicates
- Notifications/warnings
- Determine migration/emulation pathways
- Determine at-risk formats (machine-actionable
risk assessments) - Support the registry discovery of GDFR nodes
- Authentication of nodes and users (outside the
UI) - Storage of local profiles separate from central
formats - Synchronizations based on vetted or non-vetted
data - Determine quality of format information
- Multiple source nodes
36Use cases- common issues
- How evaluative should GDFR be?
- Neutral vs judgmental
- Are services in the scope of GDFR?
- Should GDFR provide services directly
(notifications, validation, etc.) or should GDFR
be a reference that can be used by external
services?
37Discussion of pilot
38Discussion of pilot
39Discussion of pilot
40Discussion of pilot
41Discussion of pilot
42Review next steps from the GDFR Governance
Workshop Report
- Richard Steinbacher
- Robert Chadduck
43Outreach to other interested parties
44Next steps?
45(No Transcript)