GDFR Pilot Discussion - PowerPoint PPT Presentation

1 / 45
About This Presentation
Title:

GDFR Pilot Discussion

Description:

PRONOM: TNA (does take addition requests) GDFR: community-based ... TNA desire to participate in a GDFR pilot. Common web service API across the registries? ... – PowerPoint PPT presentation

Number of Views:64
Avg rating:3.0/5.0
Slides: 46
Provided by: ago45
Category:
Tags: gdfr | discussion | pilot | tna

less

Transcript and Presenter's Notes

Title: GDFR Pilot Discussion


1
GDFR Pilot Discussion
  • The National Archives
  • Washington DC
  • July 10, 2008

2
Agenda
  • Introductions (All)
  • Purpose of meeting (Dale)
  • Roles (Dale, Richard)
  • Background/history (Stephen)
  • GDFR Governance Workshop (Richard, Robert)
  • Architecture (Stephen)
  • Current state (Andrea)
  • Relationship to PRONOM (Andrea)
  • Issues and observations (Dale)
  • Use cases (Andrea)
  • Discussion of pilot (All)
  • Review next steps from GDFR Governance Workshop
    Report (Richard, Robert)
  • Outreach to other interested parties (All)
  • Next steps (All)

3
Introductions
  • All

4
Purpose of the meeting
  • Dale Flecker

5
Roles
  • Harvard Dale Flecker
  • NARA Richard Steinbacher

6
Background/History
  • Stephen Abrams

7
Background/History
  • Format is the key piece of representation
    information that permits preservation activities
    to be focused on interpretable/renderable
    content, not just opaque bit strings

ffd8ffe000104a46494600010201 008300830000ffed0fb05
0686f74 6f73686f7020332e30003842494d 03e90a5072696
e7420496e666f00 0000007800000000004800480000 00000
2f40240ffeeffee03060252 0347052803fc00020000004800
48 0000000002d80228000100000064 000000010003030300
000001270f 0001000100000000000000000000 0000600800
19019000000000 ...
SOI APP0 JFIF 1.2 APP13 IPTC APP2 ICC DQT SOF0
183x512 DRI DHT SOS ECS0 ...
8
Background/History
  • Traditional methods of managing format
    information, e.g. the IANA MIME registry, are
    insufficiently descriptive and granular for
    effective preservation planning and intervention
  • The application/word format is essentially
    defined as anything produced by the Word
    application
  • TIFF 6.0, TIFF/IT, TIFF/EP, GeoTIFF, ? image/tiff

9
Background/History
  • Two DLF-sponsored invitational workshops
  • Univ. Pennsylvania, January 2003
  • Washington, March 2003
  • Two independent demonstration projects
  • FRED, John Ockerbloom, Univ. Pennsylvania
  • FOCUS, Joseph JaJa, Univ. Maryland

10
Background/History
  • Evolving consensus on scope
  • A forum for documenting normative definitions of
    format syntax and semantics
  • A common facility to pool and share scarce
    technical expertise on a global basis
  • A channel for the distribution of that expertise
    to the international community of preservation
    practitioners
  • A foundation for additional value-added services
    requiring detailed knowledge of digital formats

11
Background/History
  • Peer-to-peer network of independent, but
    cooperating registries

12
Background/History
  • Harvard University Library (HUL) funded for 2
    years by the Andrew W. Mellon Foundation
  • Technical deliverables only no funded
    governance/policy activity
  • Staffing and technical work subcontracted to OCLC
    (July 2006)

13
NARA Governance Workshop
  • Richard Steinbacher
  • Robert Chadduck

14
Architecture
  • Stephen Abrams

15
Architecture
  • A generic distributed registry framework,
    specialized for the GDFR application
  • Based on well-known products and protocols
  • Human and machine interfaces
  • Full information content expressible in XML
    form can be re-instantiated from that expression
  • Platform independence
  • Globally fault tolerant
  • Open source

16
Architecture
  • Data model is an extension of PRONOM 4

17
Architecture
  • Based on the OCLC IWSA/RFA framework

18
Architecture
  • Java, Apache/Tomcat, Berkeley DB XML
  • GNU LGPL license
  • Including technology newly-developed for the
    project and pre-existing OCLC technology

19
Current state
  • Andrea Goethals

20
Current state schedule
  • July 31, 2008
  • Contract with OCLC ends
  • GDFR source node at Harvard goes public in beta
    mode
  • August 2008 up to August 2010
  • Harvard maintains GDFR software, website and
    source node

21
Current state GDFR Home website
  • It moved!
  • Old GDFR Home http//www.formatregistry.org
  • New GDFR Home http//www.gdfr.info
  • All existing GDFR docs migrated from the old GDFR
    Home website
  • Over the next month
  • Updated documentation!
  • Demo source node?

22
Current state architecture
  • Currently
  • One GDFR source node
  • Where all data additions and edits are performed
  • Many GDFR mirror nodes
  • Replicated data
  • Future?
  • Multiple GDFR source nodes?
  • Multiple interoperable format registry source
    nodes?
  • Discoverable from GDFR Home website
  • Each node has 2 Interfaces
  • For humans user interface
  • For machines web service interface

23
Current state GDFR source node
  • Housed by Harvard for now
  • http//www.formatregistry.org/registry
  • Populated with test data- 2000 formats from
    Magic database
  • Need authorized account to add/edit data

24
Current state GDFR mirror nodes
  • Test mirror nodes at OCLC and Harvard
  • Anyone can run a mirror node
  • Synchronize data with the source node
  • Can brand your mirror node

25
Current state Mirror node set-up
  • Dependencies
  • Apache 2 (mod_rewrite, mod_jk, mod_perl2)
  • Tomcat 5.5.x
  • Berkeley DBXML 2.3.10
  • Perl 5.8.x
  • Java 1.5
  • Installation configuration half day

26
User interface
  • Mirror node
  • Search, browse, lookup/retrieve, export, manage
    node
  • Source node
  • Same as mirror node
  • Plus add, edit
  • Sneak preview

27
Current state machine interface
  • Web services using SRU
  • Can do everything supported by the human user
    interface
  • Except browsing
  • Plus mirror-to-source node synchronization

28
Relationship to PRONOM
  • Andrea Goethals

29
Relationship to PRONOM whats the problem?
  • Two different format registries
  • Overlapping but digressing data model
  • No common format model
  • No mechanism to exchange data
  • PRONOM is in production, GDFR is not yet
  • PRONOM has been publicly available for over 4
    years and is used by some preservation
    repositories
  • Interoperates with DROID
  • Basis for PLANET projects
  • How many format registries does the digital
    preservation community need?
  • Depends on how different they are

30
Relationship to PRONOM core differences
  • Who governs the registry and makes policy, scope
    and enhancement decisions?
  • PRONOM TNA
  • GDFR community-based
  • Who adds and edits format information?
  • PRONOM TNA (does take addition requests)
  • GDFR community-based
  • Where is the format information physically
    located?
  • PRONOM at TNA
  • GDFR replicated in different geographic
    locations

31
Relationship to PRONOM whats the solution?
  • Recognize there is a problem DONE
  • Mutual willingness to resolve
  • TNA desire to participate in a GDFR pilot
  • Common web service API across the registries?
  • PRONOM could become a GDFR node
  • PRONOM and GDFR could each support a new web
    service API
  • Cross-walk PRONOM PUIDs and GDFR GFIDs?
  • Use common format identification tools (DROID,
    JHOVE, etc.) with either registry

32
Issues and Observations
  • Dale Flecker

33
Use cases
  • Andrea Goethals

34
Use cases 3 sets (see handout)
  • Higher-level use cases submitted by many
    institutions (early 2003)
  • Lower-level use case model created for the
    software design (2006-7)
  • Use cases arising from informal talks and meetings

35
Key use cases discussed but not supported
  • Determine duplicates
  • Notifications/warnings
  • Determine migration/emulation pathways
  • Determine at-risk formats (machine-actionable
    risk assessments)
  • Support the registry discovery of GDFR nodes
  • Authentication of nodes and users (outside the
    UI)
  • Storage of local profiles separate from central
    formats
  • Synchronizations based on vetted or non-vetted
    data
  • Determine quality of format information
  • Multiple source nodes

36
Use cases- common issues
  • How evaluative should GDFR be?
  • Neutral vs judgmental
  • Are services in the scope of GDFR?
  • Should GDFR provide services directly
    (notifications, validation, etc.) or should GDFR
    be a reference that can be used by external
    services?

37
Discussion of pilot
  • All

38
Discussion of pilot
  • Purposes

39
Discussion of pilot
  • Pilot use cases

40
Discussion of pilot
  • Process

41
Discussion of pilot
  • Participants

42
Review next steps from the GDFR Governance
Workshop Report
  • Richard Steinbacher
  • Robert Chadduck

43
Outreach to other interested parties
  • All

44
Next steps?
  • All

45
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com