Title: FEDORA
1FEDORA
- Selecting and Implementing an Open Source Digital
Repository - Corey Keith
- ckeith_at_loc.gov
2Introduction
- History
- FEDORA Overview
- Object Oriented Principals
- LCs Requirements
- LCs Architecture
- Review
3Pop Quiz
- XML
- OAIS
- METS
- FEDORA
- DSPACE
4FEDORA History
- Continuing Research Project
- Cornell 1997
- Prototype Application
- University Virginia
- Fedora 1.0
- Open Source Release 2002
- Fedora 1.2
- Tomorrow!
5Options, options, options
- Very few tools directly compete with each other
- Many tools can be used to accomplish similar
behavior - Many tools fulfill parts of the functionality
needed for a repository - Roll your own solution
6Why Fedora?
- Repository Architects Developers Excited ?
- Object oriented approach to digital objects
- Open Source Project
- Funded development (and support)
- Java Based
- Multiple HW Platforms
7Flexible
- Integrates well with existing systems
- CGI Scripts
- Web Services
- Leaves most decisions to implementers
8Extensible
- Again, no product can do it all
- Imaging, Audio, Transformations, Courseware
- Easy to add new functionality to objects
- Embraces web services
- Open APIs
- Access
- Management
9Digital Object
- What is the definition of a digital object?
- Documents, such as articles, preprints, working
papers, technical reports, conference papers - Books
- Theses
- Data sets
- Computer programs
- Visualizations, simulations, and other models
- Multimedia publications
- Administrative records
- Published books
- Bibliographic datasets
- Images
- Audio files
- Video files
- Reformatted digital library collections
- Learning objects
- Web pages
list taken from the dspace.org website
10Repository Architecture
- Objects
- Behavior Definitions
- Behavior Mechanisms
- API
- Management
- Access
11Object Oriented
- A software design method that models the
characteristics of abstract or real objects using
classes and objects. - Proven Techniques for Software Development
- Requirements gathering Use Cases
- Developers speak to librarians and other
stakeholders - Facilitates reuse of functionality
- Design Patterns
- Not hacking Perl Scripts to make an institutional
repository
12Object Oriented
- Data
- Metadata
- MODS Descriptive
- METS Structural
- MIX, etc Technical
- Bit streams
- Actual Files JPG, TIF, WAV, MP3, TEI, EAD
- Methods (Behaviors)
- Do stuff with the data
13Object Oriented Concepts
- Classes
- Objects of the same type belong to a class
- Interfaces
- A contract defining behaviors a class of objects
will implement - Encapsulation
- Behaviors operate on the data in an object
- Reflection
- Discover what interfaces and behaviors an object
implements
14Image Objects
- Two File Image Object
- Data
- Hi Resolution Version tif
- Low Resolution Version jpg
- MrSID File Image Object
- Data
- MrSID File
15Basic Image Interface
- getHighResolutionTIF
- getLowResolutionJPG
16Basic Image Interface Implementations
- Two File Image Object
- getHighResolutionTIF
- returns high resolution TIF
- getLowResolutionJPG
- returns low resolution JPG
- MrSID Image Object
- getHighResolutionTIF
- processes the MrSID file to return a high
resolution TIF file of the image - getLowResolutionJPG
- processes the MrSID file to return a low
resolution JPG of the image
17Sheet Music Object
- Data
- MODS Metadata
- Images of the pages (Image Objects)
- TEI encoded text of the lyrics (TEI Objects)
- Behaviors
- getPageImage(Pagenumber)
- Invoke the getLowResolutionJPG to return the
image! - getMODS
- getLyrics
18FEDORAs Interface Implementation
Behavior Definition Object
Data Object
Behavior Mechanism Object
graphics taken from presentations available at
www.fedora.info
19What is FEDORA?
- Plumbing
- Manage associations between objects and their
interfaces - Invoke behaviors from an interface which an
object subscribes - Manages or references files
20What FEDORA currently does not do?
- Digital Library in a Box
- Requires integration and custom development
- Prescribe the right way to do things
- Implementers are free to choose
- Best practices still being fleshed out
21LCs Requirements
- Complex Digital Objects
- Structurally
- METS structMap
- Rich descriptive metadata
- Exploiting MODS features
- relatedItem
22Choosing Repository Software
- Fedora provides a foundation to build on
- LC member of initial deployment team
- No other software is like FEDORA
- Except general purpose programming languages
23How LC is implementing FEDORA
- Types of Digital Objects
- Sheet Music
- Scores
- Sound Recordings
- Compact Discs
- Manuscripts
- Photographs
- Websites
- Collections
- Less emphasis
- Intellectual output of universitys research
faculty
24METS Profiles
- Correlates well with classes of objects
- Articulates
- Structure of an object
- Metadata requirements
- METS documents conforming to profiles are
ingested into repository - Atomization
- Behavior association
25Architecture
user
- Fedora (Repository)
- Cocoon (Application Layer)
web browser
cocoon
Fedora Service APIs
Fedora Repository System
26SIP vs AIP
- Complex digital objects are atomized into small
reusable objects upon ingest to FEDORA - Sheet Music METS Profile (SIP)
- Sheet music object (AIP)
- Structural metadata encoded in METS
- Descriptive encoded in MODS
- Image objects for each page (AIP)
- TIF and JPG Files
- Technical encoded in MIX
- TEI object for the lyrics (AIP)
- TEI File
27Why this Architecture?
- Clean Separation of Concerns
- Logic Makes it go!
- Content From FEDORA
- Style Web Designers
- Object not bound to display
- Repository is for preservation of metadata and
files not markup (HTML) - Markup accomplished in cocoon layer
- Leverage use of METS structural metadata
- Performance Cocoon Caching
28User Interface Development
- Web Designers
- Relate to objects and behaviors
- Can develop in HTML for display
- XSLT
- Uses XML from repository to drive display
29(No Transcript)
30Other Pieces of the Repository Puzzle
- Other open source tools
- Cocoon
- XML Publishing Framework
- Lucene
- Text Indexing and Search API
- Someone has to write software!
- Java to build Lucene indexes
- XSP searching
- More XSLT than you want to see
31Digital Object Production
- How are we building these digital objects?
- MySQL
- Cocoon
- XSLT
- Homegrown Java
- Technical metadata extraction
32Cocoon
- XML Publishing Framework (Toolbox)
- Generate
- From files (or URLS)
- From databases
- From code (XSP, JSP, PHP)
- Transform
- XSLT
- Serialize
- XML, HTML, PDF, SVG, MIDI?
- Caching
33XSLT
- Philosophy
- Get data into XML as early in the workflow as
possible - Flexibility
- Easy to change logic in XSLT
- No need to recompile
- Performance Issues
34Resources Needed for FEDORA (Cheap)
- Hardware Requirements
- Minimal for experimentation
- Installs on Windows PC
- Packaged to get up and running quickly
- Demo set of objects
- Scales with hardware in a production environment
35Resources Needed for FEDORA (Expensive)
- 1 or More Developers
- 1 Kick the tires
- or More Real production
- Application Architects
- Requirement Analysts
- Subject Matter Experts
- Articulate requirements
- Object Structure
- Descriptive Metadata
36Summary
- Five Questions
- Who
- What
- When
- Why
- Where
37Who
- Institutions with resources to do software
development - Unique requirements for digital library software
- Preexisting tools do not fit the need
- Need for integration of existing systems into one
management infrastructure
38What
- Digital Library Plumbing
- Very general purpose
- Use it to build almost any digital library
application
39When
- December 10th Version 1.2
40Why
- Robust Set of tools to build YOUR repository
- User support high from FEDORA development team
- Smart people working on hard problems
41Where
42Questions