Title: Introducing
1Introducing Pergamos
European FEDORA User MeetingCopenhagen, 28
September 2005
A FEDORA-based Digital Library System utilizing
Digital Object Prototypes
Kostas Saidissaiko_at_di.uoa.gr
Libraries Computer Center Department of
Informatics Telecommunications University of
Athens
2Outline
- Motivation The University of Athens (UoA) DL
- Digital Objects (DOs)
- DO Storage (FEDORA)
- DO Manipulation (DL Application Logic)
- Digital Object Prototypes
- Automatic DO Type Conformance
- Scope of Prototypes Collection Management
- Implementation Details
- A Preview of Pergamos
- Discussion
3The UoA DL Project
- Over 1 million objects originating from 8
disparate collections - Folklore notebooks, Ancient papyri, UoA
Historical Archive, Byzantine music manuscripts,
Theatrical photos brochures, Informatics
research papers and dissertations, Medical
images, Press articles - Heterogeneous material, in terms of content type,
metadata, structure, user requirements - Mostly digitized material, requiring detailed
cataloging
4UoA DL Project Metadata
- Build a Web-based DL System to handle all
material - Centralized DL approach due to
- Existing hardware infrastructure
- Funding restrictions
- Administration simplicity
- FEDORA is our DO Repository
5UoA DL Project Metadata Contd.
- Small Team
- 2.5 developers, 1 librarian, 1 manager
- Requirements, Specifications, Development,
Digitization Cataloging Management - while everyday tasks keep running!
- Cataloging Personnel
- Scholars Experts in each collections domain
(not librarians) - Strict Schedule
- First Collection deadline early 2006
- Project deadline end of 2006
6Motivation
- Simplify speed up the cataloging process
- Provide effective Web-based cataloging interfaces
- Automate content ingestion
- Decrease development time
- Avoid custom coding for each content variation
- Elaborate on reusable and configurable DL modules
- Provide the means to treat content variations in
a unified manner
7Digital Objects
- A Digital Object is a human generated artifact
consisting of the digital content and related
information
8FEDORA
- FEDORA Digital Object Model
- Content Models, Datastreams, Behavior
Definitions, Mechanisms Disseminators - FEDORA is a DO Repository
- Focus on how each DO part is encoded stored
- Handles effectively issues related to storage,
preservation versioning, searching indexing,
interoperability
9Traditional 2-tier Approach
10DL Application Logic
- Cataloging, Workflows, Collection Building
Management, User Interfaces, etc - DL Modules manipulate DOs in a higher level of
abstraction - Focus on the overall behavior of the DO (what are
the DO parts and how do they behave) - DOs reflect the underlying real world objects
they behave according to their nature, their
essence, their type
11DO Typing information
- Do we effectively capture, express and utilize
the nature (type) of DOs?
12An example Theatrical Collection
- Albums containing photos of National Theater
Performances - What is a Photo DO?
- A digital image
- stored in various formats (e.g high quality, www
quality, thumbnail) - accompanied by the metadata required for
describing the picture - What is an Album DO?
- A container of Photo DOs accompanied by
theatrical play metadata
13A 2nd example Historical Archive
- Universitys Senate Session Proceedings gt Folders
gt Sessions gt Items - What is a Item DO?
- A digital image (capturing 1 or 2 pages)
- stored in various formats (e.g high quality, www
quality, thumbnail) - What is a Session DO?
- A container of Item DOs metadata
- What is a Folder DO?
- A container of Session DOs metadata
14DO Typing Information
- FEDORA Content Models express DO Typing
information - Content Models are metadata attributes (e.g.
photo, album) that we use as a guide - Humans interpret Content Models, not the DL
System - Manual resolution of DO Typing issues
15Problems
- Catalogers carry out manual XML editing in a low
level of abstraction with too technical, complex
over detailed semantics - Developers generate ad-hoc, custom not reusable
implementations of DO types variations of
behavior - DL modules exhibit limited evolution and
configuration capabilities
16DO Typing Information
- The DL System should resolve DO Typing issues
automatically - (in a manner transparent to the DL Application
Logic)
17Automatic DO Type Conformance
- The designer specifies the various DO types
- and the DL System makes DOs conform to these
type specifications automatically - How?
18By drawing on the notions of OO
19The OO Viewpoint
- In the OO model an object is itself aware of its
nature and behaves accordingly - Objects are conceived as instances of a type,
automatically conforming to the types
definitions specifications - OO types are separate entities (named either
classes or prototypes)
20Digital Object Prototypes
- A DO Prototype is a DO Type Specification, a
separate entity that defines the DOs - Constitutional parts metadata sets, files,
structure, etc - Private behaviors DO internal operations such
as serializations, validations, assignment of
default values, content conversions, etc - Public behaviors (behavior schemes) the DO
external interface, consisting of high level
operations such as Detail view, Browse View, Edit
View, etc
21OO Encapsulation
22Photo Prototype Instances
23DO Prototypes Instances
- The designer carries out the definition of DO
Prototypes the DL System handles the rest - DO Prototypes represent the realization of the
Content Model notion in a OO fashion - The process of generating a DO from a Prototype
is called instantiation - The resulted object is an instance of the
prototype - A DO instance automatically conforms to the
Prototypes specifications - Stored DOs vs DO instances
243-tier DL Architecture
25Digital Object Dictionary
- The runtime environment in which DO instances and
Prototypes operate - Instantiation of DOs based on the prototype
specifications (private behaviors load parse
XML, assign default values, etc) - Exposure of the public DO behaviors in a high
level, uniform API (for use by DL Modules) - Serialization of the DO instance back to FEDORA
(private behaviors serialize data structures in
XML, perform validations, etc)
26Expression of DL Application Logic
- A DL Module performs the following steps
- Acquire the DO Instance
- do dictionary.acquireObject(type)
- do dictionary.acquireObject(uoadl1024)
- Perform operations upon it
- do.getMDSet(DC).getField(title)
- dictionary.executeBehavior(do, editView)
- Store the DO in the repository
- dictionary.saveObject(do)
- Cleaner, simpler, more effective
273-tier DL Architecture
Separation of Concerns
283-tier DL Architecture
Separation of Concerns
Storage
293-tier DL Architecture
DO Typing Instantiation
Separation of Concerns
Storage
303-tier DL Architecture
Composition of DO behaviors
DO Typing Instantiation
Separation of Concerns
Storage
31Pergamos
32(No Transcript)
33(No Transcript)
34(No Transcript)
35Scope of Prototypes
- Should we have global DO Types?
- Collection-pertinent types A DO Prototype is
defined in the context of a Collection - Support fine grained definition of collection
specific kinds of material - Hierarchical naming scheme for types
- Theatrical Collection Photo dl.theatre.photo
- Medical Collection Photo dl.medical.photo
- Stored in the contentModel metadata attribute
- Avoid type collisions
36Album Prototype Instances
37(No Transcript)
38(No Transcript)
39Collection Management
- DL Hierarchy of DO instances
- Collections are also DOs
- The DL itself is a DO, representing the
super-collection (the collection of all the
collections) - Easily add new collections sub-collections
- All content is modeled in a unified manner can
be characterized - Allow the DL designer to work out the details of
each collection independently, yet in a uniform
manner
40DL as a Hierarchy of DO instances
41(No Transcript)
42(No Transcript)
43Implementation details
- DO Prototypes are
- Specified in XML form
- Stored in the TEMPLATE datastream of the
appropriate Collection DO - Loaded, parsed interpreted by the DO Dictionary
in its bootstrap procedure - Transparent to FEDORA
- DO Instances are supplied with the CONTAINER
datastream, containing the pids of the DOs they
contain
44DO Prototypes in detail
- MD Sets
- Specification of each individual field (label,
description, multi-value, mandatory, UI
characteristics) - Serialization information (how to store it in
FEDORA) - Field mappings (under development)
- Files Automatic conversions (tiff -gt jpeg
thumb) - Batch Import automatically create Dos from zip
bundles - Structure allowed children types
- Browsers browse field
- Indices e.g. subject catalog
- Behavior schemes atomic DO elements
45Discussion
46Pergamos
- Historical Archive (production)
- Folklore Notebooks (testing)
- Theatrical Collection, Medical Images Byzantine
music manuscripts (finalization of requirements
specifications) - Undergoing development the remaining
collections are coming next - Historical Archive will be published on early
2006 - with a multi-lingual UI, hopefully!
47Public DO Behaviors
FEDORA Behaviors Behavior Schemes
Are defined in each DO separately Are defined once and in one place (in the Prototype)
Operate on the datastreams Operate on the atomic elements of a DO
Invoked directly on the DO Invoked as in OO Dynamic Method Dispatch
Require the a priori existence of datastreams Instantiation (empty DO)
Generic Targeted on UI issues
Exposed as Web services Web services will be of use after the DL has been built
48Future Work
- Fully implement the OO paradigm
- OO Inheritance for DO Prototypes (e.g the
Notebook type derives from the Book type) - OO Polymorphism for DO instances (e.g the DO
uoadl1234 is both a Notebook a Book) - Supply general purpose linking capabilities that
exceed structural relations (FEDORA Metadata for
Object-to-Object Relationships?) - Deliver on schedule
49Conclusions
- If in doubt, use FEDORA
- Flexible Extensible (they mean it)
- 1 year of Pergamos development, 2 months of
testing 3 months of production use (Historical
Archive) with no serious problems - Though, Sandy Carl, Id be grateful for some
minutes of your time!!! - DO Prototypes a realization of Content Models in
OO terms, implemented on top of FDOM to handle DO
Typing issues automatically - Detailed report on Pergamos to appear
50Thank You
- Questions?
- Comments?
- For details
- "On the Effective Manipulation of Digital
Objects A Prototype-based Instantiation
Approach"Kostas Saidis, George Pyrounakis, Mara
Nikolaidou, Proc. 9th European Conference on
Research and Advanced Technology for Digital
Libraries, ECDL 2005, Vienna, Austria, September
2005 - email saiko_at_di.uoa.gr