aDORe v1 : Architectural Highlights - PowerPoint PPT Presentation

1 / 49
About This Presentation
Title:

aDORe v1 : Architectural Highlights

Description:

Federation components are: Identifier Locator, Service Registry, Format Registry, ... based service interface to the Identifier Locator. Herbert Van de Sompel ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 50
Provided by: wepre
Category:

less

Transcript and Presenter's Notes

Title: aDORe v1 : Architectural Highlights


1
aDORe v1 Architectural Highlights Herbert Van
de Sompel Digital Library Research Prototyping
Team Research Library Los Alamos National
Laboratory Acknowledgments Luda Balakireva,
Jeroen Bekaert, Ryan Chute, Patrick Hochstenbach,
Xiaoming Liu, Damien Lujan The aDORe effort was
supported by an NDIIP grant from the Library of
Congress
2
Context
  • Fact
  • LANL Research Library stores a significant
    scholarly collection locally (AI databases,
    journal articles, ) and creates applications
    based on that collection.
  • Initial aDORe motivation
  • Undo tight integration between data and
    application
  • Uniform approach for ingesting, storing, and
    disseminating LANL RL data collections
  • Bigger picture
  • Allow for multiple, parallel applications on top
    of stored content
  • Create an environment that provides guarantees
    regarding long-term accessibility of stored
    content

3
aDORe characteristics
  • Standards-based
  • MPEG-21 Digital Item Declaration, the MPEG-21
    Digital Item Identification, URI, info URI,
    OAI-PMH, NISO OpenURL, SRU, Information
    Environment Service Registry, Internet Archive
    ARC file format, OAIS concepts, XML, XML Schema,
    XQuery.
  • Component-based, highly modular
  • Multiple content repositories, Identifier
    Locator, Service Registry, Format Registry,
    Semantic Registry, Harvesting front-end,
    Dissemination front-end
  • Protocol-based
  • Components expose (REST-based) Web services
  • All read services based on 4 standards
    OAI-PMH, NISO OpenURL, SRU, Xquery.
  • Interaction between modules is protocol-driven.

4
aDORe characteristics
  • Scalable
  • Scalable
  • Etc.

5
aDORe effort
  • aDORe is 2 things
  • A standards-based, repository federation
    architecture
  • Actual implementation of the architecture at LANL
    for local storage of digital assets
  • Prototype version was in production for 2 years!
  • Production version finalized June 2007.

6
aDORe overview
  • Representing Digital Objects
  • MPEG-21 DID DIDL to represent Digital Objects
    using XML packages
  • Identification of Digital Objects, datastreams,
    and XML Packages
  • Storing Digital Objects
  • Autonomous distributed repositories with OAI-PMH
    and OpenURL-based service interfaces
  • Locating Digital Objects, datastreams, and XML
    Packages
  • Identifier Locator
  • Registries
  • Service Registry Locating service interfaces for
    autonomous distributed repositories
  • Format Registry Sharing media type identifiers
    across autonomous distributed repositories
  • Semantic Registry Sharing intellectual content
    type identifiers across autonomous distributed
    repositories
  • Providing federated access to the autonomous
    distributed repositories
  • OAI-PMH Federator Harvesting XML packages
  • OpenURL Resolver Requesting services pertaining
    to Digital Objects, datastreams, and XML Packages

7
Representing Digital Objects
8
sample Digital Object
  • Create an XML-based surrogate for each Digital
    Object
  • Glues all components together in a single XML
    Package
  • Contains all required metadata (descriptive,
    technical, identifiers, ) in the XML Package
  • Initial access format for all materials is the
    same (XML) irrespective of their native media
    type
  • Assign identifiers to the XML Package, the
    Digital Object, the datstreams. Maintain
    original identifiers.

9
representing Digital Objects using MPEG-21 DID
DIDL
  • An XML Package is available for every Digital
    Object
  • The Package is an XML document compliant with the
    MPEG-21 Digital Item Declaration Language DIDL
    document
  • The DIDL document typically contains
  • By-Value descriptive metadata datastream
    ingest/repository related metadata
  • By-Reference all constituent datastreams of the
    Digital Object
  • Creation of DIDL documents can be
  • static, at ingestion time, cf. for aDORe Archive
  • dynamic, via add-on capability to existing
    content management system, cf. Ghent University
    eRez add-ons
  • A new DIDL document is created when a new version
    of a previously ingested Digital Object is
    ingested (update is considered re-ingestion).

10
sample Digital Object
11
representing Digital Objects using MPEG-21 DID
12
Identification digital objects, datastreams,
DIDL documents
13
aDORe DIDLTools
  • aDORe DIDLTools software is available from
    http//african.lanl.gov/aDORe/projects/DIDLTools/

14
The aDORe architecture
15
the aDORe architecture 3 layers
  • Layer 1 the aDORe repositories
  • Networked systems that host digital object
    content and that make that content accessible by
    exposing core service interfaces.
  • In LANL Implementation XMLtapes and ARCfiles
    (aDORe Archive)
  • Other Content Management Systems can be turned
    into an aDORe repository by implementing the core
    service interfaces.
  • Layer 2 the aDORe federation components
  • Networked systems that facilitate presenting the
    aDORe repositories as a single logical
    repository these federation components expose
    core service interfaces to allow access to their
    content.
  • Federation components are Identifier Locator,
    Service Registry, Format Registry, Semantic
    Registry
  • Layer 3 the aDORe front-ends
  • Networked systems that make digital object
    content hosted in the multitude of physical aDORe
    repositories accessible by exposing core services
    interfaces that present those aDORe repositories
    as a single logical repository
  • aDORe front-ends are OAI-PMH Federator, OpenURL
    Resolver

16
(No Transcript)
17
The aDORe architecture
Layer 1 aDORe repositories Hosting Digital
ObjectsMaking hosted Digital Object content
accessible
18
(No Transcript)
19
aDORe repositories
  • Networked systems that host digital object
    content and that have core service interfaces to
    facilitate access that content.
  • Currently 2 types in LANL implementation
  • XMLtapes concatenating XML Packages
  • ARCfiles concatenating datastreams
  • Combination of OAI-PMH and OpenURL-based core
    service interfaces
  • Generic XMLtape XQuery Resolver
  • Other Content Management Systems can be turned
    into an aDORe repository by implementing the core
    service interfaces.
  • Cf. Aleph
  • Cf. Ghent University eRez

20
aDORe Archive XMLtapes
XMLtape
oaipmh2 openurl-aDORe1 openurl-aDORe2 openurl-aDOR
e3
21
aDORe Archive XMLtape XQuery Resolver
XMLtape
openurl-aDORe7
22
aDORe Archive ARCfiles
ARCfile
openurl-aDORe3 openurl-aDORe4
23
The aDORe architecture
Layer 2 aDORe federation components
Facilitating the presentation of aDORe
repositoriesas a single logical repository
24
(No Transcript)
25
Identifier Locator
  • Stores all identifiers of aDORe repositories
    (DIDLDocumentIdentifier, digital object
    identifier, datastream identifier)
  • Loaded by retrieving identifiers from aDORe
    repositories using their give me your
    identifiers OpenURL service interface
  • Stores identifier, repository identifier
  • 1 OpenURL-based service interface to the
    Identifier Locator

26
Identifier Locator
openurl-aDORe2
27
Service Registry
  • Stores information on all components of the aDORe
    environment, including
  • Identifier of the component,
  • Supported core services,
  • Location of the core service interfaces,
  • Other metadata about the component content
  • Components include
  • aDORe repositories
  • XMLtapes
  • ARCfiles
  • Federation components
  • Identifier Locator
  • Registries
  • aDORe Front-ends
  • OAI-PMH Federator
  • OpenURL Resolver

28
Registries Service Registry
  • Lay-out follows the UK Information Environment
    Service Registry (IESR) specification
  • OAI-PMH, OpenURL and SRU service interfaces

29
Service Registry
30
Service Registry
oaipmh2 SRU openurl-aDORe6
31
openurl-aDORe1 openurl-aDORe2 openurl-aDORe6
32
Registries Format Registry
  • Stores information on aDORe media types for
    datastreams.
  • Content
  • MIME media types
  • XML document types
  • Digital object profiles
  • OAI-PMH service interface

oaipmh2
33
Registries Semantic Registry
  • Stores information on aDORe semantic content
    types for datastreams.
  • OAI-PMH service interface

oaipmh2
34
The aDORe architecture
Layer 3 aDORe front-ends Presenting aDORe
repositories as a single logical repository
35
(No Transcript)
36
Expose aDORe repositories as a single repository
  • Pretend that everything that was introduced so
    far is just 1 repository, not hundreds,
    thousands,
  • Name that repository aDORe1
  • Provide core service interfaces to aDORe1,
    similar to those available for the autonomous
    aDORe repositories oaipmh, openURL,
  • Achieve this through the introduction of 2
    components
  • OAI-PMH Federator
  • OpenURL Resolver
  • These aDORe1-level core service interfaces are
    really the only ones that should be known to
    downstream applications

37
OAI-PMH Federator
  • Single point of access to harvest DIDLs from
    aDORe1
  • Interacts with Service Registry, Identifier
    Locator, and aDORe repositories to generate
    OAI-PMH responses
  • Supports DIDL, and can disseminate other compound
    object formats (i.e. METS, Atom, )
  • OAI-PMH Federator provides OAI-PMH interface to
    aDORe1

oaipmh2
38
OpenURL Resolver
  • Supports the core OpenURL services for aDORe1
    that are also available for the autonomous aDORe
    repositories
  • Interacts with Service Registry, Identifier
    Locator, and aDORe repoitories to generate
    responses
  • OpenURL Resolver provides 3 core service
    interfaces to aDORe1
  • obtain the most recent DIDL for a specified
    identifier (DIDLDocumentIdentifier, digital
    object identifier, datastream identifier)
  • retrieve a list of all the locations (DIDLs in
    aDORe1) containing a specified identifier
  • retrieve a datastream corresponding with a
    specified identifier (datastream identifier)
  • OpenURL Resolver could support
  • Return all identifiers of aDORe1
  • XQuery

openurl-aDORe1 openurl-aDORe2 openurl-aDORe4
39
rft_id identifier svc_id infolanl-repo/svc/ge
tDIDL
This is really 2 look-ups
40
OpenURL Resolver (a bit more)
  • Single point of access to request services
    pertaining to single items from the aDORe
    repositories.
  • Powered by a rule engine that dynamically decides
    which services are available for a specified item
    based on properties of the item (format,
    semantics, collection, creation date, ).
  • Interacts with Service Registry, Identifier
    Locator, aDORe repositories, rule engine, and
    transformation services to generate responses
  • Retrieve a list of all services pertaining to an
    item with specified identifier (DIDLDocumentIdenti
    fier, content identifier, datastream identifier)

openurl-aDORe5
41
Select an OpenURL service request from the list
42
LANL aDORe implementation
43
LANL aDORe software
  • Largely based on off-the-shelf software
    components
  • Berkeley DB Java Edition
  • Heritrix tookit
  • MySQL db
  • OCLC OAICat
  • OCLC OpenURL software
  • Ockam IESR service registry
  • aDORe Archive software (Layer 1 XMLtape
    ARCfiles) is available from http//african.lanl.g
    ov/aDORe/projects/adoreArchive/
  • Plans to one way or another make the entire
    LANL aDORe solution (revised Layer 1, Layer 2,
    Layer 3) available.

44
LANL aDORe _at_ 2 Sep 2007
  • aDORe Archive
  • XMLtapes 1,308
  • ARCfiles 2,223
  • DIDL Documents 45,444,113
  • ARCfile resources 115,028,715
  • 4.4 TByte
  • Identifier Locator
  • Identifiers 310,253,260

45
LANL aDORe hardware
46
LANL aDORe Performance
  • Ingestion
  • Preprocessing, Indexing, Registration 12 DIDLs
    / Second
  • System Specifications
  • Sunfire x4600 M2 Server
  • CPU AMD 8218 dual-core 2.6GHz (X 8)
  • RAM 16 x 2GB DDR2-667
  • Retrieval
  • Sub-10ms Retrieval Times for Individual Modules
  • System Specifications
  • IBM Blade Center
  • Chassis Model 86773XU
  • Blades Model 885092U (X 14)
  • AMD 2.8GHz (single core)
  • RAM 8 GB PC3200 ECC DDR SDRAM

47
aDORe Ingestion Overview
48
Conclusion
  • aDORe Archive
  • The file-based approach (XMLtape/ARCfile) is
    inherently simple, and reduces dependency on
    database systems.
  • The XMLtape approach is inspired by the ARC file
    format, but provides several additional
    attractive features
  • Off-the-shelf XML tools can be used to
    parse/validate an XMLtape
  • All Digital Object metadata can be stored in XML
    Package
  • The autonomy of the indexes allows retaining the
    files over time, while the indexes can be created
    using other techniques as technologies evolve.
  • Can throw all indexes out and just start from
    scratch.
  • Data integrity
  • XMLpackage contains SHA1 digest for each
    datastream of the Digital Object represented by
    the XML Package
  • SHA1 digest for each XMLtape and ARCfile stored
    in XMLtape Registry, and ARCfile Registry,
    respectively

49
Conclusion
  • aDORe
  • The protocol-based nature of the access increases
    the flexibility in light of evolving technologies
    through the introduction of a layer of
    abstraction.
  • Can throw whichever technology out and
    re-implement the same protocol interface using
    another technology.
  • The protocol-based nature of the solution allows
    a fully distributed implementation.
  • The component-based nature yields scalability.
  • The standard-based design allows the use of
    off-the-shelf tools.
  • A standard-based approach typically allows for a
    less painless migration (to a new standard).
  • All kinds of Content Management Systems can be
    aDORe-ized.
Write a Comment
User Comments (0)
About PowerShow.com