Title: Fedora and Pathways Current and future with semantic web
1Fedora and PathwaysCurrent and future with
semantic web
- Mellon OS Retreat
- February 2006
Sandy Payette Co-Director, Fedora
Project Researcher, Cornell Information Science
2Motivations Fedora and RDF
- A natural model for exposing repository as
network of objects - Object-to-object relationships
- Relationships to external entities
- Query the graph traversal to discover related
stuff - Indexing based on generalizable data model
- Graph-based data model is a common reduction
- Avoid fixed schema problems and metadata mud
wrestling - Extensible enrichment of object descriptions
- Keep overlaying statements from multiple
ontologies - Organic evolution
- Powerful queries and inference for repository
management - Transitive relationships among objects
- Dependency analysis
- Detection/Extraction of sub-graphs (e.g.,
multi-object content models) - provenance of disseminations
3RDF in the Fedora Digital Object Model
Persistent ID (PID)
Relations Datastreams (RDF/XML)
Relations
Dublin Core
Audit Trail
Policy
All digital objects serialized in Fedora
repository persistent store in Fedora XML wrapper
format (FOXML) Optionally indexed in RDF
triplestore
Datastream
Datastream
Disseminator
4Fedora RDF Datastream
- Assert relationships from Fedora ontology
- Collection member
- Whole part
- Equivalence
- Description Of
- More
- Assert relationships/properties from other
ontologies
5Relationships Datastream
ltfoxmldatastream ID"RELS-EXT"
CONTROL_GROUP"X"gt ltfoxmldatastreamVersion
ID"RELS-EXT.0" MIMETYPE"text/xml" LABEL"RDF"gt
ltfoxmlxmlContentgt ltrdfRDF
xmlnsrdf"http//www.w3.org/1999/02/22-rdf-syntax
-ns" .gt ltrdfDescription rdfabout"infofedo
ra/nsdl100"gt
ltfedoraisMemberOfCollection rdfresource"infofe
dora/nsdlmath-49"/gt ltfedoraisMemberOfCo
llection rdfresource"infofedora/nsdlphysics-48
"/gt ltnsdlrecommendedBy
rdfresourceinfofedora/nsdlExpertVoices-120/gt
ltnsdlownergtJane Doeltnsdlowner/gt
lt/rdfDescriptiongt lt/rdfRDFgt
lt/foxmlxmlContentgt lt/foxmldatastreamVersion
gt lt/foxmldatastreamgt
6Network of Digital Objects in a Fedora Repository
7Fedora Resource Index (RDF)
- NOT the core repository object storage RI is
optional index - Automatic, incremental indexing
- Kowari triplestore
- Scale tested to 180M triples
- Search/query the repository via Fedora RI Query
Interface
RDF Datastream
Fedora Digital Object
Fedora model properties
DC Datastream
RDF-based Index of Repository
8Fedora RDF Query Interface
9RI as enabler for other services
- Services can query RDF index
- Fedora OAI Provider (PROAI-based)
- Queries
- Whats has changed since dateTime?
- Set information
- Mapping between Fedora objects and OAI records
- Takes advantage of dependency chains in graph
- Determine if dissemination has changed if
dependent datastreams have changed - Can evaluate down a chain of disseminations
10Fedora Implementations using RDF
- NSDL
- RightsCom
- Alfred Wegener Institute for Polar and Marine
Research - Tibetan Buddhist Resource Center
- ARROW
- Case Western Reserve
- UQ e-scholarship and Fez
- Harris
- more
11NSDL-NDR Information Network Overlay
12(No Transcript)
13NSF Pathways (Cornell/LANL)Challenges Phase 1
- Current situation
- Heterogeneous repository systems
- Heterogeneous object models (or no object model)
- Multiple protocols and service APIs
- Services lacking formal interface definitions
- Can these ever play nicely together?
- Need common abstractions
- Ontology-based Information model
- Ontology-based Service model
14Pathways Vision Interoperable Information Model
Most things can be represented as a graph of
nodes and arcs.
Cornell University and Los Alamos Natl
Lab http//www.infosci.cornell.edu/pathways
15Pathways Core Ontology
16Core-1 Ontology Article Example
Cornell University and Los Alamos Natl
Lab http//www.infosci.cornell.edu/pathways
17Dynamic Service Matching
- The right services married to right types of
complex objects via ontologies (OWL OWL-S) - Use cases
- interoperable object presentation (e.g. journal
overlay) - interoperable content transformation
- migration/preservation processes
- curation services other
- Work related to repositories
- PANIC (Hunter)
- Pathways InterDisseminator (Cornell/LANL)
18Building Block Repository Integration
Cornell University and Los Alamos Natl
Lab http//www.infosci.cornell.edu/pathways
19Summary Near term opportunities
- Metadata interoperability
- Finding objects
- Complex object interoperability
- Using objects
- Curation and lifecycle
- Managing objects
20Summary Challenges
- Common baseline ontologies that communities rally
around - Ontology weaving
- assertions of equivalences are significant
- Standard protocols and query languages for
exposing ontology-based views of digital objects
across repositories and services - Scalability and Performance of triplestores
- 200-300M triples currently reported in the field
- Fedora testing (see tripletest on Sourceforge)
- So, one gets too big then unified search over
federation? - distributed graph query is research topic
- Tools for users
- To create knowledge assertions
- To query, navigate, make sense