Title: Long Term Preservation of Scientific Models using Semantic Web Services
1Long Term Preservation of Scientific Models
(using Semantic Web Services)
Jane Hunter, jane_at_dstc.edu.au
2PANIC Project
- Objectives
- Address the long term preservation and
accessibility of (composite) digital objects - Partners
- DSTC, UQ
- NLA, APSR
3Objectives
- Provide an Integrated Preservation Framework
which supports - Large, heterogeneous, distributed collections
- Multiple formats, composite digital objects
- New emerging formats, software, recommendations
- New migration, emulation services
- Recommender services/decision support
- Flexible, Dynamic, Scalable, Extensible
- Sustainable - cost-effective, semi-automated
- Collaborative effort!
4Existing Tools
- JHOVE, NLNZ, PREMINT Metadata tool
- OCLCs INFORM, Cornells VRC, SHERPA risk
assessment -gt notification services - GDFR, PRONOM, DCC-RIR Format registries
- VersionTracker, IIPC Software Registries
- XENA, TOM Conversion/migration services
- IBMs UVC (Universal Virtual Computer)
- Koninklijke Bibliothek - Emulation services
5(No Transcript)
6Steps
- Archival selection and capture of digital
object(s) preservation metadata - Risk assessment and notification of potential
obsolescence - New recommendations, formats, software versions
- Service Specification and Request
- Emulation or Migration
- Inputs/Outputs
- Cost
- Speed
- Remote/Distributed/Local
- Reliability
- Lossiness
- Select, Compose, Invoke Preservation Service
- Record preservation events
7PANIC Architecture
MySQL databases
Sesame RDF Store
Apache AXIS
8Notification component
Obsolescence detector periodically compares the
preservation metadata for each object with
registries to determine when object is at risk
of obsolescence
9Invocation component
- Service Discovery provides a user interface so
collections manager can specify the type of
preservation service they are looking for. - Service Selection presents the services
retrieved by the Discovery agent for selection. - Service Invocation invokes the chosen service
and updates the preservation metadata where
necessary
10OWL-S Preservation Extensions
11Discovery component
- Discovery Agent - matches service request against
OWL-S descriptions of Preservation Web services - Returns a ranked list of Preservation Web
services that match the request
Sesame RDF Store
12Provider component
- Provider Agent either
- retrieves and invokes preservation service
locally or - Invokes preservation service remotely
13Hypothetical Example
- Russel Coight is an astronomer at the Australian
Telescope National Facility (ATNF) - Large collection of astronomy images in TIFF 5.0
format - ImageViewer 1.0 used to view TIFF images
- New version of ImageViewer (2.0) no longer
supports TIFF 5.0 - RLG recommends that TIFF format be replaced by
JPEG2000 for archival
14(No Transcript)
15(No Transcript)
16(No Transcript)
17(No Transcript)
18(No Transcript)
19(No Transcript)
20(No Transcript)
21eScience Workflow
Organization B
Organization A
Organization C
t1
t6
t7
t8
t5
t8
Initiate New Experiments
Statistical Analysis
Microscopic Images
Semantic Indexing
t3
t4
t2
MatLab Image Processing
Data Exploration Hypothesis Testing
Capture Results
Capture Spectrometry data
Conduct Experiments
BPEL4WS workflow based on web services
22Scientific Discovery Process
- Inception of the idea
- Analysis of prior work
- Experimental design
- Capture the empirical/observational data
- Analyse, process, interpret and annotate the
data - Visualization of the data
- Formulation of an hypothesis, construction of
conceptual and/or numerical models - Verify and refine the model by comparison with
new experimental data - Document and publish the findings
23Components
- Prior work - pre-existing data, models,
hypotheses or publications - Objective, hypothesis
- Experimental data - instrumental conditions,
settings and parametric ranges or constraints,
assumptions - Data sets numerical data, survey/questionnaire
data, images, video, audio, spectral data,
real-time sensor data - Formulae, rules, hypotheses, mathematical
functions - Conceptual models - axioms, models and metaphors
- Software
- source code, executables, applets or links to web
services - Hardware instruments and computers
- Visualizations 2D, 3D imagery, graphs, tables,
charts, diagrams, animations - Textual - publications, reports, documentation,
annotations, bibliographies, reviews
24Harmony ABC Model
25Extended ABC Ontology
description
objectives
parameters
Experiment
conditions
Simulation
results
Processing
Numerical
Textual
Data
Image
Model
Graphical
Design
Audiovisual
Hypothesis
Mapping
Theory
26Modelling Provenance
Experiment
Processing
Objectives
State3
State1
State2
Type
Type
Model
Input
Input
Event E1
Event E2
Output
Output
Visualization
Action
Action
Context
Context
Tool
Tool
Scope
MatLab
Role
Role
Date/ Time
Place
Agent
Conditions
Date/ Time
Place
Agent
Agents can be people or software e.g., web
services
27Scenario
Tools to enable Construction Description Publishi
ng Of Scientific Model Packages
Private Workspace Project1 Project2 Project3
Shared Workspace Project1 Project3
RDF
SRB/MCAT
DSpace RDF DataStore
Distributed Databases
- Institutional Repository
- Scholarly Publications
- Scientific Model Packages
- eLearning Objects
28Model to predict LE of CD-ROMs
RDF Package Title Creator Description Type
image_of
analysis_of
Average LE 1/T exp (A B/T)
derived_from
Each component Has software, OS,
hardware Dependencies interdependencies
refers_to
Slattery, O., Lu, R., Zheng, J., Byers, F., Tang,
X. "Stability Comparison of Recordable Optical
Discs- A study of error rates in harsh
conditions," Journal of Research of the NIST,
109, 517-524, 2004
29Composite Objects
- Use RDF/XML to package metadata, component
objects and relationships - METS, MPEG-21 DIDL, XFDU, IMS-CP
- Maintain preservation metadata for
- Composite object
- Atomic components
- Maintain index of file formats
- Monitor atomic objects first
- JPEG -gt JPEG-2000
- PDF -gt PDG
- Then check currency of composite objects
- MPEG-21 DIDL V1 -gt MPEG-21 DIDL V2
30Future Directions
- Ongoing refinement and evaluation
- within real archive/library (PANDORA)
- Integrate - GDFR, PRONOM, TOM, XENA, media
longevity estimates, risk assessments - Trusted services - quality ratings
- Grid Services - Web Services Resource Framework
(WSRF) - Composite services, composite objects
31Reference
- http//metadata.net/panic
- contact jane_at_dstc.edu.au