Title: Enhancing sccess to resesrch dsts: the crystallogrsphy chsllenge
1Enhancing access to research data the challenge
of crystallography Rachel Heery, Monica Duke,
Michael Day UKOLN, University of Bath Leslie
Carr, Simon Coles University of Southampton
UKOLN is supported by
JCDL 2005, June 7-11, Denver
www.bath.ac.uk
A centre of expertise in digital informaion
management
2Enhancing access to research data overview
- Crystallography as an exemplar
- Impact of digital technologies on scientific
research process - Need new modes of data curation
- eBank project applying digital library
techniques to support data curation - Next steps
3Changes in scientific research process
- Increasing data volumes from eScience /
Grid-enabled / cyber-infrastructure applications,
big science - Changing research methods high througput
technologies, automation, smart labs - Potential for re-use of data, new
inter-disciplinary research - Different types of data observational data,
experimental data, computational data different
stewardship requirements
4The data deluge crystallography
5Data overload the publication bottleneck
2,000,000
25,000,000
300,000
6Current Publishing Process
- Journal articles aims, ideas, context,
conclusions only most significant data - Raw underlying data required by peers not
readily available
7Context existing data repositories
- National data archives
- UK Data Archive, Arts and Humanities Data
Service, US National Archives and Records
Administration (NARA), Atlas Datastore - Discipline specific archives
- GenBank, Protein Data Bank
- Crystallography archives
- Cambridge Crystallographic Data Centre (Cambridge
Structural Database) , Indiana University
Molecular Structure Center (Crystal Data Server,
Reciprocal Net), FIZ Karlsruhe (Inorganic
crystals), Toth Information Systems (CHRYSTMET) - Journals require deposit of data to support
articles - Typically deposit of summary data. partial
coverage
8Crystallography workflow
- Initialisation mount new sample on
diffractometer set up data collection - Collection collect data
- Processing process and correct images
- Solution solve structures
- Refinement refine structure
- CIF produce CIF (Crystallographic Information
File) - Validation chemical crystallographic checks
9eBank UK project overview
- JISC funded in 2003, now in Phase 2 to 2006
- Joint effort between crystallographers, computer
scientists, digital library researchers - Investigating contribution of existing digital
library technologies to enable publication at
source - Partners have interest in dissemination of
chemistry research data, open access, OAI,
institutional repositories http//www.ukoln.ac.uk/
projects/ebank-uk/
10eBank project team
- University of Bath, UKOLN
- Michael Day, Monica Duke, Rachel Heery, Liz Lyon,
Traugott Koch - University of Southampton, School of Chemistry
- Simon Coles, Jeremy Frey, Mike Hursthouse
- University of Southampton, School of Electronics
and Computer Science - Leslie Carr, Chris Gutteridge
- University of Manchester, PSIgate
- John Blunden-Ellis
11eBank phase one achievements
- Gathered requirements from crystallographers
- Established pilot institutional repository for
crystallography data at Southampton with web
interface - Developed a demonstrator aggregator service at
UKOLN (CCDC exploring aggregation service) - Developed appropriate schema
- Demonstrated a search interface as an embedded
service at PSIgate portal - Demonstrated an added value service linking
research data to papers (one-off)
12Institutional repositoriespublication at source
- Institution establishes repository(s)
- Institution pro-actively supports deposit process
- OAI provides basis for interoperability
- Potential for added value services
- And/Or .international subject based archives?
13Crystallography good fit.
- Crystallography has well defined data creation
workflow - Tradition of sharing using standard file format
- Crystallography Information File (CIF)
- What about other chemistry sub-disciplines? other
scientific disciplines? -
14Data Flow in eBank UK
Create
OAI-PMH
Index and Search
Institutional repository
eBank aggregator
Data files
Metadata
15Southampton digital repository
http//ecrystals.chem.soton.ac.uk
16Access to ALL underlying data
17OAI-PMH harvesting and aggregating
eBank aggregator at UKOLN http//eprints-uk.rdn.ac
.uk/ebank-demo/
Demonstrating potential for linking between data
and journal article
18Embedded search service at PSIgate
PSIgate subject gateway service provider
19Schema for records made available for harvesting
- Data holding (collection of files associated with
experiment) - Qualified Dublin Core data elements plus
additional chemical properties - Empirical formula
- International Chemical Identifier (InChI)
- Compound Class
- Individual data files
- Separate records for stage status of each file
- Description set wrapped into one XML record using
METS - Research metadata/data as a complex object
20Dataset
eBank data model
Dataset
Dataset
dctermsreferences
Harvesting OAI-PMH oai_dc
Crystal structure (data holding)
ePrint UK aggregator service
Linking
dctypeCrystalStructure
Harvesting OAI-PMH ebank_dc
ebank_dc record (XML)
Deposit
eBank UK aggregator service
dcidentifier
Institutional repositories
dctermsisReferencedBy
Crystal structure report (HTML)
Deposit
Harvesting OAI-PMH oai_dc,ebank_dc
Eprint jump-off page (HTML)
dcidentifier
Eprint manifestation (e.g. PDF)
Eprint oai_dc record (XML)
Other aggregators and services
dctypeEprint and/or Text
Linking
Model input Andy Powell, UKOLN.
21Creating the metadata
- Potential to embed deposit and disseminate into
workflow of chemist in automated way
22Data Collection
23eBank phase two work areas
- Sub-disciplines of chemistry and physical
sciences - Pursue generic data model
- Use of identifiers for citing datasets
- Subject approach to discovering research data
- Access to research data in teaching and learning
context - Liaise with other digital repository initiatives
24For the future
- Who provides added value services?
- Authority files, automated subject indexing,
annotation, data mining, visualisation - What are the preservation issues?
- UK Digital Curation Centre http//www.dcc.ac.uk
- National Science Board Draft report on long-lived
data collections http//www.nsf.gov/nsb/meetings/2
005/LLDDC_draftreport.pdf - How to manage complex objects descriptions within
OAI - Digital curation of research data presents new
roles for scientists, computer scientists, data
managers. data scientists
25Thank you.Comments, questions?http//www.ukoln.
ac.uk/projects/ebank-uk/Acnowledgement to all
project partners for their contributions to this
presentation.