Enhancing sccess to resesrch dsts: the crystallogrsphy chsllenge - PowerPoint PPT Presentation

About This Presentation
Title:

Enhancing sccess to resesrch dsts: the crystallogrsphy chsllenge

Description:

National data archives: ... Archive, Arts and Humanities Data Service, US National Archives and Records Administration (NARA), Atlas Datastore ... – PowerPoint PPT presentation

Number of Views:17
Avg rating:3.0/5.0
Slides: 26
Provided by: monicaduke
Category:

less

Transcript and Presenter's Notes

Title: Enhancing sccess to resesrch dsts: the crystallogrsphy chsllenge


1
Enhancing access to research data the challenge
of crystallography Rachel Heery, Monica Duke,
Michael Day UKOLN, University of Bath Leslie
Carr, Simon Coles University of Southampton
UKOLN is supported by
JCDL 2005, June 7-11, Denver
www.bath.ac.uk
A centre of expertise in digital informaion
management
2
Enhancing access to research data overview
  • Crystallography as an exemplar
  • Impact of digital technologies on scientific
    research process
  • Need new modes of data curation
  • eBank project applying digital library
    techniques to support data curation
  • Next steps

3
Changes in scientific research process
  • Increasing data volumes from eScience /
    Grid-enabled / cyber-infrastructure applications,
    big science
  • Changing research methods high througput
    technologies, automation, smart labs
  • Potential for re-use of data, new
    inter-disciplinary research
  • Different types of data observational data,
    experimental data, computational data different
    stewardship requirements

4
The data deluge crystallography
5
Data overload the publication bottleneck
2,000,000
25,000,000
300,000
6
Current Publishing Process
  • Journal articles aims, ideas, context,
    conclusions only most significant data
  • Raw underlying data required by peers not
    readily available

7
Context existing data repositories
  • National data archives
  • UK Data Archive, Arts and Humanities Data
    Service, US National Archives and Records
    Administration (NARA), Atlas Datastore
  • Discipline specific archives
  • GenBank, Protein Data Bank
  • Crystallography archives
  • Cambridge Crystallographic Data Centre (Cambridge
    Structural Database) , Indiana University
    Molecular Structure Center (Crystal Data Server,
    Reciprocal Net), FIZ Karlsruhe (Inorganic
    crystals), Toth Information Systems (CHRYSTMET)
  • Journals require deposit of data to support
    articles
  • Typically deposit of summary data. partial
    coverage

8
Crystallography workflow
  • Initialisation mount new sample on
    diffractometer set up data collection
  • Collection collect data
  • Processing process and correct images
  • Solution solve structures
  • Refinement refine structure
  • CIF produce CIF (Crystallographic Information
    File)
  • Validation chemical crystallographic checks

9
eBank UK project overview
  • JISC funded in 2003, now in Phase 2 to 2006
  • Joint effort between crystallographers, computer
    scientists, digital library researchers
  • Investigating contribution of existing digital
    library technologies to enable publication at
    source
  • Partners have interest in dissemination of
    chemistry research data, open access, OAI,
    institutional repositories http//www.ukoln.ac.uk/
    projects/ebank-uk/

10
eBank project team
  • University of Bath, UKOLN
  • Michael Day, Monica Duke, Rachel Heery, Liz Lyon,
    Traugott Koch
  • University of Southampton, School of Chemistry
  • Simon Coles, Jeremy Frey, Mike Hursthouse
  • University of Southampton, School of Electronics
    and Computer Science
  • Leslie Carr, Chris Gutteridge
  • University of Manchester, PSIgate
  • John Blunden-Ellis

11
eBank phase one achievements
  • Gathered requirements from crystallographers
  • Established pilot institutional repository for
    crystallography data at Southampton with web
    interface
  • Developed a demonstrator aggregator service at
    UKOLN (CCDC exploring aggregation service)
  • Developed appropriate schema
  • Demonstrated a search interface as an embedded
    service at PSIgate portal
  • Demonstrated an added value service linking
    research data to papers (one-off)

12
Institutional repositoriespublication at source
  • Institution establishes repository(s)
  • Institution pro-actively supports deposit process
  • OAI provides basis for interoperability
  • Potential for added value services
  • And/Or .international subject based archives?

13
Crystallography good fit.
  • Crystallography has well defined data creation
    workflow
  • Tradition of sharing using standard file format
  • Crystallography Information File (CIF)
  • What about other chemistry sub-disciplines? other
    scientific disciplines?

14
Data Flow in eBank UK
Create
OAI-PMH
Index and Search
Institutional repository
eBank aggregator
Data files
Metadata
15
Southampton digital repository
http//ecrystals.chem.soton.ac.uk
16
Access to ALL underlying data
17
OAI-PMH harvesting and aggregating
eBank aggregator at UKOLN http//eprints-uk.rdn.ac
.uk/ebank-demo/
Demonstrating potential for linking between data
and journal article
18
Embedded search service at PSIgate
PSIgate subject gateway service provider
19
Schema for records made available for harvesting
  • Data holding (collection of files associated with
    experiment)
  • Qualified Dublin Core data elements plus
    additional chemical properties
  • Empirical formula
  • International Chemical Identifier (InChI)
  • Compound Class
  • Individual data files
  • Separate records for stage status of each file
  • Description set wrapped into one XML record using
    METS
  • Research metadata/data as a complex object

20
Dataset
eBank data model
Dataset
Dataset
dctermsreferences
Harvesting OAI-PMH oai_dc
Crystal structure (data holding)
ePrint UK aggregator service
Linking
dctypeCrystalStructure
Harvesting OAI-PMH ebank_dc
ebank_dc record (XML)
Deposit
eBank UK aggregator service
dcidentifier
Institutional repositories
dctermsisReferencedBy
Crystal structure report (HTML)
Deposit
Harvesting OAI-PMH oai_dc,ebank_dc
Eprint jump-off page (HTML)
dcidentifier
Eprint manifestation (e.g. PDF)
Eprint oai_dc record (XML)
Other aggregators and services
dctypeEprint and/or Text
Linking
Model input Andy Powell, UKOLN.
21
Creating the metadata
  • Potential to embed deposit and disseminate into
    workflow of chemist in automated way

22
Data Collection
23
eBank phase two work areas
  • Sub-disciplines of chemistry and physical
    sciences
  • Pursue generic data model
  • Use of identifiers for citing datasets
  • Subject approach to discovering research data
  • Access to research data in teaching and learning
    context
  • Liaise with other digital repository initiatives

24
For the future
  • Who provides added value services?
  • Authority files, automated subject indexing,
    annotation, data mining, visualisation
  • What are the preservation issues?
  • UK Digital Curation Centre http//www.dcc.ac.uk
  • National Science Board Draft report on long-lived
    data collections http//www.nsf.gov/nsb/meetings/2
    005/LLDDC_draftreport.pdf
  • How to manage complex objects descriptions within
    OAI
  • Digital curation of research data presents new
    roles for scientists, computer scientists, data
    managers. data scientists

25
Thank you.Comments, questions?http//www.ukoln.
ac.uk/projects/ebank-uk/Acnowledgement to all
project partners for their contributions to this
presentation.
Write a Comment
User Comments (0)
About PowerShow.com