Title: Enabling Interaction and Quality in a Distributed Data DRIS
1Enabling Interaction and Quality in a Distributed
Data DRIS
CRIS 2006 Bergen, Norway May 11, 2006
- D. Scott BrandtAssociate Dean for Research
Michael WittSenior Research Systems
Administrator - Purdue University Libraries
2Background Purdue University
- Nine Colleges Agriculture, Consumer Family
Sciences, Education, Engineering, Liberal Arts,
Management, Pharmacy/ Nursing/Health Sciences,
Technology, Vet Medicine - 73 Departments, several cross-disciplinary e.g.
Agricultural Biological Engineering
3Purdue University Libraries
- 2004 initiative for Librarians (faculty) to
collaborate with other faculty across
campusapply library science knowledge and
expertise to various research data problems - collect, organize, describe, curate, archive,
disseminate data/information
4Strategic directions
- University interdisciplinaryand collaborative
endeavorsgrounded in the strengths of academic
disciplines -
- Libraries Libraries faculty are integrated into
campus research agenda
5Areas of research collaboration
- Discovery Learning Center
- Earth Atmospheric Science
- English
- IT at Purdue
- Mechanical Engineering Technology
- Regenstrief Center
- Agronomy
- Biology
- Cancer Center
- Center for the Environment
- Chemical Engineering
- Chemistry
- Cyber Center
6Current areas of participation
- E. Coli K-12 Model Organism Resource NIH proposal
(B. Wanner, Biology, PI, D. Scott Brandt,
Libraries, Co-PI) create archival process for
curated database, assist in applying ontologies
for data representation and annotation - An Expert System Multimedia Tutorial for Locating
Technical Information, Purdue University TLT
Digital Content grant (Megan Sapp, PI, Amy Van
Epps and Michael Fosmire, co-PIs, with Bruce
Harding, Mechanical Engineering Technology)
develop tutorial for MET102 course in using and
applying standards - URL-based Search Interface to the Distributed
Institutional Repository Purdue University
Graduate School (Michael Witt, Libraries, PI,
Darcy Bullock, Civil Engineering, Co-PI) develop
toolkit to deploy customized searching of
dissertations by school, advisor, etc. - AquaEcon Web Library An Electronic Resource on
Economics-Related Literature on Aquaculture, NOAA
(K. Quagrainie, Agricultural Economics PI, Hal
Kirkwood, Libraries, as co-PI) build and
populate database
7Progression towards CRIS
- Institutional repository (IR)
- Distributed institutional repository (DIR)
- Interactions related to DIR leading to CRIS-like
applications - Leverage DIR for DRIS/CRIS
8Distributed Institutional Repository
e-prints
archival collections
MetadataRepository
grid resources
Applications
data archive
native databases
OAI Service Provider
OAI Data Providers
9A systems-based approach to Libraries supporting
research linear
inputs
experimentation
outputs
Data repositories
Document repositories
CRIS
A repository of well-described data resulting
from research processes is preserved and shared
for repurposing
A current research information system links
people engaged in research with funding and other
resources such as interdisciplinary collaborators
Journal article pre-prints, post-prints,
conference and working papers, dissertations and
other e-prints represent research outputs in a
document repository
10A systems-based approach to Libraries supporting
research cyclical
CRIS
data repository
e-print repository
11An example application SRU
- Linking to electronic theses and dissertations
(ETD) - URL-based search interface to DIR running as a
web service - 16,000 Strategic Development Initiative award
for fellowship and server
12Getting to the datasets SRB
- The Storage Resource Broker
- Developed by the San Diego Supercomputer Center
- Uniform access to heterogeneous, distributed
storage - Metadata catalog (MCAT) and preservation
functionality - TeraGrid, collaboration with Information
Technology at Purdue and Rosen Center for
Advanced Computing
13An example systems interaction
- OAISRB provides an OAI-PMH interface to the SRB
to expose metadata from resources on a data grid
to OAI service providers
Data grid
14Sample OAISRB config
OAI Handler Base URL Format OAIHandler.baseUR
Lhttp//128.210.126.2318080/OAISRB/OAIHandler
SRB Connection Parameters SRB.HOSTorion.sdsc.e
du SRB.PORT7620 SRB.USERNAMEmwitt SRB.PASSWORDn
yah SRB.HOMEDIRECTORY/dspace/home/mwitt.purdue SR
B.MDASDOMAINNAMEpurdue SRB.DEFAULTSTORAGERESOURCE
dspace-fs1 SRB.MCATZONEdspace SRB
Collection Count and SRB Collection
Names SRB.root/TGzone/home/lars.itap SRB.maxcolle
ctions1 SRB.collection1LARSDATA Custom
Parameters for SRB GRID SRBRecordFactory.repositor
yIdentifiermwitt.purdue Display.MaxListSize50
Custom Identify response values Identify.reposi
toryNameSRB Data Grid Identify.adminEmailmailto
mwitt_at_purdue.edu Identify.earliestDatestamp2000-0
1-01T000000Z Identify.deletedRecordno
Crosswalk (in this example, FGDC-to-unqualified
Dublin Core) DC.Identifiertitle DC.Descriptionpu
rpose DC.Titletitle DC.FormatFile
Format DC.Creatoraddress DC.Subjectmetprof
15Metadata research
- Metadata librarian worked for four months
analyzing metadata needs and processes for
several data sets - Results included DC descriptions, enhanced with
thesaurus headings, and a basic crosswalk - Also metadata descriptions from scratch are too
manually intensive
16Metadata- Water Quality
- A flat file with only system metadata
- Began with Dublin Core
- Enhanced subjects with thesaurus from NAL (US
National Agriculture Library) - Looked at DIF (Dir. Interchange Format)
- Looked at cross-walk with FGDC (Federal
Geographic Data Comm.) format
17(No Transcript)
18(No Transcript)
19Next steps Metadata
- Articulate metadata workflow to imbed metadata
into the process - Review automating all data
- Determine how/where to validate and automate
descriptive metadata
20Conclusions and Questions
- Use existing, native metadata whenever possible
- Automate and periodically assess processes to
ensure quality - Diminishing returns we settled on discovery and
collection-level metadata - Crosswalks are useful but can truncate or distort
the original meaning - The importance of interactions, among people and
systems - How do we implement CRIS/CWIS/DRIS in our
environment? - What is the role of the Libraries in such?
21Takk (thank you)
- Michael Witt
- mwitt_at_purdue.edu
- D. Scott Brandt
- techman_at_purdue.edu