Title: Digital Repository Projects at the North Carolina State University Libraries
1Digital Repository Projects at the North
Carolina State University Libraries
- James Jackson Sanborn
- Jim Tuttle
- Open Repositories/DSpace User Group 07
2Early Repository Planning
- Digital Repository Planning Committee
- What it wouldnt be (at least to start)
- Distributed community structure
- Open submission
- Institutional Repository
- What it would be (at least to start)
- Library-managed collections
- Building block for campus partnership
- Learning opportunity
3Repository Building Blocks
- NCSU Electronic Theses and Dissertations
- Started 1997
- Mandatory since 2002
- Virginia Techs ETDdb
- 3,000 ETDs
- NCSU Authors Database
- Started 1995
- Access Database/Cold Fusion front-end
- 22,000 citations
4Repository Building Blocks (contd)
- Technical Reports Print Collection
- Campus Institutes and Departments
- Massive fall-off in print distribution
- Special Collections Resource Center
- Digitized texts and photographs
- Campus Newsletters
- GIS Data
- Library managed/acquired data collection
- Homegrown data layer database/discovery tools
5Repository Plan
- Target Research collections first
- Technical Reports
- ETDs
- Faculty Publications/Citations
- Treat each collection as its own project
- Actively pursue common technological solutions
6Technical Reports
- DSpace Application
- Lightly Customized
- Library Harvested
- Local Cataloging/Metadata database
- Scripted Ingest Object Creation
- Batch Ingest
- Mix of ongoing submission by institute/departmenta
l personnel and Library capture.
7Tech Rep Screenshot
8Technical Reports Item Detail
9Electronic Theses Dissertations
- Partnership with Graduate School
- Hybrid System DSpace and ETD-db
- ETD-db submission/approval/management
- Direct database extract for DSpace Ingest Object
creation - Scheduled Batch Ingest process
- DSpace Considerations/Alterations
- Metadata Mapping
- Author Browse (exclude contributor.advisor)
- Various interface changes
10ETD-DB screenshot
11ETD DSpace screenshot
12Faculty Publications
- Built on Existing Author Database
- Rebuilt Authors DB from Access/ColdFusion to
Oracle/PHP - Re-modeled data
- Added Functionality
- OpenURL
- Vita-like citation display
- Full-text or submission links
- Full-text stored in DSpace
- Citation metadata and file exported by script
- DSpace Identifier currently manually entered
13Faculty Publications Schematic
Scholar
Submit Citations and/or Text
View full-text
SR Citations
Web interface (php)
DSpace Item Display
Web Submission Form
PostgreSQL (metadata)
DSpace Java/JSP (full-text only)
Oracle Faculty Publications DB (citations)
Handle IDs
File System (files)
Access
ISI Ann. Reps Etc.
Add/Edit data
Cataloging and Coll. Mgt.
14FacPubs Search Screen
15FacPubs result screenshot
16FacPubs Item screenshot
17Repository Governance
- Internal
- Digital Repository Planning Committee
- Data Repository Architect
- External
- Faculty Repository Advisory Committee
- Partnerships with departments and institutes
18NCGDAP Overview
- NDIIPP National Digital Information
Infrastructure and Preservation Program - Collaboration with Library of Congress
- 1 of 8 three year projects to study long-term
(50 years) digital preservation - Objective engage existing state/federal
geospatial data infrastructures in preservation - Project approaches Technical and Social
19Repository Requirements
- Dim archive with possible future access
- minimal IR/access component
- Minimal repository imprint on data
- repository agnostic ingest and export
- Simple digital curation functions
- Periodic MD5 checksum validation
- Structured metadata index
- Expected archived-data exchange
- Leverage existing investments
- Free Software with active community
20Automation Threat and format analysis,
validation
- Python wrappers for the following
- Anti-virus ClamAV
- Compressed files (tar, zip, gzip, bzip)
- At-risk formats
- Executable files (magic numbers)
- Jhove validation
21AutomationArchive package organization
- ESRI ArcGIS toolbar for selected formats
22AutomationArchive package organization
- Rule-based python logic
- filestem
- extension relationships ( multi-file format
validation) - directory structure
- Manual intervention
- NOID assignment
23MetadataSeed file form
- 'Transfer set' metadata capture in 'Seed file'
- communicates with DSpace backend, generates xml
used to inform later scripts
24MetadataCommunities and Collections
- Search by type for 100 communities
- Facilitates creation and reduces errors
25Curation Processing
- At-risk format migration, original retained
- Agency-specific XML templates in ArcCatalog with
synchronization flags - Provenance and curation metadata scripted
26Source Metadata Translation
- Repository agnostic approach
- Spokes for each transformation
- Facilitates export from Dspace into other
repositories - Generate Dspace QDC, METS populate Workflow
database
27Extra-repository AIP management
- Workflow Management Database (WMD) populated as a
spoke on the metadata/ingest hub - External tracking of NOID, Handle, ISO keywords,
other metadata for interaction with other systems - Integrates with existing GIS Lookup tool
28Repository Architecture Overview
PostgreSQL
One shared username. Separate database for each
app
repository tomcat instance
Tomcat DSpace Internal
Faculty Publications PHP/DSpace hybrid
NDIIPP (DSpace)
SCRC (DSpace)
- Repository
- (DSpace)
- Technical Reports
- ETDs
Collections (DSpace) SCRC --Course Catalogs
--Green N Growing
Asset Store/ ATABeast (sub-directory for each
DSpace app)
29Upcoming Repository Related Projects
- Enhancements to current system
- XTF search interface
- Inter-archive exchange
- Digital Collections Repository
- Special Collections Research Center
- Other non-faculty collections
- Data Repository
- Scientific data
- Statistical resources
30For More Information
- James Jackson Sanborn
- james_sanborn_at_ncsu.edu
- Jim Tuttle
- jim_tuttle_at_ncsu.edu