Digital Repository Projects at the North Carolina State University Libraries - PowerPoint PPT Presentation

About This Presentation
Title:

Digital Repository Projects at the North Carolina State University Libraries

Description:

Digitized texts and photographs. Campus Newsletters. GIS Data ... Free Software with active community. NCSU Libraries. Automation: ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 31
Provided by: JT17
Learn more at: https://www.lib.ncsu.edu
Category:

less

Transcript and Presenter's Notes

Title: Digital Repository Projects at the North Carolina State University Libraries


1
Digital Repository Projects at the North
Carolina State University Libraries
  • James Jackson Sanborn
  • Jim Tuttle
  • Open Repositories/DSpace User Group 07

2
Early Repository Planning
  • Digital Repository Planning Committee
  • What it wouldnt be (at least to start)
  • Distributed community structure
  • Open submission
  • Institutional Repository
  • What it would be (at least to start)
  • Library-managed collections
  • Building block for campus partnership
  • Learning opportunity

3
Repository Building Blocks
  • NCSU Electronic Theses and Dissertations
  • Started 1997
  • Mandatory since 2002
  • Virginia Techs ETDdb
  • 3,000 ETDs
  • NCSU Authors Database
  • Started 1995
  • Access Database/Cold Fusion front-end
  • 22,000 citations

4
Repository Building Blocks (contd)
  • Technical Reports Print Collection
  • Campus Institutes and Departments
  • Massive fall-off in print distribution
  • Special Collections Resource Center
  • Digitized texts and photographs
  • Campus Newsletters
  • GIS Data
  • Library managed/acquired data collection
  • Homegrown data layer database/discovery tools

5
Repository Plan
  • Target Research collections first
  • Technical Reports
  • ETDs
  • Faculty Publications/Citations
  • Treat each collection as its own project
  • Actively pursue common technological solutions

6
Technical Reports
  • DSpace Application
  • Lightly Customized
  • Library Harvested
  • Local Cataloging/Metadata database
  • Scripted Ingest Object Creation
  • Batch Ingest
  • Mix of ongoing submission by institute/departmenta
    l personnel and Library capture.

7
Tech Rep Screenshot
8
Technical Reports Item Detail
9
Electronic Theses Dissertations
  • Partnership with Graduate School
  • Hybrid System DSpace and ETD-db
  • ETD-db submission/approval/management
  • Direct database extract for DSpace Ingest Object
    creation
  • Scheduled Batch Ingest process
  • DSpace Considerations/Alterations
  • Metadata Mapping
  • Author Browse (exclude contributor.advisor)
  • Various interface changes

10
ETD-DB screenshot
11
ETD DSpace screenshot
12
Faculty Publications
  • Built on Existing Author Database
  • Rebuilt Authors DB from Access/ColdFusion to
    Oracle/PHP
  • Re-modeled data
  • Added Functionality
  • OpenURL
  • Vita-like citation display
  • Full-text or submission links
  • Full-text stored in DSpace
  • Citation metadata and file exported by script
  • DSpace Identifier currently manually entered

13
Faculty Publications Schematic
Scholar
Submit Citations and/or Text

View full-text
SR Citations
Web interface (php)
DSpace Item Display
Web Submission Form
PostgreSQL (metadata)
DSpace Java/JSP (full-text only)
Oracle Faculty Publications DB (citations)
Handle IDs
File System (files)
Access
ISI Ann. Reps Etc.
Add/Edit data
Cataloging and Coll. Mgt.
14
FacPubs Search Screen
15
FacPubs result screenshot
16
FacPubs Item screenshot
17
Repository Governance
  • Internal
  • Digital Repository Planning Committee
  • Data Repository Architect
  • External
  • Faculty Repository Advisory Committee
  • Partnerships with departments and institutes

18
NCGDAP Overview
  • NDIIPP National Digital Information
    Infrastructure and Preservation Program
  • Collaboration with Library of Congress
  • 1 of 8 three year projects to study long-term
    (50 years) digital preservation
  • Objective engage existing state/federal
    geospatial data infrastructures in preservation
  • Project approaches Technical and Social

19
Repository Requirements
  • Dim archive with possible future access
  • minimal IR/access component
  • Minimal repository imprint on data
  • repository agnostic ingest and export
  • Simple digital curation functions
  • Periodic MD5 checksum validation
  • Structured metadata index
  • Expected archived-data exchange
  • Leverage existing investments
  • Free Software with active community

20
Automation Threat and format analysis,
validation
  • Python wrappers for the following
  • Anti-virus ClamAV
  • Compressed files (tar, zip, gzip, bzip)
  • At-risk formats
  • Executable files (magic numbers)
  • Jhove validation

21
AutomationArchive package organization
  • ESRI ArcGIS toolbar for selected formats

22
AutomationArchive package organization
  • Rule-based python logic
  • filestem
  • extension relationships ( multi-file format
    validation)
  • directory structure
  • Manual intervention
  • NOID assignment

23
MetadataSeed file form
  • 'Transfer set' metadata capture in 'Seed file'
  • communicates with DSpace backend, generates xml
    used to inform later scripts

24
MetadataCommunities and Collections
  • Search by type for 100 communities
  • Facilitates creation and reduces errors

25
Curation Processing
  • At-risk format migration, original retained
  • Agency-specific XML templates in ArcCatalog with
    synchronization flags
  • Provenance and curation metadata scripted

26
Source Metadata Translation
  • Repository agnostic approach
  • Spokes for each transformation
  • Facilitates export from Dspace into other
    repositories
  • Generate Dspace QDC, METS populate Workflow
    database

27
Extra-repository AIP management
  • Workflow Management Database (WMD) populated as a
    spoke on the metadata/ingest hub
  • External tracking of NOID, Handle, ISO keywords,
    other metadata for interaction with other systems
  • Integrates with existing GIS Lookup tool

28
Repository Architecture Overview
PostgreSQL
One shared username. Separate database for each
app
repository tomcat instance
Tomcat DSpace Internal
Faculty Publications PHP/DSpace hybrid
NDIIPP (DSpace)
SCRC (DSpace)
  • Repository
  • (DSpace)
  • Technical Reports
  • ETDs

Collections (DSpace) SCRC --Course Catalogs
--Green N Growing
Asset Store/ ATABeast (sub-directory for each
DSpace app)
29
Upcoming Repository Related Projects
  • Enhancements to current system
  • XTF search interface
  • Inter-archive exchange
  • Digital Collections Repository
  • Special Collections Research Center
  • Other non-faculty collections
  • Data Repository
  • Scientific data
  • Statistical resources

30
For More Information
  • James Jackson Sanborn
  • james_sanborn_at_ncsu.edu
  • Jim Tuttle
  • jim_tuttle_at_ncsu.edu
Write a Comment
User Comments (0)
About PowerShow.com