Title: Digital Archiving and Library Consortia: Issues and Technologies
1Digital Archiving and Library Consortia Issues
and Technologies
- T.B. Rajashekar
- National Centre for Science InformationIndian
Institute of ScienceBangalore 560 012(E-Mail
raja_at_ncsi.iisc.ernet.in)
2Agenda
Consider some current developments related to
archiving Key issues in consortia level
archiving Some technical possibilities for
consortia level archiving (loud thinking!)
3Digital archiving
- Long-term storage, preservation and access to
digital material - Preservation options
- Technical preservation
- Same content, software, h/w Not viable
- Migration
- Migrate content to new platform
- Emulation
- Preserve content and emulate original environment
- Hybrid (Migration Emulation)
Ideal situation XML and Unicode (another 5
years?)
4Technical criteria and guidelines
- DLF
- Minimum criteria for an archival repository of
digital scholarly journals (May 2000) - RLG
- Recommendations for preserving digital
information - OAIS
- Open Archival Information Systems (ISO Reference
Model ISO 147212002) - CEDARS (JISC supported project, UK)
- Guide to digital preservation strategies (2002)
- ICOLC advisories w.r.t. licensing e-content
5Library consortia and archiving
- Expectations of consortia
- Perpetual access to licensed content
- Uninterrupted access
- Concerns
- Interruptions or cancellation of access due to
political, economic or technological reasons - Move towards Purchase model for e-content
6ICOLC Advisory to publishers
- Provide permanent access to licensed content
- Agree to and support consortia in developing
local archiving solutions
7Publishers Response
- Most publishers increasingly responsive to
library archiving needs - Key concern Loss of control over content
- Willing to support purchase model for e-content,
if license compliance can be guaranteed - Hand over archiving to libraries reduce their
investments move away from print versions
altogether - Current status Many agree to provide content to
fulfill archiving needs - Free, media cost, additional fee
- Raw data
8Archiving and Consortia
- Key issues
- Perpetual access to bibliographic databases
- Perpetual access to e-journals
- Who does the archiving?
- Consortia, third party
- How do we preserve publishers interests?
- Incorporate archiving terms in agreements
- How the data is acquired?
- How do we create the access architecture from
this data? - Are there software solutions?
9Archiving and Consortia Some Technical
Considerations Assumption Archiving is done by
the consortia
10Content related issues Bibliographic
databases E-Journals
11Content for Archiving Bib. Databases
- Stand-alone CD-ROM with search engine
- Intranet version (web enabled search engine)
(ERL, OVID) - Is this an adequate long term solution?
- What should we get?
- Structured data in XML format
- Supporting authority files, thesauri?
- What about links to full text articles?
12Content for Archiving E-Journals
- What should we get from publishers?
- Structured metadata in XML
- Full text content
- Formats? HTML, PDF, TIF, XML
- Issues Preservation, fulltext searching, format
migration - Citation links standards used
- Key issue Can we create the publication
architecture from the data?
13Software Possibilities
14Software solutions for archiving
- Dedicated archiving software
- Example LOCKSS
- Open source DL software
- Examples GSDL, EAS, Dspace
- Extend current intranet hosting solutions
- Examples Science Server (Elsevier), OVID/ ERL
(Ovid), JCCC (Informatics)
15LOCKSS
- Lots of Copies Keeps Stuff Safe
- Software solution Stanford University Libraries
http//lockss.stanford.edu/ - Beta 1 1999-2002. Production version release by
end of 2002 supported by NSF, Sun Microsystems,
Mellon Foundation, etc. Open source software - Tested by several libraries. Received endorsement
from several publishers.
16LOCKSS
- E-Journal archiving solution
- Implement purchase-and-own model for e-journals
- Protect interests of both publishers and
libraries - How does it work?
- Web crawler caches licensed content from
publishers site (permanent cache) publisher
specific plug-ins - Cache available for local access when online
content is no longer available - Incorporates content management support for
preservation and long term access - Cache usage can be audited by publisher
17LOCKSS
- Requirements
- Inexpensive cache servers, with adequate storage
- Archive decision to be made at the time of
subscription to enable archiving from day 1 - Publisher gives written consent for archiving and
crawling - LOCKSS tested with Science and BMJ in a few
libraries - Publishers can run LOCKSS to audit user cache
18Open source DL software
- Examples GSDL, EAS, DSpace
- Software for setting up and managing repositories
of digital material - Not tested for very large content
- Need to be adopted for archiving e-journals
- Crawler to gather e-journal content
- Batch content loading from backup content
- Access control, auditing and reporting
- Support for long term preservation of content
- Opportunity for integrating institutional
document archives with licensed content
19Extend current intranet hosting solutions
- E-Journal Solutions (e.g. Science Server)
- Support for importing other publishers content
- Access control and auditing
- Bibliographic database solutions (e.g. ERL)
- Full content capture, access control, fulltext
search, auditing - Consortia aggregation (e.g. JCCC)
- Full content capture, access control, fulltext
search, auditing
20Publisher
Raw Content
Online Content
Onsite Content
Preservation Access Architecture
Cache Access Architecture?
Preservation Software Maintenance
(LOCKSS)
21Conclusion
- Recognize need for local archiving
- Evolve clarity in technical and administrative
criteria for archiving - Archiving requires both preservation access
strategies - Work with publishers and enforce archiving at
agreement level Encourage move towards XML - Carry out pilot implementations identify
technical, non-technical requirements - Include archiving cost within consortia
management budget - Share experiences with other consortia
- Evolve short-term and long-term strategies
22Thank You