VRC: Preservation Risk Management for Web Resources - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

VRC: Preservation Risk Management for Web Resources

Description:

the VRC toolkit needs more than just Web crawlers. VRC Toolbox ... Leverage tools beyond crawlers. Value of resource models for access & preservation ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 27
Provided by: nancy306
Category:

less

Transcript and Presenter's Notes

Title: VRC: Preservation Risk Management for Web Resources


1
VRC Preservation Risk Management for Web
Resources Nancy Y. McGovern, Archiving Web
Resources 2004
2
VRC Funding
  • Part of a 4(5)-year NSF-funded project
  • supported by the Digital Libraries Initiative,
    Phase 2 (Grant No. IIS-9905955, the Prism
    Project)
  • Also partially funded by a grant from
    The Andrew W. Mellon Foundation
  • Political Communications Web Archiving
    http//www.crl.edu/content/PolitWeb.htm

3
Current Team
  • Anne R. Kenney, Research Advisor
  • Nancy Y. McGovern, Project Manager
  • Richard Entlich, Sr. Researcher
  • William R. Kehoe, Technology Coordinator
  • Ellie Buckley, Digital Research Specialist

4
Research
  • "Preservation Risk Management for Web Resources
    Virtual Remote Control in Cornell's Project
    Prism"
  • by Kenney, McGovern, et al, in DLib Magazine,
    January 2002
  • http//www.dlib.org/dlib/january02/kenney/01kenney
    .html
  • "Virtual Remote Control
  • Building a Preservation Risk Management Toolbox
    for Web Resources"
  • by McGovern, Kenney, et al, in DLib Magazine,
    April 2004
  • http//www.dlib.org/dlib/april04/mcgovern/04mcgove
    rn.html

5
Virtual
  • because VRC develops models
  • to represent essential features of selected Web
    sites
  • that enable ongoing monitoring over time
  • to identify, respond to, and mitigate potential
    risks to the site integrity and longevity

6
Remote
  • because VRC is intended for use by cultural
    heritage institutions interested in the longevity
    of Web resources
  • that reside on remote servers
  • not owned or managed by the monitoring
    institution

7
Control
  • because at the most proactive end of the VRC
    approach, a monitoring organization may act to
    protect another organization's resources
  • by agreement or implicit consent
  • through notification and/or action

8
Types of Web Resources
  • Two types of initiatives for monitoring and/or
    capture of
  • Web-based publications Web site as a means
  • All (or a subset) of a Web site consisting of
    pages within a boundary defined by a URL - or a
    portion of one Web site as an end (VRC)

9
Risk Factors
  • Organizational Context
  • Combination of indicators
  • Monitoring (change/loss over time)
  • Triggers (events, organizational, upgrades)
  • Degradation of site management indicators

10
VRC Stages
  • Identification
  • Analysis
  • Appraisal
  • Strategy
  • Detection
  • Response

11
Human Tool Scenario
  • 1. Identification
  • Human identify Web resources of interest
  • Toolbox verify list, expand list
  • 2. Analysis
  • Toolbox crawl sites, generate characterizations
  • Human accept/revise characterizations
  • 3. Appraisal
  • Human define/review attributes of value
  • Toolbox support appraisal, capture results

12
Human Tool Scenario
  • 4. Strategy
  • Human develop/review strategies
  • Toolbox plot appraisals, compile strategies
  • 5. Detection
  • Human define risk parameters
  • Toolbox identify/assess risks propose responses
  • 6. Response
  • Toolbox propose risk response based on rules
    automatic response for some risk categories
  • Human monitor automated responses select
    response based on recommended actions

13
Risk Display Grid
14
Monitoring Layers
15
Web Crawling
  • traversing Web sites via links
  • a capability common to most tools, but with
    different purposes and results
  • the VRC toolkit needs more than just Web crawlers

16
VRC Toolbox
  • Identify tools for each stage (adopt, adapt,
    define, devise)
  • Leverage existing apply to longevity
  • Analyze steps - automated and manual
  • Formalize protocol
  • Provide a framework to map existing, plug gaps
    with developments

17
VRC Toolkit
  • Development steps
  • extensive literature review
  • development of tool categories
  • definition of categories and test protocols
  • survey existing tools for evaluation
  • select representative for testing
  • highlight findings in category summaries

18
Tool Categories
  • Link checkers
  • Site monitors
  • Web crawlers
  • Site managers
  • Change Detectors
  • Site Mappers (includes visualization)
  • HTML Validators

19
(No Transcript)
20
(No Transcript)
21
(No Transcript)
22
VRC and Other Approaches
  • Risk Management Records Management
  • Passive (monitor) ? active (capture)
  • Lifecycle support selection to capture
  • Human (curator) tool interaction
  • Structural and change models of resources
  • Promulgate preservation practices
  • Understand Web resources and risks

23
Characteristics Benefits
  • Comprehensive tracking that include archival and
    non-archival
  • Management of pre-agreement interests
  • Leverage tools beyond crawlers
  • Value of resource models for access
    preservation
  • Overkill for simple capture

24
Access
  • is not the focus of the research, but
  • Presumes OAIS environment
  • Treats pages and sites as objects
  • View instances over time, related topical
    resources over time
  • Safety net of last best version

25
Preservation
  • Capture as active option
  • Records management-like control monitor change
    at varying levels
  • Incremental capture deconstruction
  • Format-level and page-type management

26
For more info on VRC
  • nm84_at_cornell.edu
  • http//irisresearch.library.cornell.edu/VRC/
Write a Comment
User Comments (0)
About PowerShow.com