The Web is a Mess: or How I Learned to Stop Worrying and Love Web Archiving - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

The Web is a Mess: or How I Learned to Stop Worrying and Love Web Archiving

Description:

The Web is a Mess: or How I Learned to Stop Worrying and Love Web Archiving Lori Donovan, Internet Archive – PowerPoint PPT presentation

Number of Views:83
Avg rating:3.0/5.0
Slides: 34
Provided by: AndrewK81
Learn more at: https://connect.ala.org
Category:

less

Transcript and Presenter's Notes

Title: The Web is a Mess: or How I Learned to Stop Worrying and Love Web Archiving


1
The Web is a Mess or How I Learned to Stop
Worrying and Love Web Archiving
Lori Donovan, Internet Archive
2
About Internet Archive
  • We are a Digital Library
  • Mission Statement Universal access to all
    knowledge
  • Founded by Brewster Kahle in San Francisco,
    California in 1996
  • Largest publicly available web archive in
    existence
  • Officially designated a Library by the State of
    California in 2007

3
What is Web Archiving?
  • The goal of web archiving is to document changes
    to web resources over time, archive them and make
    them accessible.

4
What is a Web Archive?
  • A web archive is a collection of archived Urls
    grouped by theme, event, subject area, or web
    address.
  • A web archive contains as much as possible from
    the original resources. It is a priority to
    recreate the same experience a user would have
    had if they had visited the live site.

5
Why Web Archiving?
  • Billions of people around the world have grown
    accustomed to using the web as their primary
    resource to acquire information.
  • The web is a crucial part our culture and our
    social fabric, and we dont want to lose any of
    it, so it is essential that we collect and
    preserve these digital resources and make them
    accessible in creative ways.
  • The availability of this digital information is
    taken for granted and it is a fallacy that if
    something is on the web it will be there forever.

6
Limited lifespan of a webpage
  • It is a a fairly common misconception that
    content that exists on the web will remain there
    forever.
  • A report in Scientific American claims 44 days.
  • A subsequent academic study in IEEE suggests 75
    days.
  • A Washington Post article indicates the number is
    100 days.
  • Over 95 of government information today is
    born-digital. But less than 50 is being
    maintained with an active preservation plan.
    State of the Federal Web Report

7
Historically important events for researchers and
scholars
  • Much of the record of any historic event in
    todays world is born digital. And many items
    born in print are also available in digital form,
    or soon will be. To understand major world
    eventsnot only disasters but political
    upheavalsand to keep a record and a memory of
    them for survivors, for scholars, for
    policy-makers, and for a wider public, it is
    simply essential that we collect and preserve
    these digital resources and make them accessible
    in creative ways.
  • Andrew Gordon, Harvard University.

8
Its a requirement.
  • Records Retention policy. Several state and
    federal laws or policies require universities to
    maintain various statistics and reports.
  • Responsibility preserve things like course
    information, course roster information and
    policies documents now showing up only as
    digital content

9
The Role of Libraries
  • Libraries and archives have long collected
    information that serve scholars and the general
    public in understanding history, culture, and
    society.
  • So much of today's information is easily (and
    only) found on the world wide web -- web pages
    have replaced hard copy records and documents,
    blogs are today's diaries, and newspapers and
    socio-political commentary exist solely online.
  • As part of an effort to appropriately document
    and capture today's information for tomorrow's
    use, institutions must adopt a web archiving
    strategy.
  • However, for many institutions, the prospect of
    capturing and storing web pages, websites, or
    entire web domains is a daunting prospect

10
About Archive-It
  • First deployed in February 2006
  • Web based application allowing users to create,
    manage and preserve collections of digital
    content
  • Includes tools for selection and scoping,
    harvesting, cataloging with metadata, full text
    search, and QA
  • Ability to capture content using 10 different
    crawl frequencies
  • Archived content includes html, videos, audio,
    PDF, images, social networking sites, online
    newspapers
  • View archived content within 24 hours after a
    capture is complete
  • Annual subscription service, includes hosting,
    access and storage (primary and back-up)

11
Who Uses Archive-It?
205 partners around the world in 43 U.S. States
and 15 countries
12
  • How Partners Use Archive-It

13
Archive-It Use Cases
  • Essential part of a mandate to capture and
    preserve institutional memory and history.
    Construct an historical record of an
    institutions web presence over time.
  • Capture state/ local agency publications that
    arent being deposited in print form. Collect and
    aggregate state/ local government websites and
    presence.
  • Capture websites that relate to
    historical/traditional collections and link them
    with existing collections around the same
    thematic focus.
  • Create a thematic/topical web archive on a
    specific subject or event, including different
    perspectives and social commentary (tweets,
    blogs, comments). Gather thematically-related
    resources of value to researchers and scholars
  • Support an electronic records system to meet
    record retentions requirements.
  • Closure crawls

14
Stanford University/New York UniversityIslamic
Middle Eastern Collection
  • Purpose harvest and preserve Iranian Blogs
  • Archiving 300 blogs written by and for Iran and
    the Iranian people
  • Includes coverage of 2009 Iranian elections and
    the current Middle East unrest

15
Stanford University/New York UniversityIslamic
Middle Eastern Collection
16
(No Transcript)
17
University of Texas at Austin LAGDA
  • Purpose Archive documents from 18 different
    countries, 300 government ministries/presidencies.
  • Content includes
  • Full-text versions of official documents
  • Original video and audio recordings of key
    regional leaders
  • Thousands of annual and "state of the nation"
    reports
  • Specific collections for Latin American elections
    and political parties

18
University of Texas at Austin LANIC Honduras
Presidential site 2008 (before the Coup)
19
University of Texas at Austin LANIC Honduras
Presidential site 2009 (during the Coup)
20
University of Texas at Austin LANIC Honduras
Presidential site (after the Coup)
21
Electronic Literature Organization
  • Purpose archive born digital literature
    works created explicitly for the computer.
  • ELO seeks to foster and promote the reading,
    writing, teaching, and understanding of
    literature as it develops in a digital
    environment
  • Content includes individual works, collections
    and journals, poems and stories

22
Electronic Literature Organization
23
Indiana University
  • Purpose archive all university records to
    maintain strong electronic records systems
  • Main university website, 8 different campus
    websites and other organizations on campus
    university culture, teacher blogs, student
    groups, and online publications

24
Indiana UniversityMain University website
25
Columbia University
  • Purposes
  • Archive copies of its university web presence in
    order to meet required mandates
  • Archive websites on thematic/topical subjects.

26
Columbia University Human Rights Collection
27
Columbia UniversityAvery Architectural Fine
Arts Library
28
Columbia University Archives Collection
29
North Carolina State Archives State Library
of North Carolina
  • Purpose archive state agency websites and
    publications
  • Includes pages in a variety of formats text,
    images, audio, video and social networking sites

30
North Carolina State Archives State Library
of North Carolina
31
Access to Collections
  • Partners
  • Can view through private web application with
    login/password
  • General Public
  • Can view from Archive-It website
    http//www.archiveit.org/
  • Can view from organizations website from a
    landing page that links back to Archive-It hosted
    data
  • Host from organizations own servers
  • -Restricted and private access options are
    available

32
Whats next for Archive-It
  • Collaboration and Partnerships
  • Web application development
  • Continue to develop features and functionalities
    requested by partners
  • Enhance our preservation policy/access model
  • Integrate our data with partners external
    services, systems and catalogs

33
Thank you!Lori DonovanPartner
Specialistlori_at_archive.org
Questions?
Write a Comment
User Comments (0)
About PowerShow.com