Australian web domain harvests 2005, 2006 - PowerPoint PPT Presentation

About This Presentation
Title:

Australian web domain harvests 2005, 2006

Description:

PANDORA : Domain Harvesting. Australian domain harvest .au domain, located on ... PANDORA : Australia's Web Archive. Enormous growth and volume of material ... – PowerPoint PPT presentation

Number of Views:77
Avg rating:3.0/5.0
Slides: 8
Provided by: Help84
Category:

less

Transcript and Presenter's Notes

Title: Australian web domain harvests 2005, 2006


1
Australian web domain harvests2005, 2006 2007
2
Igor Ranitovic Internet Archive engineer With
Petabox rack For Australian domain harvest
3
PANDORA Domain Harvesting
  • Australian domain harvest
  • .au domain, located on Australian servers
  • Internet Archive
  • 1st harvest June/July 2005
  • 4 weeks, 185m files, 6.69 TBs
  • 2nd harvest Aug/Sept 2006
  • 5 weeks, 596m files, 19.04 TBs
  • 3rd harvest Aug/Sept 2007
  • 4 weeks, 516m files, 18.47 TBs

4
Comparative statistics
Domain Harvests
  • PANDORA

5
PANDORA Domain Harvesting
6
PANDORA Domain Harvesting
  • Some pros
  • Retains linkages and context
  • Large scale more bytes for the buck
  • Less selectively discriminate
  • Some cons
  • High dependence on the crawler technology
  • Domain and geo-location bias (.au, geoIP)
  • Limitations in timeliness, quality assurance,
    scoping, site complexity, deep web
  • Legal and access issues to resolve

7
PANDORA Australias Web Archive
  • Enormous growth and volume of material
  • Everyone can be creators and publishers
  • Virtually instantaneous publication
  • Dynamic content and format
  • Multiplicity of formats
  • Technology dependent
  • Hyperlinked and interconnected
  • Highly accessible but hard to identify
  • Ephemeral
  • Interactivity, re-use, personalisation (web 2.0)
Write a Comment
User Comments (0)
About PowerShow.com