The Deep Web: An Introduction - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

The Deep Web: An Introduction

Description:

today's news headlines (newspaper front page' ... Often group results at the side or top of the screen and will usually tell you ... – PowerPoint PPT presentation

Number of Views:70
Avg rating:3.0/5.0
Slides: 18
Provided by: paulay
Category:

less

Transcript and Presenter's Notes

Title: The Deep Web: An Introduction


1
The Deep Web An Introduction
  • May 12th 2004
  • Career Development Group West Country Division

2
  • Aims of the Course
  • Introduce some current concepts of the
    'invisible' or 'deep' web
  • Give some examples of Invisible Websites
  • Offer suggestions for keeping up with Deep Web
    resources (and others!)
  • Give examples of articles and tutorials for
    further reading and research

3
Deep or Invisible?
  • What is the difference between the deep and
    invisible web?
  • Currently used interchangeably to refer to
    content inaccessible to the majority of search
    engines. Deep web probably more accurate
    information is not really invisible, just hidden
    from many of the surface search engines. Very
    occasionally called the Dark Web.

4
  • 1994 Jill Ellsworth used the phrase 'invisible
    webs' referring to marketing and how a successful
    website needed to be as visible as possible. At
    that time. http//www.brightplanet.com/deepcontent
    /deep_web_faq.asp 25.03.04
  • July 2000 Brightplanet first to mention Deep
    web.
  • September 2000 Elinor Abreu referred to the
    Deep Web, giving as an example flight
    information. http//www.findarticles.com/cf_0/m0HW
    W/35_3/66672509/print.jhtml 25.03.04
  • The Deep Web was estimated in the year 2000 to be
    500 times larger than the visible web.

5
  • 2000 Research done for Bright Planet estimated
    even leading search engines index, at best,
    approximately 16 of available Internet content
    http//www.brightplanet.com/deepcontent/tutorials/
    DeepWeb/index.asp
  • Two of the websites already mentioned are 'deep
    web' sites.
  • To find the Abreu article it was necessary to use
    a Deep web site (Findarticles.com)
  • The 'asp' at the end is a giveaway these
    websites are dynamically generated, not static.

6
  • Traditional search engines
  • locate web information in only 2 or 3 ways
    website owners submit details
  • or search engine 'spiders' documents (follows one
    hypertext link after another).
  • Spiders' cannot access information in databases
    or sites where user must register to use the
    data.

7
  • Other key Brightplanet findings
  • Visible web - 1 billion individual documents.
  • Invisible web - 550 billion.
  • Currently over 100,000 deep web sites (estimated)
    exist (probably much higher now its 2004!)
  • Growing faster than any other category of
    information on the Internet
  • Narrower in scope but deeper in content than the
    surface web
  • Over half content in topic specific databases
  • Over 95 of this content is free!

8
Characteristics of the Deep Web
  • Dynamic rather than static search engines do not
    usually index
  • today's job vacancies (job site front page)
  • what time a train is due (train timetable lookup)
  • todays news headlines (newspaper front page)
  • Information often hidden in databases or behind
    frontends where registration or access to a
    domain is required. e.g. BTs Directory Enquiries
    site or some university/company websites

9
  • Information may also be 'semi-concealed' in sites
    such as mailing lists, chatrooms or bulletin
    boards
  • Common 'deep web' sites include lexis-nexis, Wall
    Street Journal, and DIALOG, all fee paying sites.
    However there is a great deal of information
    available free of charge.
  • Many common file formats are not indexed by many
    search engines, e.g. pdfs, image files, sound
    files, and dynamically generated content.

10
Information on the Deep Web?
  • Topic Databases subject-specific stock
    exchange company information/records medical
    databases patent records business directories
    satellite databases.
  • Travel databases National Rail Network
    National Express British Airways
  • Internal site searchable databases for the
    internal pages of large sites that are
    dynamically created, e.g. knowledge base on the
    Microsoft site and university intranets,
    sometimes accessible via passwords.

11
  • Many newspapers and magazines have searchable
    databases for current and archived articles,
    hidden from major search engines.
  • Library catalogues including the British
    Library and most university libraries. The
    front door of the site is usually indexed
    content of such sites is also 'hidden' from
    search engines.
  • Yellow and White Pages the content of these
    people and business finders is also invisible to
    most search engines.

12
  • Jobs job/resumé postings usually hidden from
    search engines.
  • General Search searchable databases most often
    relevant to Internet search topics and
    information.
  • Mailing lists like those on jiscmail and weblogs
    (blogs) - some information here is also hidden
    from major search engines.
  • Other sites on the Deep Web include commercial
    sites such as the auction site ebay and Amazon
    (now with search inside the book feature)

13
How do you find Deep Web resources?
  • Deep Web search engines, e.g. turbo10, fazzle
    (also a metasearch engine)
  • Specialised downloadable software such as
    Copernic or Webferret (cut down versions are
    free)
  • Gateways or Portals such as completeplanet.com or
    invisibleweb.com
  • Search for a subject on a surface web search
    engine like google using terms such as database
    or directory

14
Metasearch engines
  • Another way to search the Internet is to use a
    metasearch engine, which searches several search
    engines at once, e.g.
  • Vivisimo Dogpile surfwax
  • Often group results at the side or top of the
    screen and will usually tell you which search
    engine the results have come from
  • Some specialised Deep Web metasearch engines
  • Turbo10 ez2www.com

15
The vanishing (and vanished) web
  • May include companies or organisations which have
    gone into liquidation or merged with others
  • May include university student pages
  • www.archive.org does have snapshots of some of
    these pages
  • Illustrates the best and worst of the web
    information is up to date, but may vanish very
    quickly. Information earlier than c. 1995 is rare
    on the web with some exceptions (Hansard some
    medical information, e.g. British Medical
    Journal)

16
Latest Developments
  • More and more people using Deep Web search
    engines
  • Pages added to surface search engines like google
  • Google proves paid-for advertising highly
    profitable
  • Dec 03 Google has added Worldcat to its database
  • Mar 04 Battle for Internet market share heats
    up Yahoo announced non-commercial partnerships
    with National Public Radio, Library of Congress
    and others - in a bid to start making the Deep
    Web more accessible

17
Structure of this course
  • The majority of the course is online.
  • Following an introduction to some Deep Web tools
    there is time to explore the links to your
    subject area.
  • At the end of the course we will recap the Deep
    Web very briefly and give some real life examples
    of how Deep Web sites helped resolve queries.
Write a Comment
User Comments (0)
About PowerShow.com