The Invisible Web - PowerPoint PPT Presentation

About This Presentation
Title:

The Invisible Web

Description:

http://www.scils.rutgers.edu/~tefko (contains also a list of sites relevant to the topic and this presentation) ... Babel Fish http://babelfish.altavista.com/tr ... – PowerPoint PPT presentation

Number of Views:95
Avg rating:3.0/5.0
Slides: 24
Provided by: tefkosa8
Category:

less

Transcript and Presenter's Notes

Title: The Invisible Web


1
The Invisible Web
  • Tefko Saracevic, PhD
  • Rutgers University
  • http//www.scils.rutgers.edu/tefko
  • (contains also a list of sites relevant to the
    topic and this presentation)

2
What is invisible Web?
  • Materials that general search engines cannot or
    WILL not include in their collection of Web pages
    (indexes)
  • You cannot find through general search engines
  • Contains a vast amount of information
  • much of it authoritative, qualitative

3
Why search engines miss?
  • Size Web is huge, cannot cover all
  • Economics associated costs are high
  • also pay per crawl rank
  • Technical still limited capabilities
  • Spam eliminating bad also looses good
  • Restrictions some site do not let in
  • Deep structure some sites complex

4
Web size - who knows?
  • Estimated over 16 million web servers
  • Lawrence Giles, 1999
  • But only a fraction of direct search relevance
  • Domains of sites
  • 83 commercial, 6 scientific or educational 3
    health
  • 2.5 personal 2 societies 1.5 government,
  • about 1 each community, religion
  • 1.5 pornographic
  • Web Characterization Project - OCLC
  • statistics, trends, report, links for 2001
    reports 8.5 mill web sites
  • http//wcp.oclc.org/

5
Organization of sources
  • No standardization across sources
  • Major approaches in search engines
  • classification many directory types used
  • statistical analyses of terms, links
  • Metatags in sources
  • to enable retrieval by fields
  • HTML keywords, description
  • 34 of sites use them
  • Dublin core - .3 sites use
  • Organization hindrance to retrieval
  • also faked contents to force retrieval

6
Sources search engines
  • Indexed by search engines (publicly indexed)
  • by terms, selection, links, registration
  • Not publicly indexed
  • many domain sources will not be found e.g digital
    libraries, online journals, reference
  • many commercial sites will hardly be found
  • Differing approaches to inclusion/selection
  • mostly automatic also generic source providers
  • increasingly added human evaluation selection

7
Search engine coverage
  • No engine covers more than 16 of WWW
  • In respect to combined coverage of 11 top
  • Northern Light 38.3 Snap 37.1 AltaVista 37.1
    HotBot 27.1 MS 20.3 Infoseek 19.2, Google 18.6,
    Yahoo 17.6 Excite 13.5, Lycos 5.9, EuroSeek 5.2
  • HotBot MS, Snap Yahoo use Inktomi as search
    provider, but have different filtering Inktomi
    databases
  • Northern Light has special collection -
    documents not part of publicly indexabable web
  • Hard to discern compare coverage
  • Many national search engines - own coverage

8
Meta search engines
  • Search engines that cover search engines many
    around e.g.
  • All4one http//all4one.com/
  • four windows - good for comparison
  • CDNET Search.com ttp//www.search.com/
  • meta engine of meta engines - customization
  • Search Engines Worldwide http//www.twics.com/tak
    akuwa/search/search.html
  • 174 countries, over 1300 engines
  • More on the horizon differing

9
Major source for invisible Web
  • Book
  • Chris Sherman Gary Price (2001). Invisible Web
    Uncovering information sources search engines
    cant see. Information Today
  • Site
  • www.invisible-web.net

10
Specialized meta engines
  • Selective with directories large number of
    databases search engines
  • Complete Planet http//completeplanet.com
  • Invisible Web http//invisibleweb.com
  • In the U.S. federal information via Government
    Printing Office Access http//www.gpo.gov/gpoacces
    s
  • Federal Bulletin Board (file libraries for
    download from many agencies) http//fedbbs.access
    .gpo.gov

11
Reference (expert) services
  • Reference services - several models
  • QA, directories, email answers etc. e.g.
  • Martindales Reference Desk - comprehensive
  • http//www-sci.lib.uci.edu/martindale/Ref.html
  • Ask Jeeves! most popular http//www.ask.com/
  • Ask ERIC education questions- email answers
  • http//www.askeric.org/Qa/
  • Information Please - almanac type questions
    http//www.infoplease.com/
  • Academic libraries developing reference models -
    new service area

12
Libraries as Web sources
  • Academic libraries providing open collections
    services models vary
  • Rutgers libraries - big long term effort
    http//www.libraries.rutgers.edu/
  • various sources links involved
  • for domain information sources go to
  • Electronic Reference Sources Subject Research
    Guides Social Sciences Law Library
    Information Science
  • University of California, Berkeley - a most
    elaborate effort together with Sun Corporation
    http//sunsite.berkeley.edu/

13
Virtual libraries on the Web
  • Libraries emerging only on the Web
  • More more libraries organizations involved
  • Examples of academic public libraries
  • Virtual Library - Switzerland, US, UK other
    countries oldest virtual library on the Web
  • http//vlib.org
  • Toronto Public Library
  • Internet Public Library, Michigan
  • http//www.ipl.org/

14
Domain sites
  • Many domain/issue specific sites
  • rich often unique coverage services
  • different approaches requirements
  • Examples in health related domains
  • Medscape - registration required
  • http//www.medscape.com/
  • Rxlist - The Internet Drug Index
  • http//www.rxlist.com/
  • Mayo Clinic HealthOasis http//www.mayohealth.org
    /

15
Societies, organizations , publishers
  • Great many rich sources for searching
  • differences in requirements, depth, richness
  • Examples from variety of organizations
  • Assoc. for Computing Machinery http//www.acm.org/

  • Digital Library subscription or registration
  • State department http//www.state.gov/
  • about the U.S other countries
  • R.R. Bowker http//www.bowker.com/
  • Free Resources from Bowker Library Resource
    Guide
  • Genealogy http//www.familysearch.org/

16
Language barriers on the Web
  • English still the major language
  • but declining, now slightly over 50
  • Multilingual retrieval search engines
  • Euroseek searches 40 languages
    http//www.euroseek.com/
  • All the Web 45 languages http//www.alltheweb.co
    m/
  • in both, search in different languages covers
    primarily their language sources

17
Language barriers translations
  • A number of translation sites
  • machine aided i.e. plug in terms, phrases,
    sentences in one review in the other language ,
    but effectiveness???
  • Free Translations http//www.freetranslations.com

  • Babel Fish http//babelfish.altavista.com/tr
  • Travlang great for travelers phrases
    http//www.travlang.com

18
News sources about the Web visible invisible
  • The Virtual Acquisition Shelf News Desk
    http//resourceshelf.blogspot.com/
  • Free Pint http//www.freepint.com/
  • ResearchBuzz. http//www.researchbuzz.com/index.sh
    tml
  • Internet Resources Newsletter. http//www.hw.ac.uk
    /libwww/irn/
  • Search Engine Watch. http//www.searchenginewatch.
    com/

19
Sample of great sources for invisible Web
  • Direct Search. http//gwis2.circ.gwu.edu/gprice/d
    irect.htm
  • eLibrary. http//ask.elibrary.com/
  • The Scout Report. http//scout.cs.wisc.edu/
  • Museum of online museums. http//www.coudal.com/ar
    chives/museum.html
  • Librarians index to the Internet.
    http//www.lii.org/
  • Profusion. http//www.profusion.com/
  • Research Index. http//www.researchindex.com/
  • Cybercafe Search Engine. http//www.cybercaptive.c
    om

20
Needed for Web searching in general
  • Knowledge competencies
  • variety of Web sources
  • their organization
  • search engines
  • Web search strategies
  • search dynamics, feedback
  • Keeping up up up
  • constant updates, changes, innovations
  • many domain/subject specific

21
Needed for Web searching by professionals
  • Knowledge of SOURCES in area of interest
  • search engines not enough
  • not too helpful in finding these other sources
    structure hard to discern
  • Evaluation of sources
  • a key professional skill!
  • standard criteria quality, veracity, coverage
    etc
  • plus Web criteria
  • authority accuracy currency (timeliness)
    objectivity coverage, persistence, usability

22
competencies
  • Knowledge of users use
  • Knowledge of searching
  • Use of technology
  • Adaptability, flexibility
  • Integration with other resources
  • Teaching others
  • Constant learning update

23
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com