Search Engines for Intranets - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Search Engines for Intranets

Description:

Features of search engines. Choosing the right search engine ... The features available within a tool should be made use of properly to get maximum benefits ... – PowerPoint PPT presentation

Number of Views:81
Avg rating:3.0/5.0
Slides: 34
Provided by: drtbraj
Category:

less

Transcript and Presenter's Notes

Title: Search Engines for Intranets


1
Search Engines for Intranets
  • Types of search engines
  • How search engines work?
  • Features of search engines
  • Choosing the right search engine
  • Free and commercial search engines
  • Demo of ht//dig and mg

2
Types of Search Engines
  • Internet Search Engines
  • Crawl, index and search the entire Internet.
    Eg.Altavista, Lycos, Infoseek
  • Intranet Search Engines
  • Crawl and index internal web servers and/or
    portions of these servers to create custom,
    searchable index of the documents and data housed
    on the servers. Eg. Ht//dig, Swish
  • Website indexing (e.g. Library website)
  • Indexing textual databases (e.g. bibliographic
    and full text files)

3
Internet search engines
4
Intranet search engine
5
Types of Search Engines
  • Intranet search engines are unique from Internet
    search engines in the following ways
  • - Often provide indexing for many document
    formats such as PDF, word processing, spread
    sheets, databases, graphics
  • - The indexing process is probably deeper
    than its Internet counterpart

6
Types of Search Engines
  • Why Search Engines for Information
    professionals?
  • Knowledge about indexing and searching process
    helps in implementation and evaluation of
    intranet search engines
  • They need to familiarise themselves with the
    products available and the issues surrounding
    their selection, implementation and use

7
Types of Search Engines
  • Why Search Engines for Information
    professionals?
  • In-depth knowledge of searching techniques,
    including use of controlled vocabulary, Boolean
    operators, proximity operators, and relevancy
    ranking, is necessary for evaluation
  • An understanding and experience with standard
    indexing practices and parameters can also
    ensure that the data contained in the various
    indexes built on a corporate intranet will
    facilitate accurate and efficient data retrieval

8
How search engines work?
  • Intranet search engines operate in a manner
    similar to information retrieval systems (Fig 1)
  • Components of a search engine The Gatherer,
    Indexer and the Search engine (Fig 2)
  • Gatherer Gatherer or Crawler, gathers content
    descriptors from the document collection. In case
    of html files it follows links to other pages
    within the site. This is called a site being
    "spidered" or "crawled." In case of remote
    indexing the gatherer returns to the site on a
    regular basis.

9
Fig. 1
10
Fig 2
11
How search engines work?
  • Indexer
  • Everything the gatherer finds goes into the
    second part of a search engine, the index. The
    index, also called as the catalog, contains all
    the descriptors that the gatherer finds.
  • Search engine
  • This is the program that sifts through the
    millions of descriptors recorded in the index to
    find matches to a search. They also support free
    text indexing and relevance ranking. This process
    is shown in Fig 3a and Fig 3b

12
Fig 3a
13
Fig 3b
14
Features of search engines
  • Technical functionality
  • Indexing features
  • Search features
  • Results display
  • Costs, licensing and registration requirements
  • Unique features (if any)

15
Features of search enginesTechnical
functionality
16
Features of search enginesIndexing features
17
Features of search enginesIndexing features
18
Features of search enginesSearching features
19
Features of search enginesSearching features
20
Features of search engineResults Display
21
Choosing the right search engine
  • Checklist of factors to be considered while
    selecting the search engine
  • Size of the website
  • Technical expertise available (local and/or from
    the supplier / developer)
  • System platforms available
  • Information sources and services to be supported
  • Document collection type, volume (now and in
    future)
  • Indexing, search and display requirements

22
Choosing the right search engine
  • Checklist of factors to be considered while
    selecting the search engine
  • User community to be served
  • Differentiate between the need for indexing the
    web site pages and the need for indexing
    databases / document collections (text,
    bibliographic, DBMS, etc.)
  • Support for the concept of a "record" by the
    search engine.
  • Support for structured fields and metadata
  • Cost

23
Choosing the right search engine
  • Steps in the selection and procurement of search
    engines
  • - Conduct a needs analysis.
  • - Talk to other libraries
  • - Attend trade shows and talk to vendors
  • - Read the literature that reviews search
    engines.
  • - Compile a list of possible products.
  • .

24
Choosing the right search engine
  • Steps in the selection and procurement of search
    engines
  • Compare the functionality of each product to the
    criteria you developed through needs analysis
  • Narrow your list down to three possible products.
  • Spend additional time learning about each
    product.
  • Invite the vendors in for demonstrations.
  • Ask for references and follow up with each
    reference
  • Select product and implement.
  • Follow up with end users.
  • Continue an on going review with end users.

25
Choosing the right search engine
  • Some Suggestions
  • The search system development or selection should
    be based primarily on the local needs
  • Consider using freeware search engines, if your
    requirements are met by these.
  • For large, highly developed intranet sites, you
    may like to consider commercial search engines
  • Consider if the webserver you are using supports
    indexing and search, and if this is adequate for
    you.

26
Choosing the right search engine
  • The IT Professionals should make an effort to
    keep themselves abreast of the current web
    technologies
  • The features available within a tool should be
    made use of properly to get maximum benefits
  • Carefully consider interrelations between the
    three major components document resources, users
    and the search engines.

27
Free and commercial search engines
  • For bibliographic and textual databases
    (multi-record files)
  • MG (Managing Gigabytes) (www.mds.rmit.edu.au/mg/)
  • Free-WAIS-sf (www.wsc.com/freeWAIS-sf/fwmain.html)
  • I-search (www.cnidr.org/ir/isearch.html)
  • WWWISIS (www.bireme.br/wwwisis2.htm)

28
Free and commercial search engines
  • For HTML and text files (web site indexing and
    file/directory level indexing)
  • SWISH-E (sunsite.berkeley.edu/SWISH-E/)
  • ht//Dig (htdig.sdsu.edu/)
  • Excite For Web Servers (www.excite.com/navigate/)
  • WebGlimpse (glimpse.cs.arizona.edu/webglimpse/
  • For structured/formatted data
  • - MYSQL (www.tcx.se/)

29
Free and commercial search engines
  • Commercial search engines
  • AltaVista (www.altavista.digital.com/)
  • Fulcrum (www.fulcrum.com/ )
  • Infoseek (software.infoseek.com)
  • Open Text (www.opentext.com/)
  • Oracle (www.oracle.com/)
  • PLS (www.pls.com/)
  • Verity (www.verity.com/)

30
Search engines Related sources
  • Boeri, Robert J. Intranet searching A light at
    the end of the tunnel. EMedia Professional, June
    1998, pp. 63-69.
  • Esler, Sandra L. and Nelson, Michael L. NASA
    indexing benchmarks evaluating text search
    engines. Journal of Network and Computer
    Applications, 20, 1997, pp. 339-353.
  • Hibbard, Justin. Applications--Straight Line to
    Relevant Data--Customized Content Should Slash
    Intranet Search Time. Information Week, November
    17, 1997.
  • Nance, Barry. Internal Search Engines Get You
    Where You Want To Go. Network Computing, October
    8, 1997

31
Search engines Related sources
  • Railsback, Kevin. Serving Up Quality
    Searches--Six Server-based Packages for Adding
    Search Capability to a Website. Internet
    Computing, February 16, 1998.
  • Sullivan, Danny. Search Engine Solutions for Your
    Site--Make Your Site Easy to Search with an
    Assortment of Features and Techniques.
    NetGuide, December 1, 1996
  • Zor, Peggy et. al. Surfing corporate intranets
    Search tools that control the undertow. Online,
    May/June 1997, pp. 30-51

32
Intranet search engine ht//dig
  • Developed in 1995 at San Diego State University
    as a way to search the various web servers on
    the campus network.
  • The current release is htdig-3_1_3_tar.gz and
    is available at
  • htdig.sdsu.edu/files/htdig-3.1.3.tar.gz
  • The ht//Dig system is a complete world wide web
    indexing and searching system for a small domain
    or intranet.
  • It contains four program modules viz., htdig
    (retrieves HTML documents), htmerge (creates
    document index word database),
  • htfuzzy (creates indexes for differentfuzzy''
    search algos), htsearch (search engine.)

33
Intranet search engine MG
  • Developed in 1994 by Tim C. Bell, University of
    Canterbury, Alistair Moffat, University of
    Melbourne, Ian Witten, University of Waikato and
    Justin Zobel, RMIT.
  • Current version is 1.2.1
  • MG software is a collection of programs that
    through the use of compression provide economical
    storage and indexing for large collections of
    documents as well as fast index construction and
    query processing.
  • It can be obtained via anonymous ftp from the
    Australian archive host munnari.oz.au
    128.250.1.21 from the directory /pub/mg and
    the documentation is available at
    www.mds.rmit.edu.au/mg/
  • It consists of three program modules Mgbuild
    (database creation),
    Mgquery (database search), Mgmerge
    (database updation)
Write a Comment
User Comments (0)
About PowerShow.com