INLS102 Week11: Info Retrieval in Practice: Understanding Search Engines - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

INLS102 Week11: Info Retrieval in Practice: Understanding Search Engines

Description:

What makes a good search engine? Any good search approach or strategies not covered today? ... think will be the next big thing in search? Boolean logic (will ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 14
Provided by: ils2
Learn more at: http://ils.unc.edu
Category:

less

Transcript and Presenter's Notes

Title: INLS102 Week11: Info Retrieval in Practice: Understanding Search Engines


1
INLS102 Week11Info Retrieval in
PracticeUnderstanding Search Engines
  • Date 11/10/05
  • Instructor Leo Cao
  • SILS, UNC-Chapel Hill

2
Search engines basics
  • All Internet search engines does the following
  • They search the Internet -- or select pieces of
    the Internet -- based on important words.
  • They keep an index of the words they find, and
    where they find them.
  • They allow users to look for words or
    combinations of words found in that index.
  • http//www.howstuffworks.com/search-engine.htm/pri
    ntable
  • You are NOT searching the entire web, you are
    only searching the index of the web created by
    that particular search engine

3
Types of search tools
  • Search engines (e.g., google)
  • Built by computer robot programs
  • Sub-directories (e.g., Yahoo)
  • Built by human selection
  • Organized into subject categories
  • Invisible web (or deep web)
  • Not all webpages are searchable
  • Some specialized databases, and excluded pages
  • http//www.lib.berkeley.edu/TeachingLib/Guides/Int
    ernet/InvisibleWeb.htmlWhat

4
Search engines mechanics
  • Spiders (software agent-programs) crawl (read
    and index) the web to build and update the
    information for the search engine
  • More likely to index traffic-heavy sites
  • ltmetagt info generally get more indexing priority
  • Any data linked on the WWW is fair game to be
    indexed
  • You can tell the visiting spider not to index
    particular pages by creating a robot exclusion
    file (robot.txt), see short tutorial for more
    info
  • http//www.searchengineworld.com/robots/robots_tut
    orial.htm
  • The search engine uses ranking algorithms to
    filter your search results

5
Search engines mechanicshttp//www.howstuffworks
.com/search-engine.htm/printable
6
Search engines ranking
  • POPULARITY RANKING of search results
  • by how many other sites link to each page
  • Googles PageRank (their mathematical ranking
    algorithm)
  • RELEVANCY RANKING of search results
  • factors such as how often your terms occur in
    documents, whether they occur together as a
    phrase, and whether they are in title or how near
    the top of the text.
  • SUBJECT-BASED POPULARITY RANKING of search
    results
  • the links in pages on the same subject are used
    to in ranking search results

7
Search engines major types
  • Keyword based google etc
  • Directory based yahoo etc
  • Alternate approaches
  • Natural language ask jeeves
  • Keyword refine teoma
  • Keyword relationship kartoo
  • File specific searches are starting up (google
    image search etc)
  • Search Engines Google, AllTheWeb, Yahoo, MSN
    Search, Lycos, Ask Jeeves, AOL Search, Teoma,
    WiseNut, AltaVista, HotBot, Netscape Search

8
Effective search strategies
  • Analyze your topic to decide where to begin
  • Include more related terms
  • Phrase Searching by enclosing terms in double
    quotes
  • AND/OR/NOT searching, capitalized
  • -excludes, requires exact form of word
  • Limit results by language in Advanced Search
  • Field limiting
  • link, site, allintitle , intitle , allinurl
    , inurl, etc.

9
Effective search strategies 2
  • Search engines is not the only way to search,
    people is also a resource
  • In the mean time, search engines is very
    efficient for low-cost quick searches
  • Search approach by need,
  • http//www.noodletools.com/debbie/literacies/infor
    mation/5locate/adviceengine.html

10
General discussion
  • What makes a good search engine?
  • Any good search approach or strategies not
    covered today?
  • What do you think will be the next big thing in
    search?

11
Boolean logic (will be on final)
  • Three operators
  • AND
  • OR
  • NOT
  • If the dark center area is A AND B AND C
  • Whats the other regions?
  • A AND B NOT C ?
  • A OR B NOT C ?
  • http//library.albany.edu/internet/boolean.html

12
Assignment 6
  • See assignments page

13
Group time
  • Extra time to work on your group work
Write a Comment
User Comments (0)
About PowerShow.com