Search Engine Interfaces - PowerPoint PPT Presentation

About This Presentation
Title:

Search Engine Interfaces

Description:

... Gopher) was Archie (archive without the 'v') .. Later, after the rise of Gopher came... Jughead (Jonzy's Universal Gopher Hierarchy Excavation And Display) ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 13
Provided by: nichola72
Category:

less

Transcript and Presenter's Notes

Title: Search Engine Interfaces


1
Search Engine Interfaces
  • search engine modus operandi

2
The basics whats a search engine?
  • Search engines are special websites that are
    designed to find information stored on other
    sites
  • Most have the following capabilities
  • Search the Internet based on important words
  • Keep an index of the words they find and where
    they were found
  • Allow users to looks for words or combos of words
    in that index

3
Theres a lot of sites out there.
  • Indeed (thousands upon thousands nowadays)
  • The first search engine (for Gopher) was Archie
    (archive without the v) .. Later, after the
    rise of Gopher came
  • Veronica (Very Easy Rodent-Orientated Net-wide
    Index to Computerized Archives)
  • Jughead (Jonzys Universal Gopher Hierarchy
    Excavation And Display)

4
Theres a lot of sites out there.
  • Wandex - 1993 .. First search engine (for the
    Web)
  • WebCrawler - 1994 (let users search for any word
    in any page.. revoutionary! Now standard..)
  • Lycos - 1994 (Carnegie Mellon University)
  • Many others came after.
  • Excite, Infoseek, Inktomi, Northern Light,
    AltaVista, Yahoo!
  • Google came about around 2000 and rose to
    popularity because of its innovative PageRank
    system

5
How does it work?
  • The pieces of a search engine
  • A spider or crawler
  • Software robots that go out and visit pages on
    the web and build lists of words that they find
    on each page
  • An index
  • The data (words) that are gathered are indexed
    (by a method determined by the particular search
    engine)
  • A search
  • Usually accompanied by Boolean logic

6
Example Google
  • Claim to fame the PageRank system
  • Uses multiple spiders (initially 3 at once)
  • Spiders take note of
  • Words on the page Where they were found
  • The index consists of every significant word on
    each page
  • Google excludes the articles a, an, and the
  • Each page that is indexed is weighted according
    to the PageRank System (a link analysis algorithm
    to provide a numerical weight)
  • Searching
  • When a search is performed by a user, Google
    retrieves from its index all of the pages that
    contain those keywords AND sorts them according
    by the assigned PageRank
  • Ideally the first several sites listed will match
    your search criteria

7
Example Ask (formerly AskJeeves)
  • Claim to fame the ExpertRank algorithm (formerly
    Teoma)
  • Uses multiple spiders
  • Spiders take note of
  • Words on the page Where they were found (same
    as Google)
  • The index consists of every significant word on
    each page
  • Uses link analysis like Google
  • Each page is then also analyzed to determine its
    popularity among pages that are considered
    experts on the topic of the search. This is
    called subject-specific popularity.
  • Searching - natural language search (or
    subject-specific search)
  • When a search is performed by a user, Ask goes
    and finds the keywords in its index, figures out
    the topics (known as clusters), the experts on
    those topics, and then finds the most popular
    results among those experts
  • This leads to a unique editorial flavor to
    searching (www.ask.com)

8
Notable others AltaVista and Lycos
  • The AltaVista search engine indexes every word on
    the page - even insignificant articles such as
    a, an, and the.
  • The Lycos search engine is said to index around
    100 of the most frequently words used on the page
    as well as each word in the first 20 lines of
    text.

9
So many options
  • Google is the most used search engine on the
    Internet today. (Around 50 of queries go through
    it)
  • However, there are more efficient ways to search
  • Ask.coms subject-specific searching much better
    reflects the way the Web is set up (in subject
    specific clusters). However, because of the
    complexity of their algorithm, the search results
    produced were inferior to competitors like
    Googles PageRank system
  • Only recently has Ask began to cut into the
    search engine market share (way behind Google,
    Yahoo, and MSN) by reducing how well the keywords
    must match the results (reduced from 100 to
    about 95) This yields more search results and
    puts Ask in a better position to compete for
    market share.

10
By the numbers.
  • Below Popularity (as of 12/07)
  • Right Timeline of major launches

11
Search engines of the future.
  • Two types of searching Navigational and Research
    Search
  • Navigational search - the user uses the search
    engine as a tool to navigate to a particular
    intended document
  • Research search - the user provides the search
    engine with a phrase which is intended to denote
    an object about which the user is trying to
    gather/research information.
  • Rather than use ranking algorithms such as
    Google's PageRank to predict relevancy, Semantic
    Search uses semantics, or the science of meaning
    in language to produce highly relevant search
    results.
  • The goal is to deliver the information queried by
    a user rather than have a user sort through a
    list of loosely related keyword results.

12
Semantic Searching
  • Contingent upon correct semantic markup - and
    searching over richly structured data (ie XML and
    RDF)
  • The goal is to deliver the information queried by
    the user rather than have a user sort through a
    list of loosely related keyword results.
  • Examples www.hakia.com and www.PowerSet.com
Write a Comment
User Comments (0)
About PowerShow.com