Search Engines: The players and the field - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Search Engines: The players and the field

Description:

Had a very successful IPO in August 2004. ... Remember its victory in the browser wars with Netscape. Developed its own search engine technology only recently ... – PowerPoint PPT presentation

Number of Views:244
Avg rating:3.0/5.0
Slides: 28
Provided by: rb7128
Category:

less

Transcript and Presenter's Notes

Title: Search Engines: The players and the field


1
Search Engines The players and the field
  • The mechanics of a typical search.
  • The search engine wars.
  • Statistics from search engine logs.
  • The architecture of a search engine.
  • The query engine.

2
Mechanics of a typical search
3
Results ads returned ranked
4
Category of first result
5
Result for phrase query
6
Search on the Web
  • Corpus The publicly accessible Web static
    dynamic
  • Goal Retrieve high quality results relevant to
    the users need
  • (not docs!)
  • Need
  • Informational want to learn about something
  • Navigational want to go to that page
  • Transactional want to do something
    (web-mediated)
  • Access a service
  • Downloads
  • Shop
  • Gray areas
  • Find a good hub
  • Exploratory search see whats there

Low hemoglobin
United Airlines
Car rental Finland
Abortion morality
7
Search Engines as Info Gatekeepers
  • Search engines are becoming the primary entry
    point for discovering web pages.
  • Ranking of web pages influences which pages
    users will view.
  • Exclusion of a site from search engines will
    cut off the site from its intended audience.
  • The privacy policy of a search engine is
    important.

Introna Nissenbaum Defining the Web The
Politics of Search Engines
Hindman et al Googlearchy How a few
Heavily-Linked Sites Dominate Politics on the Web
8
Search Engine Wars
  • The battle for domination of the web search space
    is heating up!
  • The competition is good news for users!
  • Crucial advertising is combined with search
    results!
  • What if one of the search engines will manage
    to dominate the space?

9
Yahoo!
  • Synonymous with the dot-com boom, probably the
    best known brand on the web.
  • Started off as a web directory service in
    1994,acquired leading search engine technology
    in 2003.
  • Has very strong advertising and e-commerce
    partners

10
Lycos!
  • One of the pioneers of the field
  • Introduced innovations that inspired the
    creation of Google

11
Google
  • Verb google has become synonymous with
    searching for information on the web.
  • Has raised the bar on search quality
  • Has been the most popular search engine in the
    last few years.
  • Had a very successful IPO in August 2004.
  • Is innovative and dynamic.
  • Has restored glamour in CS lost in dot-com-bust

12
Live Search(was MSN Search)
  • Synonymous with PC software.
  • Remember its victory in the browser wars with
    Netscape.
  • Developed its own search engine technology only
    recently, officially launched in Feb. 2005.
  • May link web search into its next version of
    Windows.

13
Ask Jeeves
  • Specialises in natural language question
    answering.
  • Search driven by Teoma.

14
Cuil
  • The latest kid on the block
  • Claims to have indexed 120B pages!
  • So far, it does not rank!

15
Experiment with query syntax
  • Default is AND, e.g. computer chess normally
    interpreted as computer AND chess, i.e. both
    keywords must be present in all hits.
  • chess in a query means the user insists that
    chess be present in all hits.
  • computer OR chess means either keywords must
    be present in all hits.
  • computer chess means that the phrase
    computer chess must be present in all hits.

16
Statistics from search engine logs
Statistic (Year) AltaVista (1998) AlltheWeb (2002) Excite (2001)
average terms per query 2.35 2.30 2.60
average queries per session 2.02 2.80 2.30
average result pages viewed 1.39 1.55 1.70
usage of advanced search features 20.4 1.0 10.0
17
The most popular search keywords
AltaVista (1998) AlltheWeb (2002) Excite (2001)
sex free free
applet sex sex
porno download pictures
mp3 software new
chat uk nude
18
Web search Users
  • Ill-defined queries
  • Short length
  • Imprecise terms
  • Sub-optimal syntax (80 queries without
    operator)
  • Low effort in defining queries
  • Wide variance in
  • Needs
  • Expectations
  • Knowledge
  • Bandwidth
  • Specific behavior
  • 85 look over one result screen only
  • mostly above the fold
  • 78 of queries are not modified
  • 1 query/session
  • Follow links the scent of information ...

19
Query Distribution
Power law few popular broad queries,
many rare specific queries
20
How far do people look for results?
(Source iprospect.com WhitePaper_2006_SearchEngin
eUserBehavior.pdf)
21
Architecture of a Search Engine
22
Rate of web content change
  • 720K pages from 270 popular sites sampled daily
    from Feb 17 Jun 14, 1999 Cho00

Mathematically, what does this seem to be?
What does this suggest for crawling policy?
23
Diversity
  • Languages/Encodings
  • Hundreds of languages, W3C encodings 55 (Jul01)
    W3C01
  • Home pages (1997) English 82, Next 15 13
    Babe97
  • Google (mid 2001) English 53, JGCFSKRIP 30
  • Document query topic
  • Popular Query Topics (from 1 million Google
    queries, Apr 2000)

24
Search Index - Inverted File
Frequency
  • Also store position of word in web page
    (offset) and information on HTML structure.

25
The query engine
  • The interface between the search index, the
    user and the web.
  • Algorithmic details of commercial search engines
    are kept as trade secrets.
  • First step is retrieval of potential results from
    the index.
  • Second step is the ranking of the results based
    on their relevance to the query.

26
Portal User Interface
27
Crawling the Web
Mode of crawl BFS Frequency of crawl
important robots.txt gives explicit directions
on what not to crawl Parallel machines crawl all
the time
Write a Comment
User Comments (0)
About PowerShow.com