Title: Things%20You%20Just%20Have%20to%20Know%20About%20Search%20Engines
1Things You Just Have to Know About Search Engines
- Ran Hock
- Online Strategies
- May 14, 2002
- InfoToday 2002
2Things You Just Have to Know About Search Engines
- 1 - No Search Engine Covers Everything
- 2 - Different Engines "Miss" and Find
Different Things - 3 - Large Numbers Arent Necessarily Bad Searches
- 4 - All Search Engines Have Techniques That Allow
You Improve Results
3Things You Just Have to Know About Search Engines
- 5 - Metasearch engines
are not "search engines" - 6 - Google is great, but not the only one you
should use. - 7 - Some Things Change, Some Don't
41 -No Search Engine Covers Everything
- There are pages no engine covers Invisible pages
- Un-linked pages, database pages, password
protected sites, deep pages, etc. - Different engines miss" and find different
things (Point 2)
52 - Different Engines Find and Miss Different
Things
- Each engine may find something others missed.
- Even 2nd tier engines find things missed by the
top 3 - Consider the results of the following search on
erris head sailing
62 - Different Engines Find and Miss Different
Things
72 - Different Engines Find and Miss
Different Things
- Of the 20 different records retrieved by all the
engines, Google found (only) 14 (70) - Google missed 6 (30)
- If you had searched Google, then just one more
engine, your retrieval would have increased by
15 - Even HotBot found 2 the other three engines
missed.
82 - Different Engines Find and Miss Different
Things - Why ?
- Indexing "policies"
- What words and other items get indexed
- How those things are "parsed"
- Crawling differences
- Starting points
- Depth / Breadth of crawling etc.
- Spam policies
- Ranking
93 - Large Numbers Arent Necessarily Bad Searches
- Most common complaint
- Youre not obligated
- All use some form of relevance ranking
- Relevance ranking does, to some degree at least,
the same things we do to find the best items - What relevance ranking uses
103 - Large Numbers Arent Necessarily Bad Searches
- Relevance ranking uses some combination of
- Popularity
- Frequency of terms
- Weighting by field (e.g., Title counts more than
Summary) - Proximity of terms
- Weighting by size of the type
- Weighting according to the order in which the
searcher entered terms - Etc.
113 - Large Numbers Arent Necessarily Bad Searches
- Most search engines automatically enhance your
search - Automatic phrase identification
- Word variants (and/or truncation)
- Case sensitivity
- Analysis of documents in the database (links,
term association, associative networks, cluster
analysis, co-occurrence, etc.) - Etc.
12Automatic Re-Write - AllTheWeb
134- All Search Engines Provide Options for You to
Enhance Your Search
- Field Searching
- title
- URL
- date
- language
- etc.
- Boolean (yes, Boolean, which is neither
difficult nor bad)
144- All Search Engines Provide Options for You to
Enhance Your Search
- How do you know about these options
- Use the Advanced Search page
- Read the documentation
- ________________
154- All Search Engines Provide Options for You to
Enhance Your Search
- Use the Advanced Search page
16(No Transcript)
175 - Metasearch engines are not search engines
- Consider the following example of a search done
in individual engines, then in metasearch engines
18Search done for geologic resources worcester
195 - Metasearch engines are not search
engines
- Most dont search all of the largest engines
- Most dont give you more than 10 or 20 records
from each engine - Most dont convey your full query syntax to the
target engines - Most give paid sites first
- Client-side metasearch programs, e.g., Copernic
and Bulls-Eye do NOT have the above problems. - Even online metasearch engines have occasional
socially redeeming features (vivisimos
clustering).
206 - Google is Great, But Not the Only One You
Should Use
- Points 1 and 2 - No search engine finds
everything and different engines find different
things
216 - Google is Great, But Not the Only One You
Should Use
- Great Because of
- Size
- Popularity-based ranking
- Unique content
- newsgroups
- PDFs and other file types
- largest image collection
- Dandy little features like addresses,
definitions, etc. - Pretty good search options
226 - Google is Great, But Not the Only One You
Should Use
- But Doesnt Have
- Everything
- Truncation and NEAR that AltaVista has
- As much news coverage as AllTheWeb
- As much currentness as AllTheWeb (maybe)
- Etc.
237 - Search Engines Change
- In some ways a lot, in other ways very little
247 - Search Engines Change
- Areas of little change
- For most engines How they do basic things such
as phrases, Boolean, truncation, field searching
etc.
257 - Search Engines Change
- Areas of frequent/considerable change
- Some come, some go
- Gone Go/InfoSeek et al.
- Arrived WiseNut, Teoma
- How things are arranged on the home page (esp.
AltaVista) - Partners (which directory they use, featured
partners and tools, etc.) - Added content, esp, content types (PDFs,
newsgroups, etc. in Google.)
26In Summary
- 1 - No Search Engine Covers Everything
- 2 - Different Engines "Miss" and Find Different
Things - 3 - Large Numbers Arent Necessarily Bad Searches
- 4 - All Search Engines Have Techniques That Allow
You Improve Results - 5 - Metasearch engines are not "search engines"
- 6 - Google is great, but not the only one you
should use. - 7 - Some Things Change, Some Don't
27- Ran Hock
- Online Strategies
- 1-800-871-4033
- www.onstrat.com
- ran_at_onstrat.com