Title: John Cox
1The Search for Quality productive Web
searching
- John Cox
- James Hardiman Library
- NUI, Galway
2The Problem
- 7.3 million new Web pages daily
- Quality varies, mainly due to ease of publication
and lack of checks - Quality is in the eye of the beholder
- Over-dependence on general search engines
- Simplistic use of search tools
3Some Usage Findings
- NUI, Galway Library survey, March 2000
- Search engines cited by 79 out of 167 respondents
- Exclusively used for, eg Nazism, defamation law,
hepatitis C - Less than 50 satisfied
- Other surveys show very simplistic use
- 33 users enter one word only
- Further 33 users enter two words only
- UK survey indicates 80 searchers waste some time
- US survey shows search rage within 12 minutes
4Key Question
- How much better than users are information staff
at finding high-quality information on the Web
and what leadership do we provide? - 5 key actions needed
55 Key Actions
- Get the best from the search engines
- Go vertical subject-specific sources
- Take time to experiment, eg helper software
- Exploit the invisible Web
- Actively promote quality searching
61 Get the Best from the Search Engines
- Understand how they work
- Know their limitations
- Use advanced features
- Search more than one
- Know when not to use them
7Search Engine Components
- Crawler follows links
- Indexer builds database
- Query processor lets us search
8Common Limitations
- Profit-oriented
- Paid entries listed at top
- Out of date
- Partial site indexing
- Technically must exclude many sites, eg
- Password-protected
- Registration needed
- Database-driven
- Hidden search facilities
9Understanding Google
- Strengths
- Coverage
- Cached pages
- File types, eg PDF,.doc,.ppt
- Relevance link popularity
- Beyond pages images, newsgroups
- Weaknesses
- Poor Boolean support
- No truncation
- Limited date searching
- Invisible search facilities
- Two pages per site displayed by default
10Google coverage
11Google search modes
Basic
Advanced
12Google file types
13Google newsgroup search
14Google cached pages 1
15Google cached pages 2
16Google Boolean limitations 1
Correct syntax medline OR embase
17Google Boolean limitations 2
Correct syntax medline embase (or use Advanced
Search)
18Google no truncation
Use clinton (tax OR taxes OR taxation)
19Google few date limits
20Google hidden features 1
Discovered at www.searchengineshowdown.com
(buried in Google help)
21Google hidden features 2
Partial URL v Specific Site Search Not possible
on Advanced Search despite Domains limit
22Other Search Engines
- Always worth searching more than one, eg
- All the Web (FAST)
- AltaVista
- Lycos/HotBot
- Northern Light (?)
- Overlap may be limited
- Different ranking criteria
232. Go Vertical specific tools
24Horses for Courses 1
25Horses for Courses 2
26Horses for Courses 3
273. Experimentation
- Try out add-on search software, eg
- BullsEye Pro
- Copernic
- Copernic Summariser
28BullsEye Pro searching
29BullsEye Pro Webliographies
30Copernic
31Copernic Summariser
324 Explore the Invisible Web
- Material, often of high quality, that general
search engines cant or wont index - Unlinked pages
- Non-HTML file types, eg audio, video, PDF
- Authenticated sites
- Databases
- Much greater in size than visible Web
33invisibleweb.com
34invisible-web.net
35WebData
36Librarians Index to the Internet
375. Promote Quality Searching
- Old sources
- Old habits
- New media
38Old Sources
39Old Habits
Concept analysis
Search strategy formulation
Critical source selection
Patience
Flexibility
Critical appraisal of search hits
40New Media
Library Web Site
E-newsletter
http//www.hw.ac.uk/libWWW/irn/irn.html
Weblog
41Towards a Brighter Future
- Automatically-generated, accurate metadata
- Smarter search engines
- More quality-sensitive
- More penetrative
- XML structured data
42References
- Sherman, Chris and Price, Gary The invisible Web
uncovering information sources search engines
can't see. Medford, N.J. Information Today,
2001. ISBN 091096551X. (accompanying database at
http//invisible-web.net) - Search Engine Watch http//www.searchenginewatch.
com - Search Engine Showdown www.searchengineshowdown.c
om