Title: Searching the Web I
1Searching the Web I
- Last time (June 3oth)
- HTML Links, Tables, Lists, and Fonts
- User-friendly Web page design
- This time
- How to find things on the Web
- Overview of search engines
- Use of META tags (time permitting)
2Coming Attractions
- Next week (July 14th)
- Collecting data via the Web FORMS (Pomeroy,
Link) - Graphics formats (Kreuz)
- ?More HTML/Web stuff (Kreuz)?
- In two weeks (July 21st)
- Psychology resources on the Web (Whitten)
- Database software FileMaker (Durrence)
- Scanning or digital photography (Klettke)
3Reminders/Caveats
- Be sure to send me your handouts and PowerPoint
files - Dont use funky/weirdo fonts that I dont have
- Make sure presentations are big enough to be seen
4Finding Information on the Web
- Problematic because there is no card catalog
(and everybody donates books) - Four ways to find things
- (1) Hand-built directories
- (2) Search engines
- (3) Meta-search engines
- (4) Smart search engines
- As always, pros and cons with each
5Desirable Characteristics
- The ideal a search returns all and only the
information youre looking for - The reality most searches return huge numbers of
matches, and none of these may be what youre
looking for - Signal detection metaphor the best situation
would be lots of hits, no misses, and few
false alarms
6(1) Hand-Built Directories
- Real people review and organize Web sites
- Pros
- Coherent directory
- Some include reviews or comments
- Cons
- Relatively small database
- Freshness problems
7Hand-built Directories II
- About.com (was The Mining Company)
- http//www.miningco.com/
- Looksmart
- http//www.looksmart.com/
- Magellan
- http//magellan.excite.com/
- Yahoo!
- http//www.yahoo.com/
8(2) Search Engines
- Use programs (called spiders, robots,
worms, or trawlers) to periodically explore
the Web - They index a pages location, title, and a
variable amount of text - Results stored in huge online databases
- Can be searched by users using keywords or search
terms
9Search Engines II
- Pros
- Much larger indices than the hand-built
directories - Fresher links
- Cons
- No intelligence - OBrien example
- No organization - you have to figure out whether
a site is relevant
10Major Search Engines I
- AltaVista
- http//www.altavista.com/
- Excite
- http//www.excite.com/
- FAST Search
- http//www.alltheweb.com/
- HotBot (uses Inktomi)
- http//www.hotbot.com/
11Major Search Engines II
- Infoseek
- http//infoseek.go.com/
- Lycos
- http//www.lycos.com/
- Northern Light
- http//www.northernlight.com/
- WebCrawler
- http//www.webcrawler.com/
12Okay - Which is Best?
- In theory, the engine that indexes the most pages
- But many of these pages may be dead or not
relevant - Chart of size as of of May 1st, 1999
13KEY AVAltaVista, NLNorthern Light,
INKInktomi, FASTFAST, EXExcite, LYLycos,
ISInfoseek, WCWebCrawler
14 (3) Meta-search Engines
- A query is sent to a number of search engines,
and the results compiled - Pros
- Avoids the idiosyncrasies of any one search
engine - Cons
- Cant do complicated Boolean searching
- Can take a little longer than using one search
engine
15Two Major Meta-search Engines
- MetaCrawler (now part of Go2net)
- http//www.go2net.com/search.html
- Uses About.com, AltaVista, Excite, GoTo.com,
Infoseek, LookSmart, Lycos, Thunderstone,
Webcrawler, Yahoo! - Dogpile
- http//www.dogpile.com/
- Uses About.com, AltaVista, Direct Hit, Dogpile
Open Directory, GoTo.com, InfoSeek, LookSmart,
Lycos, Lycos' Top 5, Thunderstone, Yahoo!
16(4) Smart Search Engines
- Attempt to identify authorities (pages that
many other pages point to) on a particular topic - Pros
- Fewer false alarms
- Cons
- Fewer hits (but highly relevant ones)
17Smart Search Engines II
- Google!
- http//www.google.com/
- My current favorite
- Pros
- Includes cached pages (no more file not found
messages!) - Can perform backwards searches
- Cons
- Smallish database
18How do I Keep Up?
- Search Engine Watch
- http//www.searchenginewatch.com/
- Search Engine Update
- www.netins.net/showcase/phdss/search/engine/
19Other Approaches
- Stand-alone programs
- Copernic 99
- PC Mac version in beta
- Build your own meta-search engine
- http//www.copernic.com/index.html
- Sherlock
- Mac only - part of OS 8.5
- Uses plug-ins for many Web sites
- Can also index the contents of hard drives
20Finding an E-mail Address
- Four popular directory sites
- Infospace
- http//www.infospace.com/
- Switchboard
- http//www.switchboard.com/
- WhoWhere?
- http//www.whowhere.lycos.com/
- Yahoo! People Search (was Four11)
- http//people.yahoo.com/
21Searching Newsgroups
- Deja.com (was DejaNews)
- http//www.deja.com/home_ps.shtml
- Archive of the entire Usenet, going back to March
1995 - Dont post a flame to a newsgroup - it will exist
forever!
22Will the Spiders Find Me??
- Probably eventually.
- You can submit your URL to various search engines
directly for example - Submit It!
- http//www.submit-it.com/
- But unnecessary, if youre patient
23But the Spiders are Idiots!
- Search engines often return just the first few
words on the Web page, whether relevant or not - (Partial) solution use special META tags
- Introduced by AltaVista and Infoseek, and
possibly used by others now - META tags are used between the ltHEADgt and lt/HEADgt
tags
24META Tag Syntax Example
- ltMETA NAME"Description" CONTENT"Roger Kreuz,
Associate Professor of Psychology at The
University of Memphis."gt - ltMETA NAME"Keywords" CONTENT"Roger Kreuz, Roger
J. Kreuz, R. Kreuz, R. J. Kreuz, Kreuz"gt - ltMETA NAME"Author" CONTENT"Roger J. Kreuz"gt
25EEK! I Dont Want the Spiders to Find Me!
- Missing out on the whole point of the Web
- However, you can
- 1) Build an intranet (behind a firewall)
- 2) Use a META tag to create a robot exclusion
zone - ltMETA NAMERobots CONTENT noindex, nofollowgt
26Keeping the Spiders Away II
- 3) Include a robots.txt file in your directory
27Pet Peeves
- Some search engines (e.g., AltaVista) are now
selling their search results - (e.g., typing in shoes -gt Nike)
- Many search engines are turning into portals
- Bottom line the Web has grown up and turned very
commercial