Title: The Invisible Web
1The Invisible Web
- Gary Price, MLIS
- George Washington University
- Chris Sherman
- Associate Editor
- Search Engine Watch
2How Search Engines Work
The Web
Crawler
URL1
URL2
Indexer
URL3
URL4
Your Browser
Eggs - 90 Eggo - 81 Ego- 40 Huh? - 10
All About Eggs by S. I. Am
Search Engine Database
Eggs?
Eggs.
3What is the Invisible Web?
- Stuff that search engine crawlers (spiders) can
not -- or will not -- add to their databases - 2 to 50 times larger than the visible Web
- Resources often much higher quality than the
visible Web
4What is the Invisible Web?
- Certain file formats (PDF, Flash, Office files,
streaming media) - Why? They arent HTML text
- Most real-time data (stock quotes, weather,
airline flight info) - Why? Ephemeral storage intensive
5What is the Invisible Web?
- Dynamically generated pages (cgi, javascript,
asp, or most pages with ? in URL) - Why? Spider traps
- Web accessible databases
- Why? Spiders cant type
6Invisible Web Gateways
- Intelliseek
- http//www.invisibleweb.com
- http//beta.profusion.com
- Complete Planet
- http//www.completeplanet.com/
- Librarians Index to the Internet
- http//www.lii.org
7The Invisible Web The Librarian
- The Need For Knowledge!
- Awareness that the IW ExistsMaybe the IW Hold
the Content Your Users Cant Find! What is the
cost in both wasted time/effort and total
frustration? - Let Others Know About the IW
- Awareness of The Synonyms
- Invisible Web
- Deep Web
- Hidden Web
- Let the Content be Your Calling CardFocus Less
on the Amount IW Data
8The Invisible Web The Librarian
- Why is the IW Useful to the Librarian and the End
User? - Quality of Content (Authority)
- Deep Content on Subject Area (Comprehensiveness)
- Focused Databases (Limited Scope)Smaller
Universe of Documents to Search (Maximize
Precision/Recall)
9The Invisible Web The Librarian
- Why is the IW Useful to the
- Librarian the End User?
- Material Unavailable Elsewhere on the Web
(Uniqueness) - Many Options to Limit, Sort, Interact with the
Data(Maximize Precision) - Timeliness vs. Time Lag of General Search Tools
(Currency)
10The Invisible Web The Librarian
- The IW, The Librarian, The Future
- What Happens If/When the General Search Tools
Crawl IW Material? Good News? Bad News? - General Search Tools May NOTOffer Many
Interactive/Limiting ToolsMay Not be
Updated/Refreshed (time lag) as
FrequentlyTimeliness, making current info
available is one of the things the NET does well.
11The Invisible Web The Librarian
- The IW, The Librarian, The Future
- The Search Engine Business, Will IW Material be a
Priority? - Just One Dialog or SilverPlatter Database?NO, in
Terms of Content!!! - Yes, Common Interface, SyntaxPerhaps XML will
Assist
12The Invisible Web The Librarian
- Challenges
- Its Not The Magic Bullet. Its a Tool
- We Still Need Traditional Online Databases
- Learning Curve, Sorry!
- Database Selection, When To Use the IW?
- Numerous Interfaces, Syntax
- A Non-Stop Flow of New Material
13The Invisible Web The Librarian
- Things To Do!
- Build Your Own CollectionsInternet Resource
Collection Development - Mine Entire Sites, Often the IW Material Gets
Little or No Notice In Reviews - Create Links When Possible DIRECT to the
Interface. - Save the Time of the Web Researcher
- Keep Current
14The Invisible Web The Librarian
Types of IW Content in Librarian Terms
- Bibliographic- OPACs- Subject Bibs
- Non-Bibliographic- Full-Text- Numeric-
Graphic- Directory- Real-Time
15Future Trends
- Killer apps will lead the way
- Research Index (CiteSeer)
- Search engines will work harder to find
Invisible Web content - Inktomi (Index Connect, Ultraseek)
- WhizBang (wrappers)
- No matter what, there will always be a problem!
16Coming Soon
Available July 2001 CyberAge Books
0-910965-51-X http//www.invisible-web.net
17Invisible WebComputer Science
- MacAfee World Virus Map
- http//www.mcafee.com
- ResearchIndex
- http//www.researchindex.com
18Invisible WebCompany Research
- European High-Tech Industry Database
- http//www.tornado-insider.com/radar/
- Kompass
- http//www.kompass.com
19Invisible WebIntellectual Property
- Delphion Intellectual Property Network
- http//www.delphion.com/
- ESP_at_CENET (European Patent Office) Patent
Database - http//ep.espacenet.com/
20Invisible WebDictionaries Languages
- EuroDicAutom
- http//eurodic.ip.lu
- Verbix
- http//www.verbix.com/index.html
21Invisible WebArt Artists
- ADAM (Art, Design, Architecture Media
Information Gateway) - http//adam.ac.uk/
- Artcyclopedia
- http//www.artcyclopedia.com/
22Invisible WebReal-Time Information
- Flight Tracker
- http//www.trip.com/ft/home/0,2096,1-1,00.shtml
- J-Track 3-D Satellite Locator
- http//liftoff.msfc.nasa.gov/realtime/JTrack/Space
craft.html
23Invisible WebMaps and Driving Directions
- MapBlast
- http//www.mapblast.com
- Streetmap.co.uk
- http//www.streetmap.co.uk/
24Invisible WebGovernment Info
- Parline Database
- http//www.ipu.org
- United Nations Daily Press Briefings
- http//www.un.org/News/
25Invisible WebHealth Medicine
- Economics of Tobacco Control Database
- http//www1.worldbank.org/tobacco/database.asp
- International Digest of Health Legislation
- http//www.who.int
26Invisible WebNews Current Events
- Cold North Wind Newspaper Archive Project
- http//www.coldnorthwind.com
- Financial Times Global Archive
- http//www.globalarchive.ft.com
27Invisible WebScience
- Great Barrier Reef Online Image Catalogue
- http//www.gbrmpa.gov.au/corp_site/info_services/l
ibrary/index.html - Nuclear Explosions Database
- http//www.ausseis.gov.au/databases
28Invisible WebTransportation
- Equasis (Merchant Ships)
- http//www.equasis.org/
- World Aircraft Accident Summary (WAAS) Fatal
Airline Accident Subset - http//www.waasinfo.net/