Title: Internet search tools
1Internet search tools techniques
- Hilda Kruger, Ben Fouché Jerall Toi
- www.knowlead.co.za
- Released under a Creative Commons license
- See http//creativecommons.org/licenses/by-nc-sa/2
.5/
2Introduction
- Do you feel like you'll never keep up with all
the great resources available on the Internet?
I've got news for you you won't. But that's
okay. There are so many great resources on the
Internet that you don't have to keep up with
absolutely everything. - (Tara Calishain, 2004 online Available from
www.researchbuzz.com/sevenways.pdf)
3Information trends...(OCLC, 2004 Available from
www.oclc.org/info/2004trends)
- The rapid unbundling of content from
traditional containers such as books, journals
and CDs ? information consumer format agnostic - Access provided on an as-needed basis to the
information consumer ? micro-payment for
micro-content - Content created, published and shared outside of
the traditional structure of the library
4The classic model of information retrieval
- Essentially, a user, driven by an information
need, constructs a query in some query language.
The query is submitted to a system that selects
from a collection of documents (corpus), those
documents that match the query as indicated by
certain matching rules. A query refinement
process might be used to create new queries
and/or to refine the results.(Broder, 2001)
5Browsersthe tool used to explore the WWW
- Firefox
- http//www.mozilla.com/firefox/
- See also Extensions Add-Ons
- Microsoft Internet Explorer
- http//www.microsoft.com/windows/ie/default.mspx
- Netscape
- http//browser.netscape.com/ns8/
- More information on Web browsers
_at_http//en.wikipedia.org/wiki/Web_browser
A Web browser is an application used to access
information on the WWW
6Bookmarklets
- Bookmarklets are a special kind of a bookmark
(or favorite). A standard bookmark consists of
two parts a URL and a bookmark name. Instead of
a standard URL, a bookmarklet uses JavaScript to
become a type of mini-program. These brief
programs can do a variety of actions such as
providing a pop-up calculator, changing the
display characteristics of the current page, or
taking selected text on the page and passing it
off to some search engine. - (Notess, 2003 online Available from
http//www.infotoday.com/online/jul03/OnTheNet.sht
ml)
7Useful tools for information retrieval personal
information management
8Useful tools social softwareExample
http//del.icio.us
del.icio.us is a social bookmarks manager. It
allows you to easily add sites you like to your
personal collection of links, to categorize those
sites with keywords, and to share your collection
not only between your own browsers and machines,
but also with others.
Social bookmarking is an activity performed over
a computer network that allows users to save and
categorize a personal collection of bookmarks and
share them with others. (http//en.wikipedia.org/w
iki/Social_bookmarking)
NEW http//www.google.com/bookmarks/
9Useful tools search toolbars
- Google Toolbar Internet Explorer Firefox
versions - http//toolbar.google.com/
- http//toolbar.google.com/firefox/index.html
- Groowe Toolbar
- http//www.groowe.com/
- MSN Toolbar
- http//toolbar.msn.com
- Beware! Installs desktop searchwith toolbar
A toolbar is a row, column, or block of onscreen
buttons or icons that, when clicked, activate
certain functions of the program.
10Useful tools desktop search tools
A desktop search program is a piece of software
that lets you search your own hard drive, your
emails and the web from the same search
form. (www.pandia.com)
- Google Desktop
- http//desktop.google.com/
- MSN Desktop Search
- http//desktop.msn.com/
- Copernic Desktop Search
- http//www.copernic.com/en/products/desktop-search
/index.html - More information on desktop search tools
_at_http//en.wikipedia.org/wiki/Desktop_search
11Useful tools alerts
Google Alerts is a service offered by search
engine company Google which notifies you (by
email) about the latest web and news pages of
your choice. (wikipedia)
- Google Alerts
- http//www.google.com/alerts
- necessary to register with Google first
- Yahoo! News Alerts
- http//beta.alerts.yahoo.com
- necessary to register with Yahoo! first
12Useful toolspersonal archiving software
- Furl
- http//www.furl.net/index.jsp
- Google Notebook
- http//www.google.com/notebook
- See also Bloglines Clippings
- See also Firefox extension Scrapbook
- http//amb.vis.ne.jp/mozilla/scrapbook/index.php?l
angen
13Creating a search strategy
14Creating a search strategydeveloping an
information-seeking routine
Think!
- Identify a search topic? start with a natural
language question - Isolate keywords from your sentence
- Consider alternates for keywords
- Abbreviations? Acronyms? Phrases? Synonyms?
Variant spellings? Equivalent terms? Multiple
meanings? Broader terms? Narrower terms? - Combine this information into a search query ?
syntax - Evaluate search results
- (Friesen online Available from
http//www.learningspaces.org/n/searchaide/)
15Creating a search strategyusing Web reference
sources
- Thesaurus.com also Dictionary.com
- http//thesaurus.reference.com/
- Web WordNet
- http//wordnet.princeton.edu/perl/webwn
- More information _at_ http//wordnet.princeton.edu/
- Acronyma
- http//www.acronyma.com/
- Wikipedia
- http//en.wikipedia.org/wiki/Main_Page
- see also A9 Reference search
16Web search engines syntax
Syntax The rules governing the construction of
search expressions in search tools.
www.webliminal.com/internet-today/it-gloss.html
17Google quiz!How do you
- Restrict your search results to those with all of
your query words in the title? - allintitle
- Restrict your search results to those with your
query words in the URL of the Web page? - allinurl
- Specify that you want to see a list of Web pages
similar to a page youve specified? - related
- Restrict your results to Web sites in a specific
domain? - site
18Google quiz!How do you
- Tell Google to search for synonyms of your query
words? -
- Specify that you want a list of Web pages that
link to a specific Web page? - link
- Find a definition in Google?
- define
- Restrict your results to those with a specific
filetype, e.g. pdf - filetype
- More information on Google search syntax _at_
http//www.google.com/help/basics.html
http//www.google.com/help/refinesearch.html
19More, more, more ... http//books.google.com/ http
//scholar.google.com/ http//answers.google.com/a
nswers/ http//www.google.com/dirhp http//pages.g
oogle.com http//labs.google.com/
20More Web search engines
- MSN search
- http//search.msn.com/
- Yahoo
- http//www.yahoo.com/
- A9
- http//a9.com
- More information on search engines _at_
http//en.wikipedia.org/wiki/Search_enginehttp//
computer.howstuffworks.com/search-engine.htm/print
able
21Meta-search engines
22 What are meta-search engines?
... and why usemetasearch engines?
- Meta-search engines do not crawl the web
compiling their own searchable databases.
Instead, they search the databases of multiple
sets of individual search engines simultaneously,
from a single site and using the same interface.
Meta-searchers provide a quick way of finding out
which engines are retrieving the best results for
you in your search. - ?More information on meta-search engines _at_ Bare
Bones 101 online http//www.sc.edu/beaufort/libr
ary/pages/bones/lesson2.shtml - Meta-search searches multiple search engines.
Using more search engines means a better overall
coverage of the Web. - (MetaCrawler)
23What makes a good meta-search engine?
- Accept complex queries every search tool has its
own syntax a meta-search engine should translate
its own syntax into the syntax of every source it
searches - Integrate results and eliminate duplicates many
search engines results will include the same
website a meta-search engine should remove
duplicates, and rank results higher that appear
in more engines
Limitation Certain search engines do not allow
meta-search engines to include it
24Examples of meta-search engines
- Clusty
- http//clusty.com
- Ixquick
- http//www.ixquick.com
- Jux2
- http//www.jux2.com/
- Mamma
- http//www.mamma.com
- Dogpile
- http//www.dogpile.com
- http//en.wikipedia.org/wiki/Metasearch_engine
Rollyo ?Roll Your Own? http//rollyo.com/index.htm
l
More information on meta-search engines _at_
25The living Web
The living web is composed of sites that update
on a daily basis.
www.daypop.com/info/about.htm
Bernstein online Available from
http//www.alistapart.com/articles/writeliving/
Some parts of the web are finished, unchanging
creations as polished and as fixed as books or
posters. But many parts change all the time
26What comprises the living Web?
- News sites bring up-to-the-minute developments,
ranging from breaking news and sports scores to
reports on specific industries, markets, and
technical fields - Weblogs also called Blogs, journals, and other
personal sites provide a window on the interests
and opinions of their creators - Corporate weblogs, wikis, knowledge banks,
community sites, and workgroup journals provide
shared news and knowledge among co-workers and
supply-chain stakeholders - (Bernstein online Available from
http//www.alistapart.com/articles/writeliving/)
27The living Web
- Newsmap
- http//www.marumushi.com/apps/newsmap
- 10x10
- http//tenbyten.org/10x10.html
- Wikipedia
- http//en.wikipedia.org/wiki/Main_Page
- Weblogs Compendium
- http//www.lights.com/weblogs/
- Blogger
- http//www.blogger.com/start
Newssites
Wikis
Blogs
28Finding blogs newsfeeds
- Google Blog Search
- http//blogsearch.google.com/
- Technorati
- http//www.technorati.com/
- Feedster
- http//www.feedster.com/
- Ask Blogs Feeds
- http//www.ask.com/?toolbls
More information on blogs _at_ http//en.wikipedia.o
rg/wiki/Blog
29Syndication Aggregators
30Example of an entry in a feed(Nottingham, 2005)
-
- Earth Invaded
- http//news.example.com/2004/12/17/invasio
n - The earth was attacked by an
invasion fleet - from halfway across the galaxy luckily, a
fatal - miscalculation of scale resulted in the entire
armada - being eaten by a small dog.
-
31Syndication Aggregators
- Syndication means that when you publish your
blog, a machine-readable representation of your
blog that can be picked up and displayed on other
web sites and information aggregation tools is
automatically generated - Special pieces of software called Newsreaders (or
Aggregators) can scan these feeds, automatically
letting you know when the sites have updated - Various Newsreaders/Aggregators are listed at
- http//www.atomenabled.org/everyone/atomenabled/in
dex.php?c5
32Syndication Aggregators
- Sage
- http//sage.mozdev.org/
- Bloglines
- http//www.bloglines.com/
- Google Reader
- http//www.google.com/reader/
- Firefox Live Bookmarks
33Fee versus free
34Free versus feeOn the Net versus Via the Net
- The vast majority of workers seek free
information on the Internet. But many important
business sources are not available for free on
the Web. And because searches on the Web cannot
be aggregated, finding useful information is
difficult and time consuming. The free
information on the Internet actually comes at a
substantial cost to the enterprise. (Factiva,
2002) - Content consumers will tolerate some costs for
content they value but that value is increasingly
related to control over the content delivery
options, filtering, personalization and
convenience. (OCLC Marketing staff)
35Requirements for an effectiveonline information
source(http//www.factiva.com/collateral/files/wh
itepaper_feevsfree_0504.pdf)
- Advanced search features
- Reliability authority information on which
decisions are based must come from authoritative,
reviewed edited sources - Updated archived must provide timely access to
the most up-to-date information extensive
archives - Aggregated information sources should be
aggregated searchable within a single
interface(Bates, 2004)
36Requirements for an effectiveonline information
source (Bates, 2004 http//www.factiva.com/collat
eral/files/whitepaper_feevsfree_0504.pdf)
- Full selection of information e.g. newswires,
industry newsletters, daily business press, trade
journals, industry analysts reports, historical
financials - Ready-to-download information information in a
format that is easy to download/email/print - Updating feature electronic clipping services /
alerts - Auditable payment within parameters of company
purchasing processes(Bates, 2004)
37Advanced search features
- Boolean operators ? operators that help you
narrow or broaden your search - Field searching e.g. AU author, KW keywords,
SU subject, TI title - Truncation the process of removing prefixes and
suffixes from query terms e.g. econom will find
economy, economics, economical, economist etc.
38Advanced search features
- Wildcards special characters used to represent
either any single character or any number of
characters e.g. organi?ation will find
organisation and organization - Phrase searching typically, when a phrase is
enclosed by double quotations marks, the exact
phrase is searched e.g. knowledge management - Limiters let you narrow the focus of your search
so that the information retrieved from the
databases you search is limited according to the
values you select
39Advanced search features
Always look for a Search tips option
- Expanders let you broaden the scope of
yoursearch. They do this by widening your search
toinclude words related to your keywords or
including the actual text of the full text
results in your search - Thesaurus a controlled vocabulary of terms that
assists in more effectively searching the
database - Proximity searches use a proximity search to
search for two or more words that occur within a
specified number of words (or fewer) of each
other in the databases e.g. Near Operator (N) -
N5 finds the words if they are within five words
of one another regardless of the order in which
they appear
40Open contentgood quality content is leaking out
of its containersand making its way to the open
Web (OCLC, 2004)
- Open content, coined by analogy with "open
source," (though technically it is actually
share-alike) describes any kind of creative work
including articles, pictures, audio, and video
that is published in a format that explicitly
allows the copying of the information. Content
can be either in the public domain or under a
license like the GNU Free Documentation License.
"Open content" is also sometimes used to describe
content that can be modified by anyone there is
no closed group like a commercial encyclopedia
publisher responsible for all the editing. - (Wikipedia, the free encyclopedia online
Available from http//en.wikipedia.org/wiki/Open_c
ontent)
41Creative Commons Licenses
Seehttp//creativecommons.org/http//search.cre
ativecommons.org/
42 Open content open archiving
initiatives
- Social Science Research Network
- http//www.ssrn.com/
- OAIster
- http//oaister.umdl.umich.edu/o/oaister/
- Networked Digital Library of Theses and
Dissertations - http//www.ndltd.org/
- Wikipedia
- http//en.wikipedia.org/wiki/Main_Page
- DARE
- http//www.darenet.nl/en/page/language.view/search
.page
43Internet subject directories / guides
44Internet subject directories / guides
- A subject directory is a catalog of sites
collected and organized by humans. (Flanagan,
c2003) - A website that categorises Web documents
according to subjects, or categories and
subcategories. (Behrens, 2000) - Directories classify Web documents into a
arbitrary subject classification scheme.
(CompletePlanet) - Involve human intervention in selecting and
organising resources ? cover fewer resources but
provide more focus and guidance for topics they
cover (ALA, 2004)
45Example of selection criteriaThe Internet Scout
Project
- Content
- What is the scope of the content? Who is the
intended audience? What is its purpose? Is it up
to date? Is it accurate (as far as we can
determine)? - Authority
- Who is the author (this is crucial, in that we
rarely select anonymous pages)? Is the author
likely to be authoritative (as far as we can
tell)? - Presentation
- How is the site organized? Is it easy to
navigate? Does it depend on graphics, and if so
does the provider maintain a separate, text-only
version?
46Example of selection criteriaThe Internet Scout
Project
- Information maintenance
- Is the site "alive ? is it maintained/updated on
a regular basis (exception designated archives) - Availability
- Do links at a site work (We check the main page
of each site for availability at least three
times in the days before the Scout Report is
released) - Cost
- Free or fee?
- (http//scout.wisc.edu/Reports/selection.php)
47Internet subject directories / guides
Best of the Web Subject guide/s
- Open Directory Project
- http//www.dmoz.org
- Infomine
- http//infomine.ucr.edu/
- Resource Discovery Network
- http//www.rdn.ac.uk/
- BUBL LINK Catalogue of Internet Resources
- http//bubl.ac.uk/
- Librarians Internet Index
- http//lii.org/
- Yahoo! Directory
- http//dir.yahoo.com
48The invisible Web ? The opaque Web
What's invisible today may become visible
tomorrow. (Dennis O'Connor)
49The invisible Web
- Also called the Deep Web or Hidden Web
- What is it?The Visible Web is what you see in
the results pages from general Web search
engines. Its also what you see in almost all
subject directories. The Invisible Web is what
you cannot retrieve in the search results and
other links contained in these types of
tools(UC Berkeley tutorial, 2004 online
Available from http//www.lib.berkeley.edu/Teachin
gLib/Guides/Internet/InvisibleWeb.html)
50Types of invisibility
- Private Web technically indexable pages that
have deliberately been excluded from search
engines by Web page designers ? Robot Exclusion
Protocol - Proprietary Web only accessible to people who
have agreed to special terms in exchange for
seeing the content ? often involves fees to get
access e.g. fee-based commercial databases - Truly Invisible Web dynamically-generated Web
pages, created on-the-fly ? contents dont exist
until you search for the information
51Searching the invisible Web
Think of it as being able to reach the front
doors of a bookstore, but not being able to look
inside at the books (Schlein, 2002)
- ProFusion
- http//www.profusion.com/index.htm
- CompletePlanet
- http//aip.completeplanet.com
- Your subject database or archive or repository
or E.g. intitlelaw repository OR
intitlelaw database OR
Findmore!
52Keeping up to date with searchtools techniques
53Good places to start
Join the mailing lists!
Subscribe to the blogs!
- http//www.pandia.com
- http//searchenginewatch.com/
- http//www.batesinfo.com/tip.html
- http//www.researchbuzz.com
- http//battellemedia.com/
- http//www.resourceshelf.com/
- Use your search skills!
- E.g. dmoz ? Computers ? Internet ? Searching
54Evaluating Web resources
55Evaluating Web resources
- The Internet is a mass of information (that is
the nature of the beast) - statistics, stories,
pictures, research, and unfortunately, myths and
lies. Everything is given 'equal billing', there
is no five-star rating system that tells us that
information on Web site A is of more value or
credence than Web site B. It is up to us as the
consumers of information to reflect and assess
what is right and true ... to develop
information literacy(Lewis, 2002 online
Available from http//www.firstmonday.org/issues/
issue7_8/lewis/index.htmll3)
56Evaluating Web resources(Trit, 2002 online
Available from http//www.library.auckland.ac.nz/
instruct/evaluate.htm)
- Who is responsible for providing the information
contained in a resource? - If you can't find out who the creator of a
resource is, this may, in itself, be a reason to
reject it. - Why has the information been published on the
Internet? - Motivation ? Advertising? Entertainment? Hoax?
? - Where was the information published?
- Part of a Web site or a bigger resource? ? Logo?
URL? - When was the resource published or updated?
- Maintenance?
- How accurate is the content?
- How free from error is the information? Can the
information be verified against other sources?
57Citing Web resources
58Citation tools
- Landmarks Citation Machine
- http//citationmachine.net/index.php
- 21st Century Information Fluency Project Portal
- http//21cif.imsa.edu/tools/citation/
- Style Sheets for Citing Resources available _at_
- http//www.lib.berkeley.edu/TeachingLib/Guides/Int
ernet/Style.html - How will you find more information on citing
specific resources? - Search! E.g. citation style guides
- Exercise Create a citation for the following Web
resource using the Harvard method of referencing
http//www.firstmonday.org/issues/issue8_12/veale/
index.html
59Thank you!
- hk_at_knowlead.co.za bf_at_knowlead.co.za