Presented by: Allen Brown IS/SE - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

Presented by: Allen Brown IS/SE

Description:

Or If there s so much out there, why can t I find it? Presented by: Allen Brown IS/SE Date: 2003-05-12 Outline - Searching the Web Information Cartography ... – PowerPoint PPT presentation

Number of Views:154
Avg rating:3.0/5.0
Slides: 39
Provided by: allen91
Learn more at: https://www.johnold.org
Category:

less

Transcript and Presenter's Notes

Title: Presented by: Allen Brown IS/SE


1
Searching the Web Or If theres so much out
there, why cant I find it?
  • Presented by Allen Brown IS/SE
  • Date 2003-05-12

?
2
Outline - Searching the Web
  1. Information Cartography
  2. Visible and Invisible Web Information
  3. Information Finding Strategies
  4. Reference Tools, Pathfinders, Specialized
    Information Repositories, Subject Directories,
    and Search Engines
  5. Information Search Strategies
  6. Information Evaluation Strategies
  7. Information Finding Summary
  8. Search Engines and their Characteristics

?
3
Information Cartography
  • Imagine a physical map of an ocean basin
  • identifiable areas of the sea floor
  • large abyssal plain
  • many undulating hills above the plain
  • occasional higher elevations or plateaus
  • sparse atolls and seamounts
  • Imagine the Web
  • some information content identifiable by subject
  • vast amounts of very low value information
  • some good stuff distributed across many sites
  • occasional high quality site with quality and
    quantity
  • sparse stunningly useful sites (to die for)

?
4
Information Cartography - 2
Information issues
quality
completeness
location!
  • In searching for information we need to adjust
    the
  • breadth of search to find all that is relevant
    in an ocean of information
  • quality level to find only atolls of
    information quality
  • to find everything that is important and useful

?
5
Visible and Invisible Information
  • Visible indexed by search engine

Invisible not indexed but accessible
engine 4
db 2
site 3
engine 2
engine 3
db 1
db 4
site 7
engine 1
site 5
db 6
?
6
Search Engines Wont Do It All!
  • According to a recent study reported in Nature
    (1) no search engine indexes more than 16 of the
    Web. Even though search engine databases are
    enormous, they cover very little of what's
    actually available on the Web.
  • 1) Steve Lawrence and C. Lee Giles. (July 8,
    1999). Accessibility of Information on the Web.
    Nature, 400, 107 - 109

?
7
Information Finding Strategies
  • Identify Starting Points based on your question
  • What type of information do you need?
  • Facts, statistics, government document, scholarly
    articles, popular opinion, music, picture,
    multimedia, news,
  • What form do you want the information in?
  • Dictionary definition, encyclopedia entry,
    journal article, elementary school project, video
    file, audio file,
  • What type of site would offer this information?
  • Academic, commercial, government, non-government
    organization
  • How much information do you need?
  • Introduction, in-depth, references,

?
8
Information Finding
  • Reference Materials (Often invisible)
  • dictionaries, thesauri, encyclopedia, newspapers
  • Information Pathfinders (Sometimes invisible) /
    Portals / Vortals
  • subject specific, highly relevant, sometimes
    bizarre
  • usually high quality
  • managed by dedicated enthusiasts, possibly
    amateur
  • e.g., Web design, Perl, micro cars, Curta
    calculators,
  • Specialized Information Repositories (Often
    invisible) / Portals
  • institution-based, sometimes obscure
  • usually high quality
  • managed by information professionals
  • e.g., government documents, archives,

?
9
Information Finding - 2
  • Subject Indices (Often invisible but this is
    changing)
  • subject-based
  • e.g., Yahoo
  • Search Engines and Search Brokers (Visible web)
  • e.g., Google, Alta Vista, Hot Bot, Lycos,
    Vivisimo, dogpile

?
10
Reference Tools - Dictionaries
http//www.yourdictionary.com/
?
11
Reference Tools - Thesauri http//www.visualthesa
urus.com/index.jsp
?
12
Reference Tools - Encyclopedia
http//www.britannica.com/
?
13
Pathfinders
A pathfinder site provides an information map of
what is available within a fairly narrow area of
interest usually compiled by domain experts.
These sites are often called vortals (vertical
portals).
?
14
Specialized Information Repositories - National
Library of Canada
A specialized information repository often
collects and catalogues relatively specific
information usually compiled by information
experts. Some are considered to be vortals.
?
15
Subject Directorieswww.yahoo.com
Subject directories are lists compiled by people.
They are organized in a hierarchy where each
subject includes a list of sub-topics. These
sites are often called portals - a one-site
starting location for general information
seeking.
?
16
Subject Directories
Subjects lists are usually evaluated but sites
are not presented in order of relevancy. In other
words, the best sites on a topic are not
necessarily listed first. Sites are compiled
through submission of URLs by site creators and
human evaluation and selection. One advantage
of is their browsability, although this feature
is only suitable with fairly general topics. A
disadvantage is their relatively small
size. Other examples of subject directories
Infomine http//infomine.ucr.edu Scout Report
Signpost http//www.signpost.org/signpost
?
17
Invisible Web Directories
Look at http//www.invisible-web.net/
?
18
Search Engines
Search engines use computer programs that
automatically collect web sites using "spiders"
or "robots". The sites are indexed and stored in
an index database. To query a search engine,
type topic keywords and Boolean connectors into a
search "box." The search engine scans its index
and returns links to websites containing the
specified keyword relationships. Size matters -
an advantage of using search engines is their
coverage (though size is relative), but this can
also be a disadvantage if relevance ranking is
poor.
?
19
Search Engines Operational Concepts
query
query parsing, index lookup, results ranking and
management
crawling and page contents extraction and indexing
query results
?
20
Search Engines - Does Size Matter?
?
21
Size
If you are looking for unusual or hard-to-find
information should try one or more of the search
engines with a large index to check more web
content. This improves the likelihood of finding
what you seek. However, for general searches or
when looking for information about popular
topics, a large index does not necessarily equal
better results. Also, large indexes may have
longer re-visit intervals.
?
22
Search EnginesSearch Scopingand Ranking /
Results Management
  • It is essential to learn and apply each engine's
    specialized search formats to narrow results and
    filter and push the most relevant pages to the
    top of the results list. Use Boolen operators,
    proximity connectors, stems, wild cards,
    sounds-like, media-type and metadata filters.
  • Result relevancy ranking also depends on the size
    of the search index and how the search engine
    interprets and uses your query.
  • Each engine determines result relevancy ranking
    in unique ways. Consult the help file of each
    engine to learn about these.
  • Some engines offer search refinement and
    conceptual clustering for better focus (tighter
    hit cluster) or greater accuracy / validity
    (centred on the right stuff).

?
23
Search Engines - Search Scoping
  • expands the scope, - reduces the scope
  • Exact phrase - - quotes, e.g., We hold these
    things to be self-evident
  • Boolean operators - and - (default) or
    (caution!) not - (extreme caution!), e.g., large
    male dog, large or male or dog, not cat
  • Proximity connectors - near - (depends on
    engine), e.g., spring near flower
  • Stemming and wildcards - e.g., swim ? swim,
    swimming, swimmer, swimmers, swimmingly,
  • Sounds-like - e.g., table ? cable, able, fable,
  • Media type - - e.g., image, audio file,
  • Concept-based - e.g., synonym ? thesaurus,
    antonym, homonym,
  • Metadata-based - - in some systems

?
24
Search Engines - Ranking
  • Result relevancy ranking (usefulness) can be
    done according to two techniques (or some
    combination)
  • Conventional - using intra-page information
  • Relative - using extra-page information

?
25
Search Engines - Conventional Ranking
  • Conventional (intra-page)
  • frequency of words (number and density)
  • phrases (exact word sequences)
  • hierarchy (e.g., closer to the top of the
    document)
  • adjacency (proximity of words)
  • metadata (keywords provided by content owners)
  • font size and style (relative intra-page)

?
Jack Christensen repairs CURTA calculators. I've
known Jack for many years and can highly
recommend him. Here are a few questions I asked
Jack What do you charge to clean a Curta?
Typically 65 to 95, depending on the work
involved. More often than not, the upper carriage
needs a complete disassembly, whereas the main
body can be cleaned without a complete
disassembly. If the main body needs to be
completely disassembled, something is usually
bent, out of adjustment, or broken. What do you
charge when repairing a Curta? I charge 20 per
hour of my time. It seems my hours are about 90
minutes long, however, because I rarely finish in
the time I originally quoted. Extended repair
time is absorbed by me. What spare parts do you
have? Are they expensive? I actually have many
hundreds of new original Curta parts. Most are
for inside the instrument, though. I use them
when I do general cleaning and repairs. Outer
body pieces, replacement cannisters, and external
parts that are easily damaged or broken due to
abuse are not generally available, although I do
occasionally locate some these items. Sometimes I
have to fabricate a part, or repair an item as
best I can. Obviously, this takes time, and the
cost is high. Parts costs are charged as the
traffic will bear. I usually try to be blunt
about this to the Curta owner, often telling them
that a severely damaged unit is best sold as a
"parts Curta". Unfortunately, I've sometimes had
to tell this to someone who wanted to repair a
Curta looked upon as an heirloom. What to them
appears to be a minor issue actually turns out to
be a major problem (e.g., a crank handle tilted
downward is due to a broken main shaft). I think
the most I ever charged for a repair was about
375. There were many severe problems with the
unit. Generally, when the price gets to be above
175 most people simply decide to keep the
damaged Curta as a memento. Can you replace a
clearing ring? What costs are involved? The
plastic clearing rings are easy to install. I
have several new ones, but I typically do not
sell them separately as a spare part. Rather, I
install them during a general cleaning and
repair. Metal rings are more difficult to
replace. As with the plastic clearing rings, I
will only install a metal clearing ring during a
general cleaning and repair. It takes a special
tool to properly swage the rivet in place.
Editor's note Very old Type I clearing rings
were held on with a screw and nut. The nut was
also crimped to the screw threads. I used all
the new metal clearing rings I had about five
years ago, but I do have a few used ones that
were removed from other damaged Curtas. I have
these for both the Type I and
?
26
Search Engines - Relative Ranking
  • Relative (extra-page)
  • popularity (page visits - from the search engine)
  • citation (links pointing to the item)
  • relevance of the pages containing the links
    pointing to the item (!)

Yahoo
?
?
Web Pages
?
27
Search Engines Keys to Success
World Wide Web
Size ? Large index and / or several engines
Scoped query ? wide net but appropriate sieve
carefully constructed for your needs
  • Ranked and manageable results ? query
    construction and search engine features

?
28
Meta Search Engines
  • Meta" search tools are able to search the index
    databases of multiple engines simultaneously,
    via a single interface.
  • Meta search tools dont really search metadata.
    They are simply brokers that reformulate a query
    and hand it off to a set of search engines, then
    combine the results.
  • Meta engines are very fast but they do not
    offer the same level of control over the
    relationship between keywords as do individual
    search engines.
  • Also, meta search engines may produce poor
    ranking of combined results.

?
29
Search Engines
Examples of popular search engines include
Google http//www.google.com Alta Vista
http//www.altavista.com All the Web
http//www.alltheweb.com Northern Light
http//www.northernlight.com Also see The KartOO
clustering visual engine http//www.kartoo.com/ Fo
r meta engines, try Vivisimo at
http//vivisimo.com/
?
30
Information Search Strategies
  • Think hard about what you are looking for!
  • Use a Reference Tool, if appropriate
  • Use a Pathfinder, if you know one
  • Use a Specialized Information Repository, if
    appropriate
  • Use Subject Indexes, if it is a common topic
  • Use several Search Engines, if needed, especially
    for the obscure or academic topic, but learn how
    they work
  • Use keywords - be narrow, and specific (and
    technical)
  • Use phrases - try synonyms or related concepts
  • Use Boolean connectors - but find out if / how
    the engine uses them
  • Use stemming and wildcards - but find out if /
    how the engine uses them
  • Use media-type filters or metadata, if appropriate

?
31
Information Search Tools - Use
Pathfinder
depth
Search Engines and Meta-engines
easy to use
focused content pre-selected by domain experts
obscure or academic caveat emptor!
Subject Indexes
popular or common pre-selected by interested
people
Specialized Information Repository
hard to use well
generic simple lookup created by professionals
contains invisible content
related or themed pre-selected by
professionals contains invisible content
Reference Tool
breadth
?
32
Information Evaluation Strategies CARS
  • CARS checklist
  • http//library.queensu.ca./inforef/guides/evalchar
    t.htm
  • Credibility
  • - author credentials stated with email contact
  • - evidence of quality control (site location)
  • Accuracy
  • - timeliness
  • - comprehensiveness
  • - audience purpose
  • Reasonableness
  • - fairness
  • - objectivity
  • - consistency
  • - world view
  • Support
  • - source documentation or bibliography

?
33
Summary
  • There is much information on the Web, but its
    not- all there- all good (or all bad)- always
    easy to locate
  • Use an information search strategy that-
    matches the information sought - uses the
    appropriate tools- uses them in the correct
    ways
  • Use an information evaluation strategy, e.g.,
    CARS methodology.
  • Choose and use search engines wisely, knowing
    their strengths, features, and their limitations.

?
34
How Do Search Engines Work?
  • Three Activities Occur
  • 1. Crawling
  • fetch pages
  • compile URL list (a db)
  • re-visit pages
  • 2. Page harvesting
  • parse page
  • add to index db and establish ranking
  • 3. Responding to search requests
  • parse query
  • apply to index
  • present and rank results

?
35
Search Engines Operation
fetch
Crawler Robot
re-visit
URL
URL data base
query
QueryProcessor
fetch
Harvester Robot
query results
Index data base
?
36
Search Engine - Hardware
(not really )
?
37
How Do Search Engines Work?
  • See The Anatomy of a Large-Scale Hypertextual
    Web Search Engine at http//www-db.stanford.edu/
    backrub/google.html

?
38
References
  • Information Search Strategies
  • lthttp//www.lib.berkeley.edu/TeachingLib/Guides/In
    ternet/FindInfo.htmlgt
  • Information Evaluation Strategies
  • lthttp//www.vuw.ac.nz/agsmith/evaln/evaln.htmgt
  • Search Engines
  • lt http//www.library.arizona.edu/search.htmgt
  • lt http//www.brightplanet.com/deepcontent/tutorial
    s/search/index.asp gt
  • lt http//www.searchenginewatch.com/ gt
  • Susan Maze, David Moxley, Donna Smith
    Authoritative Guide to Web Search Engines,
    Neal Schuman Pub, 1997, ISBN 1555703054

?
Write a Comment
User Comments (0)
About PowerShow.com