SEARCHENGINES WITH SPIDER(CRAWLER) - PowerPoint PPT Presentation

1 / 10
About This Presentation
Title:

SEARCHENGINES WITH SPIDER(CRAWLER)

Description:

These crawlers visit a Web site, read the information on the actual site, read ... The crawler will periodically return to the sites to check for any information ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 11
Provided by: HP892
Category:

less

Transcript and Presenter's Notes

Title: SEARCHENGINES WITH SPIDER(CRAWLER)


1
SEARCHENGINES WITHSPIDER(CRAWLER)
  • Rabia KAHRAMAN
  • 07010038
  • BIOINFORMATIC

2
ABOUT SEARCH ENGINES!
  • A program that searches documents for specified
    keywords and returns a list of the documents
    where the keywords were found.
  • Although search engine is really a general class
    of programs, the term is often used to
    specifically describe systems like Alta Vista and
    Excite that enable users to search for documents
    on the World Wide Web
  • searchengines have 3 based mechanism to make a
    good search
  • 1spider,
  • 2database,
  • 3put inorder mechanism

3
  • A search engine proper is a database and the
    tools to generate that database and search it a
    catalog is an organizational method and related
    database plus the tools for generating it. There
    are sites out there, however, that try to be a
    complete front end for the Internet. They provide
    news, libraries, dictionaries, and other
    resources that are not just a search engine or a
    catalog, and some of these can be really useful.
    Yahoo!, for example, emphasizes cataloging, while
    others such as Alta Vista or Excite emphasize
    providing the largest search database. Some Web
    location services do not own any of their search
    engine technology - other services are their main
    thrust. Companies such as Inktomi (after a native
    American word for spider) provide the search
    technology. These Web location services have put
    amazing power into every user's hands, making
    life much better for all of us. . . . and it's
    all free, right?

4
WHAT IS SPIDER(CRAWLER)?
  • .
  • A program that automatically fetches Web pages.
    Spiders are used to feed pages to search engines.
    It's called a spider because it crawls over the
    Web. Another term for these programs is
    webcrawler.
  • Because most Web pages contain links to other
    pages, a spider can start almost anywhere. As
    soon as it sees a link to another page, it goes
    off and fetches it. Large search engines, like
    Alta Vista, have many spiders working in
    parallel.

5
Crawler-Based Search Engines
  • Crawler-based search engines, such as Google,
    create their listings automatically. They "crawl"
    or "spider" the web, then people search through
    what they have found.
  • If you change your web pages, crawler-based
    search engines eventually find these changes, and
    that can affect how you are listed. Page titles,
    body copy and other elements all play a role.

6
  • Crawler-based engines send crawlers, or spiders,
    out into cyberspace. These crawlers visit a Web
    site, read the information on the actual site,
    read the site's meta tags and also follow the
    links that the site connects to. The crawler
    returns all that information back to a central
    depository where the data is indexed. The crawler
    will periodically return to the sites to check
    for any information that has changed, and the
    frequency with which this happens is determined
    by the administrators of the search engine.

7
  • Crawler-based search engines have three major
    elements. First is the spider, also called the
    crawler. The spider visits a web page, reads it,
    and then follows links to other pages within the
    site. This is what it means when someone refers
    to a site being "spidered" or "crawled." The
    spider returns to the site on a regular basis,
    such as every month or two, to look for changes.
  • Everything the spider finds goes into the second
    part of the search engine, the index. The index,
    sometimes called the catalog, is like a giant
    book containing a copy of every web page that the
    spider finds. If a web page changes, then this
    book is updated with new information

8
  • Sometimes it can take a while for new pages or
    changes that the spider finds to be added to the
    index. Thus, a web page may have been "spidered"
    but not yet "indexed." Until it is indexed --
    added to the index -- it is not available to
    those searching with the search engine.
  • Search engine software is the third part of a
    search engine. This is the program that sifts
    through the millions of pages recorded in the
    index to find matches to a search and rank them
    in order of what it believes is most relevant.

9
  • why will the same search on different search
    engines produce different results? Part of the
    answer to that is because not all indices are
    going to be exactly the same. It depends on what
    the spiders find or what the humans submitted.
    But more important, not every search engine uses
    the same algorithm to search through the indices.
    The algorithm is what the search engines use to
    determine the relevance of the information in the
    index to what the user is searching for.

10
  • All crawler-based search engines have the basic
    parts described above, but there are differences
    in how these parts are tuned. That is why the
    same search on different search engines often
    produces different results.
Write a Comment
User Comments (0)
About PowerShow.com