IDK0040 V - PowerPoint PPT Presentation

About This Presentation
Title:

IDK0040 V

Description:

Crawler. Searchers. processor. scans. High level description ... This file tells crawlers which directories can or cannot be crawled. ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 14
Provided by: kumla5
Category:

less

Transcript and Presenter's Notes

Title: IDK0040 V


1
IDK0040 Võrgurakendused IBuilding a site
Publicising
  • Deniss Kumlander

2
How to make a site really public or search
engines algorithms
3
Targets
  • We are going to check
  • how search services are built
  • how to ensure that your site is included into the
    search engines DB
  • how to improve a site's chances of being selected
    by a search engine in response to a query string

4
High level description
scans
Crawler
Database of URLs, Ranks, Relations etc
Searchers processor
5
High level description
  • Notice that search engines creates either general
    level description of the web as an answer to a
    searched string
  • or a private view
  • Anyway your are not searching Internet, but an
    index created by a search engine
  • Simply storing one billion pages of 10 kbytes
    each (compressed) requires 10TB and another 10TB
    for indexes
  • Moreover a public search engine requires much
    more resources than to calculate query results
    and to provide high availability.
  • Crawling 1B pages with 10 machines crawling at
    100 pages/second would take 1M seconds, or 11.6
    days on a very high capacity Internet connection.

6
Adding a site
  • Wait if any already indexed site is refering to
    your one
  • The Web is growing much faster than any
    present-technology search engine can possibly
    index (see distributed web crawling). In 2006,
    some users found major search-engines became
    slower to index new webpages
  • Use an Add my URL function in many search
    engines
  • a website developer has to be more proactive than
    ever before about getting listed by search
    engines and directories.  In many cases, this
    means (unfortunately) that you have to pay a fee
    to get listed.

7
Searching process
  • All words are ranked by the prevalence of words
    in standard X language
  • Rude is a more important word that all
  • Common words (and, or) are thrown away if you
    are not looking an exact phrase

8
Standard indexing process
  • lttitlegt is important element to look by
  • ltmeta namekeywordsgt
  • ltmeta namedescriptiongt (is not used by
    Google)
  • Headlines
  • lth1gt
  • lth2gt

9
Prohibit to discover
  • robots.txt
  • http//www.robotstxt.org/wc/norobots.html
  • Security restrictions

10
Improving the site rank
  • by Google
  • Have other relevant sites link to yours.
  • Make sure all the sites that should know about
    your pages are aware your site is online.
  • Submit your site to relevant directories such as
    the Open Directory Project and Yahoo!, as well as
    to other industry-specific expert sites
  • Make a site with a clear hierarchy and text
    links. Every page should be reachable from at
    least one static text link.
  • Offer a site map to your users with links that
    point to the important parts of your site.
  • Create a useful, information-rich site, and write
    pages that clearly and accurately describe your
    content.
  • Think about the words users would type to find
    your pages, and make sure that your site actually
    includes those words within it.
  • Try to use text instead of images to display
    important names, content, or links.
  • Check for correct HTML, format etc
  • If you decide to use dynamic pages (i.e., the URL
    contains a "?" character), be aware that not
    every search engine spider crawls dynamic pages
    as well as static pages. Keep the links on a
    given page to a reasonable number (fewer than
    100).

11
Improving the site rank
  • Make use of the robots.txt file on your web
    server. This file tells crawlers which
    directories can or cannot be crawled. Visit
    http//www.robotstxt.org/wc/faq.html to learn how
    to instruct robots when they visit your site.
  • Use Google Sitemaps
  • Don't use "id" as a parameter in your URLs
  • Provide high-quality content on your pages,
    especially your homepage. This is the single most
    important thing to do. If your pages contain
    useful information, their content will attract
    many visitors and entice webmasters to link to
    your site
  • As links says how important your site and what
    your site is about

12
Improving the site rank be honest
  • basic principles
  • Make pages for users, not for search engines.
    Don't deceive your users or present different
    content to search engines than you display to
    users, which is commonly referred to as
    "cloaking."
  • Avoid tricks intended to improve search engine
    rankings. A good rule of thumb is whether you'd
    feel comfortable explaining what you've done to a
    website that competes with you.
  • Don't participate in link schemes designed to
    increase your site's ranking or PageRank. In
    particular, avoid links to web spammers or "bad
    neighborhoods" on the web, as your own ranking
    may be affected adversely by those links.
  • Don't use unauthorized computer programs to
    submit pages, check rankings, etc.
  • Some specific guidelines
  • Avoid hidden text or hidden links.
  • Don't send automated queries to Google.
  • Don't load pages with irrelevant words.
  • Don't create multiple pages, subdomains, or
    domains with substantially duplicate content.
  • Don't create pages that install viruses, trojans,
    or other badware.
  • If your site participates in an affiliate
    program, make sure that your site adds value.
    Provide unique and relevant content that gives
    users a reason to visit your site first.

13
Final notes
  • Keep structure of the site static
  • It takes a sufficient time for search engines to
    re-visit your site especially if it is not
    top-ranked
  • Other sites starting to refer to your one could
    produce the Page not found error
Write a Comment
User Comments (0)
About PowerShow.com