White Hat Cloaking - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

White Hat Cloaking

Description:

White Hat Cloaking Six Practical Applications Presented by Hamlet Batista – PowerPoint PPT presentation

Number of Views:79
Avg rating:3.0/5.0
Slides: 23
Provided by: Haml97
Category:

less

Transcript and Presenter's Notes

Title: White Hat Cloaking


1
White Hat Cloaking Six Practical Applications
  • Presented by Hamlet Batista

2
Why white hat cloaking?
  • Good vs bad cloaking is all about your
    intention
  • Always weigh the risks versus the rewards of
    cloaking
  • Ask permission or just dont call it cloaking!
  • Cloaking vs IP delivery

3
Crash course in white hat cloaking
Practical scenarios where good cloaking makes
sense
When to cloak?
1
2
Practical scenarios and alternatives
How do we cloak?
3
How can cloaking be detected?
4
Risks and next steps
5
4
When is practical to cloak?
  • Content accessibility
  • Search unfriendly Content Management Systems
  • Rich media sites
  • Content behind forms
  • Membership sites
  • Free and paid content
  • Site structure improvements
  • Alternative to PR sculpting via no-follow
  • Geolocation/IP delivery
  • Multivariate testing

5
Practical scenario 1
Proprietary website management systems that are
not search-engine friendly
Regular users see
Search engine robot sees
  • URLs with many dynamic parameters
  • URLs with session IDs
  • URLs with canonicalization issues
  • Missing titles and meta descriptions
  • Search engine friendly URLs
  • URLs without session IDs
  • URLs with a consistent naming convention
  • Automatically generated titles and meta
    descriptions

6
Practical scenario 2
Sites built completely in Flash, Silverlight or
any other rich media technology
Search engine robot sees
  • A text representation of all graphical (images)
    elements
  • A text representation of all motion (video)
    elements
  • A text transcription of all audio in the rich
    media content

Your text
7
Practical scenario 3
Membership sites
Search users see
  • Snippets of premium content on the SERPs
  • When they land on the site they are faced with a
    registration form

Your text
Members sees
  • The same content search engine robots see

8
Practical scenario 4
Sites requiring massive site strucuture changes
to improve index penetration
Regular users follow a link structure designed
for ease of navigation
Step 4
Search engine robots follow a link structure
designed for ease of crawling and deeper index
penetration of the most important content
9
Practical scenario 5
Sites using geolocation technology
Regular users see
  • Content tailored to their geographical location
    and/or users language

Your text
Search engine robot sees
  • The same content consistently

10
Practical scenario 6
Split testing organic search landing pages
Each regular user sees
  • One of the content experiment alternatives

Your text
Search engine robot sees
  • The same content consistently

11
How do we cloak?
Cloaking is performed with a web server script or
module
Search robot detection
Content delivery
  • By HTTP User agent
  • By IP address
  • By HTTP cookie test
  • By JavaScript/CSS test
  • By DNS double check
  • By visitor behavior
  • By combining all the techniques
  • Presenting the equivalent of the inaccesible
    content to robots
  • Presenting the search-engine friendly content to
    robots
  • Presenting the content behind forms robots

12
Robot detection by HTTP user agent
A very simple robot detection technique
Search robot HTTP request
66.249.66.1 - - 04/Mar/2008002056 -0500
GET /2007/11/13/game-plan-what-marketers-can-lea
rn-from-strategy-games/ HTTP/1.1? 200 61477
- Mozilla/5.0 (compatible Googlebot/2.1
http//www.google.com/bot.html) -
13
Robot detection by HTTP cookie test
Another simple robot detection technique, but
weaker
Search robot HTTP request
66.249.66.1 - - 04/Mar/2008002056 -0500
GET /2007/11/13/game-plan-what-marketers-can-lea
rn-from-strategy-games/ HTTP/1.1? 200 61477
- Mozilla/5.0 (compatible Googlebot/2.1
http//www.google.com/bot.html) Missing
cookie info
14
Robot detection by JavaScript/CSS test
Another option for robot detection
DHTML Content
HTML Code ltdiv id"header"gtlth1gtlta
href"http//www.example.com" title"Example
Site"gtExample sitelt/agtlt/h1gtlt/divgt and the CSS
code is pretty straight forward, it swaps out
anything in the h1 tag in the header with an
image CSS Code / CSS Image replacement
/ header h1 margin0 padding0 header h1 a
display block padding 150px 0 0
0 background url(path to image) top right
no-repeat overflow hidden font-size
1px line-height 1px height 0px
!important height //150px
15
Robot detection by IP address
A more robust robot detection technique
Search robot HTTP request
66.249.66.1 - - 04/Mar/2008002056 -0500
GET /2007/11/13/game-plan-what-marketers-can-lea
rn-from-strategy-games/ HTTP/1.1? 200 61477
- Mozilla/5.0 (compatible Googlebot/2.1
http//www.google.com/bot.html) -
16
Robot detection by double DNS check
A more robust robot detection technique
Search robot HTTP request
  • nslookup
  • 66.249.66.1
  • Name crawl-66-249-66-1.googlebot.com
  • Address 66.249.66.1
  • crawl-66-249-66-1.googlebot.com
  • Non-authoritative answer
  • Name crawl-66-249-66-1.googlebot.com
  • Address 66.249.66.1

17
Robot detection by visitor behavior
Robots differ substantially from regular users
when visiting a website
Your text
18
Combining the best of all techniques
Maintain a cache with a list of known search
robots to reduce the number of verification
attempts
Label a robot anything that identifies as such
User Agent Check
User Behavior Check
Double DNS check
IP Address Check
Label as possible robot any visitor with
suspicious behavior
Confirm it is a robot by doing a double DNS
check. Also confirm suspect robots
19
Clever cloaking detection
A clever detection technique is to check the
caches at the newest datacenters
  • IP-based detection techniques rely on an
    up-to-date list of robot IPs
  • Search engines change IPs on a regular basis
  • It is possible to identify those new IPs and
    check the cache

Your text
20
Risks of cloaking
Search engines do not want to accept any type of
cloaking
Survival tips
  • The safest way to cloak is to ask for permission
    from each of the search engines that you care
    about
  • Refer to it as IP delivery.
  • Cloaking Serving different content to users than
    to Googlebot. This is a violation of our
    webmaster guidelines. If the file that Googlebot
    sees is not identical to the file that a typical
    user sees, then you're in a high-risk category. A
    program such as md5sum or diff can compute a hash
    to verify that two different files are identical.
  • http//googlewebmastercentral.blogspot.com/2008/06
    /how-google-defines-ip-delivery.html

Your text
21
Next Steps
  • Make sure clients understand the risks/rewards of
    implementing white hat cloaking
  • More information and how to get started
  • How Google defines IP delivery, geolocation and
    cloaking http//googlewebmastercentral.blogspot.co
    m/2008/06/how-google-defines-ip-delivery.html
  • First Click Free http//googlenewsblog.blogspot.co
    m/2007/09/first-click-free.html
  • Good Cloaking, Evil Cloaking and Detection
    http//searchengineland.com/070301-065358.php
  • YADAC Yet Another Debate About Cloaking Happens
    Again http//searchengineland.com/070304-231603.ph
    p
  • Cloaking is OK Says Google http//blog.venture-ski
    lls.co.uk/2007/07/06/cloaking-is-ok-says-google/
  • Advanced Cloaking Technique How to feed
    password-protected content to search engine
    spiders http//hamletbatista.com/2007/09/03/advanc
    ed-cloaking-technique-how-to-feed-password-protect
    ed-content-to-search-engine-spiders/

22
  • Blog http//hamletbatista.com
  • LinkedIn http//www.linkedin.com/in/hamletbatista
  • Facebook http//www.facebook.com/people/Hamlet_Bat
    ista/613808617
  • Twitter http//twitter.com/hamletbatista
  • E-mail hamlet_at_hamletbatista.com

?
Feel free to contact me
?
?
I would be happy to help.
Write a Comment
User Comments (0)
About PowerShow.com