Title: White Hat Cloaking
1White Hat Cloaking Six Practical Applications
- Presented by Hamlet Batista
2Why white hat cloaking?
- Good vs bad cloaking is all about your
intention - Always weigh the risks versus the rewards of
cloaking - Ask permission or just dont call it cloaking!
- Cloaking vs IP delivery
3Crash course in white hat cloaking
Practical scenarios where good cloaking makes
sense
When to cloak?
1
2
Practical scenarios and alternatives
How do we cloak?
3
How can cloaking be detected?
4
Risks and next steps
5
4When is practical to cloak?
- Content accessibility
- Search unfriendly Content Management Systems
- Rich media sites
- Content behind forms
- Membership sites
- Free and paid content
- Site structure improvements
- Alternative to PR sculpting via no-follow
- Geolocation/IP delivery
- Multivariate testing
5Practical scenario 1
Proprietary website management systems that are
not search-engine friendly
Regular users see
Search engine robot sees
- URLs with many dynamic parameters
- URLs with session IDs
- URLs with canonicalization issues
- Missing titles and meta descriptions
- Search engine friendly URLs
- URLs without session IDs
- URLs with a consistent naming convention
- Automatically generated titles and meta
descriptions
6Practical scenario 2
Sites built completely in Flash, Silverlight or
any other rich media technology
Search engine robot sees
- A text representation of all graphical (images)
elements - A text representation of all motion (video)
elements - A text transcription of all audio in the rich
media content
Your text
7Practical scenario 3
Membership sites
Search users see
- Snippets of premium content on the SERPs
- When they land on the site they are faced with a
registration form
Your text
Members sees
- The same content search engine robots see
8Practical scenario 4
Sites requiring massive site strucuture changes
to improve index penetration
Regular users follow a link structure designed
for ease of navigation
Step 4
Search engine robots follow a link structure
designed for ease of crawling and deeper index
penetration of the most important content
9Practical scenario 5
Sites using geolocation technology
Regular users see
- Content tailored to their geographical location
and/or users language
Your text
Search engine robot sees
- The same content consistently
10Practical scenario 6
Split testing organic search landing pages
Each regular user sees
- One of the content experiment alternatives
Your text
Search engine robot sees
- The same content consistently
11How do we cloak?
Cloaking is performed with a web server script or
module
Search robot detection
Content delivery
- By HTTP User agent
- By IP address
- By HTTP cookie test
- By JavaScript/CSS test
- By DNS double check
- By visitor behavior
- By combining all the techniques
- Presenting the equivalent of the inaccesible
content to robots - Presenting the search-engine friendly content to
robots - Presenting the content behind forms robots
12Robot detection by HTTP user agent
A very simple robot detection technique
Search robot HTTP request
66.249.66.1 - - 04/Mar/2008002056 -0500
GET /2007/11/13/game-plan-what-marketers-can-lea
rn-from-strategy-games/ HTTP/1.1? 200 61477
- Mozilla/5.0 (compatible Googlebot/2.1
http//www.google.com/bot.html) -
13Robot detection by HTTP cookie test
Another simple robot detection technique, but
weaker
Search robot HTTP request
66.249.66.1 - - 04/Mar/2008002056 -0500
GET /2007/11/13/game-plan-what-marketers-can-lea
rn-from-strategy-games/ HTTP/1.1? 200 61477
- Mozilla/5.0 (compatible Googlebot/2.1
http//www.google.com/bot.html) Missing
cookie info
14Robot detection by JavaScript/CSS test
Another option for robot detection
DHTML Content
HTML Code ltdiv id"header"gtlth1gtlta
href"http//www.example.com" title"Example
Site"gtExample sitelt/agtlt/h1gtlt/divgt and the CSS
code is pretty straight forward, it swaps out
anything in the h1 tag in the header with an
image CSS Code / CSS Image replacement
/ header h1 margin0 padding0 header h1 a
display block padding 150px 0 0
0 background url(path to image) top right
no-repeat overflow hidden font-size
1px line-height 1px height 0px
!important height //150px
15Robot detection by IP address
A more robust robot detection technique
Search robot HTTP request
66.249.66.1 - - 04/Mar/2008002056 -0500
GET /2007/11/13/game-plan-what-marketers-can-lea
rn-from-strategy-games/ HTTP/1.1? 200 61477
- Mozilla/5.0 (compatible Googlebot/2.1
http//www.google.com/bot.html) -
16Robot detection by double DNS check
A more robust robot detection technique
Search robot HTTP request
- nslookup
- 66.249.66.1
- Name crawl-66-249-66-1.googlebot.com
- Address 66.249.66.1
- crawl-66-249-66-1.googlebot.com
- Non-authoritative answer
- Name crawl-66-249-66-1.googlebot.com
- Address 66.249.66.1
17Robot detection by visitor behavior
Robots differ substantially from regular users
when visiting a website
Your text
18Combining the best of all techniques
Maintain a cache with a list of known search
robots to reduce the number of verification
attempts
Label a robot anything that identifies as such
User Agent Check
User Behavior Check
Double DNS check
IP Address Check
Label as possible robot any visitor with
suspicious behavior
Confirm it is a robot by doing a double DNS
check. Also confirm suspect robots
19Clever cloaking detection
A clever detection technique is to check the
caches at the newest datacenters
- IP-based detection techniques rely on an
up-to-date list of robot IPs - Search engines change IPs on a regular basis
- It is possible to identify those new IPs and
check the cache
Your text
20Risks of cloaking
Search engines do not want to accept any type of
cloaking
Survival tips
- The safest way to cloak is to ask for permission
from each of the search engines that you care
about - Refer to it as IP delivery.
- Cloaking Serving different content to users than
to Googlebot. This is a violation of our
webmaster guidelines. If the file that Googlebot
sees is not identical to the file that a typical
user sees, then you're in a high-risk category. A
program such as md5sum or diff can compute a hash
to verify that two different files are identical. - http//googlewebmastercentral.blogspot.com/2008/06
/how-google-defines-ip-delivery.html
Your text
21Next Steps
- Make sure clients understand the risks/rewards of
implementing white hat cloaking - More information and how to get started
- How Google defines IP delivery, geolocation and
cloaking http//googlewebmastercentral.blogspot.co
m/2008/06/how-google-defines-ip-delivery.html - First Click Free http//googlenewsblog.blogspot.co
m/2007/09/first-click-free.html - Good Cloaking, Evil Cloaking and Detection
http//searchengineland.com/070301-065358.php - YADAC Yet Another Debate About Cloaking Happens
Again http//searchengineland.com/070304-231603.ph
p - Cloaking is OK Says Google http//blog.venture-ski
lls.co.uk/2007/07/06/cloaking-is-ok-says-google/ - Advanced Cloaking Technique How to feed
password-protected content to search engine
spiders http//hamletbatista.com/2007/09/03/advanc
ed-cloaking-technique-how-to-feed-password-protect
ed-content-to-search-engine-spiders/
22- Blog http//hamletbatista.com
- LinkedIn http//www.linkedin.com/in/hamletbatista
- Facebook http//www.facebook.com/people/Hamlet_Bat
ista/613808617 - Twitter http//twitter.com/hamletbatista
- E-mail hamlet_at_hamletbatista.com
?
Feel free to contact me
?
?
I would be happy to help.