Bayesian Filtering AntiPhishing Toolbar Benefits presentation

About This Presentation

Transcript and Presenter's Notes

Title: Bayesian Filtering AntiPhishing Toolbar Benefits

1
Bayesian Filtering Anti-Phishing Toolbar Benefits

P. Likarish, E. Jung,
D. Dunbar, T. E. Hansen, and J.-P. Hourcade
12/04/07
presented by EJ Jung

2
Phishing
3
Why study phishing?

Identity Theft
One of fastest growing crimes
15 million Americans/year, 2.8 billion dollars

Gartner, Inc. 2007 press release.
http//www.gartner.com/it/page.jsp?id501912,
March 2007 Phishing report. http//apwg.org
4
Phishing leads into malware
Phishing report. Trojans and keyloggers.
http//apwg.org
5
Phishing and botnet into black market (Franklin
et al, 2007)

6 months of IRC log

6
and into national security threat

FBI director Robert Muller says
Younis Tsouli, and his colleagues stole thousands
of credit card accounts through phishing schemes.
They ran up charges of more than 3 million for
items they thought fellow extremists might need,
from night vision goggles to GPS devices.
botnet is Swiss Army Knifes of hackers

7
Phishing attack
8
Anti-Phishing Tools

Client or server side?
server side protection is limited
server-client cooperation
hash of system
Client side is more common
web browser toolbar
password management

9
Early Efforts

Largely heuristics-based
Set of rules developed by experts
Still used by most anti-phishing tools
Examples
IE7 phishing filter
SpoofGuard

10
SpoofGuard

IE6 toolbar
Developed by Chou, Ledesma, Teraguchi, Boneh,
Mitchell at Stanford
Heuristicswhitelist

N. Chou, R. Ledesma, Y. Teraguchi, D. Boneh, and
J. C. Mitchell. Client-side defense against
web-based identity theft. In NDSS '04
Proceedings of the 11th Annual Network and
Distributed System Security Symposium, February
2004
11
Stateless Heuristics

URL check
Suspicious URLs _at_, IP, hex
Image check
Hashed image database
Image hashing
Produces same hash for similar images
Link check
Fails if gt¼ of links fail URL check
Password check

12
Stateful Heuristics

Domain check
Hamming distance to known domains
Referrals
From email site?
May require DNS lookup
Image-domain association
Extension of hashed image heuristic
ltimage, URLgt tuples

13
Scoring
TSS Total Spoof Score
0
Ex P1 URL check (0 if page passes, 1 if it
fails) w1 .2
Source N. Chou, R. Ledesma, Y. Teraguchi, D.
Boneh, and J. C. Mitchell. Client-side defense
against web-based identity theft. In NDSS '04
Proceedings of the 11th Annual Network and
Distributed System Security Symposium, February
2004
14
Drawbacks to Heuristics

Difficult to develop accurate rules
Large number of false positives and negatives
Heuristics dont evolvephishing sites do.

M. Sahami, S. Dumais, D. Heckerman, and E.
Horvitz. A Bayesian approach to filtering junk
e-mail. In AAAI Workshop on Learning for Text
Categorization, July 1998. Y. Zhang, J. I.
Hong, and L. F. C Y. Zhang, J. I. Hong, and L. F.
Cranor. CANTINA a content-based approach to
detecting phishing web sites. In WWW '07
Proceedings of the 16th international conference
on World Wide Web, pages 639648, New York, NY,
USA, 2007. ACM Press.
15
Next Blacklist/Whitelist

2004-current
Largely blacklist-based
rely on phishing site reports
still used by most anti-phishing tools
Examples
IE7 phishing filter
Firefox 2 phishing protection Google
safe-browsing
Netcraft Toolbar

Netcraft Ltd. http//toolbar.netcraft.com
16
Drawbacks to Blacklist/Whitelist

Need reliable and timely sources for reports
Window of vulnerability
after site launch before being blacklisted
avg lifetime of a phishing site 3 days
avg lifetime after blacklisted 22 hours
cost of undoing identity theft priceless
adapt classification methods
-CANTINA, B-APT

Y. Zhang, J. I. Hong, and L. F. Cranor. CANTINA
a content-based approach to detecting phishing
web sites. In WWW '07 Proceedings of the 16th
international conference on World Wide Web, pages
639648, New York, NY, USA, 2007. ACM Press.
17
CANTINA

Technique
TF-IDF Robust Hyperlinks
Domain name
Heuristics

Y. Zhang, J. I. Hong, and L. F. Cranor. CANTINA
a content-based approach to detecting phishing
web sites. In WWW '07 Proceedings of the 16th
international conference on World Wide Web, pages
639648, New York, NY, USA, 2007. ACM Press.

18
TF-IDF

Text classification technique
Information retrieval
Term Frequency-Inverse Document Frequency
Importance of a word in a document in a given
corpus
Document website
Corpus English language

19
TF-IDF, contd.

Source for equations http//en.wikipedia.org/wiki
/Tf-idf

20
Robust Hyperlinks

Phelps and Wilensky
TF-IDF on all words on page
Lexical signature
5 words with highest TF-IDF scores
Almost uniquely id 1,000,000,000 pages

21
TF-IDF Hyperlinks in CANTINA

Calculate lexical signature
Google search on signature
If domain name is within top 30 hits, site is
legitimate
Otherwise, it is phishing
Results
94 true positives 30 false positives

22
Improving on TF-IDF

Add domain name to Google search
97
30
TF-IDF Zero results-Means-Phishing domain
name
97 t.p. 10 f.p.

? 67 t.p.
? 10 f.p.
23
Adding heuristics to CANTINA

Heuristics from SpoofGuard and other sources
Trade-off
Reduces true positive accuracy
97 ? 89 t.p.
Reduces false positive rate
10 ? 1 f.p.

24
Drawbacks to CANTINA

Relies on outside sources for information
Google
Requires heuristics to reduce false positives
Reduces accuracy
Language-specific
Different corpus for each foreign language
Difficulties with East Asian languages
Unacceptable false positive rate
Misclassifications undermine user confidence in
tool

25
CANTINA vs. Netcraft

classificationheuristics vs. blacklistheuristics
True positives
CANTINA 97 (or, 89)
Netcraft toolbar 97
(SpoofGuard 91)
False positives
CANTINA 6 (or, 1)
Netcraft toolbar 0
(SpoofGuard 48)

26
B-APT Bayesian Anti-Phishing toolbar

Firefox browser toolbar
will extend to other browsers
goals detect, communicate, and educate
Bayesian filtering whitelist
similar to spam filtering
different from spam filtering
phishing sites mirror legitimate sites
hard to find training set (inbox vs. blacklist
database)
comprehensive whitelist
Innovative UI
no known effective security indicators for
warning user of phishing sites (Dhamija, 2006
Wu, 2007)

27
Bayesian classification

Bayes law on conditional probability
Pros
easy to compute
training and tayloring
Cons
assume independence among words
Bayesian poisoning

28
Implementation details

Training on phishing pages and legitimate pages
Phishtrack HTML of phishing pages
1200 phishing sites 160 unique sites
Alexa top 500 most popular websites
same KBs of phishing sites (17k vs 64k tokens)

http//www.dslreports.com/phishtrack http//www
.alexa.com/
29
B-APT detecting phishing sites
Anti-phishing tools tested on 60 phishing sites
30
B-APT detecting legitimate sites
Anti-phishing tools tested on 60 legitimate sites
31
Summary

Classification heuristics do well
B-APT has no false negative, some false positive
working on communicating false positives
detect, communicate, and educate
Use of any toolbar is better than none
the least number was 42 of IE7
blacklist-based ones get better as time passes
(Zhang, 2007)
Beware of malware
Badware.org with Google

Write a Comment

User Comments (0)

About PowerShow.com

Bayesian Filtering AntiPhishing Toolbar Benefits PowerPoint PPT Presentation