Bayesian Filtering AntiPhishing Toolbar Benefits PowerPoint PPT Presentation

presentation player overlay
1 / 29
About This Presentation
Transcript and Presenter's Notes

Title: Bayesian Filtering AntiPhishing Toolbar Benefits


1
Bayesian Filtering Anti-Phishing Toolbar Benefits
  • P. Likarish, E. Jung,
  • D. Dunbar, T. E. Hansen, and J.-P. Hourcade
  • 12/04/07
  • presented by EJ Jung

2
Phishing
3
Why study phishing?
  • Identity Theft
  • One of fastest growing crimes
  • 15 million Americans/year, 2.8 billion dollars

Gartner, Inc. 2007 press release.
http//www.gartner.com/it/page.jsp?id501912,
March 2007 Phishing report. http//apwg.org
4
Phishing leads into malware
Phishing report. Trojans and keyloggers.
http//apwg.org
5
Phishing and botnet into black market (Franklin
et al, 2007)
  • 6 months of IRC log

6
and into national security threat
  • FBI director Robert Muller says
  • Younis Tsouli, and his colleagues stole thousands
    of credit card accounts through phishing schemes.
    They ran up charges of more than 3 million for
    items they thought fellow extremists might need,
    from night vision goggles to GPS devices.
  • botnet is Swiss Army Knifes of hackers

7
Phishing attack
8
Anti-Phishing Tools
  • Client or server side?
  • server side protection is limited
  • server-client cooperation
  • hash of system
  • Client side is more common
  • web browser toolbar
  • password management

9
Early Efforts
  • Largely heuristics-based
  • Set of rules developed by experts
  • Still used by most anti-phishing tools
  • Examples
  • IE7 phishing filter
  • SpoofGuard

10
SpoofGuard
  • IE6 toolbar
  • Developed by Chou, Ledesma, Teraguchi, Boneh,
    Mitchell at Stanford
  • Heuristicswhitelist

N. Chou, R. Ledesma, Y. Teraguchi, D. Boneh, and
J. C. Mitchell. Client-side defense against
web-based identity theft. In NDSS '04
Proceedings of the 11th Annual Network and
Distributed System Security Symposium, February
2004
11
Stateless Heuristics
  • URL check
  • Suspicious URLs _at_, IP, hex
  • Image check
  • Hashed image database
  • Image hashing
  • Produces same hash for similar images
  • Link check
  • Fails if gt¼ of links fail URL check
  • Password check

12
Stateful Heuristics
  • Domain check
  • Hamming distance to known domains
  • Referrals
  • From email site?
  • May require DNS lookup
  • Image-domain association
  • Extension of hashed image heuristic
  • ltimage, URLgt tuples

13
Scoring
TSS Total Spoof Score
0
Ex P1 URL check (0 if page passes, 1 if it
fails) w1 .2
Source N. Chou, R. Ledesma, Y. Teraguchi, D.
Boneh, and J. C. Mitchell. Client-side defense
against web-based identity theft. In NDSS '04
Proceedings of the 11th Annual Network and
Distributed System Security Symposium, February
2004
14
Drawbacks to Heuristics
  • Difficult to develop accurate rules
  • Large number of false positives and negatives
  • Heuristics dont evolvephishing sites do.

M. Sahami, S. Dumais, D. Heckerman, and E.
Horvitz. A Bayesian approach to filtering junk
e-mail. In AAAI Workshop on Learning for Text
Categorization, July 1998. Y. Zhang, J. I.
Hong, and L. F. C Y. Zhang, J. I. Hong, and L. F.
Cranor. CANTINA a content-based approach to
detecting phishing web sites. In WWW '07
Proceedings of the 16th international conference
on World Wide Web, pages 639648, New York, NY,
USA, 2007. ACM Press.
15
Next Blacklist/Whitelist
  • 2004-current
  • Largely blacklist-based
  • rely on phishing site reports
  • still used by most anti-phishing tools
  • Examples
  • IE7 phishing filter
  • Firefox 2 phishing protection Google
    safe-browsing
  • Netcraft Toolbar

Netcraft Ltd. http//toolbar.netcraft.com
16
Drawbacks to Blacklist/Whitelist
  • Need reliable and timely sources for reports
  • Window of vulnerability
  • after site launch before being blacklisted
  • avg lifetime of a phishing site 3 days
  • avg lifetime after blacklisted 22 hours
  • cost of undoing identity theft priceless
  • adapt classification methods
  • -CANTINA, B-APT

Y. Zhang, J. I. Hong, and L. F. Cranor. CANTINA
a content-based approach to detecting phishing
web sites. In WWW '07 Proceedings of the 16th
international conference on World Wide Web, pages
639648, New York, NY, USA, 2007. ACM Press.
17
CANTINA
  • Technique
  • TF-IDF Robust Hyperlinks
  • Domain name
  • Heuristics
  • Y. Zhang, J. I. Hong, and L. F. Cranor. CANTINA
    a content-based approach to detecting phishing
    web sites. In WWW '07 Proceedings of the 16th
    international conference on World Wide Web, pages
    639648, New York, NY, USA, 2007. ACM Press.

18
TF-IDF
  • Text classification technique
  • Information retrieval
  • Term Frequency-Inverse Document Frequency
  • Importance of a word in a document in a given
    corpus
  • Document website
  • Corpus English language

19
TF-IDF, contd.
  • Source for equations http//en.wikipedia.org/wiki
    /Tf-idf

20
Robust Hyperlinks
  • Phelps and Wilensky
  • TF-IDF on all words on page
  • Lexical signature
  • 5 words with highest TF-IDF scores
  • Almost uniquely id 1,000,000,000 pages

21
TF-IDF Hyperlinks in CANTINA
  • Calculate lexical signature
  • Google search on signature
  • If domain name is within top 30 hits, site is
    legitimate
  • Otherwise, it is phishing
  • Results
  • 94 true positives 30 false positives

22
Improving on TF-IDF
  • Add domain name to Google search
  • 97
  • 30
  • TF-IDF Zero results-Means-Phishing domain
    name
  • 97 t.p. 10 f.p.

? 67 t.p.
? 10 f.p.
23
Adding heuristics to CANTINA
  • Heuristics from SpoofGuard and other sources
  • Trade-off
  • Reduces true positive accuracy
  • 97 ? 89 t.p.
  • Reduces false positive rate
  • 10 ? 1 f.p.

24
Drawbacks to CANTINA
  • Relies on outside sources for information
  • Google
  • Requires heuristics to reduce false positives
  • Reduces accuracy
  • Language-specific
  • Different corpus for each foreign language
  • Difficulties with East Asian languages
  • Unacceptable false positive rate
  • Misclassifications undermine user confidence in
    tool

25
CANTINA vs. Netcraft
  • classificationheuristics vs. blacklistheuristics
  • True positives
  • CANTINA 97 (or, 89)
  • Netcraft toolbar 97
  • (SpoofGuard 91)
  • False positives
  • CANTINA 6 (or, 1)
  • Netcraft toolbar 0
  • (SpoofGuard 48)

26
B-APT Bayesian Anti-Phishing toolbar
  • Firefox browser toolbar
  • will extend to other browsers
  • goals detect, communicate, and educate
  • Bayesian filtering whitelist
  • similar to spam filtering
  • different from spam filtering
  • phishing sites mirror legitimate sites
  • hard to find training set (inbox vs. blacklist
    database)
  • comprehensive whitelist
  • Innovative UI
  • no known effective security indicators for
    warning user of phishing sites (Dhamija, 2006
    Wu, 2007)

27
Bayesian classification
  • Bayes law on conditional probability
  • Pros
  • easy to compute
  • training and tayloring
  • Cons
  • assume independence among words
  • Bayesian poisoning

28
Implementation details
  • Training on phishing pages and legitimate pages
  • Phishtrack HTML of phishing pages
  • 1200 phishing sites 160 unique sites
  • Alexa top 500 most popular websites
  • same KBs of phishing sites (17k vs 64k tokens)

http//www.dslreports.com/phishtrack http//www
.alexa.com/
29
B-APT detecting phishing sites
Anti-phishing tools tested on 60 phishing sites
30
B-APT detecting legitimate sites
Anti-phishing tools tested on 60 legitimate sites
31
Summary
  • Classification heuristics do well
  • B-APT has no false negative, some false positive
  • working on communicating false positives
  • detect, communicate, and educate
  • Use of any toolbar is better than none
  • the least number was 42 of IE7
  • blacklist-based ones get better as time passes
    (Zhang, 2007)
  • Beware of malware
  • Badware.org with Google
Write a Comment
User Comments (0)
About PowerShow.com