Measurement and Classification of Humans and Bots in Internet Chat - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

Measurement and Classification of Humans and Bots in Internet Chat

Description:

... Humans and Bots in Internet Chat ... must pass to join a chat room ... 21 chat rooms. Measurement. To create our dataset, we read and label the chat users as ... – PowerPoint PPT presentation

Number of Views:106
Avg rating:3.0/5.0
Slides: 42
Provided by: stevengia
Category:

less

Transcript and Presenter's Notes

Title: Measurement and Classification of Humans and Bots in Internet Chat


1
Measurement and Classification of Humans and Bots
in Internet Chat
  • By Steven Gianvecchio, Mengjun Xie, Zhenyu Wu,
    and Haining Wang
  • College of William and Mary

2
Outline
  • Background
  • Measurement
  • Classification System
  • Experimental Evaluation
  • Conclusion

3
Outline
  • Background
  • Measurement
  • Classification System
  • Experimental Evaluation
  • Conclusion

4
Bots
  • Bots - programs that automate human tasks
  • web bots automate browsing the web
  • chat bots automate online chat
  • can be harmful and/or helpful

5
Chat Bots vs. BotNets
  • BotNets networks of compromised machines
  • some use chat systems (IRC) for CC, others use
    P2P, HTTP, etc.
  • abuse various systems
  • Chat Bots automated chat programs
  • some are helpful, e.g., chat loggers
  • can abuse chat systems and their users

6
The Chat Bot Problem
  • The Problem chat bots abuse chat services
    (e.g., AOL, Yahoo!, MSN)
  • send spam
  • spread malicious software
  • mount phishing attacks
  • Our focus is on the Yahoo! chat system

7
A Typical Chat
  • Alice12 entered the room.

Alice12 entered the room. Alice12 Hi room.
Alice12 entered the room. Alice12 Hi
room. Bob34 hi alice
Alice12 entered the room. Alice12 Hi
room. Bob34 hi alice Susie88 any guys want to
let a cute girl move in with them! hehe
Alice12 entered the room. Alice12 Hi
room. Bob34 hi alice Susie88 any guys want to
let a cute girl move in with them! hehe Alice12
Whats up?
Alice12 entered the room. Alice12 Hi
room. Bob34 hi alice Susie88 any guys want to
let a cute girl move in with them! hehe Alice12
Whats up? Bob34 not much
Alice12 entered the room. Alice12 Hi
room. Bob34 hi alice Susie88 any guys want to
let a cute girl move in with them! hehe Alice12
Whats up? Bob34 not much Susie88 can you guys
see me on my web-cam?? (its in my profile)
8
Yahoo! Chat
  • Yahoo! chat is a large commercial chat service
  • over 3,000 chat rooms

AUTH, CHAT, IM,
9
Yahoo! Chat
  • Yahoo! chat system
  • client connects to a server
  • servers relay messages to/from clients

10
Outline
  • Background
  • Measurement
  • Classification System
  • Experimental Evaluation
  • Conclusion

11
Measurement
  • August-November 2007 we collect data
  • August 2007 Yahoo! adds CAPTCHA
  • must pass to join a chat room
  • protocol update, prevents some 3rd party clients
    from accessing chat
  • October 2007 bots are back
  • some bots return before 3rd party clients

12
Measurement
  • September and October 2007
  • very few chat bots
  • August and November 2007
  • many chat bots
  • 1,440 hours of chat logs
  • 147 chat logs
  • 21 chat rooms

13
Measurement
  • To create our dataset, we read and label the chat
    users as
  • human, bot, or ambiguous
  • In total, we recognized 14 different types of
    chat bots
  • different triggering mechanisms
  • different text generation techniques

14
Triggering Mechanisms
  • Timer-Based
  • periodic timers, e.g., 40 seconds
  • random timers, e.g., 45-125 seconds
  • Response-Based
  • responds to other users
  • Sam77 Bob12, youre just full of questions,
    arent you?
  • Sam77 Bob12, lots of evidence for evolution can
    be found here http//

15
Text Generation
  • Character Padding
  • Fiona88 anyone boredjn wanna chat?uklcss
  • Synonym Phrases
  • Marjorie99 Hi Babes! Marjorie Here! Inspect My
    Site
  • Marjorie99 Mmmm Folks! Im Marjorie! View My
    Webpage
  • Odd Line or Word Spacing
  • Message Replay

16
Types of Chat Bots
  • Periodic Bots sends messages based on periodic
    timers
  • Random Bots sends messages based on random
    timers
  • Responder Bots responds to messages of other
    users
  • Replay Bots replays messages of other users

17
  • Humans
  • inter-message delay evidence of heavy tail
  • message size well fit by Exponential (?0.034)

18
  • Periodic Bots
  • inter-message delay several clusters with high
    probabilities
  • message size messages built from templates
    approximate a normal distribution

19
  • Random Bots
  • inter-message delay Equilikely distribution at
    40, 64, and 88 Uniform distribution 45-125
  • message size messages selected from a small
    database

20
  • Responder Bots
  • inter-message delay human-like timing
  • message size multiple templates of different
    lengths

21
  • Replay Bots
  • inter-message delay cluster with high
    probabilities (replay bots are periodic)
  • message size human-like size, well fit by
    Exponential (?0.028)

22
Outline
  • Background
  • Measurement
  • Classification System
  • Experimental Evaluation
  • Conclusion

23
Classification System
  • Entropy Classifier
  • detects abnormal behavior
  • based on message sizes and inter-message delays
  • accurate but slow
  • Machine Learning Classifier
  • detects learned patterns
  • based on message content
  • fast but must be trained

24
Entropy Classifier
  • Observation chat bots are less complex than
    humans, and thus, lower in entropy
  • exploits the low entropy of chat bots
  • Corrected Conditional Entropy Test (CCE)
  • estimates higher-order entropy
  • Entropy Test (EN)
  • estimates first-order entropy

24
25
Machine Learning Classifier
  • Observation - chat spam like email spam is a text
    classification problem
  • exploits message content of chat bots
  • CRM114
  • a powerful text classification system
  • several built-in classifiers HMM,
    KNN/Hyperspace, OSB, SVM, Winnow, etc.
  • we use OSB

26
  • Hybrid Classification System
  • entropy classifier builds and maintains the bot
    corpus
  • machine learning classifier uses the bot and
    human corpora

27
Outline
  • Background
  • Measurement
  • Classification System
  • Experimental Evaluation
  • Conclusion

28
Experimental Evaluation
  • Types of Chat Bots
  • Periodic Bots
  • Random Bots
  • Responder Bots
  • Replay Bots
  • Classifiers
  • entropy classifier 100 messages
  • machine learning classifier 25 messages

29
Experimental Evaluation
  • Classification Tests
  • Ent entropy classifier
  • SupML fully-supervised ML classifier, trained
    on AUG BOTS
  • SupMLre fully-supervised ML classifier,
    retrained on NOV BOTS
  • EntML entropy-trained ML

30
  • Entropy Classifier
  • EN entropy
  • CCE corrected conditional entropy
  • (imd) inter-message delay
  • (ms) message size

31
  • EN(imd) and CCE(imd)
  • problems against responder bots
  • detect most other chat bots

32
  • EN(ms) and CCE(ms)
  • problems against random and replay bots
  • detect most other chat bots

33
  • OVERALL
  • detects all chat bots
  • false positive rate is 0.01
  • 100 messages

34
  • Entropy and Machine Learning Classifiers
  • Ent entropy classifier (from last slide)
  • SupML fully-supervised machine learning
  • SupMLre SupML retrained
  • EntML entropy-trained machine learning

35
  • Ent
  • OVERALL results from previous slide

36
  • SupML
  • has problems against November bots
  • needs to be retrained for new bots
  • SupMLre
  • detects all bots

37
  • EntML
  • false positive rate is 0.0005
  • (Ent is 0.01)
  • 25 messages

38
Outline
  • Background
  • Measurement
  • Classification System
  • Experimental Evaluation
  • Conclusion

39
Conclusion
  • Measurements
  • overall, chat bots are less complex than humans
  • some chat bots more human-like
  • Classification System
  • exploits benefits of both classifiers
  • quickly classifies known chat bots
  • accurately classifies unknown chat bots

40
Conclusion (cont.)
  • Future Work
  • investigate more advanced chat bots
  • explore applications of entropy on other forms of
    bots (e.g., web bots)
  • explore other applications of entropy (e.g.,
    detecting covert timing channels)

41
Questions?
Thank You!
Write a Comment
User Comments (0)
About PowerShow.com