Title: Botnets and Spam
1Botnets and Spam
- Anirudh RamachandranNick FeamsterDavid Dagon
- Georgia Tech
2Project History
- Initial goal Construct spam filters based on
network-level properties, rather than content - Content-based properties are malleable
- Low cost to evasion Spammers can easily alter
content - High admin cost Filters must be continually
updated - Content-based filters are applied at the
destination - Too little, too late Wasted network bandwidth,
storage, etc. - Discovery One of the most telling
network-level properties is botnet membership
3Network-level Spam Filtering
- Network-level properties are more fixed
- Hosting or upstream ISP (AS number)
- Location in the network
- IP address block
- Routes to destination
- Botnet membership
-
- Which properties are most useful for
distinguishing spam traffic from legitimate
email?
Very little (if anything) is known about these
characteristics!
4Spamming Techniques
- Mostly botnets, of course
- Other techniques, too
- Were trying to quantify this
- Coordination
- Characteristics
- How were doing this
- Correlation with Bobax victims
- from Georgia Tech botnet sinkhole
- Correlation with routing data
5Outline
- Data Collection
- Characteristics of spamming bots
- IP address space
- OSes
- ASes
- Effectiveness of blacklists
- Cloaking techniques
- Other mitigation strategies
- DNSBL Counter-intelligence
- Network flow monitoring
6Collection
- Two domains instrumented with MailAvenger (both
on same network) - Sinkhole domain 1
- Continuous spam collection since Aug 2004
- No real email addresses---sink everything
- 10 million pieces of spam
- Sinkhole domain 2
- Recently registered domain (Nov 2005)
- Clean control domain posted at a few places
- Not much spam yetperhaps we are being too
conservative - Monitoring BGP route advertisements from same
network - Also capturing traceroutes, DNSBL results,
passive TCP host fingerprinting simultaneous with
spam arrival
7Data Collection Setup
Exchange 1
Exchange 2
8Mail Collection MailAvenger
- Highly configurable SMTP server that collects
many useful statistics
9Distribution of Spam across IP Space
Fraction
/24 prefix
10Spam From Botnets
- Example Bobax
- Approximate size 100k bots
11Distribution Across Operating Systems
About 4 of known hosts are non-Windows. These
hosts are responsible for about 8 of received
spam.
12Distribution across ASes
Still about 40 of spam coming from the U.S.
13Effectiveness of IP-Based Blacklists
- More than half of client IPs appear less than
twice
Fraction of clients
Number of appearances
14Most Bot IP addresses do not return
Percentage of bots
65 of bots only send mail to a domain once over
18 months
Lifetime (seconds)
Collaborative spam filtering seems to be helping
track bot IP addresses
15Most Bots Send Low Volumes of Spam
Most bot IP addresses send very little spam,
regardless of how long they have been spamming
Amount of Spam
Lifetime (seconds)
16Mitigation IP-Based Blacklisting
95 of bots listed in one or more blacklists
Fraction of all spam received
Number of DNSBLs listing this spammer
17Are IP-Based Blacklists Enough?
- Mail Avenger is very aggressive
- Eight different blacklists
- Cloaking techniques complicate detection
- For example, what if a bot could change IP
addresses and remain reachable? - LAN agility
- BGP agility
18BGP Spectrum Agility
- Log IP addresses of SMTP relays
- Join with BGP route advertisements seen at
network where spam trap is co-located.
A small club of persistent players appears to be
using this technique.
Common short-lived prefixes and ASes
61.0.0.0/8 4678 66.0.0.0/8 21562 82.0.0.0/8 8717
10 minutes
Somewhere between 1-10 of all spam (some clearly
intentional, others might be flapping)
19Why Such Big Prefixes?
- Flexibility Client IPs can be scattered
throughout dark space within a large /8 - Same sender usually returns with different IP
addresses - Visibility Route typically wont be filtered
(nice and short)
20The Effectiveness of Blacklisting
95 of bots listed in one or more blacklists
Fraction of all spam received
80 listed on average
Only about half of the IPs spamming from
short-lived BGP are listed in any blacklist
Number of DNSBLs listing this spammer
Spam from IP-agile senders tend to be listed in
fewer blacklists
21Mitigation Counter-Intelligence
- Botmasters advertise spamming bots for which bots
are not listed in any blacklist. - Insight Someone must be looking up the bots!
- Can we fish out these DNSBL reconnaissance
queries and identify subjects/targets as suspect?
22Legit Queries vs. Reconnaissance
- Legitimate queriers are also the targets of
queries
- Reconnaissance queriers are ususally not queried
themselves
DNS-BasedBlacklist
DNS-BasedBlacklist
lookup mx.b.com
lookupmx.a.com
Legit Mail Server Amx.a.com
Legit Mail Server Bmx.b.com
email to mx.b.com
email to mx.a.com
Reconnaissance host
23Measurement Approach
- Log Spamhaus queries
- Construct querier/queried graph
- Prune graph only nodes in the Bobax trace
- Examine nodes with high out-degree
- Hypothesis targets of nodes with high out-degree
likely bots
24Whos Doing the Lookups?
- The botmaster, on behalf of the bots
- The bots, on behalf of themselves
- The bots, on behalf of each other
Known bobax drone!
Spam Sinkhole
Implication Use a seed set to bootstrap?
25Mitigation Network Monitoring
- In-network filtering
- Requires the ability to detect botnets
- Question Can we detect botnets by observing
communication structure among hosts? -
Example Migration between command and control
hosts
New type of problem essentially coupon
collectionHow good are current traffic sampling
techniques at exposing these patterns?
26Experimental Setup
27(Preliminary) Results
Feasible sampling rates
Conventional sampling techniques are not
well-suited to collecting conversations
28Lessons
- Two critical pieces of the puzzle
- Botnet detection Need better monitoring
techniques - Routing security
- Clean-slate wish list
- Better notions of identity
- More agile monitoring/sampling techniques