Title: Internet Intrusions: Global Characteristics and Prevalence
1Internet Intrusions Global Characteristics and
Prevalence
Using slides from Vinod Yegneswarans
presentation at SIGMETRICS 2003
2Overview
- Data Sources
- Intrusion Characteristics
- Port and source Distribution
- Projection to the global address space
- Implications of Shared Information
- Does information sharing help?
- How much information is needed?
3Goals
- This papers aims to
- Show the volume of intrusions attempts
- Show the distribution of intrusions
- In terms of both source and victim
- Show the impact of various scan types
- Expand findings to the global scope
4Data Sources
- To extend the findings to the global scope, the
data must - Come from many ASes
- Be spread both geographically and over the IP
address space
5DSHIELD
- http//www.dshield.org (part of SANS Institute)
- Firewall / NIDS logs, 1600 networks
- BlackIce Defender, CISCO PIX Firewall, IP chains
- Snort, Zonealarm Pro, Portsentry
- 4 months (aug 2001, may-july 2002)
- 60 million scans, 375K dest IPs per month
- 5 Class B, 45 Class C, many others
6DSHIELD Data
- Lowest common denominator approach
- simplicity, diversity, unbiased
- Pitfalls
- packet headers, active connection info
- flooding
- intentional, misconfiguration (broadcast,
half-life) - Spoofed sources
7DSHIELD
- Red dots represent participating ASes
- Grey lines demonstrate connectivity between ASes
- Dots closer to the center indicate ASes closer
to the internet backbone
8Worms
- Code-red I
- July 12, 2001, 2 phase attack, random propagation
- Code-red II
- Aug 4, 2001, local-random propagation
- Nimda
- Sep 18, 2001, local-random propagation
- SQL-snake
- May 2002, port 1433, random propagation
- email passwords and sysinfo ixltd_at_postone.com
9Scan Types
- Vertical Scan
- Multiple ports on 1 victim by 1 source
- Horizontal Scan
- 1 port on multiple victims by 1 source
- Coordinated Scans
- Multiple sources aimed at a /24 space
- Stealth Scans
- Horizontal or vertical
- Characterized by a very low frequency
10Intrusion Characteristics
- Port Distribution
- Monitor the destination port for intrusion
attempts - Source Distribution
- Look for trends in the source address associated
with intrusions - Group intrusions into port 80, port 1433, and
non-worm scans
11Port Distribution
12Source Distribution
port 80 port 1433
non-worm (June 2002) (June 2002)
(June 2002)
13Persistence of Worm Activity
- 3 months data May-July 2002 (CDF)
- Half life 18 days (/24), 6 hours (/32)
14Date Characteristics
Code Red 1 was still very much alive!!
15Top Sources
- Mainly applies to non-worm scans
- Results will show that only a few sources are
responsible for a significant amount of the scans - Zipf Distribution
- Argument for a blacklist
16Top Sources
- Zipf distribution (power law)
- CDF (source IP rank vs num scans log-log scale)
17Top Sources
- May 2002 scan volume overall vs top 100
sources - Top 100 sources account for 50 of all scans in
any month
18Source Coordination
- Aug 2001 8 of the top 20 sources display
identical ON/OFF behavior - Such clusters common among top 20 sources of
all 4 months! - All sources scan more than 5 distinct /16s.
19Source Coordination
- May 2002 ON/OFF pattern (4 out of top 20
sources) - Staggering behavior (identical attack or attack
tool)
20Identification of Scan Types
- Still look at only non-worm scans
- Horizontal scans make up the majority of the
scans - More vertical scan episodes
- Surprisingly high number of coordinated scans
- Stealth scans occur much less frequently, but are
usually vertical scans
21Scan TypesNumber of Scans
22Scan TypesNumber of Episodes
23Global Projections
- Question How has the scanning trend changed over
the past year? - Must extend the data to the entire internet
- Simply average the data and multiply by 232
- Possible because data comes from a broad range of
sources
24Projection of Port 80 Scans
- Port 80 scans show a decreasing trend
- biased by release of CR I/II
- May-july 2002 relatively steady with small
upward slope
25Projection of Non-worm Scans
- Projection (avg scan per IP) num IPs
- similar projections for /24 and /16 aggregates
- 25B scans / day
26Implications of Shared Information
- Many have looked to pool resources
- Do not identify speed of attacks
- Can gain a view of trends in attacks, though
27Information Theoretic Approach
- Relative Entropy measure of the distributional
similarity between two variables - Marginal Utility amount of information gained
by adding more samples
28Information Theoretic Approach
- Goal how much does adding intrusion logs
improve the resolution of identifying worst
offenders - Can be measured using marginal utility
- Number of experiments is the number of logs
identified
29Evaluation of Marginal Utility Approach
- Use 100 /16s and 100 /24s from the total data
sets - Chosen at random
- Received promising results about the amount
gained from adding more data sets
30Marginal Utility for Worst Offenders
- Random day, 100 random /16s and /24s
- Diminished returns after 40 /16s and 50 /24s
31Marginal Utility for Detecting Target Ports
- Random day, 100 random /16s and /24s
- Diminished returns after 40 nodes.
32Conclusion
- A lot of scanning directed away from port 80
- 25B scans per day, 25 non port 80
- A set of worst offenders does exist who are
responsible for a lot of the scanning - Combining data from multiple sites gives more
information - Data from larger sites is more useful
33Backup for discussion
- Data bias
- Different platforms BlackIce Defender, CISCO
PIX, ZoneAlarm, Linux IPchains, Portsentry and
Snort - 1600 firewall/NIDS across geography and IP space
34Internet Intrusion vs. Scan
- Scan is the most common and versatile type of
intrusion - Normally, before compromising hackers need to use
scan to find out venerability - From scans we can know the attempts from hackers
35spoof bounce
- Up to now, not widely used
- Although we cannot track where you send the scan
packet but still can track the receiver or
sensor. - Known existing tools Idlescan
36projection of whole Internet
- Pretty rough but should work
- The set of provider networks are reasonably well
distributed (both geographically and over the IP
space) - Using the routable IP space from BGP table should
be a better plan.
37Information sharing vs. privacy
- What shared are scanning attempts, which may be
malicious, so share them normally wont hurt
peoples privacy. - We also may build in BGP like policy control into
information sharing.
38scan episodes
- The scans sent by one attacker
39100 16's and 100 24s
- DSHIELD Data set 5 Class B, 45 Class C, many
others - Here the 100 16s is 100 /16 prefix, although
only 5 is full. - Same thing for 100 24s
40Scan Speed
- Stealth scan
- Internal between scans should less 180seconds.
- horizontal scans and vertical scans
- 1 hour is the upper bound
- Normal time interval is much less.
41Service Distribution of Scans