15-446%20Networked%20Systems%20Practicum - PowerPoint PPT Presentation

About This Presentation
Title:

15-446%20Networked%20Systems%20Practicum

Description:

15-446 Networked Systems Practicum Lecture 14 Worms/Viruses/Botnets* – PowerPoint PPT presentation

Number of Views:148
Avg rating:3.0/5.0
Slides: 92
Provided by: Campu177
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: 15-446%20Networked%20Systems%20Practicum


1
15-446 Networked Systems Practicum
  • Lecture 14 Worms/Viruses/Botnets

2
Outline
  • Worms
  • Worm Defense
  • Botnet/Viruses

3
What is a Computer Worm?
  • Self replicating network program
  • Exploit vulnerabilities to infect remote machines
  • Victim machines continue to propagate infection
  • Three main stages
  • Detect new targets
  • Attempt to infect new targets
  • Activate code on victim machine
  • Difference w/ computer virus?
  • No human intervention required

4
Why Worry About Worms?
  • Speed
  • Much faster than viruses
  • CRv2 14 hours for 359.000 victims
  • Slammer 10 minutes for 75.000 victims
  • Faster than human reaction
  • Highly malicious payloads
  • DDoS or data corruption

5
Some Major Worms
Worm Year Strategy Victims Other Notes
Morris 1988 Topological scanning 6K First major autonomous worm
Code Red 2001 Random scanning 300K First recent "fast" worm
Nimda 2001 Local scanning 200K Local subnet scanning Effective mix of techniques
Slammer 2003 Random scanning gt75K Spread worldwide in 10 minutes
MyDoom 2004 Topological scanning lt15K First Zero Day Worm
Conficker 2008 Random scanning gt15M? Largest infection, capability of updates
6
Threat Model
  • Traditional
  • High-value targets
  • Insider threats
  • Worms Botnets
  • Automated attack of millions of targets
  • Value in aggregate, not individual systems
  • Threats Software vulnerabilities naïve users

7
... and it's profitable
  • Botnets used for
  • Spam (and more spam)?
  • Credit card theft
  • DDoS extortion
  • Flourishing Exchange market
  • Spam proxying 3-10 cents/host/week
  • 25k botnets 40k - 130k/year
  • Also for stolen account compromised machines,
    credit cards, identities, etc. (be worried)?

8
Why is this problem hard?
  • Monoculture little genetic diversity in hosts
  • Instantaneous transmission Almost entire
    network within 500ms
  • Slow immune response human scales (10x-1Mx
    slower!)?
  • Poor hygiene Out of date / misconfigured
    systems naïve users
  • Intelligent designer ... of pathogens
  • Near-Anonymitity

9
Code Red I v1
  • July 12th, 2001
  • Exploited a known vulnerability in Microsofts
    Internet Information Server (IIS)
  • Buffer overflow in a rarely used URL decoding
    routine published June 18th
  • 1st 19th of each month attempts to spread
  • Random scanning of IP address space
  • 99 propagation threads, 100th defaced pages on
    server
  • Static random number generator seed
  • Every worm copy scans the same set of addresses
  • ? Linear growth

10
Code Red I v1
  • 20th 28th of each month attacks
  • DDOS attack against 198.137.240.91
    (www.whitehouse.gov)
  • Memory resident rebooting the system removes
    the worm
  • However, could quickly be reinfected

11
Code Red I v2
  • July 19th, 2001
  • Largely same codebase same author?
  • Ends website defacements
  • Fixes random number generator seeding bug
  • Scanned address space grew exponentially
  • 359,000 hosts infected in 14 hours
  • Compromised almost all vulnerable IIS servers on
    internet

12
Analysis of Code Red I v2
  • Random Constant Spread model
  • Constants
  • N total number of vulnerable machines
  • K initial compromise rate, per hour
  • T Time at which incident happens
  • Variables
  • a proportion of vulnerable machines compromised
  • t time in hours

13
Analysis of Code Red I v2
  • N total number of vulnerable machines
  • K initial compromise rate, per hour
  • T Time at which incident happens
  • Variables
  • a proportion of vulnerable machines compromised
  • t time in hours

Logistic equation Rate of growth of epidemic in
finite systems when all entities have an equal
likelihood of infecting any other entity
14
Code Red I v2 Plot
  • K 1.8
  • T 11.9

Hourly probe rate data for inbound port 80 at the
Chemical Abstracts Service during the initial
outbreak of Code Red I on July 19th, 2001.
15
Improvements Localized scanning
  • Observation Density of vulnerable hosts in IP
    address space is not uniform
  • Idea Bias scanning towards local network
  • Used in CodeRed II
  • P0.50 Choose address from local class-A network
    (/8)
  • P0.38 Choose address from local class-B network
    (/16)
  • P0.12 Choose random address
  • Allows worm to spread more quickly

16
Code Red II (August 2001)
  • Began August 4th, 2001
  • Exploit Microsoft IIS webservers (buffer
    overflow)
  • Named Code Red II because
  • It contained a comment stating so. However the
    codebase was new.
  • Infected IIS on windows 2000 successfully
    but caused system crash on windows NT.
  • Installed a root backdoor on the infected
    machine.

17
Improvements Multi-vector
  • Idea Use multiple propagation methods
    simultaneously
  • Example Nimda
  • IIS vulnerability
  • Bulk e-mails
  • Open network shares
  • Defaced web pages
  • Code Red II backdoor

18
Better Worms Hit-list Scanning
  • Worm takes a long time to get off the ground
  • Worm author collects a list of, say, 10000
    vulnerable machines
  • Worm initially attempts to infect these hosts

19
How to build Hit-List
  • Stealthy randomized scan over number of months
  • Distributed scanning via botnet
  • DNS searches e.g. assemble domain list, search
    for IP address of mail server in MX records
  • Web crawling spider similar to search engines
  • Public surveys e.g. Netcraft
  • Listening for announcements e.g. vulnerable IIS
    servers during Code Red I

20
Better Worms Permutation scanning
  • Problem Many addresses are scanned multiple
    times
  • Idea Generate random permutation of all IP
    addresses, scan in order
  • Hit-list hosts start at their own position in the
    permutation
  • When an infected host is found, restart at a
    random point
  • Can be combined with divide-and-conquer approach

21
Warhol Worm
  • Simulation shows that employing the two previous
    techniques, can attack 300,000 hosts in less than
    15 minutes
  • Conventional 10 scans/sec
  • Fast Scanning 100 scans/sec
  • Warhol 100 scans/sec,
  • Permutation scanning and 10,000 entry hit list

22
More on Warhol worm
23
Flash worms
  • A flash worm would start with a hit list that
    contains most/all vulnerable hosts
  • Realistic scenario
  • Complete scan takes 2h with an OC-12
  • Internet warfare?
  • Problem Size of the hit list
  • 9 million hosts ? 36 MB
  • Compression works 7.5MB
  • Can be sent over a 256kbps DSL link in 3 seconds
  • Extremely fast
  • Full infection in tens of seconds!

24
Surreptitious worms
  • Idea Hide worms in inconspicuous traffic to
    avoid detection
  • Leverage P2P systems?
  • High node degree
  • Lots of traffic to hide in
  • Proprietary protocols
  • Homogeneous software
  • Immense size (30,000,000 Kazaa downloads!)

25
Example Outbreak SQL Slammer (2003)
  • Single, small UDP packet exploit (376 b)?
  • First 1min classic random scanning
  • Doubles of infected hosts every 8.5sec
  • (In comparison Code Red doubled in 40min)?
  • After 1min, starts to saturate access b/w
  • Interferes with itself, so it slows down
  • By this point, was sending 20M pps
  • Peak of 55 million IP scans/sec _at_ 3min
  • 90 of Internet scanned in lt 10mins
  • Infected 100k or more hosts

26
Stuxnet Worm
  • The first worm for control systems
  • Discovered in June 2010
  • Attack SCADA systems using Siemens WinCC/PCS 7
    software
  • Not only spying but also reprogram programmable
    logic controllers (PLCs)
  • Four zero-day attacks used
  • Infection includes Iran (62K) and China (6M?)
  • Nation-wide support cyberwarefare?

27
Prevention
  • Get rid of the or permute vulnerabilities
  • (e.g., address space randomization)
  • makes it harder to compromise
  • Block traffic (firewalls)
  • only takes one vulnerable computer wandering
    between in out or multi-homed, etc.
  • Keep vulnerable hosts off network
  • incomplete vuln. databases 0-day worms
  • Slow down scan rate
  • Allow hosts limited of new contacts/sec.
  • Can slow worms down, but they do still spread
  • Quarantine
  • Detect worm, block it

28
Outline
  • Worms
  • Worm Defense
  • Botnet/Viruses

29
Context
  • Worm Detection
  • Scan detection
  • Honeypots
  • Host based behavioral detection
  • Payload-based

30
Worm Countermeasures
  • Signature-based worm scan filtering
  • Vulnerable to polymorphic worms
  • Scan detection
  • High scanning activity to identify victims
  • Scanning with high failure rate compared to
    legitimate users (DNS)
  • TCP RST, ICMP Unreachable
  • Two dimensions time, space
  • Rate limiting, rate halting
  • False positive (Index crawler, NAT, etc.)
  • Disruption to legitimate services
  • Not applicable to UDP based propagation

31
Worm behavior
  • Content Invariance
  • Limited polymorphism e.g. encryption
  • key portions are invariant e.g. decryption
    routine
  • Content Prevalence
  • invariant portion appear frequently
  • Address Dispersion
  • of infected distinct hosts grow overtime
  • reflecting different source and dest. addresses

32
Signature Inference
  • Content prevalence Autograph, EarlyBird, etc.
  • Assumes some content invariance
  • Pretty reasonable for starters.
  • Goal Identify attack substrings
  • Maximize detection rate
  • Minimize false positive rate

33
Content Sifting
  • For each string w, maintain
  • prevalence(w) Number of times it is found in the
    network traffic
  • sources(w) Number of unique sources
    corresponding to it
  • destinations(w) Number of unique destinations
    corresponding to it
  • If thresholds exceeded, then block(w)

34
Issues
  • How to compute prevalence(w), sources(w) and
    destinations(w) efficiently?
  • Scalable
  • Low memory and CPU requirements
  • Real time deployment over a Gigabit link

35
Estimating Content Prevalence
  • Tablepayload
  • 1 GB table filled in 10 seconds
  • Tablehashpayload
  • 1 GB table filled in 4 minutes
  • Tracking millions of ants to track a few
    elephants
  • Collisions...false positives

36
Multistage Filters
stream memory
Array of counters
Hash(Pink)
37
Multistage Filters
packet memory
Array of counters
Hash(Green)
38
Multistage Filters
packet memory
Array of counters
Hash(Green)
39
Multistage Filters
packet memory
40
Multistage Filters
packet memory
Collisions are OK
41
Multistage Filters
Reached threshold
packet memory
packet1 1
Insert
42
Multistage Filters
packet memory
packet1 1
43
Multistage Filters
packet memory
packet1 1
packet2 1
44
Multistage Filters
packet memory
Stage 1
packet1 1
No false negatives! (guaranteed detection)
45
Conservative Updates
Gray all prior packets
46
Conservative Updates
47
Conservative Updates
48
Value Sampling
  • The problem s-b1 substrings
  • Solution Sample
  • But Random sampling is not good enough
  • Trick Sample only those substrings for which the
    fingerprint matches a certain pattern

49
sources(w) destinations(w)
  • Address Dispersion
  • Counting distinct elements vs. repeating elements
  • Simple list or hash table is too expensive
  • Key Idea Bitmaps
  • Trick Scaled Bitmaps

50
Bitmap counting direct bitmap
Set bits in the bitmap using hash of the flow ID
of incoming packets
HASH(green)10001001
51
Bitmap counting direct bitmap
Different flows have different hash values
HASH(blue)00100100
52
Bitmap counting direct bitmap
Packets from the same flow always hash to the
same bit
HASH(green)10001001
53
Bitmap counting direct bitmap
Collisions OK, estimates compensate for them
HASH(violet)10010101
54
Bitmap counting direct bitmap
HASH(orange)11110011
55
Bitmap counting direct bitmap
HASH(pink)11100000
56
Bitmap counting direct bitmap
As the bitmap fills up, estimates get inaccurate
HASH(yellow)01100011
57
Bitmap counting direct bitmap
Solution use more bits
HASH(green)10001001
58
Bitmap counting direct bitmap
Solution use more bits
Problem memory scales with the number of flows
HASH(blue)00100100
59
Bitmap counting virtual bitmap
Solution a) store only a portion of the bitmap
b) multiply estimate by scaling
factor
60
Bitmap counting virtual bitmap
HASH(pink)11100000
61
Bitmap counting virtual bitmap
Problem estimate inaccurate when few flows active
HASH(yellow)01100011
62
Bitmap counting multiple bmps
Solution use many bitmaps, each accurate
for a different range
63
Bitmap counting multiple bmps
HASH(pink)11100000
64
Bitmap counting multiple bmps
HASH(yellow)01100011
65
Bitmap counting multiple bmps
Use this bitmap to estimate number of flows
66
Bitmap counting multiple bmps
Use this bitmap to estimate number of flows
67
Bitmap counting multires. bmp
Problem must update up to three bitmaps
per packet
Solution combine bitmaps into one
68
Bitmap counting multires. bmp
HASH(pink)11100000
69
Bitmap counting multires. bmp
HASH(yellow)01100011
70
Multiresolution Bitmaps
  • Still too expensive to scale
  • Scaled bitmap
  • Recycles the hash space with too many bits set
  • Adjusts the scaling factor according

71
Scaled Bitmap
  • Idea Subsample the range of hash space
  • How it works?
  • multiple bitmaps each mapped to progressively
    smaller and smaller portions of the hash space.
  • bitmap recycled if necessary.

Result Roughly 5 time less memory actual
estimation of address dispersion
72
Putting It Together
Address Dispersion Table
key src cnt dest cnt




key cnt




Content Prevalence Table
73
Putting It Together
  • Sample frequency 1/64
  • String length 40
  • Use 4 hash functions to update prevalence table
  • Multistage filter reset every 60 seconds

74
Parameter Tuning
  • Prevalence threshold 3
  • Very few signatures repeat
  • Address dispersion threshold
  • 30 sources and 30 destinations
  • Reset every few hours
  • Reduces the number of reported signatures down to
    25,000

75
Parameter Tuning
  • Tradeoff between and speed and accuracy
  • Can detect Slammer in 1 second as opposed to 5
    seconds
  • With 100x more reported signatures

76
False Negatives in EB
  • False Negatives
  • Very hard to prove...
  • Earlybird detected all worm outbreaks reported on
    security lists over 8 months
  • EB detected all worms detected by Snort
    (signature-based IDS)?
  • And some that weren't

77
False Positives in EB
  • Common protocol headers
  • HTTP, SMTP headers
  • p2p protocol headers
  • Non-worm epidemic activity
  • Spam
  • BitTorrent (!)?
  • Solution
  • Small whitelist...

78
Outline
  • Worms
  • Worm Defense
  • Botnet/Viruses

79
... and it's profitable
  • Botnets used for
  • Spam (and more spam)?
  • Credit card theft
  • DDoS extortion
  • Flourishing Exchange market
  • Spam proxying 3-10 cents/host/week
  • 25k botnets 40k - 130k/year
  • Also for stolen account compromised machines,
    credit cards, identities, etc. (be worried)?

80
Botnet
  • A group of zombie computers under the remote
    control of an attacker via a command and control
    (CC) server

81
Botnet Countermeasure
  • Detecting new botnets by using honeypots,
    analyzing spam pools, capturing group activities
    in DNS
  • Sinkholing or nullrouting CC server connections
    and cleaning zombies

82
Outline
  • Worms
  • Worm Defense
  • Botnet/Viruses

83
Malicious Code
  • Many types of malicious code
  • Virus, worm, botnet, spyware, spam, etc.
  • Who writes this and why?
  • Challenge (for fun)
  • Fame (for pride)
  • Business (for money)
  • Black markets for attacks (DDoS and spams) and
    info(credit cards, vulnerabilities)
  • Ideology (for activism)
  • Hactivism, cyberterrorism, cyberwarefare

84
What is a Computer Virus?
  • Program that spreads itself by infecting
    (modifying) an executable file and making copies
    of itself

85
Components
  • Propagation mechanism
  • Sharing infected file with other computers
  • USB drive, email attachment, and shared folders
  • Executing infected file
  • ? Infect other computers and spread infection
  • Trigger
  • Time/condition when payload is activated
  • Payload
  • Damage existing files
  • Extort sensitive information
  • Consume computers resources

86
Infected File
Before
1 Insert document in fax machine. (Program entry-point)
2 Dial the phone number.
3 Hit the SEND button on the fax.
4 Wait for completion. If a problem occurs, go back to step 1.
5 End task.
After
1 Skip to step 6.
2 Dial the phone number.
3 Hit the SEND button on the fax.
4 Wait for completion. If a problem occurs, go back to step 1.
5 End task.
6 VIRUS instructions
7 Insert document in fax machine and go to step 2.
Nachenberg, Computer Virus-Antivirus Coevolution,
CACM 1997
87
Propagation
  • Virus replicates when infected file is executed
  • Task is not entirely automated
  • User makes the first step
  • Virus copies malicious code to other files
  • Jump instruction to malicious code is added
  • Why are Windows-based viruses most prolific?
  • Largest population
  • Why write a virus if only a few people are
    infected?

88
Simple Virus
  • program V
  • goto main
  • 1234567
  • subroutine infect-executable
  • loop
  • file get-random-executable-file
  • if (first-line-of-file 1234567) then goto
    loop
  • else prepend V to file
  • subroutine trigger-pulled
  • subroutine do-damage
  • main
  • infect-executable
  • if trigger-pulled then do-damage
  • goto next
  • next

Stallings, Chapter 7.2
89
Detection
V
  • Infected file has a larger size than initial
    version of file
  • Scanners record files lengths and searches for
    changes
  • Virus can easily bypass detection through
    compression
  • (packing)

P
P
P
V
P
P
90
Detection (contd)
  • Virus signature
  • Same structure and bit pattern for
  • uniquely identifying a virus

New malicious code signatures, Symantec 2010
91
More Advanced Viruses
  • Encrypted viruses
  • Prevent signature to detect virus via
    encryption
  • Polymorphic viruses
  • Change virus code to prevent signature

92
Detection of Encrypted Virus
  • A different encryption key is generated for each
    new infection
  • Therefore, encrypted virus body appears different
    in each infected file
  • Antivirus can no longer parse virus body for the
    virus signatures
  • Still pattern matching possible
  • Still identical copy of decryption routine

93
Detection of Polymorphic Viruses
  • Advanced encrypted virus
  • Formerly, constant decryption routine
  • Now, mutable decryption routine
  • unique crypto code generated for each copy
  • No more signatures in code

94
Generic Decryption (GD) Technology
  • Signatures still present in decrypted code
  • Let the virus do the work for you
  • Emulate code in controlled environment
  • Periodically, scan virtual memory for virus
    signatures

95
GD Limitations
  • How long to emulate?
  • Emulator must include software versions and
    processor hardware
  • 80286 and 80486 CPU may have different machine
    language level
Write a Comment
User Comments (0)
About PowerShow.com