Title: DIRA: Automatic Detection, Identification, and Repair of ControlHijacking Attacks
1DIRA Automatic Detection, Identification, and
Repair of Control-Hijacking Attacks
- Alexey Smirnov and Tzi-cker Chiueh
- SUNY at Stony Brook
- alexey, chiueh_at_cs.sunysb.edu
- DEFCON 13
2Outline of the Talk
- Introduction
- Related Work
- DIRA Architecture
- Attack Detection
- Attack Identification
- Attack Repair
- Performance Evaluation
- Conclusion
3Introduction
- Buffer overflow attacks are the most common type
of attacks. - A comprehensive protection strategy should
consists of the following components - Attack detection to prevent the attack from
causing damage - Attack identification to feed the IDS with the
attack signature - Attack repair to allow the compromised
application to continue its normal execution. - We propose a compile-time solution that provides
all three components.
4What is a Buffer Overflow Attack
- Control-hijacking attacks work by overwriting a
control pointer such as the return address,
function pointer, etc. - Buffer overflows are possible when the length of
the target buffer is less than the length of the
data that can be written into it. - Standard libc functions such as strcpy() or
sprintf() are responsible for most buffer
overflows.
5Outline of the Talk
- Introduction
- Related Work
- DIRA Architecture
- Attack Detection
- Attack Identification
- Attack Repair
- Performance Evaluation
- Conclusion
6Attack Detection
- Stackguard place a canary word before the
return address (RA) in the function prolog and
check it in the function epilog. The assumption
is that the attacker will have to overwrite the
canary word in order to overwrite the RA. - RAD save the original RA in a safe place in the
function prolog and compare it to the value
stored in the stack in the function epilog.
7Approaches to Attack Identification
- Automatic ways to identify attacks (that is, to
generate their signatures) are very important for
worm epidemics confinement. - Previous systems either provided a single
attacking packet or required a large pool of
malicious network data. - Toth and Kruegel look at network packets
payloads and perform abstract code execution. - TaintCheck uses the value of compromised
control pointer as the attack signature. - Autograph extracts most common subsequences
from suspicious flows and reports them as
signatures. - Polygraph and Nemean use machine learning
algorithms to derive common patterns from a large
set of malicious flows.
8Approaches to Attack Repair
- Program rollback and replay is used in software
debugging. Two approaches (1) keep execution
history (Spyder) or (2) do periodic state
check-pointing. Check-pointing is easy under
Linux because of copy-on-write fork() system call
(RECAP and Flashback). Can be more difficult
under other OS. - Check-pointing relies on the OS rather than on
the applications. - Shadow Honeypot runs two versions of the
application (protected and non-protected) and
dynamically switches between the two once an
attack has been detected.
9Outline of the Talk
- Introduction
- Related Work
- DIRA Architecture
- Attack Detection
- Attack Identification
- Attack Repair
- Performance Evaluation
- Conclusion
10DIRA Approach
- DIRA is an extension to GCC 3.4.1. It uses memory
updates logging to solve the three problems at
the same time. - The idea is to maintain a run-time log of all
changes to the memory state of the program. - Assignments such as ab and libc function calls
such as memcpy() change the memory state of the
program. - For each memory update DIRA stores its source
address, destination address, length, and the
pre-image.
11DIRA Approach
- How to detect, identify, and repair an attack
using memory updates log? - To detect compare the current RA with that saved
in the log - To identify trace back the data that replaced
the control pointer to the point where it was
read from the network - To repair restore the memory state using the
pre-images stored in the log. - At compile time, DIRA instruments the source code
to perform logging and to check correctness of
control pointers. - At run-time, the logging code generates the
memory updates log.
12Memory Updates Logging
- Memory updates log is a circular buffer each
entry has four fields read_addr, write_addr,
len, data. - DIRA logs effect of each operation of the form
XY where X and Y are directly referenced
variables, array references (ai), or
de-referenced variables ((a1)). - read_addr is set to Y,
- write_addr is set to X,
- len is set to sizeof(Y),
- data is set to the pre-image of X in DIR mode and
is empty in other modes.
13Memory Updates Logging
- If the right-hand side is a complex expression
then a log record is created for each variable of
it. - To handle updates performed by libc functions
DIRA proxies several of them string manipulation
functions, format string functions, file and
network I/O functions - The log is also used to store tags, special
records indicating change of programs run-time
state - FUNCTION_ENTRY tag is inserted when a function is
called - FUNCTION_EXIT tag is inserted before a function
returns. - Tags are used for signature generation and repair.
14Memory Updates Logging Example
- At compile time
- Source code xyz
- Instrumented code
- (log(x, y, sizeof(y), x), (log(x, z,
sizeof(z), x), xyz)) - At run time log() adds two records to the memory
updates log - read_addr y write_addr x len sizeof(y)
data x - read_addr z write_addr x len sizeof(z)
data x
15Memory Updates Logging Example
- At compile time
- Source code strcpy(a,b)
- Instrumented code dira_strcpy(a,b)
- At run time
- Proxy function dira_strcpy() adds a log record
- read_addrb, write_addra, lenstrlen(b)1,
dataa
16Attack Detection (D-mode)
- DIRA uses RAD-like approach the code to save the
RA in a protected buffer is added to the function
prolog. The actual RA stored in the stack is
compared with this value in function epilog.
Using a special buffer to store RAs is an
optimization of using a common memory update log
to store RAs. - DIRA can protect other control-sensitive data
structures such as GOT, signal handler tables in
a similar fashion (not implemented yet).
17Attack Identification
- The desired properties of an attack signature
- Context-aware (to reduce false positives)
- Semantics-aware (to reduce false positives)
- Provides a degree of flexibility within each
packet (to reduce false negatives) - DIRAs signatures consist of multiple packets,
each packet is a regular expression. The length
constraint limits the length of the attacking
part of the last packet. - Memory updates log is used to build attack
signatures.
18Attack Identification
- Two types of dependencies data and control
dependencies. - A data dependency is created when one variable is
assigned to another. - A control dependency is created between variable
X and variable Y if value of variable Y depends
on the value of variable X used in a conditional
expression. Example - if (xgt0)
- y1
- else
- y2
- Why we need control dependencies? Example FTP
server attack involving authentication.
19Vulnerable FTP Server Example
- A vulnerable FTP server pseudo-code
- char buf16
- Is_authis_user0 // user not authenticated
initially - while (1)
- recv_packet(p)
- if (!strncmp(p, QUIT,4)) break
- if (!strncmp(p, USER, 4)) is_user1
continue - if (!strncmp(p, PASS, 4) is_user)
is_auth1 continue - if (!is_auth) continue // authentication
required - if (!strncmp(p, GET, 3))
- strcpy(buf, p4) // copy filename
- send_file(buf)
-
20FTP Server Attack
- FTP server GET attack (3 packets)
- USER alexey
- PASS my_pass
- GET very_long_file_name_that_will_overwrite_the_re
turn_address
21FTP Server Attack
- FTP server GET attack (3 packets)
- USER alexey
- PASS my_pass
- GET very_long_file_name_that_will_overwrite_the_re
turn_address - Log records
- ltDIRA_RECV, p, 11, USER alexeygt
- ltDIRA_STRNCMP, p, 4, NULLgt
- ltDIRA_COND, p, 0, NULLgt
- ltNULL, is_user, 4, is_usergt
22FTP Server Attack
- FTP server GET attack (3 packets)
- USER alexey
- PASS my_pass
- GET very_long_file_name_that_will_overwrite_the_re
turn_address - Log records
- ltDIRA_RECV, p, 11, USER alexeygt
- ltDIRA_STRNCMP, p, 4, NULLgt
- ltDIRA_COND, p, 0, NULLgt
- ltNULL, is_user, 4, is_usergt
- ltDIRA_RECV, p, 12, PASS my_passgt
- ltDIRA_STRNCMP, p, 4, NULLgt
- ltDIRA_COND, p, 0, NULLgt
- ltDIRA_COND, is_user, 0, NULLgt
- ltNULL, is_auth, 4, is_authgt
23FTP Server Attack
- FTP server GET attack (3 packets)
- USER alexey
- PASS my_pass
- GET very_long_file_name_that_will_overwrite_the_re
turn_address - Log records (third packet)
- ltDIRA_RECV, p, 62, GET gt
- ltDIRA_COND, is_auth, 0, NULLgt
- ltDIRA_STRNCMP, p, 3, NULLgt
- ltDIRA_COND, p, 0, NULLgt
- ltp4, buf, strlen(p)-41, (p4)gt
24FTP Server Attack
- The return address (RA) is located after buf
RAbuf17. - ltDIRA_RECV, p, 11, USER alexeygt
- ltDIRA_STRNCMP, p, 4, NULLgt
- ltDIRA_COND, p, 0, NULLgt
- ltNULL, is_user, 4, is_usergt
- ltDIRA_RECV, p, 12, PASS my_passgt
- ltDIRA_STRNCMP, p, 4, NULLgt
- ltDIRA_COND, p, 0, NULLgt
- ltDIRA_COND, is_user, 0, NULLgt
- ltNULL, is_auth, 4, is_authgt
- ltDIRA_RECV, p, 62, GET gt
- ltDIRA_COND, is_auth, 0, NULLgt
- ltDIRA_STRNCMP, p, 3, NULLgt
- ltDIRA_COND, p, 0, NULLgt
- ltp4, buf, strlen(p)-41, (p4)gt
25Identifying Attack Using Data Dependencies
- The return address (RA) is located after buf
RAbuf17. - ltDIRA_RECV, p, 11, USER alexeygt
- ltDIRA_STRNCMP, p, 4, NULLgt
- ltDIRA_COND, p, 0, NULLgt
- ltNULL, is_user, 4, is_usergt
- ltDIRA_RECV, p, 12, PASS my_passgt
- ltDIRA_STRNCMP, p, 4, NULLgt
- ltDIRA_COND, p, 0, NULLgt
- ltDIRA_COND, is_user, 0, NULLgt
- ltNULL, is_auth, 4, is_authgt
- ltDIRA_RECV, p, 62, GET gt
- ltDIRA_COND, is_auth, 0, NULLgt
- ltDIRA_STRNCMP, p, 3, NULLgt
- ltDIRA_COND, p, 0, NULLgt
- ltp4, buf, strlen(p)-41, (p4)gt
26Identifying More Packets Using Control
Dependencies
- The return address (RA) is located after buf
RAbuf17. - ltDIRA_RECV, p, 11, USER alexeygt
- ltDIRA_STRNCMP, p, 4, NULLgt
- ltDIRA_COND, p, 0, NULLgt
- ltNULL, is_user, 4, is_usergt
- ltDIRA_RECV, p, 12, PASS my_passgt
- ltDIRA_STRNCMP, p, 4, NULLgt
- ltDIRA_COND, p, 0, NULLgt
- ltDIRA_COND, is_user, 0, NULLgt
- ltNULL, is_auth, 4, is_authgt
- ltDIRA_RECV, p, 62, GET gt
- ltDIRA_COND, is_auth, 0, NULLgt
- ltDIRA_STRNCMP, p, 3, NULLgt
- ltDIRA_COND, p, 0, NULLgt
- ltp4, buf, strlen(p)-41, (p4)gt
27Identifying More Packets Using Control
Dependencies
- The return address (RA) is located after buf
RAbuf17. - ltDIRA_RECV, p, 11, USER alexeygt
- ltDIRA_STRNCMP, p, 4, NULLgt
- ltDIRA_COND, p, 0, NULLgt
- ltNULL, is_user, 4, is_usergt
- ltDIRA_RECV, p, 12, PASS my_passgt
- ltDIRA_STRNCMP, p, 4, NULLgt
- ltDIRA_COND, p, 0, NULLgt
- ltDIRA_COND, is_user, 0, NULLgt
- ltNULL, is_auth, 4, is_authgt
- ltDIRA_RECV, p, 62, GET gt
- ltDIRA_COND, is_auth, 0, NULLgt
- ltDIRA_STRNCMP, p, 3, NULLgt
- ltDIRA_COND, p, 0, NULLgt
- ltp4, buf, strlen(p)-41, (p4)gt
28Identifying More Packets Using Control
Dependencies
- The return address (RA) is located after buf
RAbuf17. - ltDIRA_RECV, p, 11, USER alexeygt
- ltDIRA_STRNCMP, p, 4, NULLgt
- ltDIRA_COND, p, 0, NULLgt
- ltNULL, is_user, 4, is_usergt
- ltDIRA_RECV, p, 12, PASS my_passgt
- ltDIRA_STRNCMP, p, 4, NULLgt
- ltDIRA_COND, p, 0, NULLgt
- ltDIRA_COND, is_user, 0, NULLgt
- ltNULL, is_auth, 4, is_authgt
- ltDIRA_RECV, p, 62, GET gt
- ltDIRA_COND, is_auth, 0, NULLgt
- ltDIRA_STRNCMP, p, 3, NULLgt
- ltDIRA_COND, p, 0, NULLgt
- ltp4, buf, strlen(p)-41, (p4)gt
29Identifying More Packets Using Control
Dependencies
- The return address (RA) is located after buf
RAbuf17. - ltDIRA_RECV, p, 11, USER alexeygt
- ltDIRA_STRNCMP, p, 4, NULLgt
- ltDIRA_COND, p, 0, NULLgt
- ltNULL, is_user, 4, is_usergt
- ltDIRA_RECV, p, 12, PASS my_passgt
- ltDIRA_STRNCMP, p, 4, NULLgt
- ltDIRA_COND, p, 0, NULLgt
- ltDIRA_COND, is_user, 0, NULLgt
- ltNULL, is_auth, 4, is_authgt
- ltDIRA_RECV, p, 62, GET gt
- ltDIRA_COND, is_auth, 0, NULLgt
- ltDIRA_STRNCMP, p, 3, NULLgt
- ltDIRA_COND, p, 0, NULLgt
- ltp4, buf, strlen(p)-41, (p4)gt
30Identifying More Packets Using Control
Dependencies
- The return address (RA) is located after buf
RAbuf17. - ltDIRA_RECV, p, 11, USER alexeygt
- ltDIRA_STRNCMP, p, 4, NULLgt
- ltDIRA_COND, p, 0, NULLgt
- ltNULL, is_user, 4, is_usergt
- ltDIRA_RECV, p, 12, PASS my_passgt
- ltDIRA_STRNCMP, p, 4, NULLgt
- ltDIRA_COND, p, 0, NULLgt
- ltDIRA_COND, is_user, 0, NULLgt
- ltNULL, is_auth, 4, is_authgt
- ltDIRA_RECV, p, 62, GET gt
- ltDIRA_COND, is_auth, 0, NULLgt
- ltDIRA_STRNCMP, p, 3, NULLgt
- ltDIRA_COND, p, 0, NULLgt
- ltp4, buf, strlen(p)-41, (p4)gt
31Definition of Control Dependencies
- Whenever variable X can prevent control flow from
reaching variable Y, a control dependency is
created between X and Y. - stmt1 and stmt2 are always dependent.
- Control dependencies are also created for for and
while. Tags START_SCOPE and END_SCOPE are used to
store control dependencies in the memory updates
log.
32Representing Packets as Regular Expressions
- For each byte of the attacking packet DIRA
determines whether it was looked at by the
program or not looked at. For example, strcmp()
applied to the packet bytes converts them into
looked-at bytes. If the bytes are blindly copied
with strcpy() then they are non-looked-at.
Initially all bytes are not-looked-at. - DIRA traverses the log forward from where the
packets were received and records all packet
bytes that were looked at. - When it outputs the bytes, a looked-at byte is
output as is, a non-looked-at is output as ?.
33Building Regular Expressions
- ltDIRA_RECV, p, 11, USER alexeygt
- ltDIRA_STRNCMP, p, 4, NULLgt
- ltDIRA_COND, p, 0, NULLgt
- ltNULL, is_user, 4, is_usergt
- ltDIRA_RECV, p, 12, PASS my_passgt
- ltDIRA_STRNCMP, p, 4, NULLgt
- ltDIRA_COND, p, 0, NULLgt
- ltDIRA_COND, is_user, 0, NULLgt
- ltNULL, is_auth, 4, is_authgt
- ltDIRA_RECV, p, 62, GET gt
- ltDIRA_COND, is_auth, 0, NULLgt
- ltDIRA_STRNCMP, p, 3, NULLgt
- ltDIRA_COND, p, 0, NULLgt
- ltp4, buf, strlen(p)-41, (p4)gt
34Building Regular Expressions
- ltDIRA_RECV, p, 11, USER alexeygt
- ltDIRA_STRNCMP, p, 4, NULLgt
- ltDIRA_COND, p, 0, NULLgt
- ltNULL, is_user, 4, is_usergt
- ltDIRA_RECV, p, 12, PASS my_passgt
- ltDIRA_STRNCMP, p, 4, NULLgt
- ltDIRA_COND, p, 0, NULLgt
- ltDIRA_COND, is_user, 0, NULLgt
- ltNULL, is_auth, 4, is_authgt
- ltDIRA_RECV, p, 62, GET gt
- ltDIRA_COND, is_auth, 0, NULLgt
- ltDIRA_STRNCMP, p, 3, NULLgt
- ltDIRA_COND, p, 0, NULLgt
- ltp4, buf, strlen(p)-41, (p4)gt
35Building Regular Expressions
- ltDIRA_RECV, p, 11, USER alexeygt
- ltDIRA_STRNCMP, p, 4, NULLgt
- ltDIRA_COND, p, 0, NULLgt
- ltNULL, is_user, 4, is_usergt
- ltDIRA_RECV, p, 12, PASS my_passgt
- ltDIRA_STRNCMP, p, 4, NULLgt
- ltDIRA_COND, p, 0, NULLgt
- ltDIRA_COND, is_user, 0, NULLgt
- ltNULL, is_auth, 4, is_authgt
- ltDIRA_RECV, p, 62, GET gt
- ltDIRA_COND, is_auth, 0, NULLgt
- ltDIRA_STRNCMP, p, 3, NULLgt
- ltDIRA_COND, p, 0, NULLgt
- ltp4, buf, strlen(p)-41, (p4)gt
36Length Constraint Generation
- The length constraint limits the attacking part
of the packet by specifying the terminating
character and its maximum offset in any benign
packet.
37DIRAs Signature File Format
- N number of packets
- L_i length of i-th packet
- Regular expression of the packet. Possible
characters are shown on the right
- The length constraint is specified for the last
attacking packet.
38Complete Signature for FTP Attack
- 3 number of packets
- 11 1st packet length
- USER???????
- 12 2nd packet length
- PASS????????
- 62 3rd packet length
- GET???...???
- 4 17 \0 length constraint
39Attack Recovery (DIR-mode)
- Main goal bring the program to the state in
which it was before the attack packet(s) was
received. - How to restore the pre-attack state?
- From which point to continue execution?
- Program restart points can only be at the
beginning of a function because only global
updates are logged in DIR mode (for performance
reasons). - The proper function is the least common dynamic
ancestor of the function in which the attack was
detected and the function in which the data was
read in.
40Choosing the Restart Point
- depth is a loop invariant it is the relative
depth of the current function with respect to the
greatest dynamic ancestor seen so far.
41Choosing the Restart Point
- When all updates are tracked it is possible to
resume execution from the middle of a function. - No system support is required for restarting
longjmp() and setjmp() are used. A setjmp() call
is inserted before the function that can be a
potential restart point is called (to push the
arguments again). - DIRA inserts the first local update tag when it
encounters such an update after a function call.
42Outline of the Talk
- Introduction
- Related Work
- DIRA Architecture
- Attack Detection
- Attack Identification
- Attack Repair
- Performance Evaluation
- Conclusion
43DIRA Evaluation
- Programs tested
- ghttpd 1.4 have exploit
- drcatd 0.5.0 have exploit
- named 8.1 have exploit
- qpopper 4.0.4
- proftpd 1.2.9
- Two goals measure run-time overhead and quality
of automatically generated signatures - Configuration server machine (P-4M 1.7GHz, 512
MB RAM), two clients (Athlon 1.7GHz, 512 MB RAM). - Used exploit programs from securiteam.com and
insecure.org.
44Run-time Overhead
- The following two graphs show run-time overhead
for programs compiled in DIR-mode
45Signature Generation
- Signatures were produced for all programs that we
had exploits for. ghttpd signature specifies
length constraint using terminating character
named signature specifies maximum value of the
length field. The drcatd signature has three
packets in it login, password, and the attacking
packet
46Is Recovery Really Useful?
- Recovery incurs significant overhead. Is it
really better than just terminating the
application? Yes, because - Terminating a single-threaded program disconnects
all clients. - Same tradeoff exists in the case of source-code
checking tools using them requires developers
time investment and we can always use Stackguard
instead to protect the programs.
47Outline of the Talk
- Introduction
- Related Work
- DIRA Architecture
- Attack Detection
- Attack Identification
- Attack Repair
- Performance Evaluation
- Conclusion
48Conclusion
- DIRA solves the problems of attack detection,
identification, and repair in a unified way. - It produces accurate multi-packet signatures from
a single attack instance. - Dynamic slicing of the memory updates log is the
underlying technique. - Same technique can be used for automatic patch
generation our future work.
49Questions? http//www.ecsl.cs.sunysb.edu/dira