Title: Enhancing Security of Real-World Systems with a Better Understanding of Threats
1Enhancing Security of Real-World Systems with a
Better Understanding of Threats
- Shuo Chen
- Ph.D. Candidate in Computer Science
- Center for Reliable and High Performance
Computing - University of Illinois at Urbana-Champaign
2My Dissertation
- Security Threat Analysis and Mitigations in
Real-World Systems - How errors in hardware and software impose
security threats to real-world systems? (common
characteristics?) - How effective are current defense techniques?
(substantial deficiencies?) - How to build better defenses?
- Analysis-centric research approach
- Study hardware memory errors ? impact on system
security - Software vulnerabilities reported in Bugtraq and
CERT databases, source code of vulnerable
applications - Current attack methods and defense techniques
- Analysis results motivate the development of new
defense techniques. - Many areas related to my dissertation
3I as a System Hacker/Builder
- Summer01, Avaya Labs, Basking Ridge, NJ
- Port Libsafe to Windows NT/2000.
- Summer02, Bell Labs, Holmdel, NJ
- Detection of network denial of service attacks
- Hack FreeBSD TCP/IP, network card drivers
- Summer03, Microsoft Research, Redmond, WA
- Audit-enhanced authentication in Kerberos
- NTOS security subsystem, Kerberos, LSA, NTDLL
- Summer04, Microsoft Research, Redmond, WA
- A tracing technique to identify the dependencies
of Windows applications on Administrator
privileges - NTOS security subsystem, access/privilege
checking, application interactions with NTOS
4Outlines
- Analyzing and Identifying Security Threats on
Real-World Systems - Security compromises due to HW/SW memory
corruptions - A type of memory corruption attacks currently
believed to be rare is a realistic threat. - Deficiencies of current defense techniques
- New Defense Techniques Towards a Better Security
Protection - A common characteristic of memory corruption
attacks pointer taintedness - A theorem proving based program analysis
- A runtime detection technique
Analyses
Solutions
5Analyzing and Identifying Security Threats on
Real-World Systems
6Threat of Hardware Memory Errors
Due to hardware memory errors, users can log in
with arbitrary passwords
?
Attacker
Network server (FTP and SSH)
Due to hardware memory errors, packets can
penetrate firewalls
Attacker
Target host
Firewall (IPChains and Netfilter)
- Emulate random hardware memory errors
- A stochastic model to estimate such threats in
real environments - Motivate other researchers to conduct physical
fault injections - Java type system subverted due to random hardware
memory errors.
7Threat of Software Vulnerabilities
- CERT Advisories ? 66 vulnerabilities are low
level memory errors in software. - Widely exploited by attackers, worms and viruses.
8State Machine Model WU-FTP Server Attack
repeat
FTP_service()
get an FTP command
seteuid(x)
Authentication x user ID
SITE_EXEC(fn)
printf(fn,)
exec(/bin/sh)
seteuid(0)
9State Machine Model NULL-HTTP Server Attack
repeat
HTTP_service()
process HTTP header
free(p)
pmalloc()
HTTP_POST()
foo()
recv(p,)
exec(/bin/sh)
seteuid(0)
10Control Data Attack Well-Known, Dominant
- Control data
- data used as targets of call, return and jump.
- widely understood as security critical elements
- Control data attack the most dominant form of
memory corruption attacks CERT and Microsoft
Security Bulletin - Many current defense techniques to enforce
program control flow integrity to provide
security.
11Non-control-data attacks
- Currently very rare in reality.
- One instance suggested by Young and McHugh in
1987. - How applicable are such attacks against many
real-world software? - Not studied yet, but important.
12An Important Question
- Are attackers in general incapable to mount
non-control-data attacks against many real
systems? - PROBABLY NOT!
- Random hardware memory errors can subvert the
security of real-world systems with a
non-negligible probability. - Software vulnerabilities are more deterministic
and more amenable to attacks. - Each attack exploiting software vulnerabilities
is composed by multiple primitive components.
Allow potentially polymorphic attacks. Dangerous.
13Our Claim General Applicability of
Non-control-data Attacks
- We claim
- Many real-world software applications are
susceptible to non-control-data attacks. - The severity of the attack consequences is
equivalent to that due to control data attacks. - Validate the claim by constructing
non-control-data attacks to get the root
privilege on major network servers - FTP, HTTP, SSH and Telnet servers
- Over 1/3 of vulnerabilities in CERT advisories
- Non-control-data attacks are realistic threats.
14Non-control-data attack against WU-FTP Server
(via a format string bug)
int x FTP_service(...) authenticate() x
user ID of the authenticated user seteuid(x)
while (1) get_FTP_command(...)
//vulnerable if (a data command?)
getdatasock(...) getdatasock( ... )
seteuid(0) setsockopt( ... )
seteuid(x)
When return to service loop, still runs as EUID 0
(root). Allow me to upload /etc/passwd I can
grant myself the root privilege! Only corrupt an
integer, not a control data attack.
15Non-control-hijacking attack against NULL-HTTP
Server (via a heap overflow bug)
- Attack the configuration string of CGI-BIN path.
- Mechanism of CGI
- suppose server name www.foo.comCGI-BIN
/usr/local/httpd/exe - Requested URL http//www.foo.com/cgi-bin/bar
- The server executes
- Our attack
- Exploit the vulnerability to overwrite CGI-BIN to
/bin - Request URL http//www.foo.com/cgi-bin/sh
- The server executes
/usr/local/httpd/exe
/bar
/bin
/sh
The server gives me a root shell! Only overwrite
four characters in the CGI-BIN string. Not a
control data attack.
16Non-control-data attack against SSH
Communications SSH Server (via an integer
overflow bug)
void do_authentication(char user, ...) int
auth 0 ... while (!auth) / Get a
packet from the client / type
packet_read() switch (type) ...
case SSH_CMSG_AUTH_PASSWORD if
(auth_password(user, password)) auth
1 case ... if (auth) break
/ Perform session preparation. /
do_authenticated()
17More non-control-hijacking attacks
- Against NetKit Telnet server (default Telnet
server of Redhat Linux) - Exploit a heap overflow bug
- Overwrite two strings/bin/login h foo.com -p
(normal scenario) /bin/sh h p
-p (attack scenario) - The server runs /bin/sh when it tries to
authenticate the user. - Against GazTek HTTP server
- Exploit a stack buffer overflow bug
- Send a legitimate URL http//www.foo.com/cgi-bin/b
ar - The server checks that /.. is not embedded in
the URL - Exploit the bug to change the URL to
http//www.foo.com/cgi-bin/../../../../bin/sh - The server executes /bin/sh
18Implications of Non-Control-Data Attacks
- Control flow integrity is not a sufficiently
accurate approximation to software security. - Many types of non-control data critical to
security - Once attackers have the incentive, they are
likely to succeed in non-control-data attacks.
19Re-Examining Current Defense Techniques
- Many of them are based on control flow integrity
- Monitor system call sequences
- Protect control data
- Non-executable stack and heap
- Pointer encryption PointGuard
- Address space randomization
- StackGuard, Libsafe and FormatGuard
- Building a generic and secure defense technique
still an open problem.
20Pointer Taintedness Detection Towards a Better
Security Protection for Real-World Systems
21Pointer Taintedness
- Pointer Taintedness a pointer value, including a
return address, is derived from user input. - Most memory corruption attacks are due to pointer
taintedness. - Pointer taintedness a unifying perspective for
reasoning about security vulnerabilities.
22Most Memory Corruption Attacks are Due to Pointer
Taintedness
- Format string attack
- Taint an argument pointer of functions such as
printf, sprintf and syslog. - Stack buffer overflow (stack smashing)
- Taint a frame pointer or a return address.
- Heap corruption
- Taint the free-chunk doubly-linked list
maintaining the heap structure. - globbing attack
- User input resides in a location that is used as
a pointer by the parent function of glob().
23Internals of Stack Buffer Overflow Attacks
Vulnerable code char buf100
strcpy(buf,user_input)
High
Return addr Frame pointer buf99 buf1 buf0
user_input
Stack growth
buf
Low
24Internals of Format String Attacks
Vulnerable code recv(buf) printf(buf) /
should be printf(s,buf) /
\xdd \xcc \xbb \xaa d d d n
High
n d d d 0xaabbccdd
Stack growth
Low
In vfprintf(), if (fmt points to n) then
ap (character count)
ap is a tainted value.
25Internals of Heap Corruption Attacks
Free chunk A
Vulnerable code buf malloc(1000) recv(sock,buf
,1024) free(buf)
Allocated buffer buf
user input
Free chunk B
fdA bkC
In free() B-gtfd-gtbkB-gtbk B-gtbk-gtfdB-gtfd
Free chunk C
When B-gtfd and B-gtbk are tainted, the effect of
free() is to write a user specified value to a
user specified address.
26Building Defense Techniques based on Pointer
Taintedness
- Static code analysis analyze the source code to
extract the conditions under which the
possibility of pointer taintedness exists. - To uncover potential vulnerabilities
- Runtime detection monitor at runtime whether a
tainted value is dereferenced as a pointer. - To defeat memory corruption attacks
27Static Analysis about Pointer Taintedness To
Extract Security Specifications of Library
Functions
IFIP International Information Security
Conference 2004
28Library function specifications are crucial to
secure programming
- Library function specifications are specified
empirically - printf(fmt,), strcpy(d,s), free(p), glob(p),
strtok(s,del), savestr(p), . - Formal and complete specifications required by
compiler techniques to check application source
code for security. - A unified reason why these specifications are
required - Required to eliminate pointer taintedness.
- Extraction of security specifications of a
function is reduced to a theorem proving task
29Semantics of Pointer Taintedness
- Formal definition of program semantics is
required for theorem proving. - Currently defined using an equational logic
framework - Taintedness-aware memory model
- The logic framework defines operations to fetch
the content and test the taintedness (true/false)
of each memory location. - Incorporate pointer taintedness into program
semantics - Define program semantics at the assembly level to
reason about memory layout. - Load/Store/ALU instructions propagate
taintedness from source data to destination data. - Input functions (scanf, recv and recvfrom)
- Axiom The memory locations in the receiving
buffer are tainted immediately after these
function calls.
30Extracting Function Specifications by Theorem
Prover
Automatically translated to formal semantic
representation
C source code of a library function
formal semantic representation
Theorem generation
For each pointer dereference in an assignment,
generate a theorem stating that the pointer is
not tainted
Theorem proving
A set of sufficient conditions that imply the
validity of the theorems. They are the security
specifications of the analyzed function.
31Example vfprintf()
int vfprintf (FILE s, const char format,
va_list ap) char p, q int
done,data,n,state char buf10 pformat
done0 if (pNULL) return 0
stateNO_PENDING while (p ! 0) if
(stateNO_PENDING) if (p'')
statePENDING else outchar(s,p)
else switch (p) case ''
outchar(s,'') break case 'd' datava_arg
(ap, int) if (datalt0) outchar(s,'-')
data-data n0 while (datagt0
nlt10) bufndata10'0' data/10 n
while (ngt0) n--
outchar(s,bufn) break case 's'
qva_arg (ap, char ) if (qNULL)
break while (q!0) outchar(s,q) q
break case 'n' q va_arg(ap,void)
(int) q done break default
outchar(s,p) stateNO_PENDING
p return done
Theorem1 bufn should not be a tainted value
Theorem2 q should not be a tainted value
32Extracting the Specifications of vfprintf()
- Try to prove the two theorems
- The theorem prover cannot complete the proof
initially - only valid under certain preconditions.
- Add these preconditions as axioms to the theorem
prover. - Repeat until both theorems are proved.
- Four preconditions are added the specifications
of vfprintf (FILE s, const char format,
va_list ap) - ap never points to any location within the
current function frame. - ap never points to the location of variable ap,
i.e., ap ? ap - Suppose the memory segment that ap sweeps over is
called ap_activitiy_range, then ap never points
to any location within ap_activitiy_range. - No locations within ap_activitiy_range are
tainted before vfprintf() is called.
iterate
33Other Studied Examples
- Function strcpy()
- Four security specifications indicating buffer
overflow, buffer overlapping and buffer underflow
scenarios causing pointer taintedness. - Function free() of a heap management system
- Seven security specifications are extracted,
including several specifications indicating heap
corruption vulnerabilities. - Socket read functions of Apache HTTP Server and
NULL HTTP Server - Apache function is proven to be free of pointer
taintedness. - Two (known) vulnerabilities are exposed in the
theorem proving process of NULL HTTP Server
function.
34Runtime Pointer Taintedness Detection To Defeat
Memory Corruption Attacks
To appear in IEEE Conference on Dependable
Systems and Networks, 2005.
35The Technique
- A processor architectural level mechanism to
detect pointer taintedness - Implemented on SimpleScalar simulator
- Architectural implementation of pointer
taintedness semantics - To show the validity of pointer taintedness
concept on whole programs of real applications - Network servers
- SPEC 2000 integer benchmarks
36Evaluations on Real-World Software
- Evaluation
- Effectiveness of detection
- No false alarm in any application evaluated
- Transparent to applications
- A small number of potential attack scenarios
undetected. - Pointer taintedness detection can be applied to
the whole program of real software - offers a substantial improvement on security
protection.
37Conclusions
38Conclusions
- Many real-world software can be compromised by
corrupting non-control data. - It is insufficient to rely on control flow
integrity for software security. - Pointer taintedness is a unifying perspective to
reason about most memory corruption
vulnerabilities/attacks. - Reasoning about pointer taintedness is a
promising direction to enhance security on
real-world systems - A theorem proving based code analysis approach
- A runtime pointer taintedness detection mechanism
39Future Directions
- Short term goals
- Provide a higher degree of automation for the
theorem proving technique. - Reduce the intrusiveness of the runtime pointer
taintedness detection technique - Combine with the theorem proving technique. The
processor only checks function preconditions. - Long term goals
- Extract programming styles susceptible to
security attacks. Can compilers detect bad
programming styles? - Identify a broader range of non-traditional
security threats. - Study historical data about how security
vulnerabilities were discovered, reported and
patched. - Decompose the behaviors of viruses, worms and
rootkits to a number of basic building blocks.
40Summary of My Research Methodology
- Analysis-centric approach
- A significant amount of effort in my dissertation
is on analysis. - Starting from the reality (usually a mess) to
define problems! - I am a data analysis person
- Excited to analyze real data and incidents
- Tedious? Sometimes, but it is a step toward a lot
of fun. - Rewarding? Definitely. Especially important for
systems research. - Goal strongly motivate research topics that
solve problems in the reality.
41Backup Slides
42Related Work
- Perl security
- Shankar and Wagner (2001)
- Static analysis to uncover format string
vulnerabilities - Our work pointer taintedness (Aug. 2004)
- Reasoning taintedness using an extended memory
model - Pointer taintedness as the root cause
- Secure Program Execution (MIT), Minos (UC-Davis)
and TaintCheck (CMU) (late 2004 and early 2005) - Similar memory model
- Taintedness of control data
- Taintedness cause or result of memory corruption?
43Static and Dynamic Approaches
- Static approaches (avoid producing memory
vulnerabilities in programs) - Writing code with type safe language
- Compiler techniques to uncover memory
vulnerabilities - Compiler instruments source code according to
program annotations. - Challenges legacy code and low level code,
compatibility and performance. - Fact Memory vulnerabilities are still constantly
discovered and exploited. - Intrusion detection techniques (defeat attacks,
given the existence of vulnerabilities) - Specialized techniques
- Defeat stack buffer overflow and format string
attacks. - Generic defense techniques
- Most techniques are designed to defeat
control-hijacking attacks. Host intrusion
detection system and control flow integrity
protection techniques. very active research area. - Others have constraints and difficulties in their
deployments. (pointer encryption and address
randomization)
44One-Slide Intro to Equational Logic
- Use term rewriting to establish proofs of
theorems. - Natural number addition expressed in the Maude
system.
0 Natural . s_ Natural -gt Natural . __
Natural Natural -gt Natural . vars N M Natural
Axiom N 0 N . Axiom N s M s (N M) .
(s s s 0) (s s 0) s ((s s s 0) (s 0)) s(
s((s s s 0) 0)) s(s((s s s 0)) s s s s
s 0 Intuitively, this is a proof of 3 2 5
in natural number algebra.
45Taintedness-Aware Memory Model
- A store represents a snapshot of the memory
state at a point in the program execution. - For each memory location, we can evaluate two
properties content and taintedness (true/false). - Operations on memory locations
- The fetch operation Ftch(S,A) gives the content
of the memory address A in store S - The location-taintedness operation LocT(S,A)
gives the taintedness of the location A in store
S - Operations on expressions
- The evaluation operation Eval(S,E) evaluates
expression E in store S - The expression-taintedness operation ExpT(S,E)
computes the taintedness of expression E in store
S.
46Axioms of Eval and ExpT operations
Eval(S, I) I // I is
an integer constant Eval(S, E1)
Ftch(S, Eval(S,E1)) Eval(S, E1 E2) Eval(S,
E1) Eval(S, E2) Eval(S, E1 - E2) Eval(S,
E1) - Eval(S, E2) ExpT (S, I)
false ExpT(S, E1) LocT(S,Eval(S,E1))
ExpT(S,E1 E2) ExpT(S,E1) or
ExpT(S,E2) ExpT(S,E1 - E2) ExpT(S,E1) or
ExpT(S,E2) E.g., is the expression (100)2
tainted in store S? ExpT(S, (100)2) ExpT(S,
(100)) or ExpT(S, 2)
LocT(S,100) or false LocT(S,100) Note
is the dereference operator, 100 gives the
content in the location 100
47Semantics of My Assembly Language
- The following instructions are defined
- mov Exp1 lt- Exp2
- branch (Condition) Label
- call FuncName(Exp1,Exp2,)
- Axioms defining mov instruction semantics
- Specify the effects of applying mov instruction
on a store - Allow taintedness to propagate from Exp2 to
Exp1. - Ftch((S mov E1 lt- E2),X1) Eval(S,E2) if
(Eval(S,E1) is X1) . - Ftch((S mov E1 lt- E2),X1) Ftch(S,X1) if not
(Eval(S,E1) is X1) . - LocT((S mov E1 lt- E2),X1) ExpT(S,E2) if
(Eval(S,E1) is X1) . - LocT((S mov E1 lt- E2),X1) LocT(S,X1) if not
(Eval(S,E1) is X1) . - Axioms defining the semantics of recv (similarly,
scanf, recvfrom user input functions) - Specify the memory locations tainted by the recv
call.
48Example strcpy()
char strcpy (char dst,
char src) char res 0 res dst while
(src!0) 1 dstsrc dst
src 2 dst0 return res
0 mov res lt- dst lbl(while6)
branch ( src is 0) exwhile6 1 mov
dst lt- src mov dst lt- ( dst) 1
mov src lt- ( src) 1 branch true
while6 lbl(exwhile6) 2 mov dst lt- 0
mov ret lt- res
Translate to formal semantics
Theorem generation
a) Suppose S1 is the store before Line L1, then
LocT(S1,dst) false b) If S0 is the store
before Line L0, and S2 is the store after Line
L1, then I lt Eval(S0, dst) or Eval(S0,
dstdstsize) ? I gt
LocT(S2,I) LocT(S0, I) c)
Suppose S3 is the store before Line L2, then
LocT(S3,dst) false
Theorem proving
49Specifications Extracted
- Suppose when function strcpy() is called, the
size of destination buffer (dst) is dstsize, the
length of user input string (src) is srclen
- Specifications that are extracted by the theorem
proving approach - srclen lt dstsize
- The buffers src and dst do not overlap in such a
way that the buffer dst covers the string
terminator of the src string. - The buffers dst and src do not cover the function
frame of strcpy. - Initially, dst is not tainted
Documented in Linux man page
Not documented