Security Applications For Emulation

About This Presentation

Title:

Security Applications For Emulation

Description:

Used in Argos, a system ... in identifying vulnerabilities as they happen, eg Argos. ... Argos works by dynamic taint analysis of network data which is ... – PowerPoint PPT presentation

Number of Views:145

Avg rating:3.0/5.0

Slides: 63

Provided by: silvio2

Category:

more less

Transcript and Presenter's Notes

Title: Security Applications For Emulation

1
Security Applications For Emulation

silvio.cesare_at_gmail.com

2
Speaker details

An independent researcher.
Presented a number of vulnerabilities at the
first Ruxcon after auditing the opensource
kernels (FreeBSD, NetBSD, Linux, OpenBSD)?
Also interested in Reverse Engineering, speaking
at CanSecWest on Linux malware.

3
Outline

A Presentation examining public research, and the
results of my own research, on the topic of
emulation applied to security.
Technology review
Security applications for emulation
Reverse engineering Cisco IOS Heap Management
Tracing and evaluating the capabilities of
binaries
Dynamic Taint Analysis
Automated unpacking
Symbolic Execution
Detecting Runtime Errors in Programs
And introducing a new tool for the detecting out
of bounds heap access in the Linux Kernel

4
Virtualization

Different technologies all sharing similar
themes
Virtualization
Emulation
Dynamic Binary Translation
Different types of virtualization
Full Virtualization provides a simulation of the
underlying hardware
Host performs native execution of the guest as
much as possible.
Not an emulator, so aiming for near native
speeds.
In i386, if there isn't full virtualization
hardware support, privileged code is translated
Eg VMWare, VirtualBox
Virtualization is an important technology, but
this presentation focuses on the host being able
to intercept and emulate each individual
instruction in the guest. This is in contrast to
virtualization, which executes guest code
natively as much as possible, with little general
host interception.

5
Emulation and Dynamic Binary Translation

Emulation
Emulator Fetches, Decodes and Executes
instruction by instruction
Different types of emulators whole system
emulators capable of running unmodified guest
operating systems, or emulators only capable of
running applications on specific systems.
Guest state is maintained in software, including
the CPU, system memory, and for whole system
emulators, hardware devices.
Eg Bochs
Used in the open source automated unpacker,
Pandora's Bochs.
Dynamic Binary Translation
A faster form of emulation
Caches blocks of decoded and translated
instruction
Eg QEMU
Used in Argos, a system for capturing 0day
Used in my MemCheck tool for detecting Linux
kernel heap access bugs.

6
Dynamic Analysis and Emulation

An emulator can be used to implement dynamic
analysis.
Dynamic Analysis means running a program and
seeing whats going on as it executes, eg as in a
debugger
It can mean identifying specific behaviors in the
program, such as how the program accesses memory,
transfers execution control, or treats network
data.
Dynamic analysis using a debugger is prone to
anti-debugging tricks, and is very cumbersome
when applied in a kernel context.
A robust solution is to perform dynamic analysis
from inside an emulator.
Hooks are added in the fetch/decode/execute loop
of an emulator.
When modifying a dynamic binary translator
generally, instrumentation or callbacks are added
to the translated code blocks.
All the applications for emulation presented, are
related to or applications of dynamic analysis.

7
Part i)Reverse Engineering Cisco IOS's
Heap Management
8
Reverse Engineering Cisco IOSwith Dynamips

Dynamips is an open source emulator and binary
translator of Cisco hardware running PPC/MIPS IOS
images.
Potential future development environment for IOS
exploits.
Dynamic analysis of IOS
My experience is with IOS on MIPS
IOS MIPS images use an invalid ELF e_machine
field.
Some IDA (5.2) bugs with MIPS (turn off macros to
workaround).
Dynamic analysis, can identify heap management
functions in IOS and provide a means to
potentially implement Valgrind style heap
checkers.
It can also be used to reverse engineer other
components of IOS.
Dynamic analysis is different to the static
approach, and has some advantages
Can be completely automated
Since the behavior of the IOS implementation is
relatively constant this method can work across
different IOS images, providing new or obsolete
features aren't being examined

9
IOS Heap Management Basics

Well documented public research in developing
heap based buffer overflow exploits describes
general heap layout.
IOS heap allocated buffers have a header
appearing directly before the buffer, and a
trailer that follows the buffer.
These 'chunks' form a doubly linked list.
Chunk header begins with a known constant
This fact is used later in the analysis.

10
Dynamic Analysis Approach

Knowing the header constant of a malloc chunk
enables us to track memory allocations by
intercepting writes to memory of that particular
constant.
Heap management is slightly different in a kernel
but a kernel or user mode alloc/free still has a
set of expected semantics and prototypes.
An alloc(ation) function returns a pointer to an
allocated buffer.
But don't expect there only to be one argument of
the allocation size, eg kmalloc in Linux has
multiple arguments including flags.
Free might have multiple arguments also, but one
of those arguments is certainly a pointer to an
allocated buffer.
By tracking allocations, and checking the
behavior of functions, we can infer the locations
of malloc and free.

11
Identifying Functions with Dynamic Analysis

Finding malloc
Track writes to memory that write the constant
that identifies a malloc chunk.
Track procedures exits, checking the return value
for a pointer to a known allocated buffer. This
return value is the chunk location chunk header
length.
First function to return allocated buffer is
malloc, but sample a number of times to be sure.
Finding free
Find two malloc calls that return the same
memory
Free must have occurred between mallocs since
logically, allocated buffers can't overlap.
Track procedure calls with an argument matching
freed memory, eg free(ptr)?
Sample large enough set, common function among
samples is free.

12
Testing the results with a double free and
overlapping allocation checker.

How can we determine if malloc and free are the
only heap management functions.
The solution is to trace those functions while
running IOS, building our own representation of
the heap, all the while checking for consistency
in our representation.
Certain conditions should always be true in a
well managed heap. If any assertions fail
catastrophically, our model of the heap is
incorrect.
Only allocated memory can be freed.
Allocated memory can not overlap.
This results in a checker that can be used to
detect double free bugs in IOS, as they happen,
much like Valgrind. But IOS checks the
consistency of the heap regularly and also during
free, so the checker is probably only useful for
automated analysis.

13
Detecting IOS 0-day

Another type of IOS checker could potentially be
made to detect 0-day attacks.
IOS exploitation uses corrupted malloc chunks
that are subsequently freed.
Freeing the corrupt chunk causes an arbitrary
write to memory.
The checker could confirm the consistency of
header attributes such as the size of each chunk
through the interception of free calls.
For more complete coverage, the chunk header
could be retrieved and stored after every malloc,
subsequently being verified before free.
In a roll-out, honeypots could automatically
detect mass 0-day exploitation and raise alarms
of the attack.

14
Reference Counting.

Tracing malloc and free, shows us conditions
where we are freeing the same memory twice, or
performing a double free.
Potentially this could indicate a bug in IOS but
there are simply too many alerts to be
meaningful.
In fact, it turns out that as suspected by other
researchers, allocated buffers are reference
counted
Before the two double frees is a call to
increment the reference count (IncRefCnt) of the
buffer, thus causing the first free to simply
decrement the count without actually freeing the
memory.
MIPS has an atomic addition instruction, used
only for incrementing the malloc chunk refcnt.
Any procedure that uses this instruction on a
malloc chunk is IncRefCnt.
For other architectures, the refcnt field in the
malloc chunk is at a fixed offset, and writes to
this address may also indicate the location of
IncRefCnt.

15
MallocLite

Tracing also reveals the appearance of
overlapping memory allocations.
In later versions of IOS, 'MallocLite'
implementation is used.
A 64k allocation is used which is subsequently
subdivided for use in allocations
This feature may affect the writing of heap
exploits and should be taken into account.
If malloc recursively calls itself, requesting
64k of memory, then MallocLite is allocating this
larger block of memory.
For tracing, ignoring recursive allocations works.

16
Cisco IOS TODO

The malloc tracer could potentially be used to
implement a Valgrind style MemCheck tool to
detect out of bounds heap access.
This could be used alongside fuzzing to provide
more accurate detection of vulnerabilities when
they happen.
Easy to implement, but the initial attempt
resulted in too many false positives.
Problem There are other functions that have
direct access to internal heap structures besides
malloc, free and IncRefCnt, eg CheckHeaps.
More reversing is required.
If Cisco gave me access to the source, I'm pretty
sure I could whack this out in a week -)?
The MemCheck concept was later successfully
implemented for the Linux Kernel as source code
is openly available.

17
Cisco IOS Summary

By modifying the open source Cisco emulator,
dynamips, dynamic analysis of IOS is possible.
Dynamic Analysis of IOS can aid in reverse
engineering.
Potentially one day we will have Valgrind style
IOS memory checking tool, or in the near future a
0-day detection tool.

18
Part ii)Tracing execution and evaluating
the capabilities of binaries and potential malware
19
Tracing and evaluating the capabilities of
binaries

Running binary inside a sandboxed environment
logging events of interest.
System calls, registry changes, files accessed,
process management, services started or stopped
etc.
Public websites offer free online services to
evaluate binaries and potential malware.
Trace useful for quickly determining what a
binary is doing.
May help in determining if binary is malicious.
A non emulated approach is to trace the binary
using a debugger based tool from userspace within
a VM.
Malware almost certain to use anti debugging
tricks which may make tracing problematic.
Another approach is to perform the execution
inside an emulator.
Emulated approach very resistant to modern
anti-debugging tricks.

20
TTAnalyze

TTAnalyze A Masters thesis that presented a
closed source fork of QEMU that logged windows
system calls.
Important as other techniques such as automated
unpacking are based on similar methods and the
thesis clearly describes the implementation.
Windows XP running as a guest, emulated by a fork
of QEMU in the host.
Host uploads binary to guest using virtual
network created by VM.
Binary is executed in guest environment.
Host monitors execution and logs events of
interest.

21
TTAnalyze concepts

Host emulator intercepts every instruction.
It identifies instructions that belong to the
process being monitored.
How to know what code is part of the process we
wish to monitor?
CR3 register (the page directory base address) is
unique for each process.
Kernel maintains a process list (EPROCESS) with
these addresses.
Given a specific process instruction, it may be
executing either kernel code or user code.
For our target process, kernel code is when EIP
0x80000000.
For the target process, it checks EIP, and if it
points to a Windows API call it logs the event.
It also logs returning from Windows API calls.
To know the addresses of each Windows API call,
it uses the PEB from the target process used to
eventually retrieve a list of all loaded DLL's.
The library calls in each DLL is parsed, and
their addresses noted.

22
TTAnalyze Implementation

A component that executes inside the guest
system
Kernel driver to parse kernel EPROCESS list, to
obtain the page directory address (CR3), and PEB
of the target process.
RPC mechanism to control guest operations from
host
uploading executables to guest
Controlling execution of the target process,
which is initially started in a suspended state
to allow querying.
Querying the pdb/CR3 and PEB kernel driver.
QEMU modifications
Identifying the process of interest using the CR3
result from the guest kernel driver.
The PEB is used to established a list of
addresses for each windows API call in a DLL
Identifying entering and leaving windows API
calls in the guest, based on intercepting each
instruction and checking EIP.

23
TTAnalyze Implementation Challenges

Arguments for system calls which reside in
virtual memory might be paged out.
QEMU page fault handler detects condition then
alters guest code to access target memory, paging
it in.
Malware can use the Native API directly.
Understanding this requires unofficial
documentation of API.
Trap native calls by checking each instruction
for a OS trap (int 2e or sysenter).

24
TTAnalyze Attacks

Malware might evade detection of Windows API
calls which is dependant on exact EIP matching.
Vulnerable if malware doesn't jump to the very
beginning of a function, eg Caller might
implement callee prologue
Malware might detect guest changes.
Communication channel between host and guest.
Kernel driver component.
See Pandora's Bochs (An automated unpacker)
implementation with no guest changes.
Malware might detect system emulators
CPU Bugs (in errata) generally not implemented
Model Specific Registers implementation different
for different CPU vendors.

25
Binary Tracing Summary

Existing software that traces binaries using a
userland style debugger based tool in a VM,
vulnerable to many anti-debugging tricks.
An emulator can present a solution to that
problem.

26
Part iii)Using emulation for dynamic taint
analysis
27
Dynamic Taint Analysis

A technique used to analyze the the flow of data
in a program.
Has applications in identifying vulnerabilities
as they happen, eg Argos.
Has also been used to identify spyware, eg,
BitBlaze.
Is a general concept that can be used in a number
of applications, including symbolic execution.
Traces the flow of data, instruction by
instruction, from a source that generates
'tainted' data, to sinks where the data is used.
Variables, registers and memory are tagged as
being tainted or clean.
Destination operand in instruction becomes
tainted when a source operand is tainted.
Sometimes its useful that data can become
untainted by certain operations.

28
Dynamic Taint Analysis in Vulnerability Detection

Dynamic Taint Analysis has been applied for
vulnerability detection such as SQL injection, or
incorrect use of the Unix exec() or system()
calls which run executables.
Source of user input, that is untrusted data,
taints the data.
Flow of untrusted data followed by taint
analysis.
If untrusted data checked in a condition, then
input validation deemed to have occurred, so
untaint data.
At site of exec(), system(), or even
mysql_query, check that argument is non tainted.
If tainted, then untrusted data assumed to have
reached privileged code and vulnerability has
occurred.

29
Argos A tool for detecting 0day attacks

Uses dynamic taint analysis to detect 0day
attacks.
An open source fork of QEMU.
Detects exploits as they are happening and
automatically generates vulnerability
signatures.
Vision is of an automatic worm defense system.
Honeypots detect 0day attacks.
Generates and delivers vulnerability signatures
to intrusion prevention systems
Argos works by dynamic taint analysis of network
data which is considered untrusted.
Taints data returned from QEMU emulated network
driver.
Exploits detected when their is code redirection
under attacker control.
If EIP becomes tainted (under the control of the
attacker)?
If EIP points to tainted data.
Execve system calls checked for tainted arguments.

30
Dyanamic Taint Analysis Summary

Dynamic Taint Analysis is a technique used to
track the flow of data.
Important because it can be used as a general
technique in more applied topics.
Has applications including vulnerability
detection and is used in places like symbolic
Execution.

31
Part iv)Automated Unpacking
32
Packers

A packer rewrites an executable, wrapping a new
layer of code around the original program.
Essentially becomes an executable inside an
executable.
A packer is used to compress, obfuscate or
encrypt the original executable
Today almost all malware is packed.
Packers originally used for compression
I remember packers (or crunchers) from the early
90's, and had 2 floppy disks full of them, for
the Commodore 64!
The resulting packed executable consists of a
runtime unpacking layer and a binary blob of the
compressed or obfuscated original program.
At runtime, the unpacking layer, decompresses the
blob writing to memory the original executable.
It then transfers execution back to the original
code.
Not all packers follow this behavior. Some
packers convert the original executable to PCODE.
At runtime the packed executable acts as a VM.

33
Unpacking

Unpacking is the process of extracting the
original executable from a packed image.
The manual approach is to run the packed
executable in a debugger, skipping the unpacking
stub which writes to memory the original image,
and breaking (in the debugger) when execution
transfers to the now unpacked image.
A dump of memory, but rebuild the image so its a
valid executable again.
Requires fixing the Import Address Table.
ImpRec can do this.
Debugger scripts can automate the process on
specific unpackers by identifying instruction
sequences that indicate which stage the unpacking
stub is in.

34
Automated Unpacking

Unpacking can be automated.
Run packed executable.
Track all memory writes by executable.
If execution transfers to a priorly written to
memory location, then unpacking deemed to have
occurred.
May be necessary to repeat as multiple layers may
exist.
Public automated unpackers available from
Offensive Computing, and also Pandora's Bochs.

35
Automated Unpacking Implementation Approaches

Multiple approaches in implementation
Use hardware page protection in OS to track
writes and execution. Eg Offensive Computing.
This results in high performance.
If running inside a virtualized environment like
VMWare, VM might be detected. Offensive
Computing recommend using a real goat machine.
Dynamic Instrumentation or complete emulation of
packed program to track memory writes and
execution.
Offensive Computing use instrumentation approach
with Intel PIN framework.
Pandoras Bochs uses the Bochs emulator.

36
Automated Unpacking using an Emulator

Emulation is a mature closed source technology
used by AntiVirus
Original usage of emulation was to detect
polymorphic virus, but now used for unpacking
also.
Typical AntiVirus emulator emulates both the
instruction set and parts of the operating
system.
This is how I wrote my own automated unpacker and
emulator.
There are no software licensing problems since
the emulator is only a regular piece of
software.
Another approach is to use a whole system
emulator such as Bochs or QEMU running an
installed OS.
Non emulated approaches are more likely to be
detected or be suspect to anti-debugging tricks
employed by malware.

37
Using an AV style Emulator as a CPU checker

While developing my AV style emulator, a need
arose to verify the emulation.
I Implemented a program tracer to trace programs
in parallel to emulation
Tracer needed to automatically evade
anti-debugging tricks
Instructions needed to be emulated that would
indicate the program was being debugged. (eg,
EFlags popf, rdtsc, or software int1 being
confused with single stepping)?
Library calls also (eg, Process32 which shows
debuger in process list, and IsDebuggerPresent)?
For each traced instruction, the emulator
executes the same instruction.
The CPU state from the tracer is verified against
the state of the emulator, and checked for
consistency.
Some instructions produced differences between
emulation and tracing, not due to a fault of the
emulator or tracer.
CPU Bugs. Some Instructions not following Intel
specifications.
Not setting/clearing processor status flags

38
Automated Unpacking using an Emulator
implementation

Changes to an emulator required involve modifying
the software MMU to track memory writes, and
checking each instruction to see if the EIP
matches any addresses where memory writes have
occurred.
Similar problems as TTAnalyze are present in
determining what code is part of the target
process.
The Renovo unpacker from the BitBlaze project
follows the TTAnalyze approach in starting the
executable in a suspended state, and then using a
kernel driver in the guest to find the page
directory base address of the process.
Pandora's Bochs uses an unmodified guest system
and instead watches for changes in the CR3
register to identify the target process.
To determine the value of CR3 it takes into
account that in kernel mode windows uses the fs
register to reference a known structure leading
to the EPROCESS list which like TTAnalyze,
contains the page directory base address (CR3) of
each process.

39
Attacks against Automated Unpackers and Emulators

Malware might make use of unimplemented emulation
of the architecture, instruction set or operating
system
For AV emulators, use of obscure libraries.
For whole system emulators, detection of the
emulator. Malware might check existence of known
CPU errata.
Having malware require activation (eg, using the
Internet), or only occasionally activating.

40
Attacks (cont) Virtual Machine Packers

Packer translates executable into PCODE.
At runtime, PCODE is decoded and executed in the
style of a virtual machine.
PCODE can be polymorphic.
This type of packer doesn't follow the 'write to
memory then execute' algorithm.
Eg, TheMida, but fortunately these packers are
not as common in current malware.
No automated method of unpacking against an
unknown packer of this type.

41
Automated Unpacking Summary

Automated unpacking works on a theory of
intercepting execution on priorly written to
memory addresses.
Multiple approaches to implementation emulation
has some advantages.
Automated unpacking doesn't work on VM based
unpackers.

42
Part v)Using emulation to design and
implement symbolic execution
43
Symbolic Execution

A technique used to analyze programs.
For unknown input to a program, it maintain
generalized information on program state,
systematically exploring program paths.
Really a definition for mixed symbolic
execution.
Execution occurs, by emulating instructions and
using symbolic formula instead of concrete data
for user defined input.
Example symbolic data can be network packet
contents, program arguments, file contents etc
Symbolic formula contain information on all
program states on that program path for arbitrary
user input, that is, all the values the data can
possibly hold as held true by the symbolic
formula.
Bug finding is equivalent to solving the
equations.
Eg, Is this pointer being dereference ever equal
to 0, given arbitrary user input.
And if so, what is the user input that generates
that bug.

44
SMT Based Constraint Solvers

Symbolic equations are generated for instructions
that have symbolic arguments.
Conditional instructions generate equations which
are constraints (eg, x
Equations handled by Satisfiability over Modulo
Theory (SMT) Solvers.
Efficient SMT based solvers are a relatively new
achievement in the past decade.
Annual SMT competition pits solvers against each
other.
Microsoft has their own solver which is free to
use, but not open source.
A number of open source solvers available.
SMT Solver can be queried, given a set of
equations and constraints, to see if certain
queried constraints are true.
Can easily determine if symbolic pointer is
null..
SMT solvers can also generate concrete solutions
from symbolic equations

45
Applications of Symbolic Execution

As a Bug checker
Dawson Englers closed source C checker ExE which
could detect buffer overflows, null pointer
dereferences and divisions by zero.
The open source Catchconv which doesn't explore
program paths, but checks assertions on a given
set of input using symbolic execution to find
signedness bugs.
Intelligent fuzzing
Symbolic Execution can automatically enumerate
the paths and data in a program that fuzzing
normally misses, aiming towards complete
automated code coverage.
Eg, closed source Microsoft Sage research
Tracing and evaluating the capabilities of
binaries
The closed source Bitblaze projects implements
BitScope which is in a similar vein to TTAnalyze
except it symbolically explores the many program
paths in potential malware to find its
capabilities.

46
Symbolic Execution Implementation

Emulator runs program, instruction by
instruction, generating symbolic equations for
instructions when a source operand is symbolic,
such as the symbolic equation ebxeax 10.
In an instruction, if a source operand is
symbolic, destination becomes symbolic.
This is implemented using Dynamic Taint Analysis
At conditional instructions, two possible
equations, the condition being true, or the
condition being false.
Symbolic Execution explores each path
separately.
A symbolic constraint representing the conditions
truth is given to each path, eg (x 10 and x
10).
Feasibility, that is if an equation can be
satisfied as true, of each path is determined by
SMT solvers.

47
Symbolic Execution Challenges

Symbolic Execution may never terminate in the
presence of loops, so loops must be simplified,
typically through unrolling.
Symbolic Execution therefore is not complete.
Path Explosion Dealing with functions like
strcmp with symbolic input, has many possible
paths an exponential number of paths for the
size of the string.
BitBlaze approach Hard code 'function summaries'
to deal with common library functions.
Dealing with symbolic pointers.
Dynamic taint analysis has trouble determining
the target memory that becomes tainted if a
pointer is symbolic.
Requires SMT solver to determine concrete
solutions of pointer.
SMT solver support used for target architecture
may not be complete
No public solvers support floating point.

48
Symbolic Execution Summary

Symbolic execution is a relatively new method to
analyze programs.
Applications include bug checkers, smart fuzzers,
and binary evaluation.
I believe symbolic execution has a big part in
the future of automated analysis.

49
Part vi)Detecting Runtime Errors in Programs
50
Valgrind

Valgrind is a heavyweight dynamic binary
instrumentation framework.
Most well known for the MemCheck checker.
Memcheck used as a bug checker for incorrect heap
use or access.
Also detects uninitialized variable use.
Translates machine code to IR, then allows
instrumentation, with modules that implement
runtime checkers.
Valgrind's Memcheck can detect out of bounds or
invalid heap access and tracks what addresses can
be accessed by maintaining a 'shadow memory'
mirroring allocations on the heap.
For each address in shadow memory, also stores
weather its initialized or not.
Then checks all guest memory references belong to
the shadow memory using IR instrumentation.

51
Valgrind's MemCheck with uninitialized variables

Uninitialized variable checker implemented using
dynamic taint analysis.
Newly allocated memory and new stack frames
considered tainted.
Initializing data untaints it.
Alert when using tainted/uninitialized data.
Naive implementation causes false positives.
Memcpy of padded structures or memcpy of
structures with uninitialized members causes
false positives.
Fixed by warning only when using uninitialized
variables in system calls, conditions or being
dereferenced as a pointer.

52
Detecting Runtime Heap Errors in the Linux Kernel

Tools that have similar designs or aims to detect
some classes of heap errors in the Linux Kernel.
KEFence (Linux) / MemGuard (FreeBSD)?
Detects overflows (and underflows for KEFence,
but not both at the same time) of heap buffers.
Allocates a guard page next to the allocated
buffer that page faults on any access.
Only detects overflows, not arbitary invalid
access.
KmemCheck (Linux)?
Used to Detect uninitialized variable bugs.
Maintains a shadow memory indicating state of
data being initialized or not.
Page faults on all heap access, then checks
shadow memory against access.
UML Valgrind
Doesn't seem active, and source unavailable (

53
Linux Kernel MemCheck

My own runtime checker that detects out of bounds
heap access in the Linux Kernel.
Not Valgrind's MemCheck I named it poorly I
know.
Tested under Linux 2.6.26 using a Windows Vista
Cygwin host.
Implemented as a C fork of QEMU.
Dumps kernel stack trace on guest access
violation
Only reports when a memory access violation
occurs, much like Valgrind.
Not a static analysis tool.
Host maintains 'shadow memory' of guest Linux
Kernel heap that identifies valid heap
addresses.
The shadow memory is created by intercepting the
heap management functions in the Linux kernel and
building a representation of the guest heap.
MemCheck validates all memory access against this
shadow memory (like Valgrind).
Except in heap management functions like kmalloc,
kfree etc.

54
Linux Kernel Heap Management

Linux has had several memory allocators, the
latest Linux kernels now using the slub
allocator.
MemCheck only supports the latest slub
allocator.
There are also three internal allocators in Linux
that use the heap.
The Page Allocator, using the buddy allocator
internally, which only handles allocations of
sizes being a predetermined multiple of the page
size.
The page allocator can be called directly or
indirectly from the slub allocator.
The Slub Allocator? which handles allocations of
varying sizes by dividing up a slab that
originates from the page allocator.
The BootMem Allocator which uses a simpler
algorithm than the other allocators during boot
time only.

55
Linux Kernel Heap Tracing and Guest Linux
Implementation

MemCheck must trace the kernel allocator
functions to properly create its shadow memory.
However tracing an unmodified Linux guest
presents problems.
The Page Allocator does not always return the
address of the allocated page contents, but
returns a structure of the page description
instead.
The Slub Allocator defines kmalloc as an inline
function which can't be intercepted using a
compile time symbol address.
Following internal logic can be difficult, such
as kmalloc using the page allocator internally.
The solution is to use a modified guest Linux
Kernel that uses instrumentation of the
allocators that MemCheck can easily intercept

56
MemCheck QEMU implementation

QEMU was modified to implement MemCheck.
MemCheck is written in C running in a Windows
host, so I ported QEMU 0.9.1 to compile under
g. In hindsight, porting was not necessary and
not worth the effort. I also backported some
patches that cause 0.9.1 to fail in windows.
QEMU has an optimization of merging basic blocks
in a translation block. I needed basic block
granularity to correctly intercept the beginning
of functions so this QEMU optimization was turned
off.
A tracer was implemented to track functions using
a callback interface on function entry or exit.
By tracing the heap management code, a simple
shadow memory was constructed using C STL maps
for the implementation.
The software MMU in QEMU was modified to check
the memory access was a valid address in the
shadow memory.

57
MemChecking the Linux Kernel

The Linux Test Project (LTP) contains 3000 tests
for the Linux Kernel which exercise much of the
core kernel code.
Ran the default test suite on Linux 2.6.26.3
using MemCheck.
MemCheck is slow, but still allows for
interactive sessions.
Fedora Linux takes 30 minutes to boot.
Let the testsuite to run overnight
No out of bounds access detected.
Reran the testsuite again using slub debugging
which in combination to MemCheck, may result in
more bugs being detected.
Again, no out of bounds access detected.
While no immediate bugs were identified in
2.6.26.3, MemCheck may be used against future
kernel releases, possibly as part of an automated
test suite, or used to aid kernel debugging and
development.

58
MemCheck Limitations

Because MemCheck is based on QEMU, very little
hardware is emulated so most of the Linux driver
code is not tested.
Buffer overflows don't necessarily result in
memory access using invalid heap addresses.
A slab based allocator fits heap allocations next
to each other, so buffers overflow into adjacent
and valid heap allocations.
A solution is to boot Linux using the slub_debug
kernel option which separates heap objects using
a redzone.
If MemCheck generates a report from a vulnerable
kernel module, only kernel addresses are given in
the stack trace no symbolic names are used.

59
MemCheck TODO

A solution to the adjacent buffer problem is to
associate every heap access with its original
allocation by tracking heap pointers using
dynamic taint analysis.
This use of dynamic taint analysis could also be
applied in userland, as a Valgrind checker.
Dynamic taint analysis can also be the basis of
tracking uninitialized variable usage without the
false positives currently associated with
kmemcheck.
Dynamic taint analysis could also be used to
implement garbage collection, which could be used
to identify memory leaks at the exact location of
each leak.
Symbol names for addresses in kernel modules!

60
MemCheck Packages

http//silvio.cesare.googlepages.com/ For the
package
http//silviocesare.wordpress.com/ For commentary
on some of MemCheck's internals.

61
Runtime Error Detection Summary

Existing tools for runtime error detection
include Valgrind which detects userland heap
bugs.
Tools for the kernel exist such as kmemcheck
which detects uninitialized variables.
MemCheck is a new tool to detect heap bugs in the
Linux Kernel, and operates similar to Valgrind.