Title: Security Testing
1- Security Testing
- fuzzing
- protocol fuzzing
- model-based testing
- automated reverse engineering
- Erik Poll
- Radboud University Nijmegen
2Testing Ingredients
- Two things are needed to test a SUT (System Under
Test) - test suite, ie collection of input data
- a test oracle
- that decides if a test was ok or reveals an
error, i.e.
some way to decide if the SUT behaves as we want - A nice simple test oracle just seeing if
the SUT crashes - Both defining test suite and test oracles can be
a lot of work - for each individual test case the test
oracle may need to be tweaked by specifying
exactly what should happen
3Coverage Criteria
- Measures of how good a test suite is
- statement coverage
- branch coverage
- Statement coverage does not imply branch
coverage eg for - void f (int x, y) if (xgt0) y
- y--
- statement coverage needs 1 test case,
branch coverage needs 2 - More complex coverage criteria exists, eg MCDC
(Modified condition/decision coverage) which is
used in avionics
4Possible Perverse Effect of Coverage Criteria
- High coverage criteria may discourage defensive
programming - void m(File f)
- if ltsecurity_check_failsgt throw
(SecurityException) - try ltthe main part of the methodgt
- catch (SomeException) lttake some
measuresgt
- throw
(SecurityException) -
- If the green defensive code is hard to
trigger in tests, programmers may be tempted (or
forced) to remove it to improve coverage in
testing...
5Security testing is HARD, in general
- Normal testing will look at right, wanted
behaviour for sensible inputs, and some inputs on
borderline conditions - Security testing also involves looking for the
wrong, unwanted behaviour for really silly inputs - Similarly, normal use of a system is more likely
to reveal functional problems
(users will complain)
than
security problems (hackers wont
complain)
6Security testing is HARD, in general
all possible inputs
normal inputs
.
.
. input that triggers security bug
.
.
.
.
7JML annotations as test oracle
- Tools for runtime assertion checking of JML
annotations can be used when testing - code instrumented with check to test annotations,
which throw special exceptions for violations - effectively, the annotations serve as test oracle
- Benefits
- Test oracle for free you can test by sending
random data - More precise and detailed feedback adding
//_at_
invariant contents ! null
an application may crash with an
Invariant Violation in line 18000 after 1 minute
with runtime assertion checking, whereas
otherwise it would crash NullpointerException in
line 12000 after 5 minutes - pointing to the real
origin of the problem, not the eventual effect
8Symbolic Execution for test suites
- Symbolic execution can be used to generate test
suites with good coverage - Basic idea symbolic execution
- instead of giving variables a concrete value
(say 42), variables are given a symbolic value
(say N), and the program is executed with these
symbolic values to see when certain program
points are reached
9Symbolic Execution
- m(int x,y)
- x x y
- y y x
- if (2y gt 8) ....
-
- else if (3x lt 10) ...
-
-
10Symbolic Execution
- m(int x,y)
- x x y
- y y x
- if (2y gt 8) ....
-
- else if (3x lt 10) ...
-
-
- There are tools that, given such sets of
constraints, try to produce test data that meets
these constraints
// let x N and y M // x becomes NM // y
becomes M-(NM) -N // taken if 2-N gt 8, ie N
lt -4 // taken if Ngt-4 and
3(MN)lt10
11Symbolic Execution
- Symbolic execution can also be used for program
verification - symbolically execute a method (or piece of
code) - assuming precondition (and invariant) on
initial values, - prove postcondition (and invariant) for
final values
12Fuzzing
13Fuzzing
- Fuzzing
- try really long inputs for string arguments
to trigger segmentation faults and hence find
buffer overflows - Benefit can be automated, because test
suite of long inputs can be automatically
generated, and test oracle is trivial looking if
the program crashes - This original idea has been generalised to other
settings - The general idea of fuzzing using semi-random,
automatically generated test data that is likely
to trigger security problems -
14Fuzzing in memory safe languages
- For memory safe languages such as Java or C(),
fuzzing can still reveal bugs in a VM, bytecode
verifier, or libraries with native code - Eg, fast graphics libraries often rely on native
code - CVE reference CVE-2007-0243
Release Date 2007-01-17
Sun Java JRE GIF Image Processing Buffer
Overflow Vulnerability - Critical Highly critical Impact System
access Where From remote - Description A vulnerability has been
reported in Sun Java Runtime Environment (JRE),
which can be exploited by malicious people to
compromise a vulnerable system. The vulnerability
is caused due to an error when processing GIF
images and can be exploited to cause a heap-based
buffer overflow via a specially crafted GIF image
with an image width of 0.
Successful exploitation
allows execution of arbitrary code.
15File format fuzzing
- Incorrectly formatted files, or corner cases in
file formats can cause trouble - Eg
- GIF image with width 0 on previous slide
- Microsoft Security Bulletin MS04-028
- Buffer Overrun in JPEG Processing (GDI)
Could Allow Code Execution
Impact of Vulnerability Remote
Code Execution
Maximum Severity Rating Critical
Recommendation Customers should apply the update
immediately - Root cause a zero sized comment field,
without content.
16Fuzzing web-applications?
- Could we fuzz a web application in the hope to
find security flaws? - SQL injection
- XSS
- ...
- What would be needed?
- test inputs that trigger these security flaws
- some way of detecting if a security flaw occurred
- looking at website response, or log files
17Fuzzing web-applications
- There are many tools to fuzz web-applications
- Spike proxy, HP Webinspect, AppScan, WebScarab,
Wapiti, w3af, RFuzz, WSFuzzer, SPI Fuzzer Burp,
Mutilidae, ... - Some fuzzers crawl a website, generating traffic
themselves, other fuzzers
modify traffic generated by some other means. - As usual, there will be false positives
negatives, eg - false negative for SQL injection due to not
recognizing some SQL database errors - false positives for XSS due to signalling a
correctly quoted echoed response as XSS - Frank van der Loo, Comparison of penentration
testing tools for web applications, MSc thesis
18Protocol Fuzzing
- Protocol fuzzing based on known protocol format
- ie format of packets or messages
- Typical things to try in protocol fuzzing
- trying out many/all possible value for specific
fields - esp undefined values, or values Reserved for
Future Use (RFU) - giving incorrect lengths, length that are zero,
or payloads that are too short/long - Tools for protocol fuzzing exist, eg SNOOZE
-
19Example GSM protocol fuzzing
- GSM is a extremely rich complicated protocol
20SMS message fields
Field size
Message Type Indicator 2 bit
Reject Duplicates 1 bit
Validity Period Format 2 bit
User Data Header Indicator 1 bit
Reply Path 1 bit
Message Reference integer
Destination Address 2-12 byte
Protocol Identifier 1 byte
Data Coding Scheme (CDS) 1 byte
Validity Period 1 byte/7 bytes
User Data Length (UDL) integer
User Data depends on CDS and UDL
21Example GSM protocol fuzzing
- Lots of stuff to fuzz!
- We can use a USRP
- with open source cell tower software
(OpenBTS) - to fuzz phones
- Mulliner et al, SMS of Death from analyzing to
attacking mobile phones on a large scale - Brinio Hond, Fuzzing the GSM protocol, MSc
thesis
22Example GSM protocol fuzzing
- Fuzzing SMS layer of GSM reveals weird
functionality in GSM standard and in phones
23Example GSM protocol fuzzing
- Fuzzing SMS layer of GSM reveals weird
functionality in GSM standard and on phones - eg possibility to send faxes (!?)
- Only way to get rid if this icon reboot the
phone
you have a fax!
24Example GSM protocol fuzzing
- Malformed SMS text messages showing raw memory
contents, rather than content of the text message
25Example GSM protocol fuzzing
- Lots of success to DoS phones phones crashing,
disconnecting from the network, or stopping
accepting calls - eg requiring reboot or battery removal to
restart, to accept calls again, or to remove
weird icons - after reboot, the network might redeliver the SMS
message, if no acknowledgement was sent before
crashing - But not all these SMS messages could be sent
over real network - There is not always a correlation between
problems and phone brands firmware versions - how many implementations of the GSM stack does
Nokia have? - The scary part what would happen if we fuzz base
stations...
26Example fuzzing e-passports
- e-passports implement protocol to prevent giving
any info to passive eavesdropper of active
attacker - correct protocols runs dont leak info to an
eavesdropper - Fuzzing unexpected but correctly formatted
instructions - leaks a unique fingerprint per
implementation, and
hence (almost) unique per country - for Australian, Belgian, Dutch, French, German,
Greek, Italian, Polish, Spanish, Swedish
passports - Here we dont fuzz to crash,
but
to see if there is information leakage - Henning Richter et al. , Fingerprinting
passport, NLUUG 2009
27State-based Protocol Fuzzing
- Instead of fuzzing the content of individual
messages, - we can also fuzz the order of messages
- using protocol state-machine to
- reach an interesting state in the protocol and
then fuzz content of messages there - fuzz the order of messages to discover effect of
strange sequences -
-
28State-based Protocol Fuzzing
- Most protocols have different types of messages,
which should come in a
particular order - We can fuzz a protocol by trying out the
different types of messages in all possible
orders - This can reveal loop-holes in the application
logic - Essentially this is a from of model-based
testing, where we automatically test if an
impementation conforms to a model - Tools for this Peach, jTor
29Protocol Complexity
- NB most real protocols are much more complicated
than the ones you study in Verification of
Security Protocols - Essence of SSH transport layer
- C -gt S NC
- S -gt C NS
- C -gt S exp(g,X)
- S -gt C k_S.exp(g,Y).H_inv(k_S)
with Kexp(exp(g,X),Y),
Hhash(NC.NS.k_S.exp(g,X).exp(g,Y).K) - C -gt S XXX_KCS
with SIDH,
KCShash(K.H.c.SID) - S -gt C YYY_KSC
with SIDH,
KSChash(K.H.d.SID)
30Protocol Complexity
- NB most real protocols are much more complicated
than the ones you study in Verification of
Security Protocols - Essence of SSH transport layer
Real SSH transport layer - C -gt S NC
- S -gt C NS
- C -gt S exp(g,X)
- S -gt C k_S.exp(g,Y).H_inv(k_S)
with Kexp(exp(g,X),Y),
Hhash(NC.NS.k_S.exp(g,X).exp(g,Y).K) - C -gt S XXX_KCS
with SIDH,
KCShash(K.H.c.SID) - S -gt C YYY_KSC
with SIDH,
KSChash(K.H.d.SID)
excluding all the error transitions back to the
initial state
31Model based testing
- General framework for automating testing
- make a formal model M of (some aspect of) the SUT
- fire random inputs to M and the SUT
- look for differences in the response
- Such a difference means an error in the SUT, or
the model...
32Example model based testing of e-passport
test tool
...
...
SUT
Test tool sends the same random sequence
of commands to the model and the SUT, and
checks if the responses match
model
33Example model based testing of MIDPSSH
- MIDPSSH implementation of SSH of Java-enabled
feature phone - Implementors of MIDPSSH forgot to track the
protocol state any sequence of
messages would be accepted - So a Man-in-the-Middle attacker could eg. ask the
client for a username/password before a session
key had been agreed
any message
state machine implemented in MIDPSSH
state machine model of SSH
Aleksy Schubert et al, Verifying an
implementation of SSH, WITS 2007
34Reverse Engineering
35In the other direction
- Instead of using protocol knowledge when testing
- in protocol fuzzing or model-based fuzzing
- we can also use testing to gain knowledge about a
protocol - or a particular implementation of a protocol
- In order to
- analyse your own code and hunt for bugs, or
- reverse-engineer someone elses unknown protocol,
- eg a botnet,
- to fingerprint or to analyse (and attack) it
36What to reverse engineer?
- Different aspects that can be learned
- timing/traffic analysis
- protocol formats
- ie format of protocol packets
- eg using Discoverer, Dispatcher,
Tupni,.... - protocol state-machine
- eg using LearnLib
- both protocol format state-machine
- eg using Prospex
37How to reverse engineer?
- passive vs active learning
- ie passive observing or active testing
- active learning involves a form of fuzzing
- active learning is harder, as it requires more
software in test harness that produces meaningful
data - these approaches learns different things
passive learning
produces statistics on normal use,
active learning will more aggresvely
try our strange things - black box vs white box
- ie only observing in/output or also looking
inside running code
38Reverse engineering encrypted traffic?
- Can we reverse engineer protocol formats if
traffic is encrypted? - say for a botnet
- Trace the encrypted data through the code,
to see
where it gets decrypted, and then
look at the
parsing and case distinctions made on the buffer
containing the decrypted data - Such white-box analyses of encypted traffic, by
looking at handling of data after decryption, is
done by ReFormat at TaintScope
39Active learning with Angluins L algorithm
- Basic idea compare a deterministic systems
response to - a
- b a
- If response is different, then
- otherwise
?
40Active learning with L
Implemented in LearnLib library The learner
builds hypothesis H of what the real system M is
reset
Learner H
Teacher M
input
output
equivalence M H ?
yes or a counterexample
Equivalence can only be approximated in a black
box setting by doing model-based testing to see
if a difference can be detected
41Learning set-up for EMV banking cards
abstract instructions and response
concrete instructions and response
instruction INS
Learner H
Teacher
M
test harness
INS args
2 byte status word SW
data SW
Fides Aarts et al, Formal models of banking
cards for free, SECTEST 2013
42Test harness for EMV
- Our test harness implements standard EMV
instructions, eg - SELECT (to select application)
- INTERNAL AUTHENTICATE (for a challenge-response)
- VERIFY (to check the PIN code)
- READ RECORD
- GENERATE AC (to generate application cryptogram)
- LearnLib then tries to learn all possible
combinations - Most commands with fixed parameters, but some
with different options
43Maestro application on Volksbank bank card raw
result
44Maestro application on Volksbank bank
cardmerging arrows with identical outputs
45Maestro application on Volksbank cardmerging all
arrows with same start end state
46Formal models of banking cards for free!
- Experiments with Dutch, German and Swedish
banking and credit cards - Learning takes between 9 and 26 minutes
- Editing by hand to merge arrows and give sensible
names to states - could be automated
- Limitations
- We do not try to learn response to incorrect PIN
as cards would quickly block... - We cannot learn about one protocol step which
requires knowledge of cards secret 3DES key - We would also like to learns some integer
parameter used in protocol - No security problems found, but interesting
insight in implementations
47SecureCode application on Rabobank card
used for internet banking, hence entering PIN
with VERIFY obligatory
48understanding comparing implementations
- Are both implementations correct secure? And
compatible? - Presumably they both passed a Maestro-approved
compliance test suite...
Volksbank Maestro implementation
Rabobank Maestro implementation
49Differences between TLS implementations(work in
progress)
GnuTLS
OpenSSL
50Using such protocol state diagrams
- Analysing the models by hand, or with model
checker, for flaws - to see if all paths are correct secure
- Fuzzing or model-based testing
- using the diagram as basis for deeper fuzz
testing - eg fuzzing also parameters of commands
- which Erik Boss did for SSH
- Program verification
- proving that there is no functionality beyond
that in the diagram, which using testing you can
never establish - which we did for MIDPSSH, using ESC/Java2
- Using it when doing a manual code review
- which we did for OpenSSH
51 Learning human interfaces?
- We would like to extend such learning to also
take into account the human user interface
(keyboard display) - Then reverse engineering the state diagram of
an ATM or smartcard reader could be automated - Eg, security bug in ABN-AMROs e.dentifier2
could have been found by automated learning
Arjan Blom et al, Designed to Fail a
USB-connected reader for online banking, NORDSEC
2012
52Conclusions
- Various forms of fuzzing are great techniques to
spot some security flaws - More advanced forms of (protocol) fuzzing and
automated reverse engineering (or learning) are
closely related - State machines are a great specification
formalism - easy to draw on white boards, typically omitted
in official specs - and you can extract them for free from
implementations - using standard, off-the-shelf, tools like
LearnLib - Useful for security analysis of protocol
implementations - for reverse engineering, fuzz testing, code
reviews, or formal program verification
53Questions?