Title: Black Ops 2006 Viz Edition CCC 2006
1Black Ops 2006Viz EditionCCC 2006
- Dan Kaminsky
- Director Of Penetration Testing
- IOActive
2Thanks and No Thanks
- Thank You To Swissotel Amsterdam, who provided a
net connection with which I could actually finish
these slides - No Thanks to Delta Hotel of Amsterdam, which put
a TV on a really weak shelf. - I suppose its my fault I put my laptop
underneath. - The Star System is officially meaningless
3Who Am I?
- Coauthor of several book series
- Hack Proofing Your Network
- Stealing The Network
- Formerly of Cisco and Avaya
- Presently partnering with IOActive
- One of the Blue Hat Hackers that has been
auditing Windows Vista - Been doing talks for six years now
- TCP/IP, DNS, MD5, SSH, etc.
4What Are We Here To Do?
- Break TCP/IP A Little More
- Not in the documentation
- Its for a good cause )
- Analyze Data Linguistically
- Make Pretty Pretty Pictures!
5For Various Definitions Of PrettyVisual Bindiff
6The Ancient TongueTCP/IP
- Cant all be about pretty pictures ?
- A new problem has popped up Network oligopolies
are threatening to install firewalls that limit
or eliminate bandwidth on a per-company basis - Their own media services might be fast, others
will be slow - Their own VPN services might be fast, others will
be slow - Question Is it possible to detect and locate
devices violating network neutrality?
7Whats The Closest Tool We Have?
- Firewalk
- Mike Schiffmans Firewall Analysis Tool
- Packets elicit a ICMP Time Exceeded error if they
reach a router with TTL0 - TTL decremented by one for each hop, so you start
low, you can trace the route to a host - A firewalled packet wont live long enough to
reach TTL0 - So you can locate the firewall, and divine things
about its ruleset, based on when your packets
stop getting ICMP Time Exceeded
8Limitations of Firewalking
- But Firewalk tells us what, not who is
blockedand it tells us nothing about who is
allowed to go fast, and who is made to go slow - Suddenly, we devolve to a much older question
Is it possible to find out that a target firewall
is, or is not, blocking against or accepting
traffic from an arbitrary IP address?
9TCP Does Speed Measurement
- TCP speed analysis done blindly
- Endpoints do not negotiate with one another
- Everyone sends their packets, routers route what
they will. Endpoints need to adjust to what the
routers are willing to pass. - Routers communicate with endpoints by dropping
their packets - Can we combine this router backchannel w/
Firewalk?
10In From The Side
- What causes packets to drop?
- Too many packets
- What are we going to do?
- Send too many packets
- Two channels are set up
- A primary channel, which drops packets at some
known rate - A secondary channel, whose purpose it is to
interfere (or not) with the primary channel - When the secondary interferes with the primary,
we get feedback via the primary channel - The traffic composing the secondary channel can
come from anywhere, be composed of anything, and
can be TTLd just like in a normal firewalk.
11The TTL Channel
- Normally, you dont know which router along a
path is dropping your packets - ?
- If you are the source of the drop-inducing
packets, you can control how far your noise goes
out thus, you can discover which router is
hitting its limit / censoring your net connection - ?
12Scorchmarking
- Why Scorchmarking?
- Routers are burning packetsthose that get
through might have a scorch mark or two ? - Basic Model
- Client downloads a file from a site, at some
given speed negotiated via TCP. - At the same time, traffic is injected from
different IP addresses. This should cause drops. - If it doesnt, the network is either penalizing
the primary channel (easy to drop against) or
rewarding the secondary channel (resilient to
drops)
13Advanced Scorchmarking 0
- Having to depend on a client is lame
- Wouldnt it be nice if we could scan the Internet
for these servers? - What fundamental service is a receiving client
providing? - It is acknowledging our traffic letting us know
how much it received, and how many milliseconds
it took to receive it - Arent there other ways we could extract the same
data from hosts?
14Advanced Scorchmarking 1
- What else will acknowledge receiving traffic from
us? - TCP Servers
- Sting, from Stefan Savage, used this to great
effect - DNS Servers ?
- Routers.
- Supposedly, routers wont send more than a
certain number of ICMP Time Exceeded packets per
second - In reality, they seem to ICMP Time Exceeded ACK
however much you throw at them - Even if they didnt, you could use the difference
in ICMP Time Exceeded rates between Primary and
Secondary channel, to determine whether
interference was showing up. - Everyones got a NAT so you can query everyone
for whether certain sorts of traffic are being
blocked to them
15Advanced Scorchmarking 2
- So, yes.
- You can scan for violations of Network Neutrality
- You can find networks that are blocking or
passing particular IP ranges - Its not exactly efficient though
- Neutrality violations are easier to find than the
standard FW case - Firewalls are normally between the WAN and the
LAN (Slow Net - FW - Fast Net) - Neutrality violators are mid-WAN (Slow Net - Fw
- Slow Net - Fast Net) - Easier to overload the slow net after the
firewall - Boxes with max TTL rates override this
16Speed Limits
- Fundamental Problem Have to max out bandwidth
on the link to trigger the backchannel - No packets dropping, no data
- Means you have to DoS a link not scalable/legal
- Potential Solution Find capped acknowledgers
- The mythical ICMP Time Exceeded rate limit works
well - Primary and Secondary channel both eliciting
ITEs - When secondary channel gets a packet through, it
takes up a slot on the primary channels - ITE is perfect, since you can TTL limit any
packet - Depends on the firewall passing the primarys
ITEs - Maybe Linux / NATs actually implement rate
limits? - Another option What if we have code on the
client?
17Windows Media PlayerMore Than Just DRM. Really!
- Bulk Transfer RTP
- Runs over Unicast UDP
- Yes, the same Unicast UDP that penetrates NAT so
well! - Flow Control / Quality Monitoring RTCP
- No technical reason RTCP needs to go back to the
same address that RTP stream is coming from - So We pretend to provide media streams from all
sorts of sites, and use WMP to collect traffic
stats for us ? - It might work
18Symbols
- But this is not to be a talk on TCP/IP hackery
19SSHs Hex Problem
- ssh dan_at_blahThe authenticity of host 'blah
(1.2.3.4)' can't be established.RSA key
fingerprint is 09a9b19984177dbac655465a
17f88301.Are you sure you want to continue
connecting (yes/no)? - 09a9b1am I supposed to do something with this?
- Yes. According to SSHs design, youre supposed
to reject the proposed fingerprint if it looks
unfamiliar. (Seriously.) - The Two Billion SSH Key attack (by ADM) just
comes up with 2B keys and emits the visibly
closest key. It works.
20Hex sucks.A better mapping must be possible
21Cryptomnemonics
- There are three classes of memory, at least to
the degree as is useful in cryptography - Rejection Ive never seen that before
- Recognition Its that one, not that other one
- Recollection Let me describe it to you.
- SSH just requires rejection What? Thats
new. - Hex domain clearly does not work. What else is
available? - To restate the problem Humans do not operate on
hexadecimal symbols effectively. Are there any
other symbol sets we can use?
22Alternative Symbolic Domains
- Abstract Art via déjà vu
- Calculated faces viaPassfaces
- Both have attempted toaddress limited
capacityfor recollection by movingauthentication
to arecognition problem - But recognition offers onlya limited number of
bits9559049 - This is OK, since Passfaces isonline and thus
can lock a userout before 59K attempts are up - We are not online but we onlyneed to reject,
not recognizeand certainly not recollect
23The Nymic DomainNames Are Identity Symbols
- Humans dont remember arbitrary bits, but we do
remember stories. - Stories changes (the bits shift over time), but
names stay the same - Can we map the 160 bits SSH needs us to accept or
reject, to names? - Take 512 male names 9 bits of info per male
name - Take 1024 female names 10 bits of info per
female name - Take 8192 last names 13 bits of info per last
name - 9101332. 5 couples 160 bits
24Demo
- ssh dan_at_blahKey Data julio and epifania
dezzutti luther and rolande doornbos manual
and twyla imbesi dirk and cuc kolopajlo
omar and jeana hymelThe authenticity of host
'blah (1.2.3.4)' can't be established.Are you
sure you want to continue connecting (yes/no)? - It is critical that the Key Data be shown every
time theres a connection. The user must become
familiar with the characters in the story. - This actually seems to work.
25What about Bubble Babble?
- ssh-keygen.exe -B -f id_dsa.pub 1024
xegoz-tosys-vusik-masar-cifyc-cyled-kikih-zukuf-ny
pok-sezyt-noxax id_dsa.pub - Problem Humans do not remember arbitrary
sequences of syllables well - Names are special sequences sharing with
pre-existing language logic should improve
retention - Still, names are arbitrary (Bhoutros-Bhoutros
Ghali) could merge approachesXegoz and Tosys
VisukMasar and Cifyc CyledKikih and Zukuf
NypokSezyt Noxax - Requires testing
26Inverting The Symbol FlowPassnyms
- Suppose you have 8 characters with one of 64
characters in each slot. - aI713nM
- 6426, so (268) 48 bits
- Lowercase A, lowercase l, seven, dollar sign,
one, three, lower case n, upper case M - This is twenty three syllables!
- What if, instead, you typed
- dirk and cuc kolopajloomar and jeana hymel
- 64 bits of entropy, 14 syllables, can be spell
checked as user types it in
27It Is Easier To Interface With Systems When
Symbols Align
- Hacking is a form of interfacing ?
- We can break things with garbage symbols
- Dumb Fuzzing Take a file, flip some bits, see
what happens - We can break more things with meaningful symbols
used in unexpected ways - Smart Fuzzing Take a file, understand its
internal structure, fuzz the structure, see what
happens - Dumb fuzzing is very easy.
- Smart fuzzing is very labor intensiverequires
smart people, maybe specifications. - Is there any way we can automatically discover
symbol sets?
28File Formats Are Languages
- Kids dont get documentation when they learn new
languages. They just pick em up. - They can do this because they actually design all
sorts of internal structure and redundancy into
them. - Children make languages.
- Adults make working languages.
- Programmers make barely working languages.
- Lets autodiscover them!
29Nestce pas Non Sequitur
- Sequitur Linear Time Pattern Finder
- Creates hierarchal Context Free Grammars from
arbitrary input - Compression Algorithm in which you can look
under the covers to see whats going on - Created by Craig Neville-Manning as his PhD
thesis a decade ago - Hes now Chief Research Scientist at Google
30Syntax Highlighting For Hex Dumps
- Trivial Algorithm In a hierarchical grammar,
each byte requires traversing to a certain depth
in order to recover the raw literal. - Color each byte by how deep in the tree you have
to go.
31BLUR-O-VISION
32Whats Actually Going On?
- (0) - (73),b4,(73),ca,(73),e6,(73),02,(74),18,(
74),2c,(74),4a,(74),5c,(74),6e,(74),80,(74),98,(74
),b0,(74),c8,(74),e8,(74),fc,(74),10,(75),20,(75),
30,(75),40,(75),50,(75),64,(75),82,(75),90,(75),9e
,(75)(84),d6,(84),ee,(84),0c,(85),28,(85),3c,(8
5),4e,(85),66,(85),7e,(85),8c,(85),9e,(85),ac,(85)
,be,(85),ca,(85),ea,(85),08,(86),26,(86),44,(86),5
6,(86),6a,(86),7c,(86),8a,(86),a6,(86),b6,(86),cc,
(86),de,(86),02,(87) - Repeated sequence, single byte literal. Repeated
sequence, single byte literal. Rinse, lather,
repeat.
33Intersymbol Link Discovery
- Turns code on left intosymbolic set on
rightits easy then to linkthe symbols
togetheras per the graph. - This works for non-textual data
- Sequitur imputes meaningfulsymbols from
arbitrary inputdata
34Context Free Grammar FuzzerTHE CFG9000
- Reduce input data to a stream of symbols
- Fuzz data at the symbol level, rather than at
pure bytes - Shuffle
- Drop
- Repeat
- Uniform Corrupt
- Consistently corrupt all instances of a given
symbol - -
- Sequitur is not necessarily the best way to
generate a grammar. - Doesnt handle recursion, common in genomic data
- Suffix trees may yield better output
- Sequitur may scale better (100MB input not an
issue)
35Sample CFG9000 Output
- calculate_rule_usage(p-rulep-rulep-rulep-rule
p-rulep-rulep-rulep-rulep-rulep-rulep-rulep
-rulep-rulep-rule() - calculate_rule_usage(calculate_rule_usage(calculat
e_rule_usage(calculate_rule_usage(calculate_rule_u
sage(calculate_rule_usage(calculate_rule_usage(cal
culate_rule_usage(calculate_rule_usage(calculate_r
ule_usage(calculate_rule_usage(calculate_rule_usag
e(calculate_rule_usage(calculate_rule_usage(calcul
ate_rule_usage(calculate_rule_usage(calculate_rule
_usage(calculate_rule_usage(p-rule())
36Slashdot Fuzzed
37Slashdot Fuzzed (2)
38Its Not The Best CFG Fuzzing Ever
- Many physicists would agree that, had it not been
for congestion control, the evaluation of web
browsers might never have occurred. In fact, few
hackers worldwide would disagree with the
essential unification of voice-over-IP and public
private key pair. In order to solve this riddle,
we confirm that SMPs can be made stochastic,
cacheable, and interposable. - Rooter A Methodology for the Typical Unification
of Access Points and Redundancy - By A Context-Free Grammar Generating CompSci
Papers - Authors handcoded meaningful symbols in CompSci
speak. The eventual goal is the autogeneration
of symbol and inter-symbol patterns.
39Symbolic Discovery Is Inevitable
- An early inference procedure was described by
Chomsky and Miller (1957a), as reported in
Solomonoff (1959). Chomsky proposed a method for
detecting loops in finite state languages. The
approach requires a set of valid sentences, and
an oracle that determines whether a sentence is
in the language.The algorithm proceeds by
deleting part of a valid sentence and asking the
oracle whether the sentence is still valid. If it
is, the deleted part is reinserted into the
sequence and repeated, so that it appears twice.
If the sentence is still in the language, a cycle
has been detected. - Inferring Sequential Structure, Craig Neville
Manning, 1996 - This couldnt POSSIBLY be useful for building a
structure for a dumb fuzzer to operate against. - Instead of seeing if the parser crashes, just see
if it considers the input valid
40TODO
- Requitur Sequitur implementation optimized for
fuzzer use - Generate larger symbols
- No two byte symbols please were not trying to
compress, were trying to elucidate structure - Eliminate redundant symbols
- Keiffer-Yang optimization in 2001 If symbol
(x) symbol (y), then delete (y) and set all
instances of (y) to (x) - Need to do this to actually consistently fuzz all
instances of a particular trope - Possibly remove in-memory grammar requirement
- Use mechanisms from Ray, a out-of-memory variant
- Add foreign grammar capability
41Whats Out Now
- 8 Bit Clean Can Analyze Arbitrary Data
- Mergedot Can create graph from Sequitur output
42How To Think Of Sequitur
- Any time youre manipulating data as bytes, think
of manipulating it as symbols - Trigram histograms on bytes - Trigram histograms
on symbols - Bayesian probabilities on characters - Bayesian
probabilities on symbols - Adapt yourself to more than 256 codes per symbol
and reap the benefit - If your code is already Unicode aware you might
be one step ahead!
43Fuzzy Wuzzy Wuz A Symbol
- Symbol analysis systems (language translators,
etc) have issues w/ TMTOWTDI (Theres More Than
One Way To Do It) - Very similar messages can be encapsulated in very
different ways - Very similar messages can be encapsulated in very
similar, but not identical ways - Sequitur only handles exact matches fuzzy
grammar imputation doesnt appear to exist yet - Are there any systems for analyzing complex,
inequal but somewhat related sets of symbols?
44Another Approach DotPlots
- Popular mechanism in bioinformatics for visual
analysis of genomes. - Some attempts to apply dotplots outside of
bioinformatics - Textual analysis
- Audio
- Remembered an old paper, entitled Visualizing
Music And Audio Using Self-Similarity - Jonathan Foote from Xerox
- Brute Force solution compare songs to
themselves, splitting them into tiny chunks and
marking light for similar and dark for dissimilar - Disassociated Studio will do this for you
45Day Tripper from the BeatlesMusic shows
internal pattern.
46So does MPEG.
47What Exactly Are We Doing
- Jonathan HelmansDotPlot Patterns ALiteral
Look at PatternLanguages offers anintroduction - Instead of to, be, not etc, we use chunks of
data from arbitrary files - The same similarity metric used to disambiguate
names for the SSH hack, is used to measure
similarity here ?
48There are so many patterns we might see
49and no matter how much weve learned of this
pattern language
50???
51So How Might This Be Useful?
- A) Format Identification
- 1) Do different file formats appear different?
- 2) Do different instances of the same file
format appear similar? - 3) Does one format embedded in another make
itself apparent? - B) Fuzzer Guidance
- 1) Can we locate the actual byte offsets where
one section ends and another begins? - 2) Can we visualize and compare fuzzer
operations via Dotplots?
52Format Identification
- 1) Do different files appear different, and does
the appearance reflect the existence of internal
structure? - 2) Do different instances of the same file
format appear similar? - 3) Does one format embedded in another make
itself apparent?
53Java Class Files
54.NET Assemblies
55CNNs Home Page
56SMBTorture Traffic(Packets Note, Stop/Start Is
Visible)
57Kernel32.dll
58Chromosome 22(This is, after all, a genomics
hack)
59The Legend Of Zelda
60Format Identification
- 1) Do different files appear different, and does
the appearance reflect the existence of internal
structure? - Answer Yes. They do.
- 2) Do different instances of the same file
format appear similar? - 3) Does one format embedded in another make
itself apparent?
61Books from Project GutenbergConsistent
Despite Englishs low information content, lack
of even mildly related strings causes little
self-similarity across symbol clusters
62US CodeModerately Consistent
Legalese is a massively structured dialect.
Symbols appear in very distinct patterns that are
more reminiscent of machine code than text.
63HTMLConsistent
HTML repeats smaller symbols (tags) and larger
symbol clusters (via template engines) regularly.
This shows up visually as a tightly repeating
pattern.
64Java Class Files (Compared)Mildly Consistent
- Binary code (be it bytecode or x86) tends to be
very structured. Still, we are dependent on both
the content and the compiler to generate distinct
patterns.
65x86Consistent (In Sections)
x86 tends not to be handwritten as such complex
instructions are emitted in a highly structured
form.
66Exception?
- 64 kilobyte graphical demonstration
- Run through a packer ?
- Compression removes patterns
67NES Games
6502 Assembly Tends To Show Consistent Patterns,
But
68Mario Games Look Rather Different.
- Output is highly dependent on the compiler
- Output is highly dependent upon the actual
content - File formats are merely shells for actual
content. You are analyzing the content the
format is just syntactic sugar.
69Format Identification
- 1) Do different files appear different, and does
the appearance reflect the existence of internal
structure? - Answer Yes. They do.
- 2) Do different instances of the same file
format appear similar? - Answer Somewhat. Similar content looks like
itself, but youre measuring the fundamental
entropy of the underlying content, not the format
of the content itself. - 3) Does one format embedded in another make
itself apparent?
70File Formats Contain Multiple SubformatsAnother
Look At Kernel32.DLL
These are all different parts of Kernel32.
71Quickly Browsing Large FilesTilt-Shift View
- Instead of measuring absolute Y against absolute
X, make X relative - Advance through the file going down, look back a
number of bytes going right
72Complain All You Want.Hex Still Sucks.
73Format Identification
- 1) Do different files appear different, and does
the appearance reflect the existence of internal
structure? - Answer Yes. They do.
- 2) Do different instances of the same file
format appear similar? - Answer Somewhat. Similar content looks like
itself, but youre measuring the fundamental
entropy of the underlying content, not the format
of the content itself. - 3) Does one format embedded in another make
itself apparent? - Answer Yes. Multiple, distinct sections are
clearly visible in a way that hex cannot show.
74Fuzzer Guidance
- 1) Can we locate the actual byte offsets where
one section ends and another begins? - Why would we want to?
- Fuzzers break parsers.
- Many subformats to a format, many subparsers to a
parser - To a rough level of approximation, fuzzing a
single subformat lets you stress a single
subparser - So once we split a file up, we can selectively
attack one subparser at a time. - 2) Can we visualize and compare fuzzer
operations via Dotplots?
75Simple Math
We select an interesting blob from kernel32.dll.
The blob is at pixel offset 507x507, and is a
square around 570 pixels wide. Window size on viz
was 32. 50732 The interesting section starts
16224 bytes into the file. 57032 The
interesting section is 18240 bytes long.
76Whats The Actual Data?dd ifkernel32.dll bs1
skip16100 hexdump - more
77Using Hardcorr as a first knife to locate
interesting-to-fuzz regions
78Fuzzer Guidance
- 1) Can we locate the actual byte offsets where
one section ends and another begins? - Answer Yes. We can quickly route from the
image to the byte offset, through basic
arithmetic. - 2) Can we visualize and compare fuzzer
operations via Dotplots?
79Differentials
- Major use of dotplots in bioinformatics is to
compare one genome against another - Autocorrelation Compare A to A
- Cross-Correlation Compare A to B
- Most files are sufficiently dissimilar that not
very interesting structure shows up - Notable exception Different versions of the
same binary
80Visual Bindiff!
81MSVCR70.DLL v. MSVCR71.DLL
82FuzzersVery Broken Patchers ?
Mangle.C Single Bit Differences
CFG9000 Large Scale Reordering
83Fuzzer Guidance
- 1) Can we locate the actual byte offsets where
one section ends and another begins? - Answer Yes. We can quickly route from the
image to the byte offset, through basic
arithmetic. - 2) Can we visualize and compare fuzzer
operations via Dotplots? - Answer Yes visual diffing effectively shows
differences between files, including differences
introduced by various flavors of fuzzers.