Black Ops 2006 Viz Edition CCC 2006

About This Presentation

Title:

Black Ops 2006 Viz Edition CCC 2006

Description:

Client downloads a file from a site, at some given speed negotiated via TCP. ... By A Context-Free Grammar Generating CompSci Papers ... – PowerPoint PPT presentation

Number of Views:94

Avg rating:3.0/5.0

Slides: 84

Provided by: vdak

Category:

more less

Transcript and Presenter's Notes

Title: Black Ops 2006 Viz Edition CCC 2006

1
Black Ops 2006Viz EditionCCC 2006

Dan Kaminsky
Director Of Penetration Testing
IOActive

2
Thanks and No Thanks

Thank You To Swissotel Amsterdam, who provided a
net connection with which I could actually finish
these slides
No Thanks to Delta Hotel of Amsterdam, which put
a TV on a really weak shelf.
I suppose its my fault I put my laptop
underneath.
The Star System is officially meaningless

3
Who Am I?

Coauthor of several book series
Hack Proofing Your Network
Stealing The Network
Formerly of Cisco and Avaya
Presently partnering with IOActive
One of the Blue Hat Hackers that has been
auditing Windows Vista
Been doing talks for six years now
TCP/IP, DNS, MD5, SSH, etc.

4
What Are We Here To Do?

Break TCP/IP A Little More
Not in the documentation
Its for a good cause )
Analyze Data Linguistically
Make Pretty Pretty Pictures!

5
For Various Definitions Of PrettyVisual Bindiff
6
The Ancient TongueTCP/IP

Cant all be about pretty pictures ?
A new problem has popped up Network oligopolies
are threatening to install firewalls that limit
or eliminate bandwidth on a per-company basis
Their own media services might be fast, others
will be slow
Their own VPN services might be fast, others will
be slow
Question Is it possible to detect and locate
devices violating network neutrality?

7
Whats The Closest Tool We Have?

Firewalk
Mike Schiffmans Firewall Analysis Tool
Packets elicit a ICMP Time Exceeded error if they
reach a router with TTL0
TTL decremented by one for each hop, so you start
low, you can trace the route to a host
A firewalled packet wont live long enough to
reach TTL0
So you can locate the firewall, and divine things
about its ruleset, based on when your packets
stop getting ICMP Time Exceeded

8
Limitations of Firewalking

But Firewalk tells us what, not who is
blockedand it tells us nothing about who is
allowed to go fast, and who is made to go slow
Suddenly, we devolve to a much older question
Is it possible to find out that a target firewall
is, or is not, blocking against or accepting
traffic from an arbitrary IP address?

9
TCP Does Speed Measurement

TCP speed analysis done blindly
Endpoints do not negotiate with one another
Everyone sends their packets, routers route what
they will. Endpoints need to adjust to what the
routers are willing to pass.
Routers communicate with endpoints by dropping
their packets
Can we combine this router backchannel w/
Firewalk?

10
In From The Side

What causes packets to drop?
Too many packets
What are we going to do?
Send too many packets
Two channels are set up
A primary channel, which drops packets at some
known rate
A secondary channel, whose purpose it is to
interfere (or not) with the primary channel
When the secondary interferes with the primary,
we get feedback via the primary channel
The traffic composing the secondary channel can
come from anywhere, be composed of anything, and
can be TTLd just like in a normal firewalk.

11
The TTL Channel

Normally, you dont know which router along a
path is dropping your packets
?
If you are the source of the drop-inducing
packets, you can control how far your noise goes
out thus, you can discover which router is
hitting its limit / censoring your net connection
?

12
Scorchmarking

Why Scorchmarking?
Routers are burning packetsthose that get
through might have a scorch mark or two ?
Basic Model
Client downloads a file from a site, at some
given speed negotiated via TCP.
At the same time, traffic is injected from
different IP addresses. This should cause drops.
If it doesnt, the network is either penalizing
the primary channel (easy to drop against) or
rewarding the secondary channel (resilient to
drops)

13
Advanced Scorchmarking 0

Having to depend on a client is lame
Wouldnt it be nice if we could scan the Internet
for these servers?
What fundamental service is a receiving client
providing?
It is acknowledging our traffic letting us know
how much it received, and how many milliseconds
it took to receive it
Arent there other ways we could extract the same
data from hosts?

14
Advanced Scorchmarking 1

What else will acknowledge receiving traffic from
us?
TCP Servers
Sting, from Stefan Savage, used this to great
effect
DNS Servers ?
Routers.
Supposedly, routers wont send more than a
certain number of ICMP Time Exceeded packets per
second
In reality, they seem to ICMP Time Exceeded ACK
however much you throw at them
Even if they didnt, you could use the difference
in ICMP Time Exceeded rates between Primary and
Secondary channel, to determine whether
interference was showing up.
Everyones got a NAT so you can query everyone
for whether certain sorts of traffic are being
blocked to them

15
Advanced Scorchmarking 2

So, yes.
You can scan for violations of Network Neutrality
You can find networks that are blocking or
passing particular IP ranges
Its not exactly efficient though
Neutrality violations are easier to find than the
standard FW case
Firewalls are normally between the WAN and the
LAN (Slow Net - FW - Fast Net)
Neutrality violators are mid-WAN (Slow Net - Fw
- Slow Net - Fast Net)
Easier to overload the slow net after the
firewall
Boxes with max TTL rates override this

16
Speed Limits

Fundamental Problem Have to max out bandwidth
on the link to trigger the backchannel
No packets dropping, no data
Means you have to DoS a link not scalable/legal
Potential Solution Find capped acknowledgers
The mythical ICMP Time Exceeded rate limit works
well
Primary and Secondary channel both eliciting
ITEs
When secondary channel gets a packet through, it
takes up a slot on the primary channels
ITE is perfect, since you can TTL limit any
packet
Depends on the firewall passing the primarys
ITEs
Maybe Linux / NATs actually implement rate
limits?
Another option What if we have code on the
client?

17
Windows Media PlayerMore Than Just DRM. Really!

Bulk Transfer RTP
Runs over Unicast UDP
Yes, the same Unicast UDP that penetrates NAT so
well!
Flow Control / Quality Monitoring RTCP
No technical reason RTCP needs to go back to the
same address that RTP stream is coming from
So We pretend to provide media streams from all
sorts of sites, and use WMP to collect traffic
stats for us ?
It might work

18
Symbols

But this is not to be a talk on TCP/IP hackery

19
SSHs Hex Problem

ssh dan_at_blahThe authenticity of host 'blah
(1.2.3.4)' can't be established.RSA key
fingerprint is 09a9b19984177dbac655465a
17f88301.Are you sure you want to continue
connecting (yes/no)?
09a9b1am I supposed to do something with this?
Yes. According to SSHs design, youre supposed
to reject the proposed fingerprint if it looks
unfamiliar. (Seriously.)
The Two Billion SSH Key attack (by ADM) just
comes up with 2B keys and emits the visibly
closest key. It works.

20
Hex sucks.A better mapping must be possible
21
Cryptomnemonics

There are three classes of memory, at least to
the degree as is useful in cryptography
Rejection Ive never seen that before
Recognition Its that one, not that other one
Recollection Let me describe it to you.
SSH just requires rejection What? Thats
new.
Hex domain clearly does not work. What else is
available?
To restate the problem Humans do not operate on
hexadecimal symbols effectively. Are there any
other symbol sets we can use?

22
Alternative Symbolic Domains

Abstract Art via déjà vu
Calculated faces viaPassfaces
Both have attempted toaddress limited
capacityfor recollection by movingauthentication
to arecognition problem
But recognition offers onlya limited number of
bits9559049
This is OK, since Passfaces isonline and thus
can lock a userout before 59K attempts are up
We are not online but we onlyneed to reject,
not recognizeand certainly not recollect

23
The Nymic DomainNames Are Identity Symbols

Humans dont remember arbitrary bits, but we do
remember stories.
Stories changes (the bits shift over time), but
names stay the same
Can we map the 160 bits SSH needs us to accept or
reject, to names?
Take 512 male names 9 bits of info per male
name
Take 1024 female names 10 bits of info per
female name
Take 8192 last names 13 bits of info per last
name
9101332. 5 couples 160 bits

24
Demo

ssh dan_at_blahKey Data julio and epifania
dezzutti luther and rolande doornbos manual
and twyla imbesi dirk and cuc kolopajlo
omar and jeana hymelThe authenticity of host
'blah (1.2.3.4)' can't be established.Are you
sure you want to continue connecting (yes/no)?
It is critical that the Key Data be shown every
time theres a connection. The user must become
familiar with the characters in the story.
This actually seems to work.

25
What about Bubble Babble?

ssh-keygen.exe -B -f id_dsa.pub 1024
xegoz-tosys-vusik-masar-cifyc-cyled-kikih-zukuf-ny
pok-sezyt-noxax id_dsa.pub
Problem Humans do not remember arbitrary
sequences of syllables well
Names are special sequences sharing with
pre-existing language logic should improve
retention
Still, names are arbitrary (Bhoutros-Bhoutros
Ghali) could merge approachesXegoz and Tosys
VisukMasar and Cifyc CyledKikih and Zukuf
NypokSezyt Noxax
Requires testing

26
Inverting The Symbol FlowPassnyms

Suppose you have 8 characters with one of 64
characters in each slot.
aI713nM
6426, so (268) 48 bits
Lowercase A, lowercase l, seven, dollar sign,
one, three, lower case n, upper case M
This is twenty three syllables!
What if, instead, you typed
dirk and cuc kolopajloomar and jeana hymel
64 bits of entropy, 14 syllables, can be spell
checked as user types it in

27
It Is Easier To Interface With Systems When
Symbols Align

Hacking is a form of interfacing ?
We can break things with garbage symbols
Dumb Fuzzing Take a file, flip some bits, see
what happens
We can break more things with meaningful symbols
used in unexpected ways
Smart Fuzzing Take a file, understand its
internal structure, fuzz the structure, see what
happens
Dumb fuzzing is very easy.
Smart fuzzing is very labor intensiverequires
smart people, maybe specifications.
Is there any way we can automatically discover
symbol sets?

28
File Formats Are Languages

Kids dont get documentation when they learn new
languages. They just pick em up.
They can do this because they actually design all
sorts of internal structure and redundancy into
them.
Children make languages.
Adults make working languages.
Programmers make barely working languages.
Lets autodiscover them!

29
Nestce pas Non Sequitur

Sequitur Linear Time Pattern Finder
Creates hierarchal Context Free Grammars from
arbitrary input
Compression Algorithm in which you can look
under the covers to see whats going on
Created by Craig Neville-Manning as his PhD
thesis a decade ago
Hes now Chief Research Scientist at Google

30
Syntax Highlighting For Hex Dumps

Trivial Algorithm In a hierarchical grammar,
each byte requires traversing to a certain depth
in order to recover the raw literal.
Color each byte by how deep in the tree you have
to go.

31
BLUR-O-VISION
32
Whats Actually Going On?

(0) - (73),b4,(73),ca,(73),e6,(73),02,(74),18,(
74),2c,(74),4a,(74),5c,(74),6e,(74),80,(74),98,(74
),b0,(74),c8,(74),e8,(74),fc,(74),10,(75),20,(75),
30,(75),40,(75),50,(75),64,(75),82,(75),90,(75),9e
,(75)(84),d6,(84),ee,(84),0c,(85),28,(85),3c,(8
5),4e,(85),66,(85),7e,(85),8c,(85),9e,(85),ac,(85)
,be,(85),ca,(85),ea,(85),08,(86),26,(86),44,(86),5
6,(86),6a,(86),7c,(86),8a,(86),a6,(86),b6,(86),cc,
(86),de,(86),02,(87)
Repeated sequence, single byte literal. Repeated
sequence, single byte literal. Rinse, lather,
repeat.

33
Intersymbol Link Discovery

Turns code on left intosymbolic set on
rightits easy then to linkthe symbols
togetheras per the graph.
This works for non-textual data
Sequitur imputes meaningfulsymbols from
arbitrary inputdata

34
Context Free Grammar FuzzerTHE CFG9000

Reduce input data to a stream of symbols
Fuzz data at the symbol level, rather than at
pure bytes
Shuffle
Drop
Repeat
Uniform Corrupt
Consistently corrupt all instances of a given
symbol
-
Sequitur is not necessarily the best way to
generate a grammar.
Doesnt handle recursion, common in genomic data
Suffix trees may yield better output
Sequitur may scale better (100MB input not an
issue)

35
Sample CFG9000 Output

calculate_rule_usage(p-rulep-rulep-rulep-rule
p-rulep-rulep-rulep-rulep-rulep-rulep-rulep
-rulep-rulep-rule()
calculate_rule_usage(calculate_rule_usage(calculat
e_rule_usage(calculate_rule_usage(calculate_rule_u
sage(calculate_rule_usage(calculate_rule_usage(cal
culate_rule_usage(calculate_rule_usage(calculate_r
ule_usage(calculate_rule_usage(calculate_rule_usag
e(calculate_rule_usage(calculate_rule_usage(calcul
ate_rule_usage(calculate_rule_usage(calculate_rule
_usage(calculate_rule_usage(p-rule())

36
Slashdot Fuzzed
37
Slashdot Fuzzed (2)
38
Its Not The Best CFG Fuzzing Ever

Many physicists would agree that, had it not been
for congestion control, the evaluation of web
browsers might never have occurred. In fact, few
hackers worldwide would disagree with the
essential unification of voice-over-IP and public
private key pair. In order to solve this riddle,
we confirm that SMPs can be made stochastic,
cacheable, and interposable.
Rooter A Methodology for the Typical Unification
of Access Points and Redundancy
By A Context-Free Grammar Generating CompSci
Papers
Authors handcoded meaningful symbols in CompSci
speak. The eventual goal is the autogeneration
of symbol and inter-symbol patterns.

39
Symbolic Discovery Is Inevitable

An early inference procedure was described by
Chomsky and Miller (1957a), as reported in
Solomonoff (1959). Chomsky proposed a method for
detecting loops in finite state languages. The
approach requires a set of valid sentences, and
an oracle that determines whether a sentence is
in the language.The algorithm proceeds by
deleting part of a valid sentence and asking the
oracle whether the sentence is still valid. If it
is, the deleted part is reinserted into the
sequence and repeated, so that it appears twice.
If the sentence is still in the language, a cycle
has been detected.
Inferring Sequential Structure, Craig Neville
Manning, 1996
This couldnt POSSIBLY be useful for building a
structure for a dumb fuzzer to operate against.
Instead of seeing if the parser crashes, just see
if it considers the input valid

40
TODO

Requitur Sequitur implementation optimized for
fuzzer use
Generate larger symbols
No two byte symbols please were not trying to
compress, were trying to elucidate structure
Eliminate redundant symbols
Keiffer-Yang optimization in 2001 If symbol
(x) symbol (y), then delete (y) and set all
instances of (y) to (x)
Need to do this to actually consistently fuzz all
instances of a particular trope
Possibly remove in-memory grammar requirement
Use mechanisms from Ray, a out-of-memory variant
Add foreign grammar capability

41
Whats Out Now

8 Bit Clean Can Analyze Arbitrary Data
Mergedot Can create graph from Sequitur output

42
How To Think Of Sequitur

Any time youre manipulating data as bytes, think
of manipulating it as symbols
Trigram histograms on bytes - Trigram histograms
on symbols
Bayesian probabilities on characters - Bayesian
probabilities on symbols
Adapt yourself to more than 256 codes per symbol
and reap the benefit
If your code is already Unicode aware you might
be one step ahead!

43
Fuzzy Wuzzy Wuz A Symbol

Symbol analysis systems (language translators,
etc) have issues w/ TMTOWTDI (Theres More Than
One Way To Do It)
Very similar messages can be encapsulated in very
different ways
Very similar messages can be encapsulated in very
similar, but not identical ways
Sequitur only handles exact matches fuzzy
grammar imputation doesnt appear to exist yet
Are there any systems for analyzing complex,
inequal but somewhat related sets of symbols?

44
Another Approach DotPlots

Popular mechanism in bioinformatics for visual
analysis of genomes.
Some attempts to apply dotplots outside of
bioinformatics
Textual analysis
Audio
Remembered an old paper, entitled Visualizing
Music And Audio Using Self-Similarity
Jonathan Foote from Xerox
Brute Force solution compare songs to
themselves, splitting them into tiny chunks and
marking light for similar and dark for dissimilar
Disassociated Studio will do this for you

45
Day Tripper from the BeatlesMusic shows
internal pattern.

46
So does MPEG.
47
What Exactly Are We Doing

Jonathan HelmansDotPlot Patterns ALiteral
Look at PatternLanguages offers anintroduction
Instead of to, be, not etc, we use chunks of
data from arbitrary files
The same similarity metric used to disambiguate
names for the SSH hack, is used to measure
similarity here ?

48
There are so many patterns we might see
49
and no matter how much weve learned of this
pattern language
50
???
51
So How Might This Be Useful?

A) Format Identification
1) Do different file formats appear different?
2) Do different instances of the same file
format appear similar?
3) Does one format embedded in another make
itself apparent?
B) Fuzzer Guidance
1) Can we locate the actual byte offsets where
one section ends and another begins?
2) Can we visualize and compare fuzzer
operations via Dotplots?

52
Format Identification

1) Do different files appear different, and does
the appearance reflect the existence of internal
structure?
2) Do different instances of the same file
format appear similar?
3) Does one format embedded in another make
itself apparent?

53
Java Class Files
54
.NET Assemblies
55
CNNs Home Page
56
SMBTorture Traffic(Packets Note, Stop/Start Is
Visible)
57
Kernel32.dll
58
Chromosome 22(This is, after all, a genomics
hack)
59
The Legend Of Zelda
60
Format Identification

1) Do different files appear different, and does
the appearance reflect the existence of internal
structure?
Answer Yes. They do.
2) Do different instances of the same file
format appear similar?
3) Does one format embedded in another make
itself apparent?

61
Books from Project GutenbergConsistent
Despite Englishs low information content, lack
of even mildly related strings causes little
self-similarity across symbol clusters
62
US CodeModerately Consistent
Legalese is a massively structured dialect.
Symbols appear in very distinct patterns that are
more reminiscent of machine code than text.
63
HTMLConsistent
HTML repeats smaller symbols (tags) and larger
symbol clusters (via template engines) regularly.
This shows up visually as a tightly repeating
pattern.
64
Java Class Files (Compared)Mildly Consistent

Binary code (be it bytecode or x86) tends to be
very structured. Still, we are dependent on both
the content and the compiler to generate distinct
patterns.

65
x86Consistent (In Sections)
x86 tends not to be handwritten as such complex
instructions are emitted in a highly structured
form.
66
Exception?

64 kilobyte graphical demonstration
Run through a packer ?
Compression removes patterns

67
NES Games
6502 Assembly Tends To Show Consistent Patterns,
But
68
Mario Games Look Rather Different.

Output is highly dependent on the compiler
Output is highly dependent upon the actual
content
File formats are merely shells for actual
content. You are analyzing the content the
format is just syntactic sugar.

69
Format Identification

1) Do different files appear different, and does
the appearance reflect the existence of internal
structure?
Answer Yes. They do.
2) Do different instances of the same file
format appear similar?
Answer Somewhat. Similar content looks like
itself, but youre measuring the fundamental
entropy of the underlying content, not the format
of the content itself.
3) Does one format embedded in another make
itself apparent?

70
File Formats Contain Multiple SubformatsAnother
Look At Kernel32.DLL
These are all different parts of Kernel32.
71
Quickly Browsing Large FilesTilt-Shift View

Instead of measuring absolute Y against absolute
X, make X relative
Advance through the file going down, look back a
number of bytes going right

72
Complain All You Want.Hex Still Sucks.
73
Format Identification

1) Do different files appear different, and does
the appearance reflect the existence of internal
structure?
Answer Yes. They do.
2) Do different instances of the same file
format appear similar?
Answer Somewhat. Similar content looks like
itself, but youre measuring the fundamental
entropy of the underlying content, not the format
of the content itself.
3) Does one format embedded in another make
itself apparent?
Answer Yes. Multiple, distinct sections are
clearly visible in a way that hex cannot show.

74
Fuzzer Guidance

1) Can we locate the actual byte offsets where
one section ends and another begins?
Why would we want to?
Fuzzers break parsers.
Many subformats to a format, many subparsers to a
parser
To a rough level of approximation, fuzzing a
single subformat lets you stress a single
subparser
So once we split a file up, we can selectively
attack one subparser at a time.
2) Can we visualize and compare fuzzer
operations via Dotplots?

75
Simple Math
We select an interesting blob from kernel32.dll.
The blob is at pixel offset 507x507, and is a
square around 570 pixels wide. Window size on viz
was 32. 50732 The interesting section starts
16224 bytes into the file. 57032 The
interesting section is 18240 bytes long.
76
Whats The Actual Data?dd ifkernel32.dll bs1
skip16100 hexdump - more
77
Using Hardcorr as a first knife to locate
interesting-to-fuzz regions
78
Fuzzer Guidance

1) Can we locate the actual byte offsets where
one section ends and another begins?
Answer Yes. We can quickly route from the
image to the byte offset, through basic
arithmetic.
2) Can we visualize and compare fuzzer
operations via Dotplots?

79
Differentials

Major use of dotplots in bioinformatics is to
compare one genome against another
Autocorrelation Compare A to A
Cross-Correlation Compare A to B
Most files are sufficiently dissimilar that not
very interesting structure shows up
Notable exception Different versions of the
same binary

80
Visual Bindiff!
81
MSVCR70.DLL v. MSVCR71.DLL
82
FuzzersVery Broken Patchers ?
Mangle.C Single Bit Differences
CFG9000 Large Scale Reordering
83
Fuzzer Guidance

1) Can we locate the actual byte offsets where
one section ends and another begins?
Answer Yes. We can quickly route from the
image to the byte offset, through basic
arithmetic.
2) Can we visualize and compare fuzzer
operations via Dotplots?
Answer Yes visual diffing effectively shows
differences between files, including differences
introduced by various flavors of fuzzers.