Black Ops 2006 Viz Edition CCC 2006 - PowerPoint PPT Presentation

1 / 83
About This Presentation
Title:

Black Ops 2006 Viz Edition CCC 2006

Description:

Client downloads a file from a site, at some given speed negotiated via TCP. ... By A Context-Free Grammar Generating CompSci Papers ... – PowerPoint PPT presentation

Number of Views:96
Avg rating:3.0/5.0
Slides: 84
Provided by: vdak
Category:
Tags: ccc | black | downloads | edition | free | ops | song | viz

less

Transcript and Presenter's Notes

Title: Black Ops 2006 Viz Edition CCC 2006


1
Black Ops 2006Viz EditionCCC 2006
  • Dan Kaminsky
  • Director Of Penetration Testing
  • IOActive

2
Thanks and No Thanks
  • Thank You To Swissotel Amsterdam, who provided a
    net connection with which I could actually finish
    these slides
  • No Thanks to Delta Hotel of Amsterdam, which put
    a TV on a really weak shelf.
  • I suppose its my fault I put my laptop
    underneath.
  • The Star System is officially meaningless

3
Who Am I?
  • Coauthor of several book series
  • Hack Proofing Your Network
  • Stealing The Network
  • Formerly of Cisco and Avaya
  • Presently partnering with IOActive
  • One of the Blue Hat Hackers that has been
    auditing Windows Vista
  • Been doing talks for six years now
  • TCP/IP, DNS, MD5, SSH, etc.

4
What Are We Here To Do?
  • Break TCP/IP A Little More
  • Not in the documentation
  • Its for a good cause )
  • Analyze Data Linguistically
  • Make Pretty Pretty Pictures!

5
For Various Definitions Of PrettyVisual Bindiff
6
The Ancient TongueTCP/IP
  • Cant all be about pretty pictures ?
  • A new problem has popped up Network oligopolies
    are threatening to install firewalls that limit
    or eliminate bandwidth on a per-company basis
  • Their own media services might be fast, others
    will be slow
  • Their own VPN services might be fast, others will
    be slow
  • Question Is it possible to detect and locate
    devices violating network neutrality?

7
Whats The Closest Tool We Have?
  • Firewalk
  • Mike Schiffmans Firewall Analysis Tool
  • Packets elicit a ICMP Time Exceeded error if they
    reach a router with TTL0
  • TTL decremented by one for each hop, so you start
    low, you can trace the route to a host
  • A firewalled packet wont live long enough to
    reach TTL0
  • So you can locate the firewall, and divine things
    about its ruleset, based on when your packets
    stop getting ICMP Time Exceeded

8
Limitations of Firewalking
  • But Firewalk tells us what, not who is
    blockedand it tells us nothing about who is
    allowed to go fast, and who is made to go slow
  • Suddenly, we devolve to a much older question
    Is it possible to find out that a target firewall
    is, or is not, blocking against or accepting
    traffic from an arbitrary IP address?

9
TCP Does Speed Measurement
  • TCP speed analysis done blindly
  • Endpoints do not negotiate with one another
  • Everyone sends their packets, routers route what
    they will. Endpoints need to adjust to what the
    routers are willing to pass.
  • Routers communicate with endpoints by dropping
    their packets
  • Can we combine this router backchannel w/
    Firewalk?

10
In From The Side
  • What causes packets to drop?
  • Too many packets
  • What are we going to do?
  • Send too many packets
  • Two channels are set up
  • A primary channel, which drops packets at some
    known rate
  • A secondary channel, whose purpose it is to
    interfere (or not) with the primary channel
  • When the secondary interferes with the primary,
    we get feedback via the primary channel
  • The traffic composing the secondary channel can
    come from anywhere, be composed of anything, and
    can be TTLd just like in a normal firewalk.

11
The TTL Channel
  • Normally, you dont know which router along a
    path is dropping your packets
  • ?
  • If you are the source of the drop-inducing
    packets, you can control how far your noise goes
    out thus, you can discover which router is
    hitting its limit / censoring your net connection
  • ?

12
Scorchmarking
  • Why Scorchmarking?
  • Routers are burning packetsthose that get
    through might have a scorch mark or two ?
  • Basic Model
  • Client downloads a file from a site, at some
    given speed negotiated via TCP.
  • At the same time, traffic is injected from
    different IP addresses. This should cause drops.
  • If it doesnt, the network is either penalizing
    the primary channel (easy to drop against) or
    rewarding the secondary channel (resilient to
    drops)

13
Advanced Scorchmarking 0
  • Having to depend on a client is lame
  • Wouldnt it be nice if we could scan the Internet
    for these servers?
  • What fundamental service is a receiving client
    providing?
  • It is acknowledging our traffic letting us know
    how much it received, and how many milliseconds
    it took to receive it
  • Arent there other ways we could extract the same
    data from hosts?

14
Advanced Scorchmarking 1
  • What else will acknowledge receiving traffic from
    us?
  • TCP Servers
  • Sting, from Stefan Savage, used this to great
    effect
  • DNS Servers ?
  • Routers.
  • Supposedly, routers wont send more than a
    certain number of ICMP Time Exceeded packets per
    second
  • In reality, they seem to ICMP Time Exceeded ACK
    however much you throw at them
  • Even if they didnt, you could use the difference
    in ICMP Time Exceeded rates between Primary and
    Secondary channel, to determine whether
    interference was showing up.
  • Everyones got a NAT so you can query everyone
    for whether certain sorts of traffic are being
    blocked to them

15
Advanced Scorchmarking 2
  • So, yes.
  • You can scan for violations of Network Neutrality
  • You can find networks that are blocking or
    passing particular IP ranges
  • Its not exactly efficient though
  • Neutrality violations are easier to find than the
    standard FW case
  • Firewalls are normally between the WAN and the
    LAN (Slow Net - FW - Fast Net)
  • Neutrality violators are mid-WAN (Slow Net - Fw
    - Slow Net - Fast Net)
  • Easier to overload the slow net after the
    firewall
  • Boxes with max TTL rates override this

16
Speed Limits
  • Fundamental Problem Have to max out bandwidth
    on the link to trigger the backchannel
  • No packets dropping, no data
  • Means you have to DoS a link not scalable/legal
  • Potential Solution Find capped acknowledgers
  • The mythical ICMP Time Exceeded rate limit works
    well
  • Primary and Secondary channel both eliciting
    ITEs
  • When secondary channel gets a packet through, it
    takes up a slot on the primary channels
  • ITE is perfect, since you can TTL limit any
    packet
  • Depends on the firewall passing the primarys
    ITEs
  • Maybe Linux / NATs actually implement rate
    limits?
  • Another option What if we have code on the
    client?

17
Windows Media PlayerMore Than Just DRM. Really!
  • Bulk Transfer RTP
  • Runs over Unicast UDP
  • Yes, the same Unicast UDP that penetrates NAT so
    well!
  • Flow Control / Quality Monitoring RTCP
  • No technical reason RTCP needs to go back to the
    same address that RTP stream is coming from
  • So We pretend to provide media streams from all
    sorts of sites, and use WMP to collect traffic
    stats for us ?
  • It might work

18
Symbols
  • But this is not to be a talk on TCP/IP hackery

19
SSHs Hex Problem
  • ssh dan_at_blahThe authenticity of host 'blah
    (1.2.3.4)' can't be established.RSA key
    fingerprint is 09a9b19984177dbac655465a
    17f88301.Are you sure you want to continue
    connecting (yes/no)?
  • 09a9b1am I supposed to do something with this?
  • Yes. According to SSHs design, youre supposed
    to reject the proposed fingerprint if it looks
    unfamiliar. (Seriously.)
  • The Two Billion SSH Key attack (by ADM) just
    comes up with 2B keys and emits the visibly
    closest key. It works.

20
Hex sucks.A better mapping must be possible
21
Cryptomnemonics
  • There are three classes of memory, at least to
    the degree as is useful in cryptography
  • Rejection Ive never seen that before
  • Recognition Its that one, not that other one
  • Recollection Let me describe it to you.
  • SSH just requires rejection What? Thats
    new.
  • Hex domain clearly does not work. What else is
    available?
  • To restate the problem Humans do not operate on
    hexadecimal symbols effectively. Are there any
    other symbol sets we can use?

22
Alternative Symbolic Domains
  • Abstract Art via déjà vu
  • Calculated faces viaPassfaces
  • Both have attempted toaddress limited
    capacityfor recollection by movingauthentication
    to arecognition problem
  • But recognition offers onlya limited number of
    bits9559049
  • This is OK, since Passfaces isonline and thus
    can lock a userout before 59K attempts are up
  • We are not online but we onlyneed to reject,
    not recognizeand certainly not recollect

23
The Nymic DomainNames Are Identity Symbols
  • Humans dont remember arbitrary bits, but we do
    remember stories.
  • Stories changes (the bits shift over time), but
    names stay the same
  • Can we map the 160 bits SSH needs us to accept or
    reject, to names?
  • Take 512 male names 9 bits of info per male
    name
  • Take 1024 female names 10 bits of info per
    female name
  • Take 8192 last names 13 bits of info per last
    name
  • 9101332. 5 couples 160 bits

24
Demo
  • ssh dan_at_blahKey Data julio and epifania
    dezzutti luther and rolande doornbos manual
    and twyla imbesi dirk and cuc kolopajlo
    omar and jeana hymelThe authenticity of host
    'blah (1.2.3.4)' can't be established.Are you
    sure you want to continue connecting (yes/no)?
  • It is critical that the Key Data be shown every
    time theres a connection. The user must become
    familiar with the characters in the story.
  • This actually seems to work.

25
What about Bubble Babble?
  • ssh-keygen.exe -B -f id_dsa.pub 1024
    xegoz-tosys-vusik-masar-cifyc-cyled-kikih-zukuf-ny
    pok-sezyt-noxax id_dsa.pub
  • Problem Humans do not remember arbitrary
    sequences of syllables well
  • Names are special sequences sharing with
    pre-existing language logic should improve
    retention
  • Still, names are arbitrary (Bhoutros-Bhoutros
    Ghali) could merge approachesXegoz and Tosys
    VisukMasar and Cifyc CyledKikih and Zukuf
    NypokSezyt Noxax
  • Requires testing

26
Inverting The Symbol FlowPassnyms
  • Suppose you have 8 characters with one of 64
    characters in each slot.
  • aI713nM
  • 6426, so (268) 48 bits
  • Lowercase A, lowercase l, seven, dollar sign,
    one, three, lower case n, upper case M
  • This is twenty three syllables!
  • What if, instead, you typed
  • dirk and cuc kolopajloomar and jeana hymel
  • 64 bits of entropy, 14 syllables, can be spell
    checked as user types it in

27
It Is Easier To Interface With Systems When
Symbols Align
  • Hacking is a form of interfacing ?
  • We can break things with garbage symbols
  • Dumb Fuzzing Take a file, flip some bits, see
    what happens
  • We can break more things with meaningful symbols
    used in unexpected ways
  • Smart Fuzzing Take a file, understand its
    internal structure, fuzz the structure, see what
    happens
  • Dumb fuzzing is very easy.
  • Smart fuzzing is very labor intensiverequires
    smart people, maybe specifications.
  • Is there any way we can automatically discover
    symbol sets?

28
File Formats Are Languages
  • Kids dont get documentation when they learn new
    languages. They just pick em up.
  • They can do this because they actually design all
    sorts of internal structure and redundancy into
    them.
  • Children make languages.
  • Adults make working languages.
  • Programmers make barely working languages.
  • Lets autodiscover them!

29
Nestce pas Non Sequitur
  • Sequitur Linear Time Pattern Finder
  • Creates hierarchal Context Free Grammars from
    arbitrary input
  • Compression Algorithm in which you can look
    under the covers to see whats going on
  • Created by Craig Neville-Manning as his PhD
    thesis a decade ago
  • Hes now Chief Research Scientist at Google

30
Syntax Highlighting For Hex Dumps
  • Trivial Algorithm In a hierarchical grammar,
    each byte requires traversing to a certain depth
    in order to recover the raw literal.
  • Color each byte by how deep in the tree you have
    to go.

31
BLUR-O-VISION
32
Whats Actually Going On?
  • (0) - (73),b4,(73),ca,(73),e6,(73),02,(74),18,(
    74),2c,(74),4a,(74),5c,(74),6e,(74),80,(74),98,(74
    ),b0,(74),c8,(74),e8,(74),fc,(74),10,(75),20,(75),
    30,(75),40,(75),50,(75),64,(75),82,(75),90,(75),9e
    ,(75)(84),d6,(84),ee,(84),0c,(85),28,(85),3c,(8
    5),4e,(85),66,(85),7e,(85),8c,(85),9e,(85),ac,(85)
    ,be,(85),ca,(85),ea,(85),08,(86),26,(86),44,(86),5
    6,(86),6a,(86),7c,(86),8a,(86),a6,(86),b6,(86),cc,
    (86),de,(86),02,(87)
  • Repeated sequence, single byte literal. Repeated
    sequence, single byte literal. Rinse, lather,
    repeat.

33
Intersymbol Link Discovery
  • Turns code on left intosymbolic set on
    rightits easy then to linkthe symbols
    togetheras per the graph.
  • This works for non-textual data
  • Sequitur imputes meaningfulsymbols from
    arbitrary inputdata

34
Context Free Grammar FuzzerTHE CFG9000
  • Reduce input data to a stream of symbols
  • Fuzz data at the symbol level, rather than at
    pure bytes
  • Shuffle
  • Drop
  • Repeat
  • Uniform Corrupt
  • Consistently corrupt all instances of a given
    symbol
  • -
  • Sequitur is not necessarily the best way to
    generate a grammar.
  • Doesnt handle recursion, common in genomic data
  • Suffix trees may yield better output
  • Sequitur may scale better (100MB input not an
    issue)

35
Sample CFG9000 Output
  • calculate_rule_usage(p-rulep-rulep-rulep-rule
    p-rulep-rulep-rulep-rulep-rulep-rulep-rulep
    -rulep-rulep-rule()
  • calculate_rule_usage(calculate_rule_usage(calculat
    e_rule_usage(calculate_rule_usage(calculate_rule_u
    sage(calculate_rule_usage(calculate_rule_usage(cal
    culate_rule_usage(calculate_rule_usage(calculate_r
    ule_usage(calculate_rule_usage(calculate_rule_usag
    e(calculate_rule_usage(calculate_rule_usage(calcul
    ate_rule_usage(calculate_rule_usage(calculate_rule
    _usage(calculate_rule_usage(p-rule())

36
Slashdot Fuzzed
37
Slashdot Fuzzed (2)
38
Its Not The Best CFG Fuzzing Ever
  • Many physicists would agree that, had it not been
    for congestion control, the evaluation of web
    browsers might never have occurred. In fact, few
    hackers worldwide would disagree with the
    essential unification of voice-over-IP and public
    private key pair. In order to solve this riddle,
    we confirm that SMPs can be made stochastic,
    cacheable, and interposable.
  • Rooter A Methodology for the Typical Unification
    of Access Points and Redundancy
  • By A Context-Free Grammar Generating CompSci
    Papers
  • Authors handcoded meaningful symbols in CompSci
    speak. The eventual goal is the autogeneration
    of symbol and inter-symbol patterns.

39
Symbolic Discovery Is Inevitable
  • An early inference procedure was described by
    Chomsky and Miller (1957a), as reported in
    Solomonoff (1959). Chomsky proposed a method for
    detecting loops in finite state languages. The
    approach requires a set of valid sentences, and
    an oracle that determines whether a sentence is
    in the language.The algorithm proceeds by
    deleting part of a valid sentence and asking the
    oracle whether the sentence is still valid. If it
    is, the deleted part is reinserted into the
    sequence and repeated, so that it appears twice.
    If the sentence is still in the language, a cycle
    has been detected.
  • Inferring Sequential Structure, Craig Neville
    Manning, 1996
  • This couldnt POSSIBLY be useful for building a
    structure for a dumb fuzzer to operate against.
  • Instead of seeing if the parser crashes, just see
    if it considers the input valid

40
TODO
  • Requitur Sequitur implementation optimized for
    fuzzer use
  • Generate larger symbols
  • No two byte symbols please were not trying to
    compress, were trying to elucidate structure
  • Eliminate redundant symbols
  • Keiffer-Yang optimization in 2001 If symbol
    (x) symbol (y), then delete (y) and set all
    instances of (y) to (x)
  • Need to do this to actually consistently fuzz all
    instances of a particular trope
  • Possibly remove in-memory grammar requirement
  • Use mechanisms from Ray, a out-of-memory variant
  • Add foreign grammar capability

41
Whats Out Now
  • 8 Bit Clean Can Analyze Arbitrary Data
  • Mergedot Can create graph from Sequitur output

42
How To Think Of Sequitur
  • Any time youre manipulating data as bytes, think
    of manipulating it as symbols
  • Trigram histograms on bytes - Trigram histograms
    on symbols
  • Bayesian probabilities on characters - Bayesian
    probabilities on symbols
  • Adapt yourself to more than 256 codes per symbol
    and reap the benefit
  • If your code is already Unicode aware you might
    be one step ahead!

43
Fuzzy Wuzzy Wuz A Symbol
  • Symbol analysis systems (language translators,
    etc) have issues w/ TMTOWTDI (Theres More Than
    One Way To Do It)
  • Very similar messages can be encapsulated in very
    different ways
  • Very similar messages can be encapsulated in very
    similar, but not identical ways
  • Sequitur only handles exact matches fuzzy
    grammar imputation doesnt appear to exist yet
  • Are there any systems for analyzing complex,
    inequal but somewhat related sets of symbols?

44
Another Approach DotPlots
  • Popular mechanism in bioinformatics for visual
    analysis of genomes.
  • Some attempts to apply dotplots outside of
    bioinformatics
  • Textual analysis
  • Audio
  • Remembered an old paper, entitled Visualizing
    Music And Audio Using Self-Similarity
  • Jonathan Foote from Xerox
  • Brute Force solution compare songs to
    themselves, splitting them into tiny chunks and
    marking light for similar and dark for dissimilar
  • Disassociated Studio will do this for you

45
Day Tripper from the BeatlesMusic shows
internal pattern.

46
So does MPEG.
47
What Exactly Are We Doing
  • Jonathan HelmansDotPlot Patterns ALiteral
    Look at PatternLanguages offers anintroduction
  • Instead of to, be, not etc, we use chunks of
    data from arbitrary files
  • The same similarity metric used to disambiguate
    names for the SSH hack, is used to measure
    similarity here ?

48
There are so many patterns we might see
49
and no matter how much weve learned of this
pattern language
50
???
51
So How Might This Be Useful?
  • A) Format Identification
  • 1) Do different file formats appear different?
  • 2) Do different instances of the same file
    format appear similar?
  • 3) Does one format embedded in another make
    itself apparent?
  • B) Fuzzer Guidance
  • 1) Can we locate the actual byte offsets where
    one section ends and another begins?
  • 2) Can we visualize and compare fuzzer
    operations via Dotplots?

52
Format Identification
  • 1) Do different files appear different, and does
    the appearance reflect the existence of internal
    structure?
  • 2) Do different instances of the same file
    format appear similar?
  • 3) Does one format embedded in another make
    itself apparent?

53
Java Class Files
54
.NET Assemblies
55
CNNs Home Page
56
SMBTorture Traffic(Packets Note, Stop/Start Is
Visible)
57
Kernel32.dll
58
Chromosome 22(This is, after all, a genomics
hack)
59
The Legend Of Zelda
60
Format Identification
  • 1) Do different files appear different, and does
    the appearance reflect the existence of internal
    structure?
  • Answer Yes. They do.
  • 2) Do different instances of the same file
    format appear similar?
  • 3) Does one format embedded in another make
    itself apparent?

61
Books from Project GutenbergConsistent
Despite Englishs low information content, lack
of even mildly related strings causes little
self-similarity across symbol clusters
62
US CodeModerately Consistent
Legalese is a massively structured dialect.
Symbols appear in very distinct patterns that are
more reminiscent of machine code than text.
63
HTMLConsistent
HTML repeats smaller symbols (tags) and larger
symbol clusters (via template engines) regularly.
This shows up visually as a tightly repeating
pattern.
64
Java Class Files (Compared)Mildly Consistent
  • Binary code (be it bytecode or x86) tends to be
    very structured. Still, we are dependent on both
    the content and the compiler to generate distinct
    patterns.

65
x86Consistent (In Sections)
x86 tends not to be handwritten as such complex
instructions are emitted in a highly structured
form.
66
Exception?
  • 64 kilobyte graphical demonstration
  • Run through a packer ?
  • Compression removes patterns

67
NES Games
6502 Assembly Tends To Show Consistent Patterns,
But
68
Mario Games Look Rather Different.
  • Output is highly dependent on the compiler
  • Output is highly dependent upon the actual
    content
  • File formats are merely shells for actual
    content. You are analyzing the content the
    format is just syntactic sugar.

69
Format Identification
  • 1) Do different files appear different, and does
    the appearance reflect the existence of internal
    structure?
  • Answer Yes. They do.
  • 2) Do different instances of the same file
    format appear similar?
  • Answer Somewhat. Similar content looks like
    itself, but youre measuring the fundamental
    entropy of the underlying content, not the format
    of the content itself.
  • 3) Does one format embedded in another make
    itself apparent?

70
File Formats Contain Multiple SubformatsAnother
Look At Kernel32.DLL
These are all different parts of Kernel32.
71
Quickly Browsing Large FilesTilt-Shift View
  • Instead of measuring absolute Y against absolute
    X, make X relative
  • Advance through the file going down, look back a
    number of bytes going right

72
Complain All You Want.Hex Still Sucks.
73
Format Identification
  • 1) Do different files appear different, and does
    the appearance reflect the existence of internal
    structure?
  • Answer Yes. They do.
  • 2) Do different instances of the same file
    format appear similar?
  • Answer Somewhat. Similar content looks like
    itself, but youre measuring the fundamental
    entropy of the underlying content, not the format
    of the content itself.
  • 3) Does one format embedded in another make
    itself apparent?
  • Answer Yes. Multiple, distinct sections are
    clearly visible in a way that hex cannot show.

74
Fuzzer Guidance
  • 1) Can we locate the actual byte offsets where
    one section ends and another begins?
  • Why would we want to?
  • Fuzzers break parsers.
  • Many subformats to a format, many subparsers to a
    parser
  • To a rough level of approximation, fuzzing a
    single subformat lets you stress a single
    subparser
  • So once we split a file up, we can selectively
    attack one subparser at a time.
  • 2) Can we visualize and compare fuzzer
    operations via Dotplots?

75
Simple Math
We select an interesting blob from kernel32.dll.
The blob is at pixel offset 507x507, and is a
square around 570 pixels wide. Window size on viz
was 32. 50732 The interesting section starts
16224 bytes into the file. 57032 The
interesting section is 18240 bytes long.
76
Whats The Actual Data?dd ifkernel32.dll bs1
skip16100 hexdump - more
77
Using Hardcorr as a first knife to locate
interesting-to-fuzz regions
78
Fuzzer Guidance
  • 1) Can we locate the actual byte offsets where
    one section ends and another begins?
  • Answer Yes. We can quickly route from the
    image to the byte offset, through basic
    arithmetic.
  • 2) Can we visualize and compare fuzzer
    operations via Dotplots?

79
Differentials
  • Major use of dotplots in bioinformatics is to
    compare one genome against another
  • Autocorrelation Compare A to A
  • Cross-Correlation Compare A to B
  • Most files are sufficiently dissimilar that not
    very interesting structure shows up
  • Notable exception Different versions of the
    same binary

80
Visual Bindiff!
81
MSVCR70.DLL v. MSVCR71.DLL
82
FuzzersVery Broken Patchers ?
Mangle.C Single Bit Differences
CFG9000 Large Scale Reordering
83
Fuzzer Guidance
  • 1) Can we locate the actual byte offsets where
    one section ends and another begins?
  • Answer Yes. We can quickly route from the
    image to the byte offset, through basic
    arithmetic.
  • 2) Can we visualize and compare fuzzer
    operations via Dotplots?
  • Answer Yes visual diffing effectively shows
    differences between files, including differences
    introduced by various flavors of fuzzers.
Write a Comment
User Comments (0)
About PowerShow.com