Paradyn/Dyninst Binary Analysis Session - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Paradyn/Dyninst Binary Analysis Session

Description:

idiom features for single nodes. Call/ conflict features for pairs. Skipping lots of math! ... apply idiom model. find gap functions. Which (compiler) model ... – PowerPoint PPT presentation

Number of Views:57
Avg rating:3.0/5.0
Slides: 31
Provided by: nathanro
Category:

less

Transcript and Presenter's Notes

Title: Paradyn/Dyninst Binary Analysis Session


1
Paradyn/DyninstBinary Analysis Session
  • Stripped Binaries
  • ?
  • Obfuscated Binary Code
  • ?
  • Undetectable Transformation and Instrumentation

Paradyn / Condor Week Madison, Wisconsin April 29
May 2, 2008
2
The World of Dyninst
instrumentation
analysis
debugging
The binary!
Code in known functions
3
Modern Binary Challenges
Missing symbols
gaps
Could be anything in there
4
Modern Binary Challenges
Packed code
New code appears at runtime
5
Modern Binary Challenges
very difficult
Introspective or self-modifying code
Makes instrumenting difficult
6
The Next 90 Minutes
7
Learning to Analyze Stripped Binary Code
Nathan Rosenblum Paradyn Project Paradyn /
Condor Week Madison, Wisconsin April 29 May 2,
2008
8
Code is Hard to Find
9
Code is Hard to Find
but Dyninst knows how
ltlt push ebp mov esp, ebp gtgt
7a 01 00 fd a2 b3 74 68 69 73 20 65 78 61 6d 70
6c 65 20 69 55 85 e5 6f 67 75 73 2e 2e 2e 7a 01
00 fd a2 b3 74 68 69 73 20 65 78 61 6d 70 6c 65
7a 01 00 fd a2 b3 74 68 69 73 20 65 78 61 6d 70
6c 65 20 69 55 85 e5 6f 67 75 73 2e 2e 2e 7a 01
00 fd a2 b3 74 68 69 73 20 65 78 61 6d 70 6c 65
7a 01 00 fd a2 b3 74 68 69 73 20 65 78 61 6d 70
6c 65 20 69 push ebp mov esp, ebp push ebx .
. .
examine gaps
scan for patterns
recover code
10
Digression Evaluating Code Parsers
Better confidence in results
precision
Find more code
recall
11
Dyninst Finds Gap Code Well
GCC-compiled binaries
.97 precision .98 recall
precision
recall
12
or not so well
Intel CC-compiled binaries
precision
.67 precision .16 recall
recall
13
Why is Gap Parsing Hard?
Code Segment
Gap contents may vary
String data
  • Dialog Constants
  • Import names
  • Other strings

14
Why is Gap Parsing Hard?
15
Why is Gap Parsing Hard?
16
no flexibility for additional insns
ltlt push ebp mov esp, ebp gtgt
Ignores preceding information
rigid, hand tuned, compiler-specific
17
Learning to Recognize Functions
Goal Automatically model binary code
We need
  • Features to represent functions
  • Learning system to choose best features
  • A way to use the system when parsing

18
Idioms
ltlt push ebp mov esp,ebp gtgt ltlt push ebp
mov esp,ebp gtgt ltlt mov 0x8(ebp),eax gtgt PRE ltlt
ret nop gtgt
function starts after PRE idioms
19

A Problem of Scale
How do we choose the best idioms?
?
There are tens of thousands, we only want the
best few!
Needles in an um, idiom stack?
20
Distributed Feature Selection
21
Structural Complications
Candidate function
CALL
22
Structural Complications
23
Structural Complications
24
Research
Call/ conflict features for pairs
Conditional Random Fields (CRFs)
Label each candidate FEP
Infers probability of joint labeling
Skipping lots of math!
idiom features for single nodes
Greedy Approximation
highest confidence
idiom score
call propagation
conflict elimination
25
Experimentation
  • GNU C Compiler
  • Simple, regular function preamble
  • Intel C Compiler
  • Most variation in entry points highly optimized
  • MS Visual Studio
  • High variation in function entry point idioms

26
Testing
Comparison of three binary analysis tools
  • Original Dyninst
  • Scans for common entry preamble
  • Dyninst w/ Model
  • Model replaces entry preamble heuristic
  • IDA Pro Disassembler
  • Scans for common entry preamble
  • List of Library Fingerprints (Windows)

27
More Testing
Classifier tuned to any point on curve
I
ICC binaries are the hardest
Visual Studio
Intel C Compiler
28
One Final Issue
Which (compiler) model should we apply?
Reverse the Problem
29
Productization
Analysis mini tool
Annotated with SymtabAPI
Looks like normal binary
Optional gap parsing enabled at runtime
30
References
Rosenblum, Zhu, Miller, Hunt. Learning to
Analyze Binary Computer Code. Proceedings of the
23rd Conference on Artificial Intelligence (AAAI
08). July, 2008.
www.paradyn.org/html/publications-by-year.html
Write a Comment
User Comments (0)
About PowerShow.com