Recovery of Variables and Heap Structure in x86 Executables - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Recovery of Variables and Heap Structure in x86 Executables

Description:

Recovery of Variables and. Heap Structure in x86 Executables. Gogul Balakrishnan. Thomas Reps ... Determine base register. Row-major order. Base Addr. Base Addr ... – PowerPoint PPT presentation

Number of Views:65
Avg rating:3.0/5.0
Slides: 26
Provided by: woo89
Category:

less

Transcript and Presenter's Notes

Title: Recovery of Variables and Heap Structure in x86 Executables


1
Recovery of Variables and Heap Structure in x86
Executables
  • Gogul Balakrishnan
  • Thomas Reps
  • University of Wisconsin

2
Overview
  • Introduction
  • Challenges
  • Background
  • Recovering A-locs via Iteration
  • An Abstraction for Heap-Allocated Storage
  • Experiments

3
Introduction
  • The Need of Analyzing Executables
  • What You See Is Not What You eXecute
  • Many Obstacles in Analyzing Executables
  • Data Objects are Not Easily Identifiable.
  • Absence of Symbol Table Debugging Information
  • Determining the Memory Addresses of Data Objects
  • Difficult to Track the Flow of Data through
    Memory
  • Challenging to get useful information about the
    heap

e.g) memset(password, \0, len)
free(password)
4
Challenges(1/3)
  • Recovering Variable-like Entities
  • The layout of Memory is known at Compile time or
    Assembly time (IDAPro Approach)
  • To Recover y, the Set of Values that eax Holds at
    5 Needs to be Determined.

void main() int x, y x 1 y
2 return
proc main 1 mov ebp, esp 2 sub esp, 8 3
mov ebp-8, 1 4 mov eax, ebp 5 mov eax-4,
2 6 add esp, 8 7 retn
5
Challenges(2/3)
  • Granularity of Recovered Variable-like Entities
  • Affects the complexity and accuracy of subsequent
    analyses
  • The Structure of Heap-Allocated Objects
  • Only the Size of the Allocated Block is Known.
  • Using Abstract-Refinement Algorithm

6
Challenges(3/3)
  • Resolving Virtual-Function Calls
  • A Definite Link between the Object and the
    Virtual Function Table is Never Established.
    (Weak Update)

one-variable-per-malloc-site abstraction
7
Background(1/6)
  • Abstract Locations (A-locs)
  • Memory Region
  • A Set of Disjoint Memory Areas
  • Represents a Group of Locations that have Similar
    Runtime Properties
  • Abstract Locations
  • Locations between two addresses/offsets in
    Memory-Region
  • Address Offsets are Statically Determined

8
Background(2/6)
  • Abstract Locations (contd)

proc main 0 mov ebp,esp 1 sub esp,40 2
mov ecx,0 3 lea eax,ebp-40 L1 mov eax, 1 5
mov eax4,2 6 add eax, 8 7 inc ecx 8
cmp ecx, 5 9 jl L1 10 mov eax,ebp-36 11 add
esp,40 12 retn
9
Background(3/6)
  • Value-Set Analysis (VSA)
  • Combined Numeric-Analysis Pointer-Analysis
  • Over-Approximation of the values that each a-loc
    holds at each program point
  • Value-Set
  • The Set of Addresses and Numeric Values
  • N-tuple of strided intervals of the form sl, u
  • (Global Region, Procedure Region, )
  • (10, 9, ?) versus (?, -8-40, -8)

N the number of memory-regions
e.g) 8-40, -8 -40, -32, -24, -16, -8
10
Background(4/6)
  • Value-Set Analysis (contd)
  • The Value-Set of eax at L1
  • (?, 8-40, -8)
  • eax holds the offsets
    -40, -32,
    -24, -16, -8
  • Starting Addresses of Field x of p

proc main 0 mov ebp,esp 1 sub esp,40 2
mov ecx,0 3 lea eax,ebp-40 L1 mov eax, 1 5
mov eax4,2 6 add eax, 8 7 inc ecx 8
cmp ecx, 5 9 jl L1 10 mov eax,ebp-36 11 add
esp,40 12 retn
Typedef struct int x, y Point int
main() int i Point p5 for(i0
ilt5 i) pi.x 1 pi.y
2 return p0.y
11
Background(5/6)
  • Aggregate Structure Identification (ASI)
  • Can Distinguish between Accesses to Different
    Parts of the Same Aggregate
  • Aggregate is broken up into smaller parts (atoms)
  • Data-Access Constraint Language (DAC)
  • Specifying Data-Access Pattern in the Program

12
Background(6/6)
  • Aggregate Structure Identification (contd)
  • Data-Access Constraint Language (DAC)
  • DataRef l u refers to bytes l through u in
    DataRef
  • DataRef n n is the number of elements
  • ASI DAG

e.g) P011 3 P03, P47, or P811
return_main
13
Recovering A-locs via Iteration
  • Problems of VSA
  • Can only Represent a Contiguous Sequence of
    Memory Locations
  • Cannot Detect Internal Substructure
  • Basic Idea
  • VSA is used to obtain memory-access patterns in
    the executable
  • ASI is used as a heuristic to determine a set of
    a-locs according to the memory-access patterns
    obtained from the information recovered by VSA.

14
Recovering A-locs via Iteration
  • Generating Data-Access Constraints from Value

Input (r, sl, u, length) Output (ASI Ref,
Boolean)
AR_main-40-3307 AR_main-32-2507 AR_mai
n-24-1707 AR_main-16-907 AR_main-8-1
07
15
Recovering A-locs via Iteration
  • Generating Data-Access Constraints from Value

ltAlgorithm 2gt if (s1l1,u1 or s2l2,u2 is a
singleton then return SI2ASI(r, s1l1, u1 ?
s2l2, u2, length) end if if s1 (u2 l2
length) then baseSI ? s1l1, u1 indexSI
? s2l2, u2 else if s2 (u1 l1 length)
then baseSI ? s2l2, u2 indexSI ? s1l1,
u1 else return SI2ASI(r, s1l1, u1 ? s2l2,
u2, length) end if ltbaseRef, exactRefgt ?
SI2ASI(r, baseSI, stride(baseSI)) if exactRef is
false then return SI2ASI(r, s1l1, u1 ?
s2l2, u2, length) else return
concat(baseRef, SI2ASI(, indexSI, length)) endif
Determine base register
16
Recovering A-locs via Iteration
  • Interpreting Indirect Memory-References
  • Lookup Algorithm
  • NodeDesc ltname, lengthgt
  • NodeDescList An Ordered List of NodeDesc
  • Three Operations

name the name associated with the ASI tree
node length the length of above node
e.g) nd1, nd2, , ndn
17
Recovering A-locs via Iteration
  • Lookup Algorithm Examples

18
An Abstraction for Heap-Allocated Storage
  • Previous Abstraction
  • Recency Abstraction
  • Allowing VSA ASI to recover Info. About
    virtual-function tables
  • Use Two Memory-Regions per allocation site s
  • MRABs Most Recently Allocated Block
  • NMRABs Non-Most Recently Allocated Block
  • count How many concrete blocks the
    memory-region represents (MRABs.count,
    NMRABs.count)
  • SmallRange 0, 0, 0, 1, 1, 1, 0, 8, 1,
    8, 2, 8
  • size over-approximation of the size of block
    (MRABs.size, NMRABs.size)

All of the nodes allocated at a given allocation
site s are folded together into a single summary
node ns.
19
An Abstraction for Heap-Allocated Storage
  • Operation
  • AbsEnvs MRABs/NMRABs ? ltcount,size,alocEnv
    gt
  • AlocEnv a-loc ? ValueSet
  • Allocation site s transforms absEnv to absEnv
  • absEnv(MRABs) lt0,1, size, a-loc.Value-Setgt
  • absEnv(NMRABs).count absEnv(NMRABs).count
    absEnv(MRABs).count
  • absEnv(NMRABs).size absEnv(NMRABs).size ?
    absEnv(MRABs).size
  • absEnv(NMRABs).alocEnv absEnv(NMRABs).alocE
    nv ? absEnv(MRABs).alocEnv

20
An Abstraction for Heap-Allocated Storage
21
Experiments
  • Environments
  • Software

22
Experiments
  • Results of Virtual-Function Call Resolution

23
Experiments
  • Results of A-loc Identification
  • Comparing the Results of Algorithm with Debugging
    Information

The structure of 87 of the local variables is
correct
24
Experiments
  • Results of A-loc Identification

The structure of 72 of the objects in the heap
is correct
25
Q A
Write a Comment
User Comments (0)
About PowerShow.com