SRE Basics - PowerPoint PPT Presentation

About This Presentation
Title:

SRE Basics

Description:

Assembly code lives between binary and high level languages ... Immediate --- e.g., hard-coded constant. Memory address --- enclosed in [brackets] SRE Basics 19 ... – PowerPoint PPT presentation

Number of Views:412
Avg rating:3.0/5.0
Slides: 87
Provided by: marks9
Learn more at: http://www.cs.sjsu.edu
Category:
Tags: sre | basics | code

less

Transcript and Presenter's Notes

Title: SRE Basics


1
SRE Basics
2
In this Section
  • We briefly cover following topics
  • Assembly code
  • Virtual machine/Java bytecode
  • Windows PE file format

3
Assembly Code
4
High Level Languages
  • First, high level languages
  • Ancient high level languages
  • Basic --- little structure
  • FORTRAN --- limited structure
  • C --- structured language
  • C was designed to deal with complexity
  • OO languages take this one step further
  • Above languages considered primitive today

5
High Level Languages
  • Object oriented (OO) languages
  • Object groups code and data together
  • Consider best way to handle complexity (at least
    for now)
  • Important OO ideas include
  • Encapsulation, inheritance, polymorphism

6
High Level Languages
  • Program must deal with code and data
  • Data
  • Variables, data structures, files, etc.
  • Code
  • Reverser must study control flow
  • Conditionals, switches, loops, etc.

7
High Level Languages
  • High level languages --- different users want
    different things
  • Goes back (at least) to C vs FORTRAN
  • Today, major tradeoff is between simplicity and
    flexibility
  • Simplicity --- easy to write short program to do
    exactly what you want (e.g., C)
  • Flexibility --- language has it all (e.g., Java)

8
High Level Languages
  • Some languages compiled into native code
  • exe is specific to the hardware
  • C, C, FORTRAN, etc.
  • Other languages compiled into code, which is
    interpreted by a virtual machine
  • Java, C
  • Often possible to make compiled version
  • For reverser, this distinction is far more
    important than OO or not

9
Intro to Assembly
  • At the lowest level, machine binary
  • Assembly code lives between binary and high level
    languages
  • When reversing native code, we must deal with
    assembly code
  • Why assembly code?
  • Why not reverse binary to, say, C?

10
Intro to Assembly
  • Reverser would like to deal with high level, but
    is stuck with low level
  • Ideally, want to create mental link from low
    level to high level
  • Easier for code written in C
  • Harder for OO code, such as C
  • Why?

11
Intro to Assembly
  • Perhaps biggest difference at assembly level is
    dealing with data
  • High level languages hide lots and lots of
    details on data manipulations
  • For example, loading and storing
  • Also, low level instructions are primitive
  • Each instruction does not do very much

12
Intro to Assembly
  • Consider following simple C program
  • Simple, but far higher level than assembly code

int multiply(int x, int y) int z z x
y return z
13
Intro to Assembly
int multiply(int x, int y) int z z x
y return z
  • In assembly code
  • Store state before entering function
  • Allocate memory for z
  • Load x and y into registers
  • Multiply x by y and store result in register
  • Copy result back to memory for z (optional)
  • Restore state that was stored in 1.
  • Return z

14
Intro to Assembly
  • Why are things so complicated at low level?
  • Its all about efficiency!
  • Reading memory and storing are slow
  • No single asm instruction to read memory, operate
    on it, and store result
  • But this is common in high level languages

15
Intro to Assembly
  • Registers --- local processor memory
  • So dont have to read and write RAM
  • Stack --- scratch paper (in RAM)
  • Holds register values, local variables, function
    parameters and return values
  • E.g., storage for z in multiply example
  • Heap --- dynamic, variable-sized data
  • Data section --- e.g., string constants
  • Control flow --- high level if or while are
    much more complex at low level

16
Registers
  • Registers used in most instructions
  • Specifics here deal with IA-32
  • Intel Architecture, 32-bit
  • Used in Wintel machines
  • We use IA-32 notation
  • ATT notation also exists
  • Eight 32-bit registers (next slide)
  • All 8 start with E
  • Also several system registers

17
Registers
  • EAX, EBX, EDX --- generic, used for int, Boolean,
    , memory operations
  • ECX --- generic, used as counter
  • ESI/EDI --- generic, source/destination pointers
    when copying memory
  • SI source index, DI destination index
  • EBP --- generic, stack base pointer
  • Usually, stack position after return address
  • ESP --- stack pointer
  • Curretn stack frame is between ESP to EBP

18
Flags
  • EFLAGS --- special registers
  • Status flags updated by various operations to
    record outcomes
  • System flags too, but we dont care about them
  • Flags are basic tool for conditionals
  • For example, a TEST followed by a jump
    instruction
  • TEST sets various flags, jump determines action
    to take, based on those flags

19
Instruction Format
  • Most instructions consist of
  • Opcode --- the instruction
  • One or two operands --- parameter(s)
  • Operand (parameters) are data
  • Operands come in 3 flavors
  • Register name --- for example, EAX
  • Immediate --- e.g., hard-coded constant
  • Memory address --- enclosed in brackets

20
Operand Examples
  • EAX
  • Read from (or write to) EAX register, depending
    on opcode
  • 0x30004040
  • Immediate --- number is embedded in code
  • Usually a constant in high-level code
  • 0x4000349e
  • This os a memory address
  • Could be a global variable in high level code

21
Basic Instructions
  • We cover a few common instructions
  • First we give general format
  • Later, we give a few simple examples
  • There are lots of assembly instructions
  • But, most assembly code uses only a few
  • About 14 assembly instructions account for more
    than 90 of all code

22
Opcode Counts
  • Typical opcode counts, normal code

23
Opcode Counts
  • Opcode counts, typical virus code

24
Instructions
  • We consider following operations
  • Moving data
  • Arithmetic
  • Comparisons
  • Conditional branches
  • Function calls

25
Moving Data
  • MOV is the most popular opcode
  • 2 operands, destination and source
  • MOV DestOperand, SourceOperand
  • Note the order
  • Destination first, source second

26
Arithmetic
  • Six integer arithmetic operations
  • ADD, SUB, MUL, DIV, IMUL, IDIV
  • Many variations based on operands
  • ADD Op1, Op2 add, store result in Op1
  • SUB Op1, Op2 sub Op2 from Op1 --gt Op1
  • MUL Op mul Op by EAX ---gt EDXEAX
  • DIV Op div EDXEAX by Op
  • quotient ---gt EAX, remainder ---gt EDX
  • IMUL, IDIV --- like MUL and DIV, but signed

27
Comparisons
  • CMP opcode has 2 operands
  • CMP Operand1, Operand2
  • Subtracts Operand2 from Operand1
  • Result stored in flag bits
  • If 0 then ZF flag is set
  • Other flags can be used to tell which is greater,
    depending on signed or unsigned

28
Conditional Branches
  • Conditional branches use Jcc family of
    instructions (je, jne, jz, jnz, etc.)
  • Format is
  • Jcc TargetAddress
  • If Jcc true, goto TargetAddress
  • Otherwise, what happens?

29
Function Calls
  • Use CALL and RET
  • CALL FunctionAddress
  • RET pops return address
  • RET can be told to increment ESP
  • Need to reset stack pointer
  • Why?

30
Examples
cmp ebx,0xf020 jnz 10026509
  • What does this do?
  • Compares value in EBX with constant
  • Jumps to specified address if operands are not
    same
  • Note JNE and JNZ are same instruction

31
Examples
mov edi,ecx0x5b0 mov ebx,ecx0x5b4 imul
edi,ebx
  • What does this do?
  • First, add 0x5b0 to ECX register, get value at
    that memory and put in EDI
  • Next, add 0x5b4 to ECX, get value at that memory
    and put in EBX
  • Note that ECX points to some data structure
  • Finally, EDI EDI EBX
  • Note there are different forms of IMUL

32
Examples
push eax push edi push ebx push esi push dwor
d ptr esp0x24 call 0x10026eeb
  • What does this do?
  • PUSH four register values
  • PUSH something related to stack ptr
  • Probably, parameter or local variable
  • Would need to look at more code to decide
  • Note dword ptr is effectively a cast
  • CALL a function

33
Examples
mov eax, dword ptr ebp - 0x20 shl eax,
4 mov ecx, dword ptr ebp - 0x24 cmp dword
ptr eaxecx4, 0 call 0x10026eeb
  • What does this do?
  • Maybe data structure in an array
  • Last line
  • ECX --- gets base pointer
  • EAX --- current offset into the array
  • Add 4 to get specific member of structure

34
Examples
  • ATT syntax

pushl 14 pushl helloWorld pushl 1 movl 4,
eax pushl eax int 0x80 addl 16,
esp pushl 0 movl 1, eax pushl eax int
0x80
35
Compilation
  • Converts high level representation of code to
    binary
  • Front end --- lexical analysis
  • Verify syntax, etc.
  • Intermediate representation
  • Optimization
  • Improve structure, eliminate redundancy,

36
Compilation
  • Back end --- generates the actual code
  • Instruction selection
  • Register allocation
  • Instruction scheduling --- pipelining,
    parallelism
  • Back end process might make disassembly hard to
    read
  • Optimization too
  • Each compiler has its own quirks
  • Can you automatically determine compiler?

37
Virtual Machines Bytecode
38
Virtual Machines
  • Some languages instead generate intermediate
    bytecode
  • Bytecode runs in a virtual machine
  • Virtual machine is a program that (historically)
    interprets bytecode
  • Translates bytecode for the hardware
  • Bytecode analogous to assembly code

39
Virtual Machines
  • Advantages?
  • Hardware independent
  • Disadvantages?
  • Slow
  • Today, usually just-in-time compilers instead of
    interpreters
  • Compile snippets of bytecode into native code as
    needed

40
Reversing Bytecode
  • Reversing bytecode is easy
  • Unless special precautions are taken
  • Even then, easier than native code
  • Bytecode usually contains lots of metadata
  • Possible to reconstruct highly accurate high
    level language
  • Bytecode can be obfuscated
  • In worst case, reverser must learn bytecode
  • But bytecode is easier than native code

41
Windows PE Files
42
Windows PE File Format
  • Designed to be standard executable file format
    for all versions of OS
  • on all supported processors
  • Only small changes since PE format was introduced
  • E.g., support for 64-bit Windows

43
Windows PE Files
  • Trivia
  • Q Whats the difference between exe and dll?
  • A Not much --- one bit differs in PE files
  • Q What is size of smallest possible PE file?
  • A 133 bytes
  • PE file on disk is a file
  • Once loaded into memory, its a module
  • File is mapped to module
  • Address where module begins is HMODULE
  • PE file may not all be mapped to module

44
Windows PE Files
  • WINNT.H is final word on what PE file looks like
  • Tools to examine PE files
  • Dumpbin (Visual Studio)
  • Depends
  • PE Browse Professional
  • In spite of its name, its free
  • PEDUMP (by author of article)

45
PE File Sections
  • Each section is chunk of code or data that
    logically belongs together
  • For example, all import tables in one section
  • Code is in .text section
  • Code is code, but many types of data
  • Data examples
  • Program data (e.g., .rdata for read-only)
  • API import/export tables
  • Resources, relocation info, etc.
  • Can specify section names in C source

46
PE File Sections
  • When mapped, module starts on a page boundary
  • Linker can be told to merge sections
  • E.g., to merge .text and .rdata
  • /MERGE.rdata.text
  • Some sections commonly merged
  • Some sections cannot be merged

47
Relative Virtual Addresses
  • Exe file specifies in-memory addresses
  • PE file specifies preferred load location
  • But DLL can actually load just about anywhere
  • So, PE specifies addresses in a way that is
    independent of where it loads
  • No hardcoded addresses in PE
  • Instead, Relative Virtual Addresses (RVAs)
  • RVA is an offset relative to where PE is loaded

48
Relative Virtual Addresses
  • To find actual memory location, add RVA to the
    actual load address
  • For example, suppose
  • Exe file is loaded at 0x400000
  • And RVA is 0x1000
  • Then code (.text) starts at 0x401000
  • In Windows terminology, actual address is known
    as Virtual Address (VA)

49
Data Directory
  • There are many data structures within exe
  • For efficiency, must be loaded quickly
  • E.g., imports, exports, resources, base
    relocations, etc.
  • DataDirectory
  • Array of 16 data structures
  • define IMAGE_DIRECTORY_ENTRY_xxx defines array
    indexes (0 to 15)

50
Importing Functions
  • To use code or data from another DLL, must import
    it
  • When PE file loads, Windows loader locates
    imported functions/data
  • Usually automatic, when program first starts
  • Imported DLLs may import others
  • For example, any program created with Visual C
    imports KERNEL32.DLL
  • and KERNEL32.DLL imports from NTDLL.DLL

51
Importing Functions
  • Each PE has Import Address Table (IAT)
  • IAT contains arrays of function pointers
  • One array per imported DLL
  • Each imported API has spot in IAT
  • The only place where API address stored
  • So, all calls to API go thru one function ptr
  • E.g., CALL DWORD PTR 0x00405030
  • But, by default its a little more complex

52
PE File Structure
  • Next slides describe PE file structure
  • Note that all of these data structures defined in
    WINNT.H
  • Usually, 32-bit and 64-bit versions
  • For example,
  • IMAGE_NT_HEADERS32
  • IMAGE_NT_HEADERS64
  • Identical except for widened fields for 64-bit

53
MS-DOS Header
  • Every PE begins with small MS-DOS exe
  • Prints message saying Windows required
  • MS-DOS Header
  • IMAGE_DOS_HEADER
  • 2 important values
  • e_lfanew --- file offset of PE header
  • e_magic --- 0x5A4D, MZ in ASCII Why MZ?

54
IMAGE_NT_HEADERS Header
  • Primary location for PE specifics
  • Location in file given by e_lfanew
  • One version for 32-bit exes and another for
    64-bit exes
  • Only minor differences between them
  • Single bit specifies 32-bit or 64-bit

55
IMAGE_NT_HEADERS Header
  • Has 3 fields
  • typedef struct _IMAGE_NT_HEADERS
  • DWORD Signature
  • IMAGE_FILE_HEADER FileHeader
  • IMAGE_OPTIONAL_HEADER32 OptionalHeader
  • IMAGE_NT_HEADERS32, PIMAGE_NT_HEADERS32
  • In valid PE, Signature is 0x00004550
  • In ASCII, this is PE00

56
IMAGE_NT_HEADERS Header
  • typedef struct _IMAGE_NT_HEADERS
  • DWORD Signature
  • IMAGE_FILE_HEADER FileHeader
  • IMAGE_OPTIONAL_HEADER32 OptionalHeader
  • IMAGE_NT_HEADERS32, PIMAGE_NT_HEADERS32
  • IMAGE_FILE_HEADER predates PE
  • Struct containing basic info about file
  • Most important info is size of optional data
    that follows (not really optional)

57
IMAGE_NT_HEADERS Header
  • typedef struct _IMAGE_NT_HEADERS
  • DWORD Signature
  • IMAGE_FILE_HEADER FileHeader
  • IMAGE_OPTIONAL_HEADER32 OptionalHeader
  • IMAGE_NT_HEADERS32, PIMAGE_NT_HEADERS32
  • IMAGE_OPTIONAL_HEADER
  • DataDirectory array (at end) is address book of
    important locations in exe
  • Each entry contains RVA and size of data

58
PE Sections
  • Recall, section is chunk of code or data that
    logically belongs together
  • For example
  • All data for exes import tables are in one
    section

59
Section Table
  • Section table contains array of
    IMAGE_SECTION_HEADER structs
  • An IMAGE_SECTION_HEADER has info about associated
    section
  • Location, length, and characteristics
  • Number of such headers given by field
    IMAGE_NT_HEADERS.FileHeader.NumberOfSections

60
Alignment of Sections
  • Visual Studio 6.0
  • 4KB sections by default
  • Visual Studio .NET
  • 4KB by default, except for small files uses
    0x200-byte alignment
  • Also, .NET spec requires 8KB in-memory alignment
    (for IA-64 compatibility)

61
PE Sections
  • So far, overview of PE file format
  • Now, look inside important sections
  • and some data structures within sections
  • Then we finish with look at PEDUMP
  • Recall there are other similar utilities

62
Section Names
  • .text ---The default code section.
  • .data --- The default read/write data section.
    Global variables typically go here.
  • .rdata --- The default read-only data section.
    String literals and C/COM vtables are examples
    of items put into .rdata.

63
Section Names
  • .idata --- The imports table. It has become
    common practice (explicitly, or via linker
    default behavior) to merge .idata into another
    section, typically .rdata. By default, the linker
    only merges the .idata section into another
    section when creating a release mode exe.
  • .edata --- The exports table. When creating an
    executable that exports APIs or data, the linker
    creates an .EXP file which contains an .edata
    section that's added into the final executable.
    Like the .idata section, the .edata section is
    often found merged into the .text or .rdata
    sections.

64
Section Names
  • .rsrc --- The resources. This section is
    read-only. However, it should not be renamed and
    should not be merged into other sections.
  • .bss --- Uninitialized data. Rarely found in exes
    created with recent linkers. Instead, the
    VirtualSize of the exe's .data section is
    expanded to make room for uninitialized data.
  • .crt --- Data added for supporting the C
    runtime (CRT). A good example is the function
    pointers that are used to call the constructors
    and destructors of static C objects.

65
Section Names
  • .tls --- Data for supporting thread local storage
    variables declared with __declspec(thread). This
    includes the initial value of the data, as well
    as additional variables needed by the runtime.
  • .reloc --- Base relocations in an exe. Base
    relocations are generally only needed for DLLs
    and not EXEs. In release mode, the linker doesn't
    emit base relocations for EXE files. Relocations
    can be removed when linking with the /FIXED
    switch.
  • .sdata --- "Short" read/write data that can be
    addressed relative to the global pointer. Used
    for IA-64 and other architectures that use a
    global pointer register. Regular-sized global
    variables on the IA-64 will go in this section.

66
Section Names
  • .srdata --- "Short" read-only data that can be
    addressed relative to the global pointer. Used on
    the IA-64 and other architectures that use a
    global pointer register.
  • .pdata --- The exception table. Contains an array
    of IMAGE_RUNTIME_FUNCTION_ENTRY structs,
    CPU-specific. Pointed to by IMAGE_DIRECTORY_ENTRY_
    EXCEPTION slot in the DataDirectory. Used for
    architectures with table-based exception
    handling, such as the IA-64. The only
    architecture that doesn't use table-based
    exception handling is the x86.
  • .didat --- Delayload import data. Found in exes
    built in nonrelease mode. In release mode, the
    delayload data is merged into another section.

67
Exports Section
  • Exe may export code or data
  • Makes it available to other exes
  • Refer to an exported thing as a symbol
  • At minimum, to export symbol, must specify its
    address in defined way
  • Keyword ORDINAL tells linker to use numbers, not
    names, for symbols
  • After all, names just a convenience for coders

68
IMAGE_EXPORT_DIRECTORY
  • Points to 3 arrays
  • And a table of ASCII strings containing symbol
    names
  • Only required array is Export Address Table (EAT)
  • Array of function pointers
  • Addresses of exported functions
  • Export ordinal is an index into this array

69
IMAGE_EXPORT_DIRECTORY
  • Structure example

70
Example
  • exports table
  • Name KERNEL32.dll
  • Characteristics 00000000
  • TimeDateStamp 3B7DDFD8 -gt Fri Aug 17 232408
    2001
  • Version 0.00
  • Ordinal base 00000001
  • of functions 000003A0
  • of Names 000003A0
  • Entry Pt Ordn Name
  • 00012ADA 1 ActivateActCtx
  • 000082C2 2 AddAtomA
  • remainder of exports omitted

71
Example
  • Spse, call GetProcAddress on AddAtomA API
  • System locates KERNEL32s IMAGE_EXPORT_DIRECTORY
  • Gets start address of Export Names Table (ENT)
  • It finds there are 0x3A0 entries in ENT
  • Does binary search for AddAtomA
  • Suppose AddAtomA is 2nd entry
  • loader reads 2nd value from export ordinal table

72
Example (Continued)
  • Call GetProcAddress on AddAtomA API
  • AddAtomA has export ordinal 2
  • Use this as index into EAT (taking into account
    base field value)
  • Finds AddAtomA has RVA of 0x82C2
  • Add 0x82C2 to load address of KERNEL32 to get
    actual address of AddAtomA

73
Export Forwarding
  • Can forward export to another DLL
  • That is, must find it at forward address
  • Example
  • KERNEL32 HeapAlloc function forwarded to
    RtlAllocHeap function exported by NTDLL
  • In EXPORTS section of KERNEL32, find
  • EXPORTS
  • HeapAlloc NTDLL.RtlAllocHeap

74
Imports Section
  • Importing is opposite of exporting
  • IMAGE_IMPORTS_DESCRIPTOR
  • Points to 2 essentially identical arrays
  • Import Address Table Import Name Table
  • IAT and INT
  • Contain ordinal, address, forwarding info
  • After binding, IAT rewritten, INT retains
    original (pre-binding) info
  • Binding discussed next

75
Imports Section
  • Example
  • Importing APIs from USER32.DLL

76
Binding
  • Binding means IAT overwritten with actual
    addresses
  • VAs overwrite RVAs
  • Why do this?
  • Increased efficiency
  • Loader checks whether binding valid

77
Delayload Data
  • Hybrid between implicit explicit importing
  • Not an OS issue
  • A linker issue, at runtime
  • There is IAT and INT for the DLL
  • Identical to regular IAT and INT
  • But read by runtime library code instead of OS
  • Benefit? Calls then go directly to API

78
Resources Section
  • For resources such as
  • icons, bitmaps, dialogs, etc.
  • Most complicated section to navigate
  • Organized like a file system

79
Base Relocations
  • Executable has many memory addresses
  • As mentioned, PE file specifies preferred memory
    address to load the module
  • ImageBase field in IMAGE_FILE_HEADER
  • If DLL loaded elsewhere, all addresses will be
    incorrect
  • Base relocations tell loader all locations that
    need to be modified
  • Note that this is extra work for the loader
  • What about EXE, which is not a DLL?

80
Base Relocation Example
  • Consider the following line of code

00401020 8B 0D 34 D4 40 00 mov ecx,dword ptr
0x0040D434
  • Note that 8B 0D specifies opcode
  • Also note the address 0x0040D434
  • Suppose preferred load is at 0x00400000
  • If it loads at that address, it runs as-is
  • Suppose instead it loads at 0x00500000
  • Then code above needs to change to

8B 0D 34 D4 50 00 mov ecx,dword ptr
0x0050D434
81
Base Relocation Example
  • If not loaded at preferred address, then loader
    computes delta
  • For example on previous slide
  • delta 0x00500000 - 0x0040000
  • So, delta is 0x00100000
  • Also, there would be base relocation specifying
    location 0x00401020
  • Loader modifies address located here by delta

82
Debug Directory
  • Contains debug info
  • Not required to run the program
  • But useful for development
  • Can be multiple forms of debug info
  • Most common is PDB file

83
.NET Header
  • .NET executables are PE files
  • However, code/data is minimal
  • Purpose of PE is simply to get .NET-specific info
    into memory
  • Metadata, intermediate language (IL)
  • MSCOREE.DLL at start of a .NET process
  • This dll takes charge and uses metadata and IL
    from executable
  • So PE has stub to get MSCOREE.DLL going

84
TLS Initialization
  • Thread Local Storage (TLS)
  • .tls section for thread local variables
  • New threads initialized using .tls data
  • Presence of TLS data indicated by nonzero
    IMAGE_DIRECTORY_ENTRY_TLS in DataDirectory
  • Points to IMAGE_TLS_DIRECTORY struct
  • Contains virtual addresses, VAs (not RVAs)
  • The actual struct is in .rdata, not in .tls

85
Program Exception Data
  • x86 architecture uses frame-based exception
    handling
  • A fairly complex way to handle exceptions
  • IA-64 and others use table-based approach
  • Table containing info about every function that
    might be affected by exception unwinding
  • Table entry includes start and end addresses, how
    and where exception to be handled
  • When exception occurs, search thru table

86
PEDUMP
  • Tools for analyzing PE files
  • Dumpbin (Visual Studio)
  • Depends
  • PE Browse Professional
  • In spite of its name, its free
  • PEDUMP (by author of article)
Write a Comment
User Comments (0)
About PowerShow.com