Reverse-Engineering Instruction Encodings - PowerPoint PPT Presentation

About This Presentation
Title:

Reverse-Engineering Instruction Encodings

Description:

Instruction Encodings Wilson Hsieh, University of Utah Dawson Engler, Stanford University Godmar Back, University of Utah What s the Problem? Dynamic code ... – PowerPoint PPT presentation

Number of Views:183
Avg rating:3.0/5.0
Slides: 24
Provided by: Wilson66
Category:

less

Transcript and Presenter's Notes

Title: Reverse-Engineering Instruction Encodings


1
Reverse-EngineeringInstruction Encodings
  • Wilson Hsieh, University of Utah
  • Dawson Engler, Stanford University
  • Godmar Back, University of Utah

2
Whats the Problem?
  • Dynamic code generation, JIT compilation
  • Emit instructions quickly
  • Therefore, avoid assembler
  • Need to know how to produce binary instructions
  • Want to express instructions in assembly
  • Generate add l1, l2, l1 for SPARC

3
What Do We Do?
  • How can I get the following mapping
  • assembly instruction ? binary format
  • That mapping exists in the assembler already!
  • So lets reverse-engineer it out of the
    assembler.

4
DERIVE Tool Chain
instruction description
instruction description
DERIVE
assembler
5
Instruction Descriptions
  • / SPARC fragment /
  • iregs ( g0, g1, g2, ..., i6, i7 )
  • and, andcc, andn, ...
  • ? op r_1iregs, r_2iregs, r_destiregs
  • op r_1iregs, imm, r_destiregs
  • ba, bn, bne,
  • ? op label
  • op,a label

6
DERIVE Tool Chain
instruction description
DERIVE
assembler
7
Encoding Descriptions
  • / MIPS breakpoint instruction /
  • break, op imm,
  • 1, / operand / 4, / bytes /
    ...
  • 0xd, 0x0, 0x0, 0x0, , / opcode information
    /
  • / operand information /
  • imm, / name /
  • IMMED, / an immediate /
  • IDENT, / encoded value input value /
  • 0, / lowest value /
  • 10, / length /
  • ...
  • 16, / bit offset /
  • I_UNSIGNED, / unsigned field /
  • ... ,

8
DERIVE Tool Chain
instruction description
DERIVE
assembler
9
Code Emitters
  • / x86 addl instruction /
  • define E_addl_rr_1(_code, rf, rt) do \
  • register unsigned short _0 (0xc001\
  • ((((rf)) ltlt 11))\
  • (((rt)) ltlt 8)))\
  • (unsigned short)((char) _code) _0\
  • _code (void )((char ) _code 2)\
  • while (0)
  • / emit addl ecx, ebx in code_buffer /
  • E_addl_rr_1(code_buffer, REGecx, REGebx)

10
Instruction Model
  • Opcode
  • Registers (names)
  • Register sets
  • Cache prefetch hints on MIPS
  • Address scale on x86
  • Immediates (integers)
  • Not registers
  • Labels (jump targets)
  • Absolute jumps
  • Relative jumps

31
0
O P C ODE
ARG 1
ARG 2
ARG 3
11
Overall Strategy
  • Solve for one field at a time
  • Hold other fields fixed and vary the desired
    field
  • Use randomization when necessary to find legal
    values
  • Anything that is not in a field is the opcode

12
Intuition Behind DERIVE
  • Assembly instruction Binary encoding
  • and g7, g6, g0 0x8009 0xc006
  • and g7, g6, g1 0x8209 0xc006
  • and g7, g6, g2 0x8409 0xc006
  • and g7, g6, g3 0x8609 0xc006
  • and g7, g6, g4 0x8809 0xc006
  • and g7, g6, g5 0x8a09 0xc006
  • and g7, g6, g6 0x8c09 0xc006
  • and g7, g6, g7 0x8e09 0xc006
  • and g7, g6, o0 0x9009 0xc006
  • and g7, g6, o1 0x9209 0xc006
  • and g7, g6, o2 0x9409 0xc006
  • and g7, g6, o3 0x9609 0xc006
  • and g7, g6, o4 0x9809 0xc006

13
DERIVE Structure
Field Type Solver
register fields register solver
absolute jump targets immediate solver
immediate fields immediate solver
relative jump targets jump solver
14
Register Solver
  • Primary assumptions (for purposes of the talk)
  • Register fields are independent
  • All register values are legal
  • Enumerate registers for one field at a time
  • Hold other fields constant
  • Solve each field separately
  • Example 3 register fields, 5 bits per field
  • 25 3 32 3 96 combinations

15
Intuition Behind DERIVE
  • Assembly instruction Binary encoding
  • and g7, g6, g0 0x8009 0xc006
  • and g7, g6, g1 0x8209 0xc006
  • and g7, g6, g2 0x8409 0xc006
  • and g7, g6, g3 0x8609 0xc006
  • and g7, g6, g4 0x8809 0xc006
  • and g7, g6, g5 0x8a09 0xc006
  • and g7, g6, g6 0x8c09 0xc006
  • and g7, g6, g7 0x8e09 0xc006
  • and g7, g6, o0 0x9009 0xc006
  • and g7, g6, o1 0x9209 0xc006
  • and g7, g6, o2 0x9409 0xc006
  • and g7, g6, o3 0x9609 0xc006
  • and g7, g6, o4 0x9809 0xc006

16
Immediate Solver
  • Primary assumptions
  • Immediate field is a single range of bits in
    instruction
  • Explore each bit size to find encoding of one
    field
  • Values of 1, 2, 4, 8, 16, ...
  • Again, hold other fields constant
  • Example 10-bit immediate field
  • 10 combinations

17
Jump Solver
  • Primary assumptions
  • Label field is a single range of bits
  • Emit jumps to different offsets
  • Find where label goes for encoding of 0
  • Find smallest jump size
  • Find high bit by emitting a negative-valued jump

18
Solving Time
Processor Run Time (minutes) Description (lines)
Alpha 6.3 104
ARM 43. 227
MIPS 2.5 81
PowerPC 4.8 186
SPARC 4.8 97
x86 240. 221
x86-kaffe 4.9 106
19
Instruction Emitter Generator
  • Reads in DERIVE-generated specifications
  • Produces C macros
  • Can generate runtime checks
  • Debugging support
  • Handles multiple instruction encodings
  • Linkage macros for backpatching
  • Used to retarget Kaffe (publicly available JVM)
    on x86
  • Reduced backend description from 2084?1267 lines
    (40)

20
Extensions
  • Can handle instructions that take a subset of
    registers
  • SPARC double-word loads
  • Special encodings that are register-dependent
  • eax on x86
  • Can handle simple transformations
  • Low bits dropped off of jump offsets
  • User can specify transformations
  • Address scaling on x86
  • User can specify registers that are dependent
  • PowerPC post-increment instructions

21
Future Work
  • Extending DERIVE
  • Fields that are broken up into multiple bit
    ranges
  • Memoization of computations
  • ATOM-like tools
  • Reverse-engineering linkers

22
Related Work
  • Instruction encoding munging
  • NJ Toolkit Ramsey Fernández, USENIX 1995
  • Testing assemblers
  • NJ Toolkit Fernández and Ramsey, ICSE 1997
  • Reverse engineering compiler technology
  • Retarget back-end generators Collberg, PLDI 1997

23
Summary
  • DERIVE is a cool hack, but it isnt just a hack.
  • It is a useful tool.
  • It is a good proof of concept.
  • We did some clever tricks to build it.
  • http//www.cs.utah.edu/wilson/derive.tar.gz
Write a Comment
User Comments (0)
About PowerShow.com