DIXIE Binary Translation and Optimization for Multiple ISAs - PowerPoint PPT Presentation

About This Presentation
Title:

DIXIE Binary Translation and Optimization for Multiple ISAs

Description:

JANGO. TRACE vpc,r500,#0. Speedy & DVM. Native. ISAs. Target ... Through trace instructions inserted by Jango. Maps target system calls into host system calls ... – PowerPoint PPT presentation

Number of Views:150
Avg rating:3.0/5.0
Slides: 26
Provided by: DAC63
Category:

less

Transcript and Presenter's Notes

Title: DIXIE Binary Translation and Optimization for Multiple ISAs


1
DIXIE Binary Translation and Optimizationfor
Multiple ISAs
  • Computer Architecture Department
  • Universitat Politècnica de Catalunya-Barcelona

www.ac.upc.es/dixie
2
UPC people involved
  • Roger Espasa
  • Agustín Fernández
  • Manel Fernández
  • Victor Moya
  • Juan Lopez
  • Silvia Cernuda
  • Antonio Parada
  • Albert Ribé
  • Álex Ramírez

3
Dixie
  • Static binary translator
  • Accepts multiple ISAs (Alpha, x86, PPC, Mips,
    Convex)
  • Translates to a common IR (Dixie ISA)
  • Static binary instrumentation
  • Works on common IR but reflects source ISA
  • Static binary optimizer
  • Optimizes the common IR
  • Generates native code from common IR
  • Multiple targets supported also (Alpha, Mips)
  • Dixie Virtual Machine
  • Can run binaries specified in the common IR
  • Also runs binaries with mixture of common/native
    code

4
Dixie overview
5
Outline
  • Motivation
  • DIXIE Architecture
  • Debugging Tools
  • Performance
  • Summary

6
Outline
  • Motivation
  • DIXIE Architecture
  • Debugging Tools
  • Performance
  • Summary

7
Binary Translation
  • For embedded processors
  • Embedded market is
  • Rapidly moving
  • Changes processors frequently
  • Software (development, porting) is a major cost
    issue
  • Binary translation is cheaper than retargeting
    gcc
  • Goals
  • Retargeting must be FAST and EASY
  • Support different ISAs
  • Provide good debugging tools
  • To ease writing ISA description
  • To verify correctness of translations
  • Techniques
  • Static Translation (as much as possible)
  • Some Dynamic Translation (only if necessary)

8
Binary Optimization
  • Inevitably, binary translation introduces
    overheads
  • Use static and dynamic optimization to
  • Adapt better to new chip
  • Offset overheads of static binary translation
  • Goals
  • Eliminate overheads due to
  • Manual translation process
  • Intermediate ISA lack of expressiveness
  • Incremental development of the optimizer
  • Techniques
  • Static optimization (as much as possible)
  • Dynamic optimization (only if necessary)
  • Optimized blocks still run within Virtual Machine

9
Instrumentation
  • Instrumentation of program binaries
  • For computer architecture research
  • Due to lack of access to exotic machines
  • Historical origin of Dixie
  • Many classes of tools, but...
  • Different tools for different machines
  • Porting tools is difficult
  • Few tools allow research on vector machines or
    new ISAs
  • Lack of wrong-path information
  • Dixie goals
  • Cross-platform instrumentation
  • Research on multiple discontinued ISAs
  • Full architecture coverage
  • Wrong-path information

10
Outline
  • Motivation
  • DIXIE Architecture
  • Debugging Tools
  • Performance
  • Summary

11
Dixie overview
12
ISA highlights
13
ISA highlights
  • RISC-style architecture
  • 128-bit instructions (single format)
  • Thumb-like format under development
  • Only three addressing modes (vector support)
  • 32768 general purpose registers (64-bit wide)
  • Data types
  • Signed/unsigned integer (8, 16, 32 and 64 bits)
  • IEEE/754 floating point (single and double)
  • Address space
  • Load/store architecture
  • 32-bit and 64-bit address spaces
  • Bi-endian memory accesses
  • Byte granularity (no alignment is necessary)
  • Special support for some flags (x86 and PowerPC)

14
Dixie compiler
15
Dixie compiler components
Dixie binary
Target binary
16
Machine ISA Description
  • Describes ISA in a single file
  • yacc-style
  • specify translation for each machine instruction
  • provide debugging for translation
  • Format
  • (1) Directives
  • (2) ISA decoding and translation

17
Directives
  • Instruction size
  • .fixed_instr_size
  • Endianism
  • .big_endian,.little_endian
  • Memory address size
  • .32bits, .64bits
  • Stack placement
  • .place_begin, .place_before, .place_after,
    .place_end
  • Special registers
  • .sp, .errno, .zero, .vl, .vs, .fpcr, .tmpreg
  • Flags .cf, .of, .uf, .df, .zf, .sf, .pf
  • Init code
  • .start Dixie Instructions

18
ISA Translation Alpha
  • format( o6 ra5 disp21 )
  • if ( o 0x39 )
  • "BEQ rd, 0xlx", ra, UPC4SEXT(disp,21)
  • actionAEBB naddr2 addr0UPC
    addr1UPCSEXT(disp,21)4
  • lt VBEQ.lo.64 r(ra), 4SEXT(disp,21)4 gt
  • END
  • if ( o 0x3e )
  • "BGE rd, 0xlx", ra, UPC4SEXT(disp,21)
  • actionAEBB naddr2 addr0UPC
    addr1UPCSEXT(disp,21)4
  • lt VBGE.c2.64 r(ra), 4SEXT(disp,21)4 gt
  • END

19
ISA Translation Convex
  • format( hc2 op6 ind1 len1 aj3 rk3 )
  • if ( hc lt0x0gt )
  • switch( len )
  • case 0x0 format ( disp16 )
  • BREAK
  • case 0x1 format ( disp32 )
  • BREAK
  • action ACONT
  • if ( ind lt0x1gt aj lt0x0, 0x2, 0x4, 0x6gt )
  • if ( op 0x13 ) // shf N,Ak --OK--
  • "shf d,ad", SEXT(disp, NBITS(len)), rk
  • lt MOV.c2.8 r(TMP0), disp gt
  • lt BGE.c2.8 r(TMP0), 4 gt
  • lt SUB.c2.8 r(TMP0), r(ZERO), r(TMP0) gt
  • lt SHR.lo.32 r(A(rk)), r(A(rk)), r(TMP0) gt
  • lt BR.lo.64 r(TMP0), 2 gt
  • lt SHL.lo.32 r(A(rk)), r(A(rk)), r(TMP0) gt
  • END

20
Unix OS interface
  • Many incompatibilities between Unixes
  • Need to map native OS to host OS
  • Two-step process
  • Special handling for errno

21
Typical incompatibilities
  • Syscall numbers
  • Common flags
  • O_RDONLY is different in Convex/Alpha
  • etc
  • Structs
  • Layout
  • Some vendors have extra fields (struct stat)
  • Errno codes
  • Hidden behavior in libc.a
  • getpid() also used as getppid() in ConvexOS

22
Jango
23
Breakpoints trace
MOV.lo.32 r11,r10
mov a0,a1
MOV.lo.32 r11,r10
ld.w _at_8(a1),a2
LOAD.lo.32 r500,r11,8
TRACE vpc,r11,8
LOAD.lo.32 r12,r500,0
sub.w 8,a2
LOAD.lo.32 r500,r11,8
SUB.c2.32 r12,r12,8
TRACE vpc,r500,0
LOAD.lo.32 r12,r500,0
SUB.c2.32 r12,r12,8
24
Speedy DVM
25
Speedy DVM
  • Dixie binary is optimized by Speedy
  • Optimizations at basic block (BB) level
  • Translate Dixie BBs into native code
  • Generates .speedy sections
  • Dixie binary is runable on top of the DVM
  • Emulates the behavior of each Dixie instruction
  • Interpreting each Dixie instruction
  • Jumping into sequences of Speedy BBs
  • Interacts with the user simulator
  • Through trace instructions inserted by Jango
  • Maps target system calls into host system calls
  • Through DixOS

26
DVM Portability
  • DVM runs on all major hardware combinations

27
Speedy Architecture
  • Front End Understands Dixie ISA
  • Optimizes Dixie Code (NOP, VPC, CSE)
  • Lowers Representation
  • Load Virtual Registers into physical registers
  • Local register allocation
  • Load large constants into registers
  • Back End Translates Dixie ISA into target ISA
  • Instruction translation
  • Opcode selection
  • Big/Little endian memory access
  • Alignment issues
  • Peephole Optimizer
  • Recognize instruction sequences
  • Remove redundant loads
  • Remove redundant branches

28
Outline
  • Motivation
  • DIXIE Architecture
  • Debugging Tools
  • Performance
  • Summary

29
Debugging
  • Porting to a new ISA is not easy
  • Many cut-and-paste bugs
  • A trivial bug may take weeks to be found without
    appropriate tools
  • We would like developers to
  • Test-as-you-go every instruction description
  • Test each instruction almost in isolation
  • Quickly compare DVM and native results

30
Outline
  • Motivation
  • DIXIE Architecture
  • Debugging Tools
  • Performance
  • Summary

31
Performance
  • Benchmark suite
  • SPECint95
  • Environment
  • DEC Alpha AXP-21264 running at 625 MHz
  • OSF/1 v4.0
  • Two versions of the Dixie binaries
  • DVM pure Dixie binaries
  • Speedy Dixie binaries optimized using Speedy

32
DVM slowdown
Alpha on Alpha
33
Performance Breakdown
Fixed Inputs (Alpha), Different platforms
34
Performance Breakdown
Different inputs, fixed platform (Power2/AIX)
35
Outline
  • Motivation
  • DIXIE Architecture
  • Debugging Tools
  • Performance
  • Summary

36
Summary
  • Binary translation optimization
  • Are becoming important tools in the embedded
    market
  • Promise lower development costs
  • When changing architectures
  • Are also of interest to major computer
    manufacturers
  • IA-64 emulation
  • Transmeta
  • FX!32 (now obsolete)
  • DIXIE
  • Robust tool that meets most translation demands
  • Multi-ISA, Multi-platform
Write a Comment
User Comments (0)
About PowerShow.com