Title: Build GCC Cross Compiler for a Specify CPU
1Build GCC Cross Compiler for a Specify CPU
- Chia-Tsun Wu
- D92943007
- tommy_at_access.ee.ntu.edu.tw
2Outline
- Introduction to SoC
- Motivation and project goal
- Design a CPU
- Tools are used to design CPU hardware
- CPU Specification
- CPU Design flow
- Simulation and Results
3Outline
- Build a GCC Cross Compiler
- GCC structure
- Knowledge to port GCC
- Build Flow
- Build a GCC Cross Assembler and Cross Linker
- Build a GCC Cross Compiler
- A simple test program
- Summary
4Introduction to SoC
- SoC System on a Chip.
- Highly integrated include
- CPU
- System Bus
- Peripherals
- Co-processor
-
- Low cost, low area, high performance.
5What is SOC?
6SOC Design Flow
System Specs..
HW/SW Partitioning
Hardware Descript.
Software Descript.
HW Synth. and Configuration
Software Gen. Parameterization
Interface Synthesis
Configuration Modules
Hardware Components
HW/SW Interfaces
Software Modules
HW/SW Integration and Cosimulation
Integrated System
System Evaluation
Design Coverification
System Validation
7Motivation and project goal
- Motivation
- SoC is the major trend in recent years
- CPU is one of the key kernel of SoC design
- Development environment is the most important to
a CPU - Goal
- Design a simple 32-bit RISC CPU
- Build a cross assembler and cross linker for a
specify CPU - Build a cross compiler for a specify CPU
8Design a CPU
- Specification
- 32-bit RISC based CPU
- General-purpose register architecture
- 32-bit (64 Gbyte) addressing
- 32-bit fixed instruction length (excluding
immediate data) - MSB first
- Reset address 0x000ffffc
- No pipeline, one instruction cycle four clock
cycles - Instruction fetch
- Instruction decode and Data fetch
- Execution
- Write back
- No interrupt
- No timer
9Registers
- General purpose register R0R15
- R13 Accumulator
- R14 memory data pointer
- R15 stack pointer
- Program counter (PC) (0x000ffffc after reset)
- Program status (PS) (Sign flag, Zero flag,
oVerflow flag, Carry flag)
10Instruction formats
- General OP Rn1, Rn2
- OP 8 bits
- n register number 0000 R0, 1111 R15
- Immediate OP data, Rn2
- OP 8 bits
- n register number 0000 R0, 1111 R15
- data32 bit data
- Branch OP Addr
- OP 16 bit (low byte0x00)
- Addr 32 bits branch address
11Instruction sets
- ADD Rn1,Rn2 Machine code00000000Rn1Rn2
- Rn2Rn1Rn2
- Flag SZVC
- ADDC Rn1,Rn2 Machine code00000001Rn1Rn2
- Rn2Rn1Rn2
- Flag SZVC
- SUB Rn1,Rn2 Machine code00000010Rn1Rn2
- Rn2Rn2-Rn1
- Flag SZVC
- SUBC Rn1,Rn2 Machine code00000011Rn1Rn2
- Rn2Rn2-Rn1
- Flag SZVC
12Instruction sets
- LDI data,Rn2 Machine code00001000000Rn2Data
- Rn2data
- Flag
- MOV Rn1,Rn2 Machine code00000101Rn1Rn2
- Rn2Rn1
- Flag
- RET Machine code0000011000000000
- PCSP--
- Flag
- JMP Addr Machine code0000011100000000Addr
- PCAddr
- Flag
13Tools are used
- Synposis Design Compiler
- Mentor Graph ModelSim
- Synposis Apollo
- TSMC 0.25um standard cell libraries
14Design Flow
CPU Specifications
RTL Coding
Function simulation
Test bench
Design compiler
Constrain
Gate level simulation
Test bench
Apollo
Constrain
Test bench
Post layout simulation
Tape out
15Test vectors
- LDI 0x0,R0 00000000000000000000010000000000
00000000000000000000000000000000 - LDI 0x1,R1 00000000000000000000010000000001
00000000000000000000000000000001 - LDI 0x2,R2 00000000000000000000010000000010
00000000000000000000000000000010 - LDI 0x3,R3 00000000000000000000010000000011
00000000000000000000000000000011 - LDI 0x4,R4 00000000000000000000010000000100
00000000000000000000000000000100 - LDI 0x5,R5 00000000000000000000010000000101
00000000000000000000000000000101 - LDI 0x6,R6 00000000000000000000010000000110
00000000000000000000000000000110 - LDI 0x7,R7 00000000000000000000010000000111
00000000000000000000000000000111 - LDI 0x8,R8 00000000000000000000010000001000
00000000000000000000000000001000 - LDI 0x9,R9 00000000000000000000010000001001
00000000000000000000000000001001 - LDI 0xa,R10 00000000000000000000010000001010
00000000000000000000000000001010 - LDI 0xb,R11 00000000000000000000010000001011
00000000000000000000000000001011 - LDI 0xc,R12 00000000000000000000010000001100
00000000000000000000000000001100 - LDI 0xd,R13 00000000000000000000010000001101
00000000000000000000000000001101 - LDI 0xe,R14 00000000000000000000010000001110
00000000000000000000000000001110 - LDI 0xf,R15 00000000000000000000010000001111
00000000000000000000000000001111 - ADD R0,R1 00000000000000000000000000000001
- ADDC R2,R3 00000000000000000000000100100011
- SUB R4,R5 00000000000000000000001001000101
16Simulation result
17Synthesis results
- TSMC 0.25um
- Area0.35mmmm
- Clock400MHz
- Power1.73mW
- UMC 0.18um
- Area0.19mmmm
- Clock600MHz
- Power1mW
18Build a GCC Cross Compiler
- GCC structure
- Knowledge to port GCC
- Build Flow
- Build a GCC Cross Assembler and Cross Linker
- Build a GCC Cross Compiler
- A simple test program
- Summary
19GCC Execution
20The Structure of Compiler
21The Structure of GCC
22GCC Code Generation
- Backend machine description pattern match
intermediate format (RTL). - Machine description like a template.
- Machine description includes
- type bit widths, memory alignment
- instruction patterns, register classes
- peephole optimization rules
23GCC Code Generation (contd)
24Example of RTL
- Adds two 4-byte integer (SImode) operands.
- First operand is register
- Register is also 4-byte integer.
- Register number is 8.
- Second operand is constant integer.
- Value is 123.
- Mode is VOIDmode (not given).
25Templates
- Used for three purposes
- Generating RTL from parse tree.
- Generating machine insns from RTL.
- Specifying parameters about instructions.
- Sample Template for RISC machine
26GCC Porting and Retargeting
- Porting to new machines/processors
- The Using and Porting the GCC book and
self-contained. - Done by describing machine, not how to compile
for machine. - Using GCC as backend for other language
- Few well-documented.
- Few examples.
- See GNAT?GNU Cobol?Fortran porting.
- In both case, copy from similar ports.
27How to port GCC
- In directory gcc-xxx/gcc/config/machine/
- machine.h
- Contain C macros that define general attributes
of the machine. - machine.md
- Contain RTL expressions that define the
instruction set. - Input to programs that procude .h and .c files.
- machine.c
- Machine-dependent functions normally things too
large to cleanly put into above two files.
28How to port GCC (contd)
29gcc/config--Architecture characteristic key
- H A hardware implementation does not exist.
- M A hardware implementation is not currently
being manufactured. - S A Free simulator does not exist.
- L Integer registers are narrower than 32 bits.
- Q Integer registers are at least 64 bits wide.
- N Memory is not byte addressable, and/or bytes
are not eight bits. - F Floating point arithmetic is not included in
the instruction set - I Architecture does not use IEEE format floating
point numbers - C Architecture does not have a single condition
code register. - B Architecture has delay slots.
- D Architecture has a stack that grows upward.
- l Port cannot use ILP32 mode integer arithmetic.
30gcc/config--Architecture characteristic key
- q Port can use LP64 mode integer arithmetic.
- r Port can switch between ILP32 and LP64 at
runtime. (Not necessarily supported by all
subtargets.) - c Port uses cc0.
- p Port does not use define_peephole.
- f Port does not define prologue and/or epilogue
RTL expanders. - g Port does not define TARGET_ASM_FUNCTION_(PROEP
I)LOGUE. - m Port does not use define_constants.
- b Port does not use '" ..."' notation for output
template code. - d Port uses DFA scheduler descriptions.
- h Port contains old scheduler descriptions.
- a Port generates multiple inheritance thunks
using TARGET_ASM_OUTPUT_MI(_VCALL)_THUNK. - t All insns either produce exactly one assembly
instruction, or trigger a define_split. - e ltarchgt-elf is not a supported target.
- s ltarchgt-elf is the correct target to use with
the simulator in /cvs/src.
31gcc/config--Architecture characteristic key
32define_peephole
- In addition to instruction patterns the md' file
may contain definitions of machine-specific
peephole optimizations. - The combiner does not notice certain peephole
optimizations when the data flow in the program
does not suggest that it should try them. - For example, sometimes two consecutive insns
related in purpose can be combined even though
the second one does not appear to use a register
computed in the first one. A machine-specific
peephole optimizer can detect such opportunities.
33define_splits
- Often you can rewrite the single insn as a list
of individual insns, each corresponding to one
machine instruction. - The compiler splits the insn if there is a reason
to believe that it might improve instruction or
delay slot scheduling. - Splits are evaluated after the combiner pass and
before the scheduling passes - Splits optimaized the speed and instruction
length - they are the perfect place to put this
intelligence. - Ex If we are loading a small negative constant
we can save space and time by loading the
positive value and then sign extending it.
34define_expand
- On some target machines, some standard pattern
names for RTL generation cannot be handled with
single insn, but a sequence of RTL insns can
represent them. - For these target machines, you can write a
define_expand' to specify how to generate the
sequence of RTL. - A define_expand' is an RTL expression that looks
almost like a define_insn' but, unlike the
latter, a define_expand' is used only for RTL
generation and it can produce more than one RTL
insn. - The combiner pass only
- cares about reducing the number of instructions
- does not care about instruction lengths or speeds
35define_insn
- Push and pop
- movsi_push
- movsi_popmove
- Move
- movqi_unsigned_register_load movqi_signed_register
_load - movqi_internal
- movhi
- movhi_unsigned_register_load movhi_signed_register
_load - movhi_internal
- movsi
- movsi_internal
- movdi
- movdi_insn
- movsf
- movsf_internal
- movsf_constant_storeSigned
- conversions from a smaller integer to a larger
integer - extendqisi2
- extendhisi2
- Addition
- add_to_stack
- addsi3
- addsi_regs
- addsi_small_int
- addsi_big_int
- addsi_for_reload
- Subtraction
- subsi3
- Multiplication
- mulsidi3
- umulsidi3
- mulhisi3
- umulhisi3
- mulsi3
- Negation
- negsi2
- Shifts
- ashlsi3
36define_insn
- Logical Operations
- andsi3
- iorsi3
- xorsi3
- one_cmplsi2
- Comparisons
- cmpsi
- cmpsi_internal
- Branches
- beq
- bne
- blt
- ble
- bgt
- bge
- bltu
- bleu
- bgtu
- bgeu
- Calls Jumps
- call
- call_value
- jump
- indirect_jump
- tablejump
- Function Prologues and Epilogues
- prologue
- epilogue
- return_from_func
- leave_func
- enter_func
- Miscellaneous
- nop
- blockage
37define_insn addsi_regs
- (define_insn "addsi_regs"
- (set (match_operandSI 0 "register_operand"
"r") - (plusSI (match_operandSI 1 "register_operand"
"0") - (match_operandSI 2 "register_operand"
"r"))) - ""
- "add 2, 0"
- )
- set value x
chapter 9.15 p110 - valuex
- (plusm x y)
- xy with carry out in mode m
38define_insn addsi_regs (contd)
- (mach_operandm n predicate constraint)
chapter 10.4 p131 - if condition(predicate) is true then
return n - n count from 0
- for each number n, only one match_operand
expression - predicate is a name of C function call.
return 0 when failed - general_operand check the operand is
either a constant, a register, or a memory
reference - register_operand check the operand
is register or not - immediate_operand check the operand
is immediate data or not - constraint describes one kind of operand
that is permited - r register
- m any kind of memory operand
- o only offsetable memory operand
- V only not offsetable memory operand
- lt memory operand with autodecrement
addressing - gt memory operand with autoincrement
addressing - i immediate integer operand
- 09 an operand that matches the
specified operand number is allowed.
39Build a GCC Cross Compiler
Configure GCC
Configure Binutils
Machine Description
Make
Make
Make install
Make install
GCC compiler
40Build a GCC Cross Assembler and Cross Linker
- Binutils Ver 2.14
- Configure --targetfr30-elf prefixdir
- Make
- Make install
41Build a GCC Cross Compiler
- GCC ver 3.3.1
- ../configure --targetfr30-elf --prefixdir
--enable-languagesc - Make
- Make install
42A simple c to test cross compiler
- int test(int i,int j,int k)
-
- int a
- int b
- a49999999
- b39999999
- ak
- bj
- a
- b--
- i a b
- return i
-
- fr30-elf-gcc S O2 t.c
43A simple c to test cross compiler (contd)
- .file "t.c"
- .text
- .p2align 2
- .globl test
- .type test, _at_function
- test
- mov r4, r2
00000000000000000000010101000010 - ldi32 50000000, r4
00000000000000000000010000000100
10111110101111000010000000 - ldi32 39999998, r1
00000000000000000000010000000001
10011000100101100111111110 - add r6, r4
00000000000000000000000001100100 - add r5, r1
00000000000000000000000001010001 - add r1, r4
00000000000000000000000000010100 - add r2, r4
00000000000000000000000000100100 - ret
- .size test, .-test
- .ident "GCC (GNU) 3.3.1 (cygming special)"
44A simple c to test cross compiler (contd)
45Summary
- Study RTL is more important than study MD.
- Build cross assembler and cross linker before
build cross compiler. - There are few data to port GCC as a cross
compiler - Modify an existing MD is easier than to create a
new one. - The main goal of GCC was to make a good, fast
compiler for machines in the class that the GNU
system aims to run on 32-bit machines that
address 8-bit bytes and have several general
registers. -- Richard Stallman. - It seems that to design a new CPU is easier than
to build a cross compiler for a GIEE studient. - http//gcc.gnu.org