Title: Chapter 2: Instruction Set Architecture
1Chapter 2 Instruction Set Architecture
- Yirng-An Chen
- Dept. of CIS
- Computer Architecture
- Fall, 2000
2Computer ArchitectureHistorical Perspective
- 1950s to 1960s Computer Architecture Course
Computer Arithmetic - 1970s to mid 1980s Computer Architecture
Course Instruction Set Design, especially ISA
appropriate for compilers - 1990s Computer Architecture Course Design of
CPU, memory system, I/O system, Multiprocessors
3Instruction Set Architecture (ISA)
software
instruction set
hardware
4Interface Design
- A good interface
- Lasts through many implementations (portability,
compatability) - Is used in many differeny ways (generality)
- Provides convenient functionality to higher
levels - Permits an efficient implementation at lower
levels
use
time
imp 1
Interface
use
imp 2
use
imp 3
5Evolution of Instruction Sets
Single Accumulator (EDSAC 1950)
Accumulator Index Registers
(Manchester Mark I, IBM 700 series 1953)
Separation of Programming Model from
Implementation
High-level Language Based
Concept of a Family
(B5000 1963)
(IBM 360 1964)
General Purpose Register Machines
Complex Instruction Sets
Load/Store Architecture
(Vax, Intel 432 1977-80)
(CDC 6600, Cray 1 1963-76)
RISC
CISC
(Mips,Sparc,HP-PA,IBM RS6000,PowerPC 1987)
(Intel x86 1980-199x)
Mixed CISC RISC?
(IA-64. . .1999)
6Basic Issues in Instruction Set Design
- What operations and How many
- Load/store/Increment/branch are sufficient to do
any computation, but not useful (programs too
long!!). - How (many) operands are specified?
- Most operations are dyadic (e.g., A?BC) Some
are monadic (e.g., A? ?B). - How to encode them into instruction format?
- Instructions should be multiples of Bytes.
- Typical Instruction Set
- 32-bit word
- Basic operand addresses are 32-bit long.
- Basic operands (like integer) are 32-bit long.
- In general, Instruction could refer 3 operands
(A?BC). - Challenge Encode operations in a small number of
bits.
7What Must be Specified?
- Instruction Format (encoding)
- How is it decoded?
- Location of operands and result
- Where other than memory?
- How many explicit operands?
- How are memory operands located?
- Data type and Size
- Operations
- What are supported?
8Basic ISA Classes
- Accumulator
- 1 Address add A (acc ? acc MemA).
- Stack
- 0 address add (tos ? tos second of stack).
- General Purpose Register
- 2 addresses add A, B EA(A)
?EA(A)EA(B) - 3 addresses add A, B, C EA(A)
?EA(C)EA(B) - Load/Store (register-register)
- ALU operations No memory reference.
- 3 addresses add R1, R2, R3 R1 ? R2 R3
- load R1, R2
R1 ?MemR2 - store R1, R2
MemR1 ? R2 - Comparison Bytes per Instruction? Number of
Instructions? Cycles per instruction?
9Comparison of ISA Classes
- Memory efficiency? Instruction access? Data
access?
10Stack Machine
- Instruction Set Push, Pop, , -, , /, etc.
- Example AB - (ABC)
- Push A
- Push B
-
- Push A
- Push B
- Push C
-
-
- -
- Drawbacks
- Duplicate data accesses.
- Not good for an optimizing compiler.
11General Purpose Register
- All machines use general purpose registers after
1975. - Advantages of registers
- Registers are faster than memory.
- Registers are easier for a compiler to use.
- E.g. (AB) - (CD) - (EF) can do multiplication
in any order, but stack? - Registers can hold variables.
- Memory traffic is reduced.
- Code density improved (since register name with
fewer bits than memory address).
12Examples of Register Usage
- Typical ALU Instructions
- MIPS add Rd, Rs, Rt ? (0,3)
- 80x86 ADD AL, SI ? (1,2)
- VAX CMPB (R0), (R0) ? (2,2)
13Pros and Cons
- Register-Register (0,3)
- Simple, fixed length instruction encoding.
- Simple code-generation model.
- Similar number of clocks to execute.
- Higher instruction count.
- Bit encoding may be wasteful.
- Memory-memory (3,3)
- Most compact.
- Different Instruction size.
- Memory access bottleneck.
- Register-Memory (1,2)
- Data access without loading first.
- Easy to encode and yield good density.
- One operand is destroyed.
- Limited number of registers.
14Byte Ordering
- Idea
- Bytes in long word numbered 0 to 3
- Which is most (least) significant?
- Can cause problems when exchanging binary data
between machines - Big Endian Byte 0 is most, 3 is least
- IBM 360/370, Motorola 68K, Sparc.
- Little Endian Byte 0 is least, 3 is most
- Intel x86, VAX
- Alpha
- Chip can be configured to operate either way
- DEC workstation are little endian
- Cray T3E Alphas are big endian
15Byte Ordering Example
union unsigned char c8
unsigned short s4 unsigned int i2
unsigned long l1 dw
16Byte Ordering Example (Cont).
int j for (j 0 j lt 8 j) dw.cj 0xf0
j printf("Characters 0-7 0xx,0xx,0xx,0xx
,0xx,0xx,0xx,0xx\n", dw.c0, dw.c1,
dw.c2, dw.c3, dw.c4, dw.c5, dw.c6,
dw.c7) printf("Shorts 0-3
0xx,0xx,0xx,0xx\n", dw.s0, dw.s1,
dw.s2, dw.s3) printf("Ints 0-1
0xx,0xx\n", dw.i0, dw.i1) printf("Lon
g 0 0xlx\n", dw.l0)
17Byte Ordering on Alpha
Little Endian
f0
f1
f2
f3
f4
f5
f6
f7
c3
c2
c1
c0
c7
c6
c5
c4
LSB
MSB
LSB
MSB
LSB
MSB
LSB
MSB
s1
s0
s3
s2
LSB
MSB
LSB
MSB
i0
i1
LSB
MSB
l0
Print
Output on Alpha
Characters 0-7 0xf0,0xf1,0xf2,0xf3,0xf4,0xf5,0
xf6,0xf7 Shorts 0-3 0xf1f0,0xf3f2,0xf5f4,
0xf7f6 Ints 0-1 0xf3f2f1f0,0xf7f6f5f4
Long 0 0xf7f6f5f4f3f2f1f0
18Byte Ordering on x86
Little Endian
f0
f1
f2
f3
f4
f5
f6
f7
c3
c2
c1
c0
c7
c6
c5
c4
LSB
MSB
LSB
MSB
LSB
MSB
LSB
MSB
s1
s0
s3
s2
LSB
MSB
LSB
MSB
i0
i1
LSB
MSB
l0
Print
Output on Pentium
Characters 0-7 0xf0,0xf1,0xf2,0xf3,0xf4,0xf5,0
xf6,0xf7 Shorts 0-3 0xf1f0,0xf3f2,0xf5f4,
0xf7f6 Ints 0-1 0xf3f2f1f0,0xf7f6f5f4
Long 0 f3f2f1f0
19Byte Ordering on Sun
Big Endian
f0
f1
f2
f3
f4
f5
f6
f7
c3
c2
c1
c0
c7
c6
c5
c4
LSB
MSB
LSB
MSB
LSB
MSB
LSB
MSB
s1
s0
s3
s2
MSB
LSB
MSB
LSB
i0
i1
MSB
LSB
l0
Print
Output on Sun
Characters 0-7 0xf0,0xf1,0xf2,0xf3,0xf4,0xf5,0
xf6,0xf7 Shorts 0-3 0xf0f1,0xf2f3,0xf4f5,
0xf6f7 Ints 0-1 0xf0f1f2f3,0xf4f5f6f7
Long 0 0xf0f1f2f3
20Addressing Modes
21Addressing Modes(Cont.)
Memory
22Addressing Modes(Cont.)
Scaled Add R1, 100(R2) R3 RegsR1 ?
RegsR1Mem100 RegsR2RegsR3d
100
R2
R3
Operand
d
Registers
Memory
23Addressing Mode Usage
- 3 Programs from SPEC89 on VAX
- Others ? 0.
24Displacement Address Size
- Average of 5 programs from SPECint92 and
SPECfp92. - X-axis is log2 of displacement.
- 1 of addresses gt 16 bits.
Integer Average
FP Average
25Immediate Addressing Mode
- 10 Programs from SPECInt92 and SPECfp92
26Immediate Addressing Mode
- 50 to 60 fit within 8 bits
- 75 to 80 fit within 16 bits
gcc
spice
Tex
27Addressing Mode Summary
- Important data addressing modes
- Displacement
- Immediate
- Register Indirect
- Displacement size should be 12 to 16 bits.
- Immediate size should be 8 to 16 bits.
28Instruction Operations
- Arithmetic and Logical
- add, subtract, and , or, etc.
- Data transfer
- Load, Store, etc.
- Control
- Jump, branch, call, return, trap, etc.
- Synchronization
- Test Set.
- String
- string move, compare, search.
29Top-9 x86 Instructions
- Simple Instructions dominates instruction
frequency.
30Methods of Testing Condition
- Condition code Status bits are set by ALU
operations. - Add r1, r2, r3 and bz label
- Extra status bits
- Condition register
- cmp r1, r2, r3 and bgt r1, label
- Simple, but use up a register
- Compare and branch
- bgt r1, r2, label
- One instruction
- Too much work per instruction
31Conditional Branch Distance
- Short displacement fields often sufficient for
branch
FP Average
Integer Average
32Conditional Branch Addressing
- PC-relative, since most branches from current PC
address - At least 8 bits.
- Compare Equal/Not Equal most important for
integer programs.
33Data Types and Usage
- Byte, half word (16 bits), word (32 bits), double
word (64 bits). - Arithmetic
- Decimal 4bit per digit.
- Integers 2s complement
- Floating-point IEEE standard-- single, double,
extended precision.
34Instruction Format
- Fixed
- Operation, address specifier 1, address specifier
2, address specifier 3. - MIPS, SPARC, Power PC.
- Variable
- Operation of operands, address specifier1, ,
specifier n. - VAX
- Hybrid
- Intel x86
- operation, address specifier, address field.
- Operation, address specifier 1, address specifier
2, address field. - Operation, address field, address specifier 1,
address specifier 2. - Summary
- If code size is most important, use variable
format. - If performance is most important, use fixed
format.
35Summary ISA
- Use general purpose registers with a load-store
architecture. - Support these addressing modes displacement,
immediate, register indirect. - Support these simple instructions load, store,
add, subtract, move register, shift, compare
equal, compare not equal, branch, jump, call,
return. - Support these data size 8-,16-,32-bit integer,
IEEE FP standard. - Provide at least 16 general purpose registers
plus separate FP registers and aim for a minimal
instruction set.
36Sparc Processors
- Reduced Instruction Set Computer (RISC)
- Simple instructions with regular formats
- Key Idea make the common case fast!
- infrequent operations can be synthesized using
multiple instructions - Assumes compiler will do optimizations
- e.g., scalar optimization, register allocation,
scheduling, etc. - ISA designed for compilers, not assembly language
programmers - A 2nd Generation RISC Instruction Set
Architecture - Designed for superscalar processors (i.e. gt1 inst
per cycle) - avoids some of the pitfalls of earlier RISC ISAs
(e.g., delay slots) - Reference books
- Sparc Architecture, Assembly Language
programming, C by Richard P. Paul - The Sparc Architecture Manual by David L.
Weaver/Tom Germond
37Translation Process
38Abstract Machines
1) loops 2) conditionals 3) goto 4) Proc. call 5)
Proc. return
ASM
1) byte 2) 4-byte word 3) 8-byte word 4)
contiguous word allocation 5) address of initial
byte
3) branch/jump 4) jump link
39Basic Data Types
- Integral
- Stored operated on in general registers
- Signed vs. unsigned depends on instructions used
- Sparc Bytes C
- byte 1 unsigned char
- half word 2 unsigned short
- word 4 unsigned int
- Floating Point
- Stored operated on in floating point registers
- Special instructions for four different formats
(only 2 we care about) - UltraSparc Bytes C
- S_floating 4 float
- T_floating 8 double
40Sprac Register Convention
- General Purpose Registers 32 total (32- or
64-bits), Store integers and pointers - Usage Conventions Established as part of
architecture - Used by all compilers, programs, and libraries
and Assured object code compatibility
l0 l1 l2 l3 l4 l5 l6 l7 i0 i1 i2 i3 i4 i5 i6,fp i
7
g0 g1 g2 g3 g4 g5 g6 g7 o0 o1 o2 o3 o4 o5 o6,sp o
7
Always zero
Local data
Global data
Local data or arguments to called routine
Integer arguments
Frame pointer
Stack pointer
Call address
41Floating Point Unit
- Implemented as Separate Unit
- Hardware to add, multiply, and divide
- Floating point data registers
- Various control status registers
- Floating Point Formats
- S_Floating (C float) 32 bits
- T_Floating (C double) 64 bits
- Floating Point Data Registers
- 32 registers, each 4 bytes
- Labeled f0 to f31
f0
f1
f2
f3
f4
f5
f6
f7
f8
f9
f10
f11
f12
f13
f14
f15
f16
f17
f18
f19
f20
f21
f22
f23
f24
f25
f26
f27
f28
f29
f30
f31
42Instruction Formats
43Program Representations
Compiled to Assembly
int test2(int x,int y) return (xxx) -
(yyy)
.align 4 .global test1 .type test1,function .
proc 04 test1 sll o0,1,g2 !g2o02 add
g2,o0,g2 !g2o0g2 sll o1,1,o0
!o0o12 add o0,o1,o0 !o0o0o1 retl
!return sub g2,o0,o0 !o0g2-o0
Obtain with command gcc -O -S code.c Produces
file code.s
Place result in o0
44Prog. Representation (Cont.)
Object
Disassembled
0x0 lttest1gt sll o0, 1, g2 0x4 lttest14gt add
g2, o0, g2 0x8 lttest18gt sll o1, 1, o0 0xc
lttest112gt add o0, o1, o0 0x10
lttest116gt retl 0x14 lttest120gt sub g2, o0,
o0
0x0 lttest1gt 0x852a2001 0x84008008
0x912a6001 0x90020009 0x81c3e008 0x90208008
- Run gdb on object code
- x/6 0x0
- Print 6 words in hexadecimal starting at address
0x0 - disassemble test1
- Print disassembled version of procedure
45Alternate Disassembly
- Sparc program dis
- /usr/ccs/bin/dis file.o
- Prints disassembled version of object code file
- Code not yet linked
- Addresses of procedures and global data not yet
resolved
0 85 2a 20 01 sll o0, 1, g 4 84 00
80 08 add g2, o0, g2 8 91 2a 60 01
sll o1, 1, o0 c 90 02 00 09 add
o0, o1, o0 10 81 c3 e0 08 jmp o7
8 14 90 20 80 08 sub g2, o0, o0
46Pointer Examples
Annotated Assembly
int iaddp(int xp,int yp) int x xp
int y yp return x y
iaddp ld o0,g2 !g2xp ld o1,o0
!o0yp retl add g2,o0,o0 !o0xy
void incr(int sum, int val) int old
sum int new oldval sum new
incr ld o0,g2 !g2sum add
g2,o1,g2 !g2oldval retl st g2,o0
!store g2 to sum
47Array Indexing
Annotated Assembly
long int arefl(long int a, long int
i) return ai
arefl sll o1,2,o1 !o1 i 4 retl ld
o0o1,o0 !Load ai to o0
long int garray10 long int gref(long int
i) return garrayi
.common garray,40,4 grefl sethi
hi(garray),g2 or g2,lo(garray),g2 sll
o0,2,o0 !o0 i 4 retl ld
o0g2,o0 !Load garrayi to o0
48Structures Pointers
struct rec int i int a3 int p
Annotated Assembly
void set_i(struct rec r, int val) r-gti
val
set_i retl st o1,o0 !r-gti val
find_a sll o1,2,o1 !1idx4 add
o1,o0,o1 !1r retl ld o14,o0
!0r-gtaidx
int find_a(struct rec r,int idx) return
r-gtaidx
void set_p(struct rec r,int ptr) r-gtp
ptr
set_p retl st o1,o016 !r-gtpptr
49Branches
- Unconditional Branches
- ba label
- Conditional Branches
- cmp Ra, Rb
- bCond label
- Reseult of Ra relative to Rb is set to
flagsZ(Is Zero),N(Is Negative),V(Is too large,
overflow) - Cond branch condition, relative to zero
- be Equal Z1
- bne Not Equal Z0
- bl Less Than (NV)1
- ble Less Than or Equal (ZNV)1
- bg Greater Than (ZNV)0
- bge Greater Than or Equal (NV)0
50Conditional Branches
Annotated Assembly
C Code
condbr mov o0,g2 !g2x cmp g2,o1 !compare
x and y ble .LL2 !branch if xlty mov
0,o0 !v0 sll g2,1,o0 !o0x2 add
o0,g2,o0 !o0xxx add o0,o1,o0
!o0xxxy .LL2 retl nop
int condbr(int x,int y) int v 0 if (x gt
y) v xxxy return v
51Do-While Loop Example
C Code
Annotated Assembly
int fact(int x) int result 1 do
result x-- while (x gt 1) return
result
fact mov 1,g2 !result1 smul
g2,o0,g2 !resultx .LL6 add
o0,-1,o0 !x-- cmp o0,1 !if(xgt1) then bg,a
.LL6 !continue looping smul
g2,o0,g2 !resultx retl !return mov
g2,o0 !copy result to o0
52While Loop Example
C Code
Annotated Assembly
int ifact(int x) int result 1 while (x gt
1) result x-- return result
ifact cmp o0,1 !if(xlt1) then ble
.LL9 !branch to LL9 mov 1,g2 !result1 smul
g2,o0,g2 !resultx .LL12 add
o0,-1,o0 !x-- cmp o0,1 ! If xgt1 then bg,a
.LL12 !continue looping smul g2,o0,g2 !result
x .LL9 retl !return result mov g2,o0 !copy
result to o0
53For Loops in C
for (init test update ) body
direct translation
init while(test ) body update
54For Loop Example
Annotated Assembly
/ Find max ele. in array / int amax(int a,int
count) int i int result a0 for (i
1 i lt count i) if (ai gt result)
result ai return result
amax mov o0,o2 !copy a to o2 mov
1,g3 !i1 cmp g3,o1 !if (igtcount), bge
.LL15 !branch to return ld o2,o0 !resulta0
sll g3,2,g2 !g2i2 .LL20 ld
o2g2,g2 !g2ai cmp g2,o0 !if (ai lt
res), bg,a .LL16 !skip then part mov
g2,o0 !resultai .LL16 add
g3,1,g3 !i cmp g3,o1 ! if (i lt count), bl
.LL20 ! continue looping sll g3,2,g2
!g2i2 .LL15 retl !return result nop
for (init test update ) body
init while(test ) body update
55Jumps
- Characteristics
- transfer of control is unconditional
- target address is specified by a register, or
constant - Format
- jmpl address,rd
- rd stores the return address
- synonyms for jmpl
- jmp address -gt jmpl address, o7
- ret -gt jmpl i78, g0
- retl -gt jmpl o78, g0
56Compiling Switch Statements
C Code
- Implementation Options
- Series of conditionals
- Good if few cases
- Slow if many
- Jump Table
- Lookup branch target
- Avoids conditionals
- Possible when cases are small integer constants
- GCC
- Picks one based on case structure
typedef enum ADD, MULT, MINUS, DIV, MOD, BAD
op_type char unparse_symbol(op_type op)
switch (op) case ADD return '' case
MULT return '' case MINUS return
'-' case DIV return '/' case MOD
return '' case BAD return '?'
57Switch Statement Example
Enumerated Values ADD 0 MULT 1 MINUS 2 DIV 3 MOD 4
BAD 5
typedef enum ADD, MULT, MINUS, DIV, MOD,
BAD op_type char unparse_symbol(op_type op)
switch (op) case ADD return ''
case MULT return '' case MINUS
return '-' case DIV return '/' case
MOD return '' case BAD return '?'
Assembly Setup
unparse_symbol cmp o0,5 !if opgt 5 then
bgu .LL1 !branch to return sethi
hi(.LL9),g2 ! or g2,lo(.LL9),g2 !g2jta
b0 sll o0,2,g3 !g3op4 ld
g3g2,g2 !g2jtabop jmp g2 !jump
to jtab code nop
58Jump Table
Table Contents
Targets Completion
.LL9 .word .LL3 .word .LL4 .word .LL5 .word .LL6
.word .LL7 .word .LL8
.LL3 b .LL1 ! return mov 43,o0 .LL4 b
.LL1 ! return mov 42,o0 .LL5 b .LL1 !
return - mov 45,o0 .LL6 b .LL1 ! return
/ mov 47,o0 .LL7 b .LL1 ! return mov
37,o0 .LL8 mov 63,o0 ! return
? .LL1 retl nop
Enumerated Values ADD 0 MULT 1 MINUS 2 DIV 3 MOD 4
BAD 5
59Procedure Calls Returns
- Maintain the return address in a special register
(o7 or i7) - Procedure call
- call address Save return addr in o7, branch to
address - Procedure return
- ret Jump to address in i78
- retl Jump to address in o78(leaf procedure)
C Code
Annotated Assembly
int callee() return 5 int caller()
return callee()
. callee retl !return to o78 mov
5,o0 !copy 5 to o0 . caller . call
callee,0 !save current address to o7 nop .
60Sparc Register windows
61Stack-Based Languages
Stack (grows down)
- Languages that support recursion
- e.g., C, Pascal
- Stack Allocated in Frames
- state for procedure invocation
- return point, arguments, locals
- Code Example
yoo
who
amI
yoo() who()
who() amI()
amI() amI()
amI
amI
62Register Saving Conventions
- When procedure yoo calls who
- yoo is the caller, who is the callee
- Caller Save Registers
- not guaranteed to be preserved across procedure
calls - can be immediately overwritten by a procedure
without first saving - useful for storing local temporary values within
a procedure - if yoo wants to preserve a caller-save register
across a call to who - save it on the stack before calling who
- restore after who returns
- Callee Save Registers
- must be preserved across procedure calls
- if who wants to use a callee-save register
- save current register value on stack upon
procedure entry - restore when returning
63Register Saving Examples
- Callee Save
- Callee must save / restore if overwriting
- Caller Save
- Caller must save / restore if live across
procedure call
yoo or 31, 17, 1 stq 1, 8(sp)
save 1 bsr 26, who ldq 1, 8(sp) restore 1
addq 1, 1, 0 ret 31, (26)
yoo or l0, 17, l1 call who
ret restore
who save sp,-112,sp !save local regs or g1,
6, l1 !overwrite l1 ret restore
!restore local regs
who or 31, 6, 1 overwrite 1 ret
31, (26)
Sparc use callee-save approach
64Sparc Stack Frame
- Conventions
- Agreed upon by all program/compiler writers
- Allows linking between different compilers
- Enables symbolic debugging tools
- Run Time Stack
- Save context
- Registers (l and i)
- Storage for local variables
- Parameters to called functions
- Required to support recursion
65Stack Frame Requirements
- Procedure Categories
- Leaf procedures that do not use stack
- Do not call other procedures
- Can fit all temporaries in caller-save registers
- Leaf procedures that use stack
- Do not call other procedures
- Need stack for temporaries
- Non-leaf procedures
- Must use stack.
- Stack Frame Structure
- Must be at least 8-byte aligned
- pad the region for locals and temporaries as
needed
66Stack Frame Example
Assembly
rfact save sp,-112,sp !save regs to
stack cmp i0,1 !if xlt1 then ble,a
.LL2 !branch to return mov 1,i0 !executed
when jump call rfact,0 !call rfact add
i0,-1,o0 !x-1 as argument smul
i0,o0,i0 !multiplication .LL2 ret !return r
estore !restore from stack
C Code
/ Recursive factorial / int rfact(int x) if
(x lt 1) return 1 return x rfact(x-1)
- Stack frame 112 bytes
- Frame ptr _at_ sp 112
- Save registers l and i
- No floating pt. regs. used
sp 112
. . .
sp 12
sp 8
sp 4
sp 0
67Stack Frame Example 2
C Code
sp 152
sp 148
void show_facts(void) int i int vals10
vals0 1L for (i 1 i lt 10 i)
valsi valsi-1 i for (i 9 i gt 0
i--) printf("Fact(d) d\n", i,
valsi)
. . .
sp 12
sp 8
sp 4
sp 0
- Stack frame 152 bytes
- Frame ptr _at_ sp 152
- Local storage for vals
- fp-20 to fp-56
68Stack Frame Example 2 (Cont.)
show_facts save sp,-152,sp mov
1,o0 !o01 st o0,fp-56 !vals01 mov
o0,l0 !l01 add fp,-56,o2 !o2fp-56 .LL8
sll l0,2,o1 !o1l0ltlt2 add
l0,-1,o1 !o1l0-1 (i-1) sll o1,2,o1
!o14 ld o2o1,o1 !o1
valsi-1 smul l0,o1,o1 !o1l0 add
l0,1,l0 !i cmp l0,9 !if ilt10 then ble
.LL8 !loop st o1,o2o0 !store to
valsi mov 9,l0 !l09 sethi hi(.LLC0),l2
!set l2 2print add fp,-56,l1 !l1fp-56 sll
l0,2,o2 !o2l04 .LL15 or
l2,lo(.LLC0),o0 !print address mov
l0,o1 !o1i call printf,0 !call printf ld
l1o2,o2 !o2valsi addcc
l0,-1,l0 !i-- bpos .LL15 !loop sll
l0,2,o2 !o2l04 ret restore
C Code
void show_facts(void) int i int vals10
vals0 1 for (i 1 i lt 10 i)
valsi valsi-1 i for (i 9 i gt 0
i--) printf("Fact(d) d\n", i,
valsi)
sp 152
sp 148
. . .
sp 12
sp 8
sp 4
sp 0
69Stack Addrs as Procedure Args
rfact2 save sp,-120,sp !sp -120 cmp
i0,1 !if xlt 1 ble .LL19 !jump to LLl9 mov
1,o0 !o01 add i0,-1,o0 !o0 x -1 call
rfact2,0 !call rfact2 add fp,-20,o1 !calculate
val mov i0,o0 ! ld fp-20,o1 !load from
val smul i0,o0,o0 !multiplication .LL19 st
o0,i1 !store to result ret !return restore
C Code
void rfact2(int x,int result) if (x lt 1)
result 1 else int val
rfact2(x-1,val) result x val
return
- Stack frame 120 bytes
- val stored at fp - 20
- fp -20 passed as second argument (o1) to
recursive call of rfact2
70Floating Point Code Example
- Compute Inner Product of Two Vectors
- Single precision
in_ProdF sethi hi(.LLC0),o3 or
o3,lo(.LLC0),o3 mov 0,g3 !i0 cmp
g3,o2 !if igtn then bge .LL3 !jump to
return ld o3,f0 !f00.0 .LL5 sll
g3,2,g2 !g2i4 ld o0g2,f2 !f2xi
ld o1g2,f3 !f3yi fmuls
f2,f3,f2 !f2xiyi add
g3,1,g3 !i cmp g3,o2 !if iltn then bl
.LL5 !loop fadds f0,f2,f0 !resultxiyi
.LL3 retl !return nop
float inner_prodF (float x, float y, int
n) int i float result 0.0 for (i 0
i lt n i) result xi yi
return result
71Double Precision
in_ProdD sethi hi(.LLC1),o3 ldd
o3lo(.LLC1),f0 !result0.0 mov 0,g3 !
i0 cmp g3,o2 !If i gtn then bge .LL9 !branch
to LL9 nop .LL11 sll g3,3,g2 !g2i4 ldd
o0g2,f2 !f2 xi ldd o1g2,f4
!f4yi fmuld f2,f4,f2 !f2xiyi add
g3,1,g3 !i cmp g3,o2 !if iltn then bl
.LL11 !looping faddd f0,f2,f0 !resultxiy
i .LL9 retl !return nop
double inner_prodD (double x, double y,
int n) int i double result 0.0 for (i
0 i lt n i) result xi
yi return result
72Numeric Format Conversion
- Between Floating Point and Integer Formats
- Special conversion instructions fdtos, fstod,
fstoi, fitos, - Convert source operand in one format to
destination in other - Both source destination must be FP register
- Transfer to from GP registers via stack
store/load
C Code
Conversion Code
float double2float(double d) return (float)
d
fdtos f2,f0
Convert double to float
st o0,sp100 ld sp100,f2 fitos f2,f0
float int2float(int i) return (float) i
Pass through stack and convert
73Structure Allocation
- Principles
- Allocate space for structure elements
contiguously - Access fields by offsets from initial location
- Offsets determined by compiler
typedef struct char c int i2 double
d struct_ele, struct_ptr
c
i0
i1
d
0
4
8
16
24
74Alignment
- Requirements
- Primitive data type requires K bytes
- Address must be multiple of K
- Specific Cases
- Word data address must be multiple of 4
- Double word data address must be multiple of 8
- Reason
- Memory accessed by (aligned) words
- Compiler
- Inserts gaps within structure to ensure correct
alignment of fields
75Structure Access
Result Computation
C Code
int struct_i(struct_ptr p) return p-gti
.align 4 add o0,4,o0 !address of 4th byte
int struct_i1(struct_ptr p) return p-gti1
ld o08,o0 !word at 8th byte
double struct_d(struct_ptr p) return p-gtd
ldd o016,f0!Double at 16th byte
char struct_c(struct_ptr p) return p-gtc
ldsb o0,o0 !byte at 0th byte
76Arrays vs. Pointers
- Recall
- Can access stored data either with pointer or
array notation - Differ in how storage allocated
- Array declaration allocates space for array
elements - Pointer declaration allocates space for pointer
only
C Code for Allocation
typedef struct char c int i double
d pstruct_ele, pstruct_ptr
pstruct_ptr pstruct_alloc(void) pstruct_ptr
result (pstruct_ptr)
malloc(sizeof(pstruct_ele)) result-gti (int
) calloc(2, sizeof(int)) return
result
c
i
d
0
8
16
4
77Accessing Through Pointer
C Code
Result Computation
int pstruct_i(pstruct_ptr p) return p-gti
ld o04,o0 !word at 4th byte
int pstruct_i1(pstruct_ptr p) return
p-gti1
i quad word at 4th byte from p ld
o04,g2 Retrieve i1 ld g24,o0
c
i
d
8
16
4
78Arrays of Structures
- Principles
- Allocated by repeating allocation for array type
- Accessed by computing address of element
- Attempt to optimize
- Minimize use of multiplication
- Exploit values determined at compile time
C Code
Address Computation
/ Index into array of struct_ele's
/ struct_ptr a_index (struct_ele a, int
idx) return aidx
sll o1,1,g2 !g2idx2 add g2,o1,g2 !g2id
x3 sll g2,3,g2 !g2idx24 add
o0,g2,o0 !aidx24
79Aligning Array Elements
- Requirement
- Must make sure alignment requirements met when
allocate array of structures - May require inserting unused space at end of
structure
typedef struct double d int i2 char
c rev_ele, rev_ptr
rev_ele a2
a must be multiple of 8
Alignment OK
80Nested Allocations
- Principles
- Can nest declarations of arrays and structures
- Compiler keeps track of allocation and access
requirements
typedef struct int x int y point_ele,
point_ptr typedef struct point_ele ll
point_ele ur rect_ele, rect_ptr
81Nested Allocation (cont.)
C Code
Computation
int area(rect_ptr r) int width
r-gtur.x - r-gtll.x int height r-gtur.y -
r-gtll.y return width height
ld i08,o0 !o0ur.x ld i0,o1
!o1ll.x sub o0,o1,o0 !o0width ld
i012,o2 !o2ur.y ld i04,o1
!o1ll.y sub o2,o1,o1 !o1hight smul
o0,o1,o0 !o0area
82Union Allocation
- Principles
- Overlay union elements
- Allocate according to largest element
- Programmer responsible for collision avoidance
typedef union char c int i2 double
d union_ele, union_ptr
c
i0
i1
d
0
4
8
83Example Use of Union
typedef enum CHAR, INT, DOUBLE
utype typedef struct utype type
union_ele e store_ele, store_ptr
- Structure can hold 3 kinds of data
- Never use 2 forms simultaneously
- Identify particular kind with flag type
void print_store(store_ptr p) switch
(p-gttype) case CHAR printf("Char
c\n", p-gte.c) break case INT
printf("Int0 d, Int1 d\n",
p-gte.i0, p-gte.i1) break case DOUBLE
printf("Double g\n", p-gte.d)
84Using Union to Access Bit Patterns
typedef union float f unsigned u
bit_float_t
float bit2float(unsigned u) bit_float_t arg
arg.u u return arg.f
void show_parts(float f) int sign, exp,
significand bit_float_t arg arg.f f /
Get bit 31 / sign (arg.u gtgt 31) 0x1 /
Get bits 30 .. 23 / exp (arg.u gtgt 23)
0xFF / Get bits 22 .. 0 / significand
arg.u 0x7FFFFF
- Get direct access to bit representation of float
- bit2float generates float with given bit pattern
- NOT the same as (float) u
- show_parts extracts different components of float
85Sparc Memory Layout
- Segments
- Data
- Static space for global variables
- Allocation determined at compile time
- Text
- Stores machine code for program
- Stack
- Implements runtime stack
- Access via sp
- Reserved
- Used by operating system
- I/O devices, process info, etc.
86RISC Principles Summary
- Simple Regular Instructions
- Small number of uniform formats
- Each operation does just one thing
- Memory access, computation, conditional, etc.
- Encourage Register Usage over Memory
- Operate on register data
- Load/store architecture
- Procedure linkage
- Rely on Optimizing Compiler
- Data allocation referencing
- Register allocation
- Improve efficiency of users code