Title: CME212 Introduction to Large Scale Computing in Engineering
1Representation II
- Representing Composite Types, Disassembly,
Function Calls, Stack, Heap
2Instructions
- The back end of the compiler generates the
machine code from the intermediate representation - This code generation allows for many
optimizations - Usually, the compiler is conservative.
- Example Strict IEEE FP compliance,
non-associative algebras (cannot reorder)
3CISC
- Complex Instruction Set Computers
- Specialized machine instructions for things like
managing linked lists, evaluating polynomials - Created a large set of instructions
- Decoding was complicated
- Hard for compiler to utilize all instructions
- Useful for hand coded assembly
- x86 (IA-32) is usually thought of as a CISC
instruction set
4RISC
- Reduced Instruction Set Computer
- Uses only a small set of atomic operations that
can be combined to form more complex ones - Easy to decode
- Developed by Patterson (Berkeley) and Hennessy
(Stanford) in the 80s - SPARC, MIPS, PowerPC, Alpha
- The compiler had to do more work
- Very cumbersome to do RISC assembly programming
by hand
5Today
- RISC and CISC has converged into something
inbetween - IA-32 instructions are CISC but decomposed into
RISC-like micro-instructions internally - Many of the ideas of RISC have survived
- Less debate today
- Most ISAs today are 64-bit
6Assembler Code
- Human-readable machine code, called assembly
code, can be produced by the compiler - gcc S myfile.c
- You can also reverse engineer assembly code from
machine code using a disassembler - objdump d file
- where file be can an executable or object file
7Compiling Into Assembly
Generated Assembly
int sum(int x, int y) int t xy
return t
_sum pushl ebp movl esp,ebp movl
12(ebp),eax addl 8(ebp),eax movl
ebp,esp popl ebp ret
Obtain with command gcc -O -S code.c Produces
file code.s
8Basic Operation
- Modern computers are of LOAD-STORE type
- Other types are accumulators, stack machines
- These machines store operands in a register
file or simply registers - Small scratch memory very close to the
arithmetical units - Data must be loaded from memory into a register,
operated upon, and then stored back
9LOAD-STORE
- A regular calculator works like an accumulator
- Early computers worked this way too
- You have one register on which your arithmetic
operations can work - In a LOAD-STORE machine you have several
accumulators or memory locations where you can
store temporaries
10Registers
- Registers are a scarce resource
- Store words
- Special registers for floating-point
- The compiler tries to maximize the usage of the
registers - Called register allocation
- If you run out of registers, you must temporarily
store results back in memory and then retreive
them again - Register spill
- Degrades performace
11Registers in C
- There are two keywords that control register
allocation from C - The register keyword forces a variable to a
register - Used for heavily accessed variables
- Today, most compilers can figure this out
themselves - You cannot take the address of something that is
stored in a register - The volatile keyword forces the results to be
written back to memory - Used in low-level and parallel progamming
12Example
- register int a 23
- volatile int b 43
13Machine Instructions (x86,IA32)
14Low-level control flow
- High-level constructs are mapped onto conditional
and unconditional jumps (branches) - Unconditional jump in C goto statement
- Jump targets (a new PC location) can be
considered as labels - You can defined labels in C too
15goto Example
- if(x
- goto less
- val x-y
- goto done
- less
- / y is larger than x /
- val y-x
- done
- use(val)
if(xx-y . . use(val)
16Conditional Codes
- The are special registers which hold condition
codes - These registers are set by the test and compare
instructions - Control flow instructions use the control codes
to see the results of a conditional - If (code) set register to (setl,..)
- If (code) jump to (jmpl,)
17goto Example, Again
if(xif(xdone less val y-x done
- load x into reg0
- load y into reg1
- compare reg0,reg1
- jump to less if larger than
- reg2 subtract(reg0,reg1)
- jump to done
- less
- reg2 subtract(reg1,reg0)
- done
18Loops
- while, do and for loops are also transformed into
conditional and unconditional jumps - A for loop contains a loop header which controls
the execution of the loop and the loop body which
do the actual work
for (Init Test Update ) Body
19for Loops
while Version
for Version
Init while (Test ) Body Update
for (Init Test Update ) Body
goto Version
do-while Version
Init if (!Test) goto done loop Body
Update if (Test) goto loop done
Init if (!Test) goto done do
Body Update while (Test) done
20Calling Functions
- A function typically have input and output
arguments and local variables - As we use jumps we also need the return address
to be able to get back after the call - Both of these problems can be solved using a stack
21Stacks
- Works like a stack of papers
- Two operations
- Push (place something on the top)
- Pop (remove something from the top)
- In algorithm language stacks are LIFO,
last-in-first-out
22Stacks and Calls
- Most machines push the return address onto the
stack before doing the call - After this the PC is set to the address of the
subroutine - At the end of the subroutine, the return address
can be popped from the stack
23Arguments
- Subroutines typically need many registers to be
able to do stuff efficiently - Before a call the registers are spilled to
memory, called a save - Typically these are pushed onto the stack
- Next, we push the return address
- And finally the arguments onto the stack
- After we get back from the subroutine, we can pop
the saved registers (called a restore) from the
stack
24Stack Example
- save
- push address to Return_label
- push arguments
- call my_routine
- Return_label
- restore
Stack
Arguments
Return Address
Saved registers
my_routine pop arguments do stuff pop return
address jmp Return_label
25More on Calls
- Passing arguments using the stack is slow
- We would like to use registers
- Complicates register allocation
- Some processors have special input and output
registers - Passes arguments through these
- Limited amount
- If the number of arguments is large, the stack is
used
26ABI
- The scheme for subroutine calls is usually
defined in the Application Binary Interface (ABI) - Different compilers generate the same code
- Linux Standard Base
- http//www.linux-foundation.org/en/LSB
- SPARC
- http//www.sparc.org
27Stacks and Recursion
- Stacks used to implement recursion in an elegant
way - Fortran does not use a stack. To do recursion in
Fortran you must declare the function as
recursive - Intermediate values are deferred by pushing the
return value onto the stack - Stack grows for each recursive call
- You can get a stack overflow error
28C and Stacks
- Automatic variables are typically stored onto the
stack - When the function returns the arguments are
popped - They can however still be stored in memory
- Implementations of stacks usually use a stack
pointer to know where we are - Old values might still be present in memory
29Stack size
- The address space sizes are controlled by the
shell - The maximum stack size is given by the command
ulimit - The shell also control other things such as, the
number of files, core files, maximum virtual
memory - ulimit -a
30Where are My Variables?
- C variables will be allocated at different
locations depending on the scope and extent - Global Variables
- Automatic Variables
- Memory from malloc(), calloc()
310xffffffff
kernel virtual memory (code, data, heap, stack)
memory invisible to user code
0xc0000000
user stack (created at runtime)
Automatic variables
memory mapped region for shared libraries
0x40000000
Dynamically allocated data
run-time heap (managed by malloc)
Unitialized data, pointers, global variables
read/write segment (.data, .bss)
loaded from the executable file
read-only segment (.init, .text, .rodata)
0x08048000
Progam code. Read only data
unused
0
32Memory Allocation
Address of a is 0xffbff474 Address of b is
0xffbff470 Address of c is 0xffbff46c Address of
d is 0xffbff468 Address of e is
0xffbff460 Address of f is 0xffbff3e0 Address of
g is 0x20b9c
include int main(void) int
a,b,c,d double e char f128 static int
g printf(Address of a is 0xx\n,a)
printf(Address of b is 0xx\n,b)
printf(Address of c is 0xx\n,c)
printf(Address of d is 0xx\n,d)
printf(Address of e is 0xx\n,e)
printf(Address of f is 0xx\n,f)
printf(Address of g is 0xx\n,g)
33Array Example
typedef int zip_dig5 zip_dig cmu 1, 5, 2,
1, 3 zip_dig mit 0, 2, 1, 3, 9 zip_dig
ucb 9, 4, 7, 2, 0
- Notes
- Declaration zip_dig cmu equivalent to int
cmu5 - Example arrays were allocated in successive 20
byte blocks - Not guaranteed to happen in general
34Array Accessing Example
- Computation
- Register reg0 contains starting address of array
- Register reg1 contains array index
- Desired digit at 4reg1 reg0
int get_digit(zip_dig z, int dig) return
zdig
reg0 z reg1 dig store 4reg1 in reg2 add
reg2 to reg0 reg0 4reg1 reg0 store
value at address reg0 in reg3 Mem4reg1reg0
35Array Loop Example
int zd2int(zip_dig z) int i int zi 0
for (i 0 i zi return zi
int zd2int(zip_dig z) int zi 0 int zend
z 4 do zi 10 zi z z
while(z
- Transformed Version
- As generated by GCC
- Eliminate loop variable i
- Convert array code to pointer code
- Express in do-while form
- No need to test at entrance
36Multidimensional arrays
- Memory is one-dimensional
- Multidimensional arrays need to be mapped onto a
one-dimensional block of memory - In C, we have two alternatives
- Nested arrays anij
- Static or dynamic allocation
- Multi-level arrays aij
- Static or dynamic allocation
37Nested Arrays
- Dimensions are stacked consecutively using an
index mapping - Consider a square two-dimensional array of size N
j
i
Array(i,j) - ArrayjiN
38Static Nested Array Example
define PCOUNT 4 zip_dig pghPCOUNT 1, 5,
2, 0, 6, 1, 5, 2, 1, 3, 1, 5, 2, 1,
7, 1, 5, 2, 2, 1
- Declaration zip_dig pgh4 equivalent to int
pgh45 - Variable pgh denotes array of 4 elements
- Allocated contiguously
- Each element is an array of 5 ints
- Allocated contiguously
- Row-Major ordering of all elements guaranteed
39Static Nested Array Element Access
- Array Elements
- Aij is element of type T
- Address A (i C j) K
int ARC
Aij
Ai
A i j
A
AiC4
A(R-1)C4
A(iCj)4
40Static Multi-Level Array Example
zip_dig cmu 1, 5, 2, 1, 3 zip_dig mit
0, 2, 1, 3, 9 zip_dig ucb 9, 4, 7, 2, 0
- Variable univ denotes array of 3 elements
- Each element is a pointer
- 4 bytes
- Each pointer points to array of ints
define UCOUNT 3 int univUCOUNT mit, cmu,
ucb
41Element Access in Multi-Level Array
- Computation
- Element access MemMemuniv4index4dig
- Must do two memory reads
- First get pointer to row array
- Then access element within array
int get_univ_digit (int index, int dig)
return univindexdig
reg0 index reg1 dig store 4reg0 to
reg2 4index store addr of univ in reg3 add
reg2 to reg3 univ 4index load data at
address reg3 into reg4 Memuniv4index store
4reg1 to reg2 4dig add reg2 to reg4
Memuniv4index4dig load data at address
reg4 into reg5 MemMemuniv4index
42Static Array Element Accesses
- Similar C references
- Nested Array
- Element at
- Mempgh20index4dig
- Different address computation
- Multi-Level Array
- Element at
- MemMemuniv4index4dig
int get_pgh_digit (int index, int dig)
return pghindexdig
int get_univ_digit (int index, int dig)
return univindexdig
43Dynamic Nested Arrays in C
- Strength
- Can create matrix of arbitrary size
- Can choose row or column major order
- Programming
- Must do index computation explicitly
- Performance
- Accessing single element costly
- Must do multiplication by dimension
int get_element(int a, int i,int j, int n)
return ainj
44Dynamic Multi-level Arrays in C
- Multi-level
- Pointer-to-pointer, bracket indexing ij
- Same dual mem address calculations as for static
multi-level arrays - Can be packed (contiguous storage), bracket
indexing
int array1 (int )malloc(nrows sizeof(int
)) for(i 0 i
(int )malloc(ncolumns sizeof(int))
int array2 (int )malloc(nrows sizeof(int
)) array20 (int )malloc(nrows ncolumns
sizeof(int)) for(i 1 i
array2i array20 i ncolumns
45Structs
- The individual components are laid out in memory
in their declaration order - There might still be gaps due to alignment of
data, i.e. to place data on addresses that are a
multiple of 2,4 or 8 - Some ISAs require certain aligment of data to
simplify the design - C has support for bitfields, which are tiny
members of a struct using only a few bits each - Useful in low-level systems programming to pack
data
46Bitfield example
- struct
- / field 4 bits wide /
- unsigned field1 4
- /
- unnamed 3 bit field
- unnamedfields allow for padding
- /
- unsigned 3
- /
- one-bit field
- can only be 0 or -1 in two's complement!
- /
- signed field2 1
- / align next field on a storage unit /
- unsigned 0
- unsigned field3 6
- full_of_fields
47Incomplete Array Type (c99)
- struct s int n double d
- struct s p1, p2
- size_t sz
- sz sizeof(struct s) // sz offsetof(struct
s, d) - p1 malloc(sz 8 sizeof (double))
- p2 malloc(sz 5 sizeof (double))
- / p1 behaves now as if it had been declared as
- struct int n double d8 p1
- p2 behaves now as if it had been declared as
- struct int n double d5 p2
- /
48The Heap
- When you request memory using the standard C
library functions, it will be placed in an area
of the virtual address space called the heap - The heap can grow quite large, but is ultimately
limited by the word size of the CPU - Parts of the address space are also reserved for
the system and other parts of your program - The stack usually grows downwards towards the
heap, which means that they can meet - Always check return values to see if you got any
memory
490xffffffff
kernel virtual memory (code, data, heap, stack)
memory invisible to user code
0xc0000000
user stack (created at runtime)
Automatic variables
memory mapped region for shared libraries
0x40000000
Dynamically allocated data
run-time heap (managed by malloc)
Unitialized data, pointers, global variables
read/write segment (.data, .bss)
loaded from the executable file
read-only segment (.init, .text, .rodata)
0x08048000
Progam code. Read only data
unused
0
50The Inner Workings of the Heap
- The top of the heap is given by a kernel variable
called brk - To grow the heap you can call the UNIX function
sbrk(2) - The standard C library uses this function
internally - Memory on the heap can also be reused
- If memory has been freed you may not need to
increase the heap. - This memory can be reused
- See Bryant/OHallaron 10.9-10.11
51The Heap Puzzle
- The standard C library keeps track of the chunks
of memory you request - Start address plus size in bytes
- The free() function marks chunks as reusable
- Programs that do a lot of mallocs and frees can
fragment the heap - You cannot move a chunk as this would give a new
start address which means that all pointers
storing this address needs to be updated - If memory is reused it must fit the new request
- If you do not match each malloc() with a free()
you have created a memory leak - Once a pointer has been overwritten there is no
chance of calling free()
52Garbage Collection
- You can construct memory allocators that detect
when chunks are available for reuse - You do not call free()
- Java uses garbage collection
- The are such allocators for C too
- The Boehm-Demers-Weiser (BDW) GC
- http//www.linuxjournal.com/article/6679
- Garbage collection can degrade performace
- The GC is activate in cycles to sweep for free
chunks - However, it usually reduces or eliminates memory
leaks
53Discussion Section
- Todays Location
- Terman 102-104 (elaine Linux cluster)
- Same time (415-530)
- Topics bitwise operators, linking, command-line
arguments