Title: Linking
1Linking
2Outline
- What is linking and why linking
- Complier driver
- Static linking
- Symbols Symbol Table
- Symbol Resolution
- Relocation
- Executable Object Files
- Loading
- Dynamic Linking
- Suggested reading 7.17.11
3Monolithic source file
- Problems
- efficiency small change requires complete
recompilation - modularity hard to share common functions (e.g.
printf)
4Separate Compilation
5What is linker
- Linking is the process of
- collecting and combining various pieces of code
and data into a single executable file - Executable file
- Can be loaded (copied) into memory and executed.
6What is linker
- Linking can be performed
- at compile time, when the source code is
translated into machine code by the linker - at load time, when the program is loaded into
memory and executed by the loader - at run time, by application programs.
7Why learning on linking
- Build large program
- Avoid missing modules, especially libraries
- Avoid dangerous programming errors
- Define multiple global variables
- Under stand scoping rules
- Extern and static vs. auto and register
8(No Transcript)
9Example
- Two functions
- main() and swap()
- Three global variables
- buf, bufp0 which are initialized explicitly
- bufp1 implicitly initialized to 0
10Compiler Drivers
- Coordinates all steps in the translation and
linking process - Typically included with each compilation system
(e.g., gcc) - Invokes
- preprocessor (cpp)
- compiler (cc1)
- assembler (as),
- linker (ld).
- Passes command line args to appropriate phases
11Example
- unixgt gcc -O2 -g -o p main.c swap.c
- cpp args main.c /tmp/main.i
- cc1 /tmp/main.i main.c -O2 args -o /tmp/main.s
- as args -o /tmp/main.o /tmp/main.s
- ltsimilar process for swap.cgt
- ld -o p system obj files /tmp/main.o
/tmp/swap.o - unixgt
12Static linking
- Input
- A relocatable object files and command line
arguments - Output
- Fully linked executable object file that can be
loaded and run
13Static linking
- Object file
- Various code and data sections
- Instructions are in one section
- Initialized global variables are in one section
- Uninitialized global variables are in one section
14Static linking
- Symbol resolution
- resolves external references.
- external reference reference to a symbol
defined in another object file
15Static linking
- Relocation
- relocates symbols from their relative locations
in the .o files to new absolute positions in the
executable. - updates all references to these symbols to
reflect their new positions. - references can be in either code or data
- code a() / ref to symbol a /
- data int xpx / ref to symbol x /
16Object files
- Relocatable object file
- Contain binary code and data in a form that can
be combined with other relocatable object files
to create an executable file - Executable object file
- Contains binary code and data in a form that can
be copied directly into memory and executed
17Object files
- Shared object file
- A special type of relocatable object file that
can be loaded into memory and linked dynamically,
at either load time or run time
18Executable and Linkable Format (ELF)
- Standard binary format for object files
- Derives from ATT System V Unix
- later adopted by BSD Unix variants and Linux
- One unified format for relocatable object files
(.o), executable object files, and shared object
files (.so) - generic name ELF binaries
- Better support for shared libraries than old
a.out formats.
19EFI object file format
20EFI object file format
- Elf header
- magic number, type (.o, exec, .so), machine, byte
ordering, etc. - Section header table
21EFI object file format
- .text section
- code
- .data section
- initialized (static) data
- .bss section
- uninitialized (static) data
- Block Started by Symbol
- Better Save Space
- has section header but occupies no space
22EFI object file format
- .symtab section
- symbol table
- procedure and static variable names
- section names and locations
- .rel.text section
- relocation info for .text section
- addresses of instructions that will need to be
modified in the executable
23EFI object file format
- .rel.data section
- relocation info for .data section
- addresses of pointer data that will need to be
modified in the merged executable - .debug section
- debugging symbol table, local variables and
typedefs, global variables, original C source
file (gcc -g)
24EFI object file format
- .line
- Mapping between line numbers in the original C
source program and machine code instructions in
the .text section. - .strtab
- A string table for the symbol tables and for the
section names.
25Symbols
- Three kinds of symbols
- Defined global symbols
- Referenced global symbols
- Local symbols
26Symbols
- Defined global symbols
- Defined by module m and can be referenced by
other modules - Nonstatic C functions
- Global variables that are defined without the C
static attribute
27Symbols
- Referenced global symbols
- Referenced by module m but defined by some other
module - C functions and variables that are defined in
other modules - Local symbols
- Defined and referenced exclusively by module m.
- C functions and global variables with static
attribute
28Symbol Tables
- Each relocatable object module has a symbol table
- A symbol table contains information about the
symbols that are defined and referenced by the
module
29Symbol Tables
- Local nonstatic program variables
- does not contain in the symbol table in .symbol
- Local static procedure variables
- Are not managed on the stack
- Be allocated in .data or .bss
30EFI object file format
31Examples
- int f()
-
- static int x1
- return x
-
-
- int g()
-
- static int x 1
- return x
-
- x.1 and x.2 are allocated in .data
32Symbol Tables
- Compiler exports symbols in .s file
- Assembler builds symbol tables using exported
symbols - An ELF symbol table is contained in .symtab
section - Symbol table contains an array of entries
33(No Transcript)
34ELF Symbol Tables
- typedef struct
- int name / string table offset /
- int value / section offset, or VM address /
- int size / object size in bytes /
- char type4 , / data, func, section, or src
file name / - binding4 / local or global /
- char reserved / unused /
- char section / section header index, ABS,
UNDEF, / - / or COMMON /
-
- ABS, UNDEF, COMMON
35ELF Symbol Tables
- Num Value Size Type Bind Ot Ndx Name
- 8 0 8 OBJECT
GLOBAL 0 3 buf - 9 0 17 FUNC
GLOBAL 0 1 main - 10 0 0 NOTYPE
GLOBAL 0 UND swap - Num Value Size Type Bind Ot Ndx Name
- 8 0 4 OBJECT
GLOBAL 0 3 bufp0 - 9 0 0 NOTYPE
GLOBAL 0 UND buf - 10 0 39 FUNC
GLOBAL 0 1 swap - 11 4 4 OBJECT
GLOBAL 0 COM bufp1
alignment
36Symbol Resolution
- void foo(void)
-
- int main()
-
- foo()
- return 0
-
- Unixgt gcc Wall O2 o linkerror linkerror.c
- /tmp/ccSz5uti.o In function main
- /tmp/ccSz5uti.o (.text0x7) undefined reference
to foo - collect2 ld return 1 exit status
37Multiply Defined Global Symbols
- Strong
- Functions and initialized global variables
- Weak
- Uninitialized global variables
- Rules
- Multiple strong symbols are not allowed
- Given a strong symbol and multiple weak symbols,
choose the strong symbol - Given multiple weak symbols, choose any of the
weak symbol
38Multiply Defined Global Symbols
39Multiply Defined Global Symbols
- /foo3.c/
- include ltstdio.hgt
- void f()
-
- int x15213
-
- int main()
-
- f()
- printf(xd\n,x)
- return 0
-
- /bar3.c/
- int x
-
- void f()
-
- x 15212
-
40Multiply Defined Global Symbols
- /foo4.c/
- include ltstdio.hgt
- void f()
-
- int x15213
-
- int main()
-
- x15213
- f()
- printf(xd\n,x)
- return 0
-
- /bar4.c/
- int x
-
- void f()
-
- x 15212
-
41Multiply Defined Global Symbols
- /foo5.c/
- include ltstdio.hgt
- void f()
-
- int x15213
- int y15212
-
- int main()
-
- f()
- printf(x0xx y0xx \n,
- x, y)
- return 0
-
- /bar5.c/
- double x
-
- void f()
-
- x -0.0
-
42Relocation
- Relocation
- Merge the input modules
- Assign runtime address to each symbol
- Two steps
- Relocating sections and symbol definitions
- Relocating symbol references within sections
43Relocation
- For each reference to an object with unknown
location - Assembler generates a relocation entry
- Relocation entries for code are placed in
.rel.text - Relocation entries for data are placed in
.rel.data
44Relocation
- Relocation Entry
- typedef struct
- int offset
- int symbol24,
- type8
- Elf32_Rel
45Relocation
- e8 fc ff ff ff call 7ltmain0x7gt swap()
-
- There is a relocation entry in rel.txt
- offset symbol type
- 7 swap R_386_PC32
46Relocation
- int bufp0 buf0
- 00000000 ltbufp0gt
- 0 00 00 00 00
-
- There is a relocation entry in rel.data
- offset symbol type
- 0 buf R_386_32
47Relocation
- e8 fc ff ff ff call 7ltmain0x7gt swap()
- 7 R_386_PC32 swap relocation entry
- r.offest 0x7
- r.symbol swap
- r.type R_386_PC32
- ADDR(main)ADDR(.text) 0x80483b4
- ADDR(swap)0x80483c8
- refaddr ADDR(main)r.offset 0x80483bb
- ADDR(r.symbol)ADDR(swap)0x80483c8
- refptr (unsigned) (ADDR(r.symbol) refptr
refaddr - (unsigned) (0x80483c8 (-4) 0x80483bb)
- (unsigned) 0x9
48Relocation
- int bufp0 buf0
- 00000000 ltbufp0gt
- 0 00 00 00 00 int bufp0 buf0
- 0 R_386_32 buf relocation entry
- ADDR(r.symbol) ADDR(buf) 0x8049454
- refptr (unsigned) (ADDR(r.symbol) refptr)
- (unsigned) (0x8049454)
- 0804945c ltbufp0gt
- 0804945c 54 94 04 08
49Relocation
- foreach section s
- foreach relocation entry r
- refptr s r.offset / ptr to
reference to be relocated / -
- / relocate a PC-relative reference /
- if (r.type R_386_PC32)
- refaddr ADDR(s) r.offset /
refs runtime address / - refptr (unsigned) (ADDR(r.symbol)
refptr refaddr) -
-
- / relocate an absolute reference /
- if ( r.type R_386_32 )
- refptr (unsigned) (ADDR(r.symbol)
refptr) -
-
50Relocation
- 080483b4ltmaingt
- 080483b4 55 push ebp
- 080483b5 89 e5 mov esp, ebp
- 080483b7 83 ec 08 sub 0x8, esp
- 080483ba e8 09 00 00 00 call 80483c8 ltswapgt
- 080483bf 31 c0 xor eax, eax
- 080483c1 89 ec mov ebp, esp
- 080483c3 5d pop ebp
- 080483c4 c3 ret
- 080483c5 90 nop
- 080483c6 90 nop
- 080483c7 90 nop
51Relocation
- 080483c8ltswapgt
- 80483c8 55 push ebp
- 80483c9 8b 15 5c 94 04 08 mov 0x804945c,
edx get bufp0 - 80483cf a1 58 94 04 08 mov 0x8049458,
edx get buf1 - 80483d4 89 e5 mov esp, ebp
- 80483d6 c7 05 48 85 04 08 58 movl 0x8049458,
0x8049548 - 80483dd 94 04 08 bufp1 buf1
- 80483e0 89 ec mov ebp, esp
- 80483e2 8b 0a mov (edx), ecx
- 80483e4 80 02 mov eax, (edx)
- 80483e6 a1 48 95 04 08 mov 0x8049548, eax
- 80483eb 89 08 mov ecx, (eax)
- 80483ed 5d pop ebp
- 80483ee c3 ret
52Relocation
- 08049454 ltbufgt
- 8049454 01 00 00 00 02 00 00 00
- 0804945cltbufp0gt
- 804945c 54 94 04 08
53EFI object file format
54Executable Object Files
- ELF header
- Overall information
- Entry point
- .init section
- A small function _init
- Initialization
- Program header table
- page size, virtual addresses for memory segments
(sections), segment sizes.
55(No Transcript)
56Loading
- Unixgt ./p
- Loader
- Memory-resident operating system code
- Invoked by call the execve function
- Copy the code and data in the executable object
file from disk into memory - Jump to the entry point
- Run the program
57Loading
- Startup code
- At the _start address defined in the crt1.o
- Same for all C program
- 0x080480c0lt_startgt
- call _libc_init_first
- call _init
- call atexit
- call main
- call _exit
58Packaging commonly used functions
- How to package functions commonly used by
programmers? - math, I/O, memory management, string
manipulation, etc.
59Packaging commonly used functions
- Awkward, given the linker framework so far
- Option 1 Put all functions in a single source
file - programmers link big object file into their
programs - space and time inefficient
- Option 2 Put each function in a separate source
file - programmers explicitly link appropriate binaries
into their programs - more efficient, but burdensome on the programmer
60Packaging commonly used functions
- Solution static libraries (.a archive files)
- concatenate related relocatable object files into
a single file with an index (called an archive) - enhance linker so that it tries to resolve
unresolved external references by looking for the
symbols in one or more archives - If an archive member file resolves reference,
link into executable
61Static libraries (archives)
62Creating static libraries
63Using static libraries
- E
- relocatable object files that will be merged to
form the executable - U
- Unresolved symbols
- D
- Symbols that have been defined in previous input
files - Initially all are empty
64Using static libraries
- Scan .o files and .a files in the command line
order. - When scan an object file f,
- Add f to E
- Updates U, D
- When scan an archive file f,
- Resolve U
- If m is used to resolve symbol, m is added to E
- Update U, D using m
65Using static libraries
- If any entries in the unresolved list at end of
scan, then error - Problem
- command line order matters!
- Moral put libraries at the end of the command
line.
66Shared libraries
- Static libraries have the following
disadvantages - potential for duplicating lots of common code in
the executable files on a file system. - e.g., every C program needs the standard C
library - potential for duplicating lots of code in the
virtual memory space of many processes. - minor bug fixes of system libraries require each
application to explicitly relink
67Shared libraries ? Solution
- shared libraries (dynamic link libraries, DLLs)
whose members are - dynamically loaded into memory and
- linked into an application at run-time
68Shared libraries ? Solution
- dynamic linking can occur when executable is
first loaded and run. - common case for Linux, handled automatically by
ld-linux.so - dynamic linking can also occur after program has
begun. - in Linux, this is done explicitly by user with
dlopen() - shared library routines can be shared by multiple
processes.
69Dynamic Linking
- Unixgt gcc shared fPIC o libvector.so addvec.c
multvec.c - Unixgtgcc o p2 main2.c ./libvector.so
70Dynamically linked shared libraries
71Dynamic Linking
- include ltdlfcn.hgt
- void dlopen(const char filename, int flag)
- returns ptr to handle if OK, NULL on error
- void dlsym(void handle, char symbol)
- returns ptr to symbol if OK, NULL on error
- int dlclose(void handle)
- returns 0 if OK, -1 on error
- const char dlerror(void)
- returns errormsg if previous call to
- dlopen, dlysym, or dlclose failed,
- NULL if previous call was OK
72- include ltstdio.hgt
- include ltdlfcn.hgt
-
- int x2 1, 2
- int y2 3, 4
- int z2
-
- int main()
-
- void handle
- void (addvec)(int , int , int , int )
- char error
-
- /dynamically load the shared library that
contains addvec() / - handle dlopen(./libvector.so, RTLD_LAZY)
- if (!handle)
- fprintf(stderr, s\n, dlerror())
- exit()
-
73-
- /get a pointer to the addvec() function we
just loaded / - addvec dlsym(handle, addvec)
- if ( (error dlerror()) ! NULL )
- fprintf(stderr, s\n, error)
- exit(1)
-
-
- / Now we can call addvec() just like any other
function / - addvec(x, y, z, 2)
- printf(zd, d\n, z0, z1)
-
- / unload the shared library /
- if (dlclose(handle) lt0)
- fprintf(stderr, s\n, dlerror())
- exit(1)
-
- return 0
-