Title: POSTECH CSE 211 Fall 2004 Microprocessor Programming and Application Instructor: G' Jounghyun Kim
1POSTECH CSE 211 (Fall 2004) Microprocessor
Programming and ApplicationInstructor G.
Jounghyun Kim
- (Lecture Notes borrowed and adapted from R.
Bryant with permission)
2Course Theme
- Abstraction is good, but dont forget reality!
- Courses to date emphasize abstraction
- Abstract data types
- Asymptotic analysis
- These abstractions have limits
- Especially in the presence of bugs
- Need to understand underlying implementations
- Useful outcomes
- Become more effective programmers
- Able to find and eliminate bugs efficiently
- Able to tune program performance
- Prepare for later systems classes in CS ECE
- Compilers, Operating Systems, Networks, Computer
Architecture, Embedded Systems
3Great Reality 1
- Ints are not Integers, Floats are not Reals
- Examples
- Is x2 0?
- Floats Yes!
- Ints
- 40000 40000 --gt 1600000000
- 50000 50000 --gt ??
- Is (x y) z x (y z)?
- Unsigned Signed Ints Yes!
- Floats
- (1e20 -1e20) 3.14 --gt 3.14
- 1e20 (-1e20 3.14) --gt ??
4Computer Arithmetic
- Does not generate random values
- Arithmetic operations have important mathematical
properties - Cannot assume usual properties
- Due to finiteness of representations
- Integer operations satisfy ring properties
- Commutativity, associativity, distributivity
- Floating point operations satisfy ordering
properties - Monotonicity, values of signs
- Observation
- Need to understand which abstractions apply in
which contexts - Important issues for compiler writers and serious
application programmers
5Great Reality 2
- Youve got to know assembly
- Chances are, youll never write program in
assembly - Compilers are much better more patient than you
are - Understanding assembly key to machine-level
execution model - Behavior of programs in presence of bugs
- High-level language model breaks down
- Tuning program performance
- Understanding sources of program inefficiency
- Implementing system software
- Compiler has machine code as target
- Operating systems must manage process state
6Assembly Code Example
- Time Stamp Counter
- Special 64-bit register in Intel-compatible
machines - Incremented every clock cycle
- Read with rdtsc instruction
- Application
- Measure time required by procedure
- In units of clock cycles
double t start_counter() P() t
get_counter() printf("P required f clock
cycles\n", t)
7Code to Read Counter
- Write small amount of assembly code using GCCs
asm facility - Inserts assembly code into machine code generated
by compiler
static unsigned cyc_hi 0 static unsigned
cyc_lo 0 / Set hi and lo to the high and
low order bits of the cycle counter. / void
access_counter(unsigned hi, unsigned lo)
asm("rdtsc movl edx,0 movl eax,1"
"r" (hi), "r" (lo) "edx", "eax")
8Code to Read Counter
/ Record the current value of the cycle counter.
/ void start_counter() access_counter(cyc_
hi, cyc_lo) / Number of cycles since the
last call to start_counter. / double
get_counter() unsigned ncyc_hi, ncyc_lo
unsigned hi, lo, borrow / Get cycle
counter / access_counter(ncyc_hi,
ncyc_lo) / Do double precision subtraction
/ lo ncyc_lo - cyc_lo borrow lo gt
ncyc_lo hi ncyc_hi - cyc_hi - borrow
return (double) hi (1 ltlt 30) 4 lo
9Measuring Time
- Trickier than it Might Look
- Many sources of variation
- Example
- Sum integers from 1 to n
- n Cycles Cycles/n
- 100 961 9.61
- 1,000 8,407 8.41
- 1,000 8,426 8.43
- 10,000 82,861 8.29
- 10,000 82,876 8.29
- 1,000,000 8,419,907 8.42
- 1,000,000 8,425,181 8.43
- 1,000,000,000 8,371,2305,591 8.37
10Great Reality 3
- Memory Matters
- Memory is not unbounded
- It must be allocated and managed
- Many applications are memory dominated
- Memory referencing bugs especially pernicious
- Effects are distant in both time and space
- Memory performance is not uniform
- Cache and virtual memory effects can greatly
affect program performance - Adapting program to characteristics of memory
system can lead to major speed improvements
11Memory Referencing Bug Example
main () long int a2 double d 3.14
a2 1073741824 / Out of bounds reference /
printf("d .15g\n", d) exit(0)
(Linux version gives correct result, but
implementing as separate function gives
segmentation fault.)
12Memory Referencing Errors
- C and C do not provide any memory protection
- Out of bounds array references
- Invalid pointer values
- Abuses of malloc/free
- Can lead to nasty bugs
- Whether or not bug has any effect depends on
system and compiler - Action at a distance
- Corrupted object logically unrelated to one being
accessed - Effect of bug may be first observed long after it
is generated - How can I deal with this?
- Program in Java, Lisp, or ML
- Understand what possible interactions may occur
- Use or develop tools to detect referencing errors
13Memory Performance Example
- Implementations of Matrix Multiplication
- Multiple ways to nest loops
/ ijk / for (i0 iltn i) for (j0 jltn
j) sum 0.0 for (k0 kltn k)
sum aik bkj cij sum
/ jik / for (j0 jltn j) for (i0 iltn
i) sum 0.0 for (k0 kltn k)
sum aik bkj cij sum
14Matmult Performance (Alpha 21164)
Too big for L1 Cache
Too big for L2 Cache
jki
kij
kji
15Blocked matmult perf (Alpha 21164)
16Great Reality 4
- Theres more to performance than asymptotic
complexity - Constant factors matter too!
- Easily see 101 performance range depending on
how code written - Must optimize at multiple levels algorithm, data
representations, procedures, and loops - Must understand system to optimize performance
- How programs compiled and executed
- How to measure program performance and identify
bottlenecks - How to improve performance without destroying
code modularity and generality
17Great Reality 5
- Computers do more than execute programs
- They need to get data in and out
- I/O system critical to program reliability and
performance - They communicate with each other over networks
- Many system-level issues arise in presence of
network - Concurrent operations by autonomous processes
- Coping with unreliable media
- Cross platform compatibility
- Complex performance issues
18Course Perspective
- Most Systems Courses are Builder-Centric
- Computer Architecture
- Design pipelined processor in Verilog
- Operating Systems
- Implement large portions of operating system
- Compilers
- Write compiler for simple language
- Networking
- Implement and simulate network protocols
19Course Perspective (Cont.)
- Our Course is Programmer-Centric
- Purpose is to show how by knowing more about the
underlying system, one can be more effective as a
programmer - Enable you to
- Write programs that are more reliable and
efficient - Incorporate features that require hooks into OS
- E.g., concurrency, signal handlers
- Not just a course for dedicated hackers
- We bring out the hidden hacker in everyone
- Transition from Abstract to Concrete!
- From high-level language model
- To underlying implementation
20What are we going to learn ?
- Basic computer operations How computers compute
? - Basic components and their organization
- Understand computer architectures and their
different implementations ? - What determines performance of a computer ?
- Assembly programming as a way to learn computer
- Relationship between software and hardware
21Architecture Implementation 1
- Architecture is those attributes visible to the
programmer - Instruction set, number of bits used for data
representation, I/O mechanisms, addressing
techniques. - e.g. Is there a multiply instruction?
- Implementation (Organization !?)
- Control signals, interfaces, memory technology.
- e.g. Is there a hardware multiply unit or is it
done by repeated addition?
22Architecture Implementation 2
- All Intel x86 family share the same basic
architecture - The IBM System/370 family share the same basic
architecture - This gives code compatibility
- At least backwards
- Implementation (or Organization) differs between
different versions
23Structure Function
- Structure is the way in which components relate
to each other - Function is the operation of individual
components as part of the structure
24Structure - Top Level
Computer
Peripherals
Central Processing Unit
Main Memory
Computer
Systems Interconnection
Input Output
Communication lines
25Structure - The CPU
CPU
Arithmetic and Login Unit
Computer
Registers
I/O
CPU
System Bus
Internal CPU Interconnection
Memory
Control Unit
26Other Computer Structures
- Parallel Computers
- SIMD Processor arrays (e.g. Image processing
unit) - MIMD Tightly coupled CPU-Memory units(e.g Dual
CPU PCs) Distributed computing
27Memory Hierarchy Economic reason
- Put everything in fast memory (for speed)
- Put everything in disk (for size)
- Middle ground Build a structure of memory
- Closer to CPU (fast memory, smaller size)
- Far from CPU (slower memory, bigger size)
- Management Put often used in CPU
28Function
- All computer functions are
- Data processing
- Data storage
- Data movement
- Control
- Main Components
- Processing element CPU
- Input and output Keyboard, Mouse, Monitor, etc.
- Memory Hard disk, RAM, Cache, Zip drive, etc.
29Functional view
- Functional view of a computer
30Software 1
- Operating system (system software)
- Programs
- Sequence of instructions
- Written in a standard language
- Compiled or Interpreted into machine
readable form (translation) - Data / Files
31Information is bits context
- include ltstdio.hgt
- Int main()
-
- printf(hello, world\n)
-
- In memory, looks like bunch of (binary numbers)
- In fact, everything looks like bunch of numbers
as stored in computer - Same sequence of numbers may represent different
things depending on context (e.g. is it a
character or number ?)
32Levels of Abstraction
- Display hello, world !
- ?
- Main ()
-
- Printf (hello, world !\n)
-
- ?
- Move h, ah
- Syscall 100
- Move e, ah
- Syscall 100
-
- ?
- 010011101010101
Easy to understand Details are abstracted
Instructions written in a particular language
Difficult to understand More control
33Program Translation
- High level Program Language e.g. C, C, LISP,
Java, - Can be understood by naked eye more or less
(!?)Independent of underlying hardware - Lowest level Binary numbers (they used to
program with numbers in ancient times )?
hardware dependent - Assembly In between (somewhat readable by
humans, yet deep connection to underlying
hardware) - But since computers can only understand binary
numbers, all programs must be translated
ultimately into binary format (object files,
executables ) ? compilation
34Compilation
- Preprocessing Replacing original program for
easier translation (e.g. replace include
ltstdio.hgt with actual content) ?.c, .cp, .lisp,
etc. - Compilation translate original code into code
made of assembly instructions (assembly language
program) ? .s files? why? Common ground for
different languages for given hardware platform - Assembly translate assembly language code into
binaries ? .o files - Linking merge with other .o files to form bigger
program (e.g. printf routine)? .out/.exe files?
read and executed by actual hardware
35Knowing about compiling
- Optimizing code performance
- Reduce link time errors
- Security and memory management
36How does a program run anyway?
- Fetch Decode Execute Store Cycle
- CPU (Processor) Carries out instructions (load,
store, IO, Jump, ) - How fast?
- Memory Stores instructions
- How close to processor (how often is the data
used?, how fast?) - IO Devices Allow communications to the computer
- Speed
- What kind?
- Bus Connections
- How wide?
- Between who
37Operations (1)
- Data movement
- e.g. keyboard to screen
38Operations (2)
- Storage
- e.g. Internet download to disk
39Operation (3)
- Processing from/to storage
- e.g. updating bank statement
40Operation (4)
- Processing from storage to I/O
- e.g. printing a bank statement
41Hello world example Entering program
Memory
CPU
User
IO
42Hello world example Loading program
Memory (Disk)
Memory (RAM)
CPU
User
IO
43Hello world example Executing program
Memory (RAM)
CPU
IO
44Architecture, Implementation and Performance
- Can architecture affect performance ?
- The Great Debate RISC vs. CISC
- Can implementation affect performance ?
- Clock ?
- Abstraction Level ?
- Memory size / speed
- Bus / Interconnection / Network
- I/O