POSTECH CSE 211 Fall 2004 Microprocessor Programming and Application Instructor: G' Jounghyun Kim - PowerPoint PPT Presentation

1 / 44
About This Presentation
Title:

POSTECH CSE 211 Fall 2004 Microprocessor Programming and Application Instructor: G' Jounghyun Kim

Description:

POSTECH CSE 211 (Fall 2004) Microprocessor Programming and Application ... (Lecture Notes borrowed and adapted from R. Bryant with permission) Course Theme ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 45
Provided by: randa88
Category:

less

Transcript and Presenter's Notes

Title: POSTECH CSE 211 Fall 2004 Microprocessor Programming and Application Instructor: G' Jounghyun Kim


1
POSTECH CSE 211 (Fall 2004) Microprocessor
Programming and ApplicationInstructor G.
Jounghyun Kim
  • (Lecture Notes borrowed and adapted from R.
    Bryant with permission)

2
Course Theme
  • Abstraction is good, but dont forget reality!
  • Courses to date emphasize abstraction
  • Abstract data types
  • Asymptotic analysis
  • These abstractions have limits
  • Especially in the presence of bugs
  • Need to understand underlying implementations
  • Useful outcomes
  • Become more effective programmers
  • Able to find and eliminate bugs efficiently
  • Able to tune program performance
  • Prepare for later systems classes in CS ECE
  • Compilers, Operating Systems, Networks, Computer
    Architecture, Embedded Systems

3
Great Reality 1
  • Ints are not Integers, Floats are not Reals
  • Examples
  • Is x2 0?
  • Floats Yes!
  • Ints
  • 40000 40000 --gt 1600000000
  • 50000 50000 --gt ??
  • Is (x y) z x (y z)?
  • Unsigned Signed Ints Yes!
  • Floats
  • (1e20 -1e20) 3.14 --gt 3.14
  • 1e20 (-1e20 3.14) --gt ??

4
Computer Arithmetic
  • Does not generate random values
  • Arithmetic operations have important mathematical
    properties
  • Cannot assume usual properties
  • Due to finiteness of representations
  • Integer operations satisfy ring properties
  • Commutativity, associativity, distributivity
  • Floating point operations satisfy ordering
    properties
  • Monotonicity, values of signs
  • Observation
  • Need to understand which abstractions apply in
    which contexts
  • Important issues for compiler writers and serious
    application programmers

5
Great Reality 2
  • Youve got to know assembly
  • Chances are, youll never write program in
    assembly
  • Compilers are much better more patient than you
    are
  • Understanding assembly key to machine-level
    execution model
  • Behavior of programs in presence of bugs
  • High-level language model breaks down
  • Tuning program performance
  • Understanding sources of program inefficiency
  • Implementing system software
  • Compiler has machine code as target
  • Operating systems must manage process state

6
Assembly Code Example
  • Time Stamp Counter
  • Special 64-bit register in Intel-compatible
    machines
  • Incremented every clock cycle
  • Read with rdtsc instruction
  • Application
  • Measure time required by procedure
  • In units of clock cycles

double t start_counter() P() t
get_counter() printf("P required f clock
cycles\n", t)
7
Code to Read Counter
  • Write small amount of assembly code using GCCs
    asm facility
  • Inserts assembly code into machine code generated
    by compiler

static unsigned cyc_hi 0 static unsigned
cyc_lo 0 / Set hi and lo to the high and
low order bits of the cycle counter. / void
access_counter(unsigned hi, unsigned lo)
asm("rdtsc movl edx,0 movl eax,1"
"r" (hi), "r" (lo) "edx", "eax")
8
Code to Read Counter
/ Record the current value of the cycle counter.
/ void start_counter() access_counter(cyc_
hi, cyc_lo) / Number of cycles since the
last call to start_counter. / double
get_counter() unsigned ncyc_hi, ncyc_lo
unsigned hi, lo, borrow / Get cycle
counter / access_counter(ncyc_hi,
ncyc_lo) / Do double precision subtraction
/ lo ncyc_lo - cyc_lo borrow lo gt
ncyc_lo hi ncyc_hi - cyc_hi - borrow
return (double) hi (1 ltlt 30) 4 lo
9
Measuring Time
  • Trickier than it Might Look
  • Many sources of variation
  • Example
  • Sum integers from 1 to n
  • n Cycles Cycles/n
  • 100 961 9.61
  • 1,000 8,407 8.41
  • 1,000 8,426 8.43
  • 10,000 82,861 8.29
  • 10,000 82,876 8.29
  • 1,000,000 8,419,907 8.42
  • 1,000,000 8,425,181 8.43
  • 1,000,000,000 8,371,2305,591 8.37

10
Great Reality 3
  • Memory Matters
  • Memory is not unbounded
  • It must be allocated and managed
  • Many applications are memory dominated
  • Memory referencing bugs especially pernicious
  • Effects are distant in both time and space
  • Memory performance is not uniform
  • Cache and virtual memory effects can greatly
    affect program performance
  • Adapting program to characteristics of memory
    system can lead to major speed improvements

11
Memory Referencing Bug Example
main () long int a2 double d 3.14
a2 1073741824 / Out of bounds reference /
printf("d .15g\n", d) exit(0)
(Linux version gives correct result, but
implementing as separate function gives
segmentation fault.)
12
Memory Referencing Errors
  • C and C do not provide any memory protection
  • Out of bounds array references
  • Invalid pointer values
  • Abuses of malloc/free
  • Can lead to nasty bugs
  • Whether or not bug has any effect depends on
    system and compiler
  • Action at a distance
  • Corrupted object logically unrelated to one being
    accessed
  • Effect of bug may be first observed long after it
    is generated
  • How can I deal with this?
  • Program in Java, Lisp, or ML
  • Understand what possible interactions may occur
  • Use or develop tools to detect referencing errors

13
Memory Performance Example
  • Implementations of Matrix Multiplication
  • Multiple ways to nest loops

/ ijk / for (i0 iltn i) for (j0 jltn
j) sum 0.0 for (k0 kltn k)
sum aik bkj cij sum

/ jik / for (j0 jltn j) for (i0 iltn
i) sum 0.0 for (k0 kltn k)
sum aik bkj cij sum

14
Matmult Performance (Alpha 21164)
Too big for L1 Cache
Too big for L2 Cache
jki
kij
kji
15
Blocked matmult perf (Alpha 21164)
16
Great Reality 4
  • Theres more to performance than asymptotic
    complexity
  • Constant factors matter too!
  • Easily see 101 performance range depending on
    how code written
  • Must optimize at multiple levels algorithm, data
    representations, procedures, and loops
  • Must understand system to optimize performance
  • How programs compiled and executed
  • How to measure program performance and identify
    bottlenecks
  • How to improve performance without destroying
    code modularity and generality

17
Great Reality 5
  • Computers do more than execute programs
  • They need to get data in and out
  • I/O system critical to program reliability and
    performance
  • They communicate with each other over networks
  • Many system-level issues arise in presence of
    network
  • Concurrent operations by autonomous processes
  • Coping with unreliable media
  • Cross platform compatibility
  • Complex performance issues

18
Course Perspective
  • Most Systems Courses are Builder-Centric
  • Computer Architecture
  • Design pipelined processor in Verilog
  • Operating Systems
  • Implement large portions of operating system
  • Compilers
  • Write compiler for simple language
  • Networking
  • Implement and simulate network protocols

19
Course Perspective (Cont.)
  • Our Course is Programmer-Centric
  • Purpose is to show how by knowing more about the
    underlying system, one can be more effective as a
    programmer
  • Enable you to
  • Write programs that are more reliable and
    efficient
  • Incorporate features that require hooks into OS
  • E.g., concurrency, signal handlers
  • Not just a course for dedicated hackers
  • We bring out the hidden hacker in everyone
  • Transition from Abstract to Concrete!
  • From high-level language model
  • To underlying implementation

20
What are we going to learn ?
  • Basic computer operations How computers compute
    ?
  • Basic components and their organization
  • Understand computer architectures and their
    different implementations ?
  • What determines performance of a computer ?
  • Assembly programming as a way to learn computer
  • Relationship between software and hardware

21
Architecture Implementation 1
  • Architecture is those attributes visible to the
    programmer
  • Instruction set, number of bits used for data
    representation, I/O mechanisms, addressing
    techniques.
  • e.g. Is there a multiply instruction?
  • Implementation (Organization !?)
  • Control signals, interfaces, memory technology.
  • e.g. Is there a hardware multiply unit or is it
    done by repeated addition?

22
Architecture Implementation 2
  • All Intel x86 family share the same basic
    architecture
  • The IBM System/370 family share the same basic
    architecture
  • This gives code compatibility
  • At least backwards
  • Implementation (or Organization) differs between
    different versions

23
Structure Function
  • Structure is the way in which components relate
    to each other
  • Function is the operation of individual
    components as part of the structure

24
Structure - Top Level
Computer
Peripherals
Central Processing Unit
Main Memory
Computer
Systems Interconnection
Input Output
Communication lines
25
Structure - The CPU
CPU
Arithmetic and Login Unit
Computer
Registers
I/O
CPU
System Bus
Internal CPU Interconnection
Memory
Control Unit
26
Other Computer Structures
  • Parallel Computers
  • SIMD Processor arrays (e.g. Image processing
    unit)
  • MIMD Tightly coupled CPU-Memory units(e.g Dual
    CPU PCs) Distributed computing

27
Memory Hierarchy Economic reason
  • Put everything in fast memory (for speed)
  • Put everything in disk (for size)
  • Middle ground Build a structure of memory
  • Closer to CPU (fast memory, smaller size)
  • Far from CPU (slower memory, bigger size)
  • Management Put often used in CPU

28
Function
  • All computer functions are
  • Data processing
  • Data storage
  • Data movement
  • Control
  • Main Components
  • Processing element CPU
  • Input and output Keyboard, Mouse, Monitor, etc.
  • Memory Hard disk, RAM, Cache, Zip drive, etc.

29
Functional view
  • Functional view of a computer

30
Software 1
  • Operating system (system software)
  • Programs
  • Sequence of instructions
  • Written in a standard language
  • Compiled or Interpreted into machine
    readable form (translation)
  • Data / Files

31
Information is bits context
  • include ltstdio.hgt
  • Int main()
  • printf(hello, world\n)
  • In memory, looks like bunch of (binary numbers)
  • In fact, everything looks like bunch of numbers
    as stored in computer
  • Same sequence of numbers may represent different
    things depending on context (e.g. is it a
    character or number ?)

32
Levels of Abstraction
  • Display hello, world !
  • ?
  • Main ()
  • Printf (hello, world !\n)
  • ?
  • Move h, ah
  • Syscall 100
  • Move e, ah
  • Syscall 100
  • ?
  • 010011101010101

Easy to understand Details are abstracted
Instructions written in a particular language
Difficult to understand More control
33
Program Translation
  • High level Program Language e.g. C, C, LISP,
    Java,
  • Can be understood by naked eye more or less
    (!?)Independent of underlying hardware
  • Lowest level Binary numbers (they used to
    program with numbers in ancient times )?
    hardware dependent
  • Assembly In between (somewhat readable by
    humans, yet deep connection to underlying
    hardware)
  • But since computers can only understand binary
    numbers, all programs must be translated
    ultimately into binary format (object files,
    executables ) ? compilation

34
Compilation
  • Preprocessing Replacing original program for
    easier translation (e.g. replace include
    ltstdio.hgt with actual content) ?.c, .cp, .lisp,
    etc.
  • Compilation translate original code into code
    made of assembly instructions (assembly language
    program) ? .s files? why? Common ground for
    different languages for given hardware platform
  • Assembly translate assembly language code into
    binaries ? .o files
  • Linking merge with other .o files to form bigger
    program (e.g. printf routine)? .out/.exe files?
    read and executed by actual hardware

35
Knowing about compiling
  • Optimizing code performance
  • Reduce link time errors
  • Security and memory management

36
How does a program run anyway?
  • Fetch Decode Execute Store Cycle
  • CPU (Processor) Carries out instructions (load,
    store, IO, Jump, )
  • How fast?
  • Memory Stores instructions
  • How close to processor (how often is the data
    used?, how fast?)
  • IO Devices Allow communications to the computer
  • Speed
  • What kind?
  • Bus Connections
  • How wide?
  • Between who

37
Operations (1)
  • Data movement
  • e.g. keyboard to screen

38
Operations (2)
  • Storage
  • e.g. Internet download to disk

39
Operation (3)
  • Processing from/to storage
  • e.g. updating bank statement

40
Operation (4)
  • Processing from storage to I/O
  • e.g. printing a bank statement

41
Hello world example Entering program
Memory
CPU
User
IO
42
Hello world example Loading program
Memory (Disk)
Memory (RAM)
CPU
User
IO
43
Hello world example Executing program
Memory (RAM)
CPU
IO
44
Architecture, Implementation and Performance
  • Can architecture affect performance ?
  • The Great Debate RISC vs. CISC
  • Can implementation affect performance ?
  • Clock ?
  • Abstraction Level ?
  • Memory size / speed
  • Bus / Interconnection / Network
  • I/O
Write a Comment
User Comments (0)
About PowerShow.com