How%20to%20measure,%20report,%20and%20summarize%20performance? - PowerPoint PPT Presentation

About This Presentation
Title:

How%20to%20measure,%20report,%20and%20summarize%20performance?

Description:

e.g., compilers/editors, scientific applications, graphics, etc. Small benchmarks. nice for architects and designers. easy to standardize. can be abused ... – PowerPoint PPT presentation

Number of Views:158
Avg rating:3.0/5.0
Slides: 102
Provided by: toda76
Category:

less

Transcript and Presenter's Notes

Title: How%20to%20measure,%20report,%20and%20summarize%20performance?


1
Performance
  • How to measure, report, and summarize
    performance?
  • What factors determine the performance of a
    computer?
  • Critical to purchase and design decisions
  • best performance?
  • least cost?
  • best performance/cost?
  • QuestionsWhy is some hardware better than
    others for different programs?What factors of
    system performance are hardware related? (e.g.,
    Do we need a new machine, or a new operating
    system?)How does the machine's instruction set
    affect performance?

2
Computer Performance
  • Response Time (execution time)
  • The time between the start and completion of
    a task
  • Throughput The total amount of work done in a
    given time
  • Q If we replace the processor with a faster one,
    what do we increase?
  • A Response time and throughput
  • Q If we add an additional processor to a system,
    what do we increase?
  • A Throughput

3
Book's Definition of Performance
  • For some program running on machine X,
    PerformanceX 1 / Execution timeX
  • "X is n times faster than Y" n PerformanceX
    / PerformanceY
  • Problem Machine A runs a program in 10 seconds
    and machine B in 15 seconds. How much faster is
    A than B?
  • Answer n PerformanceA / PerformanceB
  • Execution timeB/Execution timeA
    15/10 1.5
  • A is 1.5 times faster than B.

4
Execution Time
  • Elapsed Time, wall-clock time or response time
  • counts everything (disk and memory accesses, I/O
    , etc.)
  • a useful number, but often not good for
    comparison purposes
  • CPU time
  • doesn't count I/O or time spent running other
    programs
  • can be broken up into system time, and user time
  • Our focus user CPU time
  • time spent executing the lines of code that are
    "in" our program

5
Clock Cycles
  • Instead of reporting execution time in seconds,
    we often use cycles
  • Execution time of clock cycles cycle time
  • Clock ticks indicate when to start activities
    (one abstraction)
  • cycle time (period) time between ticks
    seconds per cycle
  • clock rate (frequency) cycles per second (1 Hz
    1 cycle/sec)A 200 MHz clock has a
    cycle time

6
How to Improve Performance
  • So, to improve performance (everything else being
    equal) you can either
  • reduce the of required clock cycles for a
    program
  • decrease the clock period or, said another way,
    increase the clock frequency.

7
Different numbers of cycles for different
instructions
time
  • Multiplication takes more time than addition
  • Floating point operations take longer than
    integer ones
  • Accessing memory takes more time than accessing
    registers
  • Important point changing the cycle time often
    changes the number of cycles required for various
    instructions (more later)
  • Another point the same instruction might require
    a different number of cycles on a different
    machine

8
Example
  • A program runs in 10 seconds on computer A, which
    has a 400 MHz clock. We are trying to help a
    computer designer build a new machine B, that
    will run this program in 6 seconds. The designer
    can use new technology to substantially increase
    the clock rate, but this increase will affect the
    rest of the CPU design, causing machine B to
    require 1.2 times as many clock cycles as machine
    A. What clock rate should we tell the designer to
    target?
  • Clock cyclesA 10 s 400 MHz 4109 cycles
  • Clock cyclesB 1.2 4109 cycles 4.8 109
    cycles
  • Execution time of clock cycles cycle time
  • Clock rateB Clock cyclesB / Execution timeB
  • 4.8 109 cycles / 6 s
    800 MHz

9
Now that we understand cycles
  • A given program will require
  • some number of instructions (machine
    instructions)
  • some number of cycles
  • some number of seconds
  • We have a vocabulary that relates these
    quantities
  • cycle time (seconds per cycle)
  • clock rate (cycles per second)
  • CPI (cycles per instruction) AVERAGE VALUE!
  • a floating point intensive application might
    have a higher CPI
  • MIPS (millions of instructions per second) this
    would be higher for a program using simple
    instructions

10
Performance
  • Performance is determined by execution time
  • Related variables
  • of cycles to execute program
  • of instructions in program
  • of cycles per second
  • average of cycles per instruction
  • average of instructions per second
  • Common pitfall thinking one of the variables is
    indicative of performance when it really isnt.

11
CPI Example
  • Suppose we have two implementations of the same
    instruction set architecture (ISA). For some
    program,Machine A has a clock cycle time of 10
    ns and a CPI of 2.0 Machine B has a clock cycle
    time of 20 ns and a CPI of 1.2 Which machine is
    faster for this program, and by how much?
  • Time per instruction for A 2.0 10 ns 20 ns
  • B
    1.2 20 ns 24 ns
  • A is 24/20 1.2 times faster
  • If two machines have the same ISA, which of our
    quantities (e.g., clock rate, CPI, execution
    time, of instructions, MIPS) will always be
    identical?
  • Answer of instructions

12
of Instructions Example
  • A compiler designer has two alternatives for a
    certain code sequence.There are three different
    classes of instructions A, B, and C, and they
    require one, two, and three cycles, respectively.
    The first sequence has 5 instructions 2 of
    A, 1 of B, and 2 of C.The second sequence has 6
    instructions 4 of A, 1 of B, and 1 of C.Which
    sequence will be faster? What are the CPI values?
  • Sequence 1 211223 10 cycles CPI1 10 /
    5 2
  • Sequence 2 411213 9 cycles CPI2 9 / 6
    1.5
  • Sequence 2 is faster.

13
MIPS
  • Million Instructions Per Second
  • MIPS instruction count/(execution time106)
  • MIPS is easy to understand but
  • does not take into account the capabilities of
    the instructions the instruction counts of
    different instruction sets differ
  • varies between programs even on the same computer
  • can vary inversely with performance!

14
MIPS example
  • Two compilers are being tested for a 100 MHz
    machine with three different classes of
    instructions A, B, and C, which require one,
    two, and three cycles, respectively. Compiler 1
    Compiled code uses 5 million Class A, 1 million
    Class B, and 1 million Class C instructions.Compi
    ler 2 Compiled code uses 10 million Class A, 1
    million Class B, and 1 million Class C
    instructions.
  • Which sequence will be faster according to MIPS?
  • Which sequence will be faster according to
    execution time?

15
MIPS example
  • Cycles and instructions
  • 1 10 million cycles, 7 million instructions
  • 2 15 million cycles, 12 million instructions
  • Execution time Clock cycles/Clock rate
  • Execution time1 10106 / 100106 0.1 s
  • Execution time2 15106 / 100106 0.15 s
  • MIPS Instruction count/(Execution time 106)
  • MIPS1 7106 / 0.1106 70
  • MIPS2 12106 / 0.15106 80

16
Benchmarks
  • Performance best determined by running a real
    application
  • Use programs typical of expected workload
  • Or, typical of expected class of
    applications e.g., compilers/editors, scientific
    applications, graphics, etc.
  • Small benchmarks
  • nice for architects and designers
  • easy to standardize
  • can be abused
  • SPEC (System Performance Evaluation Cooperative)
  • companies have agreed on a set of real programs
    and inputs
  • can still be abused
  • valuable indicator of performance (and compiler
    technology)

17
SPEC 95
18
SPEC 89
  • Compiler effects on performance depend on
    applications.

19
SPEC 95
  • Organisational enhancements enhance performance.
  • Doubling the clock rate does not double the
    performance.

20
Amdahl's Law
  • Version 1
  • Execution Time After Improvement
  • Execution Time Unaffected
  • Execution Time Affected / Amount of
    Improvement
  • Version 2
  • Speedup
  • Performance after improvement /
    Performance before improvement
  • Execution time before improvement/
    Execution time after improvement
  • Execution time before n a
  • after n a/p
  • Principle Make the common case fast

21
Amdahl's Law
  • ExampleSuppose a program runs in 100 seconds on
    a machine, with multiply responsible for 80
    seconds of this time. How much do we have to
    improve the speed of multiplication if we want
    the program to run 4 times faster?"100 s/4 80
    s/n 20 s
  • 5 s 80s/n
  • n 80 s/ 5 s 16

22
Amdahl's Law
  • ExampleA benchmark program spends half of the
    time executing floating point instructions.
  • We improve the performance of the floating point
    unit by a factor of four.
  • What is the speedup?
  • Time before 10s
  • Time after 5s 5s/4 6.25 s
  • Speedup 10/6.25 1.6

23
Machine Instructions
  • Language of the Machine
  • Lowest level of programming, control directly the
    hardware
  • Assembly instructions are symbolic versions of
    machine instructions
  • More primitive than higher level languages
  • Very restrictive
  • Programs are stored in the memory, one
    instruction is fetched and executed at a time
  • Well be working with the MIPS instruction set
    architecture

24
MIPS instruction set
  • Load from memory
  • Store in memory
  • Logic operations
  • and, or, negation, shift, ...
  • Arithmetic operations
  • addition, subtraction, ...
  • Branch

25
Instruction types
  • 1 operand
  • Jump address
  • Jump register number
  • 2 operands
  • Multiply reg1, reg2
  • 3 operands
  • Add reg1, reg2, reg3

26
MIPS arithmetic
  • Instructions have 3 operands
  • Operand order is fixed (destination
    first) Example C code A B C MIPS
    code add s0, s1, s2 s0, etc. are
    registers
  • (associated with variables by compiler)

27
MIPS arithmetic
  • Design Principle 1 simplicity favours
    regularity.
  • Of course this complicates some things... C
    code A B C D E F - A MIPS
    code add t0, s1, s2 add s0, t0,
    s3 sub s4, s5, s0
  • Operands must be registers, 32 registers provided
  • Design Principle 2 smaller is faster.

28
Registers vs. Memory
  • Arithmetic instructions operands are registers
  • Compiler associates variables with registers
  • What about programs with lots of variables

29
Memory Organization
  • Viewed as a large, single-dimension array, with
    an address.
  • A memory address is an index into the array
  • "Byte addressing" means that the index points to
    a byte of memory.

0
8 bits of data
1
8 bits of data
2
8 bits of data
3
8 bits of data
4
8 bits of data
5
8 bits of data
6
8 bits of data
...
30
Memory Organization
  • Bytes are nice, but most data items use larger
    "words"
  • For MIPS, a word is 32 bits or 4 bytes.
  • 232 bytes with byte addresses from 0 to 232-1
  • 230 words with byte addresses 0, 4, 8, ... 232-4
  • Words are aligned i.e., the 2 least significant
    bits of a word address are equal to 0.

0
32 bits of data
4
32 bits of data
Registers hold 32 bits of data
8
32 bits of data
12
32 bits of data
...
31
Load and store instructions
  • Example C code A8 h A8 MIPS
    code lw t0, 32(s3) add t0, s2, t0 sw
    t0, 32(s3)
  • word offset 8 equals byte offset 32
  • Store word has destination last
  • Remember arithmetic operands are registers, not
    memory!

32
So far weve learned
  • MIPS loading and storing words but addressing
    bytes arithmetic on registers only
  • Instruction Meaningadd s1, s2, s3 s1
    s2 s3sub s1, s2, s3 s1 s2 s3lw
    s1, 100(s2) s1 Memorys2100 sw s1,
    100(s2) Memorys2100 s1

33
Machine Language
  • Instructions, like registers and words of data,
    are also 32 bits long
  • Example add t0, s1, s2
  • R-type instruction Format 000000 10001 10010 0
    1000 00000 100000 op rs rt
    rd shamt funct
  • op opcode, basic operation
  • rs 1st source reg.
  • rt 2nd source reg.
  • rd destination reg
  • shamt shift amount
  • funct function, selects the specific variant of
    the operation

34
Machine Language
  • Introduce a new type of instruction format
  • I-type for data transfer instructions
  • Example lw t0, 32(s2) 35 18 9
    32 op rs rt 16 bit number
  • rt destination register
  • new instruction format but fields 13 are the
    same
  • Design principle 3 Good design demands good
    compromises

35
Stored Program Concept
  • Instructions are groups of bits
  • Programs are stored in memory to be read or
    written just like data
  • Fetch Execute Cycle
  • Instructions are fetched and put into a special
    register
  • Bits in the register "control" the subsequent
    actions
  • Fetch the next instruction and continue

memory for data, programs, compilers, editors,
etc.
36
Control
  • Decision making instructions
  • alter the control flow,
  • i.e., change the "next" instruction to be
    executed
  • MIPS conditional branch instructions bne t0,
    t1, Label branch if not equal
  • beq t0, t1, Label branch if equal
  • Example (if) if (ij) h i j bne s0,
    s1, Label add s3, s0, s1 Label ....

37
Control
  • MIPS unconditional branch instructions j label
  • Example (if - then - else) if (i!j) beq
    s4, s5, Label1 hij add s3, s4,
    s5 else j Label2 hi-j Label1 sub
    s3, s4, s5 Label2 ...

38
Control
  • Example (loop)Loop ----
  • iij if(i!h) go to Loop
  • ---
  • Loop ---
  • add s1, s1, s2 iij
  • bne s1, s3, Loop
  • ---

39
So far
  • Instruction Meaningadd s1,s2,s3 s1 s2
    s3sub s1,s2,s3 s1 s2 s3lw
    s1,100(s2) s1 Memorys2100 sw
    s1,100(s2) Memorys2100 s1bne
    s4,s5,L Next instr. is at Label if s4 ?
    s5beq s4,s5,L Next instr. is at Label if s4
    s5j Label Next instr. is at Label
  • Formats

R I J
40
Control Flow
  • We have beq, bne, what about Branch-if-less-than
    ?
  • New instruction set on less than
  • if s1 lt s2 then t0 1 slt
    t0, s1, s2 else t0 0
  • slt and bne can be used to implement branch on
    less than
  • slt t0, s0, s1
  • bne t0, zero, Less
  • Note that the assembler needs a register to do
    this, there are register conventions for the
    MIPS assembly language
  • we can now build general control structures

41
MIPS Register Convention
  • at, 1 reserved for assembler
  • k0, k1, 26-27 reserved for operating system

42
Procedure calls
  • Procedures and subroutines allow reuse and
    structuring of code
  • Steps
  • Place parameters in a place where the procedure
    can access them
  • Transfer control to the procedure
  • Acquire the storage needed for the procedure
  • Perform the desired task
  • Place the results in a place where the calling
    program can access them
  • Return control to the point of origin

43
Register assignments for procedure calls
  • a0...a3 four argument registers for passing
    parameters
  • v0...v1 two return value registers
  • ra return address register
  • use of argument and return value register
    compiler
  • handling of control passing mechanism machine
  • jump and link instruction jal ProcAddress
  • saves return address (PC4) in ra (Program
    Counter holds the address of the current
    instruction)
  • loads ProcAddress in PC
  • return jump jr ra
  • loads return address in PC

44
Stack
  • Used if four argument registers and two return
    value registers are not enough or if nested
    subroutines (a subroutine calls another one) are
    used
  • Can also contain temporary data
  • The stack is a last-in-first-out structure in the
    memory
  • Stack pointer (sp) points at the top of the
    stack
  • Push and pop instructions
  • MIPS stack grows from higher addresses to lower
    addresses

45
Stack and Stack Pointer
46
Constants
  • Small constants are used quite frequently e.g.,
    A A 5 B B - 1
  • Solution 1 put constants in memory and load them
  • To add a constant to a register
  • lw t0, AddrConstant(zero)
  • add sp,sp,t0
  • Solution 2 to avoid extra instructions keep the
    constant inside the instruction itself addi
    29, 29, 4 i means immediate slti 8, 18,
    10 andi 29, 29, 6
  • Design principle 4 Make the common case fast.

47
How about larger constants?
  • We'd like to be able to load a 32 bit constant
    into a register
  • Must use two instructions, new "load upper
    immediate" instruction lui t0,
    1010101010101010
  • Then must get the lower order bits right,
    i.e., ori t0, t0, 1010101010101010

1010101010101010
0000000000000000
0000000000000000
1010101010101010
ori
48
Overview of MIPS
  • simple instructions all 32 bits wide
  • very structured, no unnecessary baggage
  • only three instruction formats
  • rely on compiler to achieve performance what
    are the compiler's goals?
  • help compiler where we can

op rs rt rd shamt funct
R I J
op rs rt 16 bit address
op 26 bit address
49
Addresses in Branches and Jumps
  • Instructions
  • bne t4,t5,Label Next instruction is at Label
    if t4 ? t5
  • beq t4,t5,Label Next instruction is at Label
    if t4 t5
  • j Label Next instruction is at Label
  • Formats
  • Addresses are not 32 bits How do we handle
    this with load and store instructions?

op rs rt 16 bit address
I J
op 26 bit address
50
Addresses in Branches
  • Instructions
  • bne t4,t5,Label Next instruction is at Label if
    t4?t5
  • beq t4,t5,Label Next instruction is at Label if
    t4t5
  • Formats
  • Could specify a register (like lw and sw) and add
    it to address
  • use Instruction Address Register (PC program
    counter)
  • most branches are local (principle of locality)
  • Jump instructions just use high order bits of PC
  • address boundaries of 256 MB

op rs rt 16 bit address
I
51
MIPS addressing mode summary
  • Register addressing
  • operand in a register
  • Base or displacement addressing
  • operand in the memory
  • address is the sumof a register and a constant in
    the instruction
  • Immediate addressing
  • operand is a constant within the instruction
  • PC-relative addressing
  • address is the sum of the PC and a constant in
    the instruction
  • used e.g. in branch instructions
  • Pseudodirect addressing
  • jump address is the 26 bits of the instruction
    concatenated with the upper bits of the PC
  • Additional addressing modes in other computers

52
MIPS addressing mode summary
53
To summarize
54
Assembly Language vs. Machine Language
  • Assembly provides convenient symbolic
    representation
  • much easier than writing down numbers
  • e.g., destination first
  • Machine language is the underlying reality
  • e.g., destination is no longer first
  • Assembly can provide 'pseudoinstructions'
  • e.g., move t0, t1 exists only in Assembly
  • would be implemented using add t0,t1,zero
  • When considering performance you should count
    real instructions

55
Alternative Architectures
  • Design alternative
  • provide more powerful operations than found in
    MIPS
  • goal is to reduce number of instructions executed
  • danger is a slower cycle time and/or a higher CPI
  • Sometimes referred to as RISC vs. CISC
  • Reduced Instruction Set Computers
  • Complex Instruction Set Computers
  • virtually all new instruction sets since 1982
    have been RISC

56
Reduced Instruction Set Computers
  • Common characteristics of all RISCs
  • Single cycle issue
  • Small number of fixed length instruction formats
  • Load/store architecture
  • Large number of registers
  • Additional characteristics of most RISCs
  • Small number of instructions
  • Small number of addressing modes
  • Fast control unit

57
An alternative architecture 80x86
  • 1978 The Intel 8086 is announced (16 bit
    architecture)
  • 1980 The 8087 floating point coprocessor is
    added
  • 1982 The 80286 increases address space to 24
    bits, instructions
  • 1985 The 80386 extends to 32 bits, new
    addressing modes
  • 1989-1995 The 80486, Pentium, Pentium Pro add a
    few instructions (mostly designed for higher
    performance)
  • 1997 MMX is added
  • Intel had a 16-bit microprocessor two years
    before its competitors more elegant
    architectures which led to the selection of the
    8086 as the CPU for the IBM PC
  • This history illustrates the impact of the
    golden handcuffs of compatibilityan
    architecture that is difficult to explain and
    impossible to love

58
A dominant architecture 80x86
  • See your textbook for a more detailed description
  • Complexity
  • Instructions from 1 to 17 bytes long
  • one operand must act as both a source and
    destination
  • one operand can come from memory
  • complex addressing modes e.g., base or scaled
    index with 8 or 32 bit displacement
  • Saving grace
  • the most frequently used architectural components
    are not too difficult to implement
  • compilers avoid the portions of the architecture
    that are slow

59
Summary
  • Instruction complexity is only one variable
  • lower instruction count vs. higher CPI / lower
    clock rate
  • Design Principles
  • simplicity favours regularity
  • smaller is faster
  • good design demands good compromises
  • make the common case fast
  • Instruction set architecture
  • a very important abstraction indeed!

60
Arithmetic
  • Where we've been
  • Performance (seconds, cycles, instructions)
  • Abstractions Instruction Set Architecture
    Assembly Language and Machine Language
  • What's up ahead
  • Implementing the Architecture

61
Arithmetic
  • We start with the Arithmetic Logic Unit

62
Numbers
  • Bits are just bits (no inherent meaning)
    conventions define relationship between bits and
    numbers
  • Binary numbers (base 2) 0000 0001 0010 0011 0100
    0101 0110 0111 1000 1001... decimal 0...2n-1
  • Of course it gets more complicated numbers are
    finite (overflow) fractions and real
    numbers negative numbers
  • How do we represent negative numbers? i.e.,
    which bit patterns will represent which numbers?
  • Octal and hexadecimal numbers
  • Floating-point numbers

63
Possible Representations of Signed Numbers
  • Sign Magnitude One's Complement
    Two's Complement 000 0 000 0 000
    0 001 1 001 1 001 1 010 2 010
    2 010 2 011 3 011 3 011 3 100
    -0 100 -3 100 -4 101 -1 101 -2 101
    -3 110 -2 110 -1 110 -2 111 -3 111
    -0 111 -1
  • Issues balance, number of zeros, ease of
    operations.
  • Twos complement is best.

64
MIPS
  • 32 bit signed numbers0000 0000 0000 0000 0000
    0000 0000 0000two 0ten0000 0000 0000 0000 0000
    0000 0000 0001two 1ten0000 0000 0000 0000
    0000 0000 0000 0010two 2ten...0111 1111
    1111 1111 1111 1111 1111 1110two
    2,147,483,646ten0111 1111 1111 1111 1111 1111
    1111 1111two 2,147,483,647ten1000 0000 0000
    0000 0000 0000 0000 0000two
    2,147,483,648ten1000 0000 0000 0000 0000 0000
    0000 0001two 2,147,483,647ten1000 0000 0000
    0000 0000 0000 0000 0010two
    2,147,483,646ten...1111 1111 1111 1111 1111
    1111 1111 1101two 3ten1111 1111 1111 1111
    1111 1111 1111 1110two 2ten1111 1111 1111
    1111 1111 1111 1111 1111two 1ten

65
Two's Complement Operations
  • Negating a two's complement number invert all
    bits and add 1
  • Remember Negate and invert are different
    operations.
  • You negate a number but invert a bit.
  • Converting n bit numbers into numbers with more
    than n bits
  • MIPS 16 bit immediate gets converted to 32 bits
    for arithmetic
  • copy the most significant bit (the sign bit) into
    the other bits 0010 -gt 0000 0010 1010 -gt
    1111 1010
  • "sign extension"
  • MIPS load byte instructions
  • lbu no sign extension
  • lb sign extension

66
Addition Subtraction
  • Just like in grade school (carry/borrow 1s)
    0111 0111 0110  0110 - 0110 - 0101
    1101 0001 0001
  • Two's complement operations easy
  • subtraction using addition of negative numbers
    0111  1010
  • 10001
  • Overflow (result too large for finite computer
    word)
  • e.g., adding two n-bit numbers does not yield an
    n-bit number 0111  0001
  • 1000

67
Detecting Overflow
  • No overflow when adding a positive and a negative
    number
  • No overflow when signs are the same for
    subtraction
  • Overflow occurs when the value affects the sign
  • overflow when adding two positives yields a
    negative
  • or, adding two negatives gives a positive
  • or, subtract a negative from a positive and get a
    negative
  • or, subtract a positive from a negative and get a
    positive
  • Consider the operations A B, and A B
  • Can overflow occur if B is 0 ? No.
  • Can overflow occur if A is 0 ? Yes.

68
Effects of Overflow
  • An exception (interrupt) occurs
  • Control jumps to predefined address for exception
  • Interrupted address is saved for possible
    resumption
  • Details based on software system / language
  • example flight control vs. homework assignment
  • Don't always want to detect overflow new MIPS
    instructions addu, addiu, subu note addiu
    still sign-extends! note sltu, sltiu for
    unsigned comparisons

69
Logical Operations
  • and, andi bit-by-bit AND
  • or, ori bit-by-bit OR
  • sll shift left logical
  • slr shift right logical
  • 0101 1010
  • shifting left two steps gives 0110 1000
  • 0110 1010
  • shifting right three bits gives 0000 1011

70
Logical unit
  • Let's build a logical unit to support the and and
    or instructions
  • we'll just build a 1 bit unit, and use 32 of them
  • op0 and op1 or
  • Possible Implementation (sum-of-products)
  • res a b a op b op

a
b
71
Review The Multiplexor
  • Selects one of the inputs to be the output,
    based on a control input IEC
    symbol
  • of a 4-input
  • MUX
  • Lets build our logical unit using a MUX

0
1
72
Different Implementations
  • Not easy to decide the best way to build
    something
  • Don't want too many inputs to a single gate
  • Dont want to have to go through too many gates
  • For our purposes, ease of comprehension is
    important
  • We use multiplexors
  • Let's look at a 1-bit ALU for addition
  • How could we build a 1-bit ALU for AND, OR and
    ADD?
  • How could we build a 32-bit ALU?

CarryIn
a
cout a b a cin b cin sum a xor b xor cin
Sum
b
CarryOut
73
Building a 32 bit ALU for AND, OR and ADD
We need a 4-input MUX.
74
What about subtraction (a b) ?
  • Two's complement approch just negate b and
    add.
  • A clever solution
  • In a multiple bit ALU the least significant
    CarryIn has to be equal to 1 for subtraction.

75
Tailoring the ALU to the MIPS
  • Need to support the set-on-less-than instruction
    (slt)
  • remember slt is an arithmetic instruction
  • produces a 1 if rs lt rt and 0 otherwise
  • use subtraction (a-b) lt 0 implies a lt b
  • Need to support test for equality (beq t5, t6,
    t7)
  • use subtraction (a-b) 0 implies a b

76
Supporting slt
  • Other ALUs
  • Most significant ALU

77
32 bit ALU supporting slt
altb ? a-blt0, thus Set is the sign bit of the
result.
78
Final ALU including test for equality
  • Notice control lines000 and001 or010
    add110 subtract111 slt
  • Note zero is a 1 when the result is zero!

79
Conclusion
  • We can build an ALU to support the MIPS
    instruction set
  • key idea use multiplexor to select the output
    we want
  • we can efficiently perform subtraction using
    twos complement
  • we can replicate a 1-bit ALU to produce a 32-bit
    ALU
  • Important points about hardware
  • all of the gates are always working
  • the speed of a gate is affected by the number of
    inputs to the gate
  • the speed of a circuit is affected by the number
    of gates in series (on the critical path or
    the deepest level of logic)
  • Our primary focus comprehension, however,
  • Clever changes to organization can improve
    performance (similar to using better algorithms
    in software)
  • well look at examples for addition,
    multiplication and division

80
Problem ripple carry adder is slow
  • A 32-bit ALU is much slower than a 1-bit ALU.
  • There are more than one way to do addition.
  • the two extremes ripple carry and
    sum-of-products
  • Can you see the ripple? How could you get rid
    of it?
  • c1 b0c0 a0c0 a0b0
  • c2 b1c1 a1c1 a1b1 c2 c2(a0,b0,c0,a1,b1)
  • c3 b2c2 a2c2 a2b2 c3 c3(a0,b0,c0,a1,b1,a2
    ,b2)
  • c4 b3c3 a3c3 a3b3 c4 c4(a0,b0,c0,a1,b1,a2
    ,b2,a3,b3)
  • Not feasible! Too many inputs to the gates.

81
Carry-lookahead adder
  • An approach in-between the two extremes
  • Motivation
  • If we didn't know the value of carry-in, what
    could we do?
  • When would we always generate a carry? gi
    ai bi
  • When would we propagate the carry?
    pi ai bi
  • Look at the truth table!
  • Did we get rid of the ripple?c1 g0 p0c0
  • c2 g1 p1c1 c2 g1p1g0p1p0c0
  • c3 g2 p2c2 c3 g2p2g1p2p1g0p2p1p0c0
  • c4 g3 p3c3 c4 ...
  • Feasible! A smaller number of inputs to the
    gates.

82
1-bit adder
a b cin cout sum 0 0 0 0 0 0
0 1 0 1 0 1 0 0 1 0 1
1 1 0 1 0 0 0 1 1 0 1
1 0 1 1 0 1 0 1 1 1
1 1
83
Use principle to build bigger adders
  • Cant build a 16 bit CLA adder (too big)
  • Could use ripple carry of 4-bit CLA adders
  • Better use the CLA principle again!
  • Principle shown in the figure. See textbook for
    details.

84
Multiplication
  • More complicated than addition
  • can be accomplished via shifting and addition
  • More time and more area
  • Let's look at 2 versions based on grammar school
    algorithm 0010 (multiplicand)
    x_1011 (multiplier) 0010
  • 0010
  • 0000
  • 0010___
  • 0010110
  • Negative numbers easy way convert to positive
    and multiply
  • there are better techniques

85
Multiplication, First Version
86
Multiplication, Final Version
87
Booths Algorithm
  • The grammar school method was implemented using
    addition and shifting
  • Booths algorithm also uses subtraction
  • Based on two bits of the multiplier either add,
    subtract or do nothing always shift
  • Handles twos complement numbers

88
Fast multipliers
  • Combinational implementations
  • Conventional multiplier algorithm
  • partial products with AND gates
  • adders
  • Lots of modifications
  • Sequential implementations
  • Pipelined multiplier
  • registers between levels of logic
  • result delayed
  • effective speed of multiple multiplications
    increased

89
Four-Bit Binary Multiplication
  • Multiplicand
    B3 B2 B1 B0
  • Multiplier
    ? A3 A2 A1 A0
  • 1st partial product
    A0B3 A0B2 A0B1 A0B0
  • 2nd partial product
    A1B3 A1B2 A1B1 A1B0
  • 3rd partial product
    A2B3 A2B2 A2B1 A2B0
  • 4th partial product A3B3
    A3B2 A3B1 A3B0
  • Final product P7 P6 P5
    P4 P3 P2 P1
    P0

90
Classical Implementation
91
Pipelined Multiplier
Clk
/
/
/
/
/
/
/
/
/
/
92
Division
  • Simple method
  • Initialise the remainder with the dividend
  • Start from most significant end
  • Subtract divisor from the remainder if possible
    (quotient bit 1)
  • Shift divisor to the right and repeat

93
Division, First Version
94
Division, Final Version
Same hardware for multiply and divide.
95
Floating Point (a brief look)
  • We need a way to represent
  • numbers with fractions, e.g., 3.1416
  • very small numbers, e.g., .000000001
  • very large numbers, e.g., 3.15576 ? 109
  • Representation
  • sign, exponent, significand (1)sign
    ???significand ???2exponent
  • more bits for significand gives more accuracy
  • more bits for exponent increases range
  • IEEE 754 floating point standard
  • single precision 8 bit exponent, 23 bit
    significand
  • double precision 11 bit exponent, 52 bit
    significand

96
IEEE 754 floating-point standard
  • Leading 1 bit of significand is implicit
  • Exponent is biased to make sorting easier
  • all 0s is smallest exponent all 1s is largest
  • bias of 127 for single precision and 1023 for
    double precision
  • summary (1)sign ?????significand)
    ???2exponent bias
  • Example
  • decimal -.75 -3/4 -3/22
  • binary -.11 -1.1 x 2-1
  • floating point exponent 126 01111110
  • IEEE single precision 10111111010000000000000000
    000000

97
Floating-point addition
  • 1. Shift the significand of the number with the
    lesser exponent right until the exponents match
  • 2. Add the significands
  • 3. Normalise the sum, checking for overflow or
    underflow
  • 4. Round the sum

98
Floating-point addition
99
Floating-point multiplication
  • 1. Add the exponents
  • 2. Multiply the significands
  • 3. Normalise the product, checking for overflow
    or underflow
  • 4. Round the product
  • 5. Find out the sign of the product

100
Floating Point Complexities
  • Operations are somewhat more complicated (see
    text)
  • In addition to overflow we can have underflow
  • Accuracy can be a big problem
  • IEEE 754 keeps two extra bits during intermediate
    calculations, guard and round
  • four rounding modes
  • positive divided by zero yields infinity
  • zero divide by zero yields not a number
  • other complexities
  • Implementing the standard can be tricky

101
Chapter Four Summary
  • Computer arithmetic is constrained by limited
    precision
  • Bit patterns have no inherent meaning but
    standards do exist
  • twos complement
  • IEEE 754 floating point
  • Computer instructions determine meaning of the
    bit patterns
  • Performance and accuracy are important so there
    are many complexities in real machines (i.e.,
    algorithms and implementation).
  • We are ready to move on (and implement the
    processor)
Write a Comment
User Comments (0)
About PowerShow.com