EECS 470 - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

EECS 470

Description:

Measuring Performance. Use Total Execution Time: A is 3 times faster than B for programs P1,P2 ... Measuring Performance. Normalized Execution Time: ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 36
Provided by: garyt
Category:
Tags: eecs | measuring

less

Transcript and Presenter's Notes

Title: EECS 470


1
EECS 470
  • Computer Architecture
  • Lecture 2
  • Coverage Chapters 1-2

2
A Quantitative Approach
  • Hardware systems performance is generally easy to
    quantify
  • Machine A is 10 faster than Machine B
  • Of course Machine Bs advertising will show the
    opposite conclusion
  • Example Pentium 4 vs. AMD Hammer
  • Many software systems tend to have much more
    subjective performance evaluations.

3
Measuring Performance
  • Use Total Execution Time
  • A is 3 times faster than B for programs P1,P2
  • Issue Emphasizes long running programs

1
n
?
Timei
n
i1
4
Measuring Performance
  • Weighted Execution Time
  • What if P1 is executed far more frequently?

n
?
Weighti ? Timei Weighti 1
Arithmetic mean (AM)
i1
n
?
i1
5
Measuring Performance
  • Normalized Execution Time
  • Compare machine performance to a reference
    machine and report a ratio.
  • SPEC ratings measure relative performance to a
    reference machine.

6
Example using execution times
CompA CompB
Prog1 1 10
Prog2 1000 100
Total 1001 111
Conclusion B is faster than A It is 1001/111
9.1 times faster
7
Averaging Performance Over Benchmarks
n
1
?
  • Arithmetic mean (AM)
  • Geometric mean (GM)
  • Harmonic mean (HM)

Timei
n
i 1
v
n
n
?
Timei
i 1
n
n
1
?
Ratei
i 1
8
Which is the right Mean?
  • Arithmetic when dealing with execution time
  • Harmonic when dealing with rates
  • flops
  • MIPS
  • Hertz
  • Geometric mean gives an equi-weighted average

9
Use Harmonic Mean with Rates
million flops CompA CompB CompC
Prog1 100 1 10 20
Prog2 100 1000 100 20
Total time Total time 1001 111 40
Notice that the total time ordering is preserved
in the HM of the rates
CompA CompB CompC
Prog1 100 10 5
Prog2 0.1 1 5
AM 50.5 5.5 5
GM 3.2 3.2 5
HM 0.2 1.8 5
Rates (mflops) from above table
10
Normalized Times
  • Dont take AM of normalized execution times

CompA CompB Normalized to A A B Normalized to A A B Normalized to B A B Normalized to B A B
Prog1 1 10 1 10 0.1 1
Prog2 1000 100 1 0.1 10 1
AM 500.5 55.0 1 5.05 5.05 1
GM 31.6 31.6 1 1 1 1
which one?
which one?
  • GM doesnt track total execution time last line

11
Notes Benchmarks
  • AM GM
  • GM (Xi) / GM (Yi) GM (Xi / Yi )
  • The GM is unaffected by normalizing it just
    doesnt track execution time
  • Why does SPEC use it?
  • SPEC system performance evaluation cooperative
  • http//www.specbench.org/
  • EEMBC benchmarks for embedded applications
    embedded microporcessor benchmark consortium
  • http//www.eembc.org/

12
Amdahls Law
  • Rule of Thumb Make the common case faster

Execution timenew Execution timeold ?
?(1 - Fractionenhanced)
)
Fractionenhanced
Speedupenhanced
(Attack longest running part until it is no
longer) repeat
13
Instruction Set Design
  • Software Systems named variables complex
    semantics.
  • Hardware systems tight timing requirements
    small storage structures simple semantics
  • Instruction set the interface between very
    different software and hardware systems

14
Design decisions
  • How much state is in the microarchitecture?
  • Registers Flags IP/PC
  • How is that state accessed/manipulated?
  • Operand encoding
  • What commands are supported?
  • Opcode opcode encoding

15
Design Challenges or why is architecture still
relevant?
  • Clock frequency is increasing
  • This changes the number of levels of gates that
    can be completed each cycle so old designs dont
    work.
  • It also tend to increase the ration of time spent
    on wires (fixed speed of light)
  • Power
  • Faster chips are hotter bigger chips are hotter

16
Design Challenges (cont)
  • Design Complexity
  • More complex designs to fix frequency/power
    issues leads to increased development/testing
    costs
  • Failures (design or transient) can be difficult
    to understand (and fix)
  • We seem far less willing to live with hardware
    errors (e.g. FDIV) than software errors
  • which are often dealt with through upgrades
    that we pay for!)

17
Techniques for Encoding Operands
  • Explicit operands
  • Includes a field to specify which state data is
    referenced
  • Example register specifier
  • Implicit operands
  • All state data can be inferred from the opcode
  • Example function return (CISC-style)

18
Accumulator
  • Architectures with one implicit register
  • Acts as source and/or destination
  • One other source explicit
  • Example C A B
  • Load A // (Acc)umulator ? A
  • Add B // Acc ? Acc B
  • Store C // C ? Acc

Ref Instruction Level Distributed Processing
Adapting to Shifting Technology
19
Stack
  • Architectures with implicit stack
  • Acts as source(s) and/or destination
  • Push and Pop operations have 1 explicit operand
  • Example C A B
  • Push A // Stack A
  • Push B // Stack A, B
  • Add // Stack AB
  • Pop C // C ? AB Stack

Compact encoding may require more instructions
though
20
Registers
  • Most general (and common) approach
  • Small array of storage
  • Explicit operands (register file index)
  • Example C A B
  • Register-memory load/store
  • Load R1, A Load R1, A
  • Load R2, B
  • Add R3, R1, B Add R3, R1, R2
  • Store R3, C Store R3, C

21
Memory
  • Big array of storage
  • More complex ways of indexing than registers
  • Build addressing modes to support efficient
    translation of software abstractions
  • Uses less space in instruction than 32-bit
    immediate field
  • Ai use base (A) displacement (i)
    (scaled?)
  • a.ptr use base (ptr) displacement (a)

22
Addressing modes
  • Register Add R4, R3
  • Immediate Add R4, 3
  • Base/Displacement Add R4, 100(R1)
  • Register Indirect Add R4, (R1)
  • Indexed Add R4, (R1R2)
  • Direct Add R4, (1001)
  • Memory Indirect Add R4, _at_(R3)
  • Autoincrement Add R4, (R2)

23
Other Memory Issues
  • What is the size of each element in memory?

Byte Half word Word
0x000
0-255
0x000
0 - 65535
0 - 4B
0x000
24
Other Memory Issues
  • Big-endian or Little-endian? Store 0x114488FF

Points to most significant byte
Points to least significant byte
0x000
11
0x000
FF
44
88
88
44
FF
11
25
Other Memory Issues
  • Non-word loads? ldb R3, (000)

00 00 00 11
0x000
11
44
88
FF
26
Other Memory Issues
  • Non-word loads? ldb R3, (003)

FF FF FF FF
11
44
Sign extended
88
0x003
FF
27
Other Memory Issues
  • Non-word loads? ldbu R3, (003)

00 00 00 FF
11
44
Zero filled
88
FF
0x003
28
Other Memory Issues
  • Alignment? Word accesses only address ending in
    00
  • Half-word accesses only ending in 0
  • Byte accesses any address

11
44
ldw R3, (002) is illegal!
88
0x002
Why is it important to be aligned? How can it be
enforced?
FF
29
Techniques for Encoding Operators
  • Opcode is translated to control signals that
  • direct data (MUX control)
  • select operation for ALU
  • Set read/write selects for register/memory/PC
  • Tradeoff between how flexible the control is and
    how compact the opcode encoding.
  • Microcode direct control of signals (Improv)
  • Opcode compact representation of a set of
    control signals.
  • You can make decode easier with careful opcode
    selection (as done in HW1)

30
Handling Control Flow
  • Conditional branches (short range)
  • Unconditional branches (jumps)
  • Function calls
  • Returns
  • Traps (OS calls and exceptions)
  • Predicates (conditional retirement)

31
Encoding branch targets
  • PC-relative addressing
  • Makes linking code easier
  • Indirect addressing
  • Jumps into shared libraries, virtual functions,
    case/switch statements
  • Some unusual modes to simplify target address
    calculation
  • (segment offset) or (trap number)

32
Condition codes
  • Flags
  • Implicit flag(s) specified in opcode (bgt)
  • Flag(s) set by earlier instructions (compare,
    add, etc.)
  • Register
  • Uses a register requires explicit specifier
  • Comparison operation
  • Two registers with compare operation specified in
    opcode.

33
Higher Level Semantics Functions
  • Function call semantics
  • Save PC 1 instruction for return
  • Manage parameters
  • Allocate space on stack
  • Jump to function
  • Simple approach
  • Use a jump instruction other instructions
  • Complex approach
  • Build implicit operations into new call
    instruction

34
Role of the Compiler
  • Compilers make the complexity of the ISA (from
    the programmers point of view) less relevant.
  • Non-orthogonal ISAs are more challenging.
  • State allocation (register allocation) is better
    left to compiler heuristics
  • Complex Semantics lead to more global
    optimization easier for a machine to do.

People are good at optimizing 10 lines of
code. Compilers are good at optimizing 10M lines.
35
Next time
  • Compiler optimizations
  • Interaction between compilers and architectures
  • Higher level machine codes (Java VM)
  • Starting Pipelining Appendix A
Write a Comment
User Comments (0)
About PowerShow.com