Title: Computer Architecture
1Computer Architecture
- Chapter 1
- Fundamentals
- Prof. Jerry Breecher
- CSCI 240
- Fall 2003
2Introduction
- 1.1 Introduction
- 1.2 The Task of a Computer Designer
- 1.3 Technology and Computer Usage Trends
- 1.4 Cost and Trends in Cost
- 1.5 Measuring and Reporting Performance
- 1.6 Quantitative Principles of Computer Design
- 1.7 Putting It All Together The Concept of
Memory Hierarchy
3Art and Architecture
Whats the difference between Art and
Architecture?
Lyonel Feininger, Marktkirche in Halle
4Art and Architecture
Notre Dame de Paris
Whats the difference between Art and
Architecture?
5Whats Computer Architecture?
- The attributes of a computing system as seen by
the programmer, i.e., the conceptual structure
and functional behavior, as distinct from the
organization of the data flows and controls the
logic design, and the physical implementation. - Amdahl, Blaaw, and Brooks, 1964
SOFTWARE
6Whats Computer Architecture?
- 1950s to 1960s Computer Architecture Course
Computer Arithmetic. - 1970s to mid 1980s Computer Architecture Course
Instruction Set Design, especially ISA
appropriate for compilers. (What well do in
Chapter 2) - 1990s to 2000s Computer Architecture
CourseDesign of CPU, memory system, I/O system,
Multiprocessors. (All evolving at a tremendous
rate!)
7The Task of a Computer Designer
- 1.1 Introduction
- 1.2 The Task of a Computer Designer
- 1.3 Technology and Computer Usage Trends
- 1.4 Cost and Trends in Cost
- 1.5 Measuring and Reporting Performance
- 1.6 Quantitative Principles of Computer Design
- 1.7 Putting It All Together The Concept of
Memory Hierarchy
Evaluate Existing Systems for Bottlenecks
Implementation Complexity
Benchmarks
Technology Trends
Implement Next Generation System
Simulate New Designs and Organizations
Workloads
8Technology and Computer Usage Trends
- 1.1 Introduction
- 1.2 The Task of a Computer Designer
- 1.3 Technology and Computer Usage Trends
- 1.4 Cost and Trends in Cost
- 1.5 Measuring and Reporting Performance
- 1.6 Quantitative Principles of Computer Design
- 1.7 Putting It All Together The Concept of
Memory Hierarchy
- When building a Cathedral numerous very practical
considerations need to be taken into account - available materials
- worker skills
- willingness of the client to pay the price.
- Similarly, Computer Architecture is about working
within constraints - What will the market buy?
- Cost/Performance
- Tradeoffs in materials and processes
9Trends
- Gordon Moore (Founder of Intel) observed in 1965
that the number of transistors that could be
crammed on a chip doubles every year. - This has CONTINUED to be true since then.
10Trends
- Processor performance, as measured by the SPEC
benchmark has also risen dramatically.
11Trends
- Memory Capacity (and Cost) have changed
dramatically in the last 20 years.
year size(Mb) cyc time 1980 0.0625 250
ns 1983 0.25 220 ns 1986 1 190 ns 1989 4 165
ns 1992 16 145 ns 1996 64 120 ns 2000 256 100
ns
12Trends
- Based on SPEED, the CPU has increased
dramatically, but memory and disk have increased
only a little. This has led to dramatic changed
in architecture, Operating Systems, and
Programming practices.
Capacity Speed (latency) Logic 2x in 3
years 2x in 3 years DRAM 4x in 3 years 2x in
10 years Disk 4x in 3 years 2x in 10 years
13Measuring And Reporting Performance
- 1.1 Introduction
- 1.2 The Task of a Computer Designer
- 1.3 Technology and Computer Usage Trends
- 1.4 Cost and Trends in Cost
- 1.5 Measuring and Reporting Performance
- 1.6 Quantitative Principles of Computer Design
- 1.7 Putting It All Together The Concept of
Memory Hierarchy
- This section talks about
- Metrics how do we describe in a numerical way
the performance of a computer? - What tools do we use to find those metrics?
14Metrics
- Time to run the task (ExTime)
- Execution time, response time, latency
- Tasks per day, hour, week, sec, ns
(Performance) - Throughput, bandwidth
15Metrics - Comparisons
- "X is n times faster than Y" means
- ExTime(Y) Performance(X)
- --------- ---------------
- ExTime(X) Performance(Y)
- Speed of Concorde vs. Boeing 747
- Throughput of Boeing 747 vs. Concorde
16Metrics - Comparisons
- Pat has developed a new product, "rabbit" about
which she wishes to determine performance. There
is special interest in comparing the new product,
rabbit to the old product, turtle, since the
product was rewritten for performance reasons.
(Pat had used Performance Engineering techniques
and thus knew that rabbit was "about twice as
fast" as turtle.) The measurements showed - Â
- Performance Comparisons
- Â
- Product Transactions / second Seconds/
transaction Seconds to process transaction - Turtle 30 0.0333 3
- Rabbit 60 0.0166 1
- Which of the following statements reflect the
performance comparison of rabbit and turtle? - Â
o Rabbit is 100 faster than turtle. o Rabbit is
twice as fast as turtle. o Rabbit takes 1/2 as
long as turtle. o Rabbit takes 1/3 as long as
turtle. o Rabbit takes 100 less time than turtle.
o Rabbit takes 200 less time than turtle. o
Turtle is 50 as fast as rabbit. o Turtle is 50
slower than rabbit. o Turtle takes 200 longer
than rabbit. o Turtle takes 300 longer than
rabbit.
17Metrics - Throughput
18Methods For Predicting Performance
- Benchmarks, Traces, Mixes
- Hardware Cost, delay, area, power estimation
- Simulation (many levels)
- ISA, RT, Gate, Circuit
- Queuing Theory
- Rules of Thumb
- Fundamental Laws/Principles
19Benchmarks
SPEC System Performance Evaluation Cooperative
- First Round 1989
- 10 programs yielding a single number
(SPECmarks) - Second Round 1992
- SPECInt92 (6 integer programs) and SPECfp92 (14
floating point programs) - Compiler Flags unlimited. March 93 of DEC 4000
Model 610 - spice unix.c/def(sysv,has_bcopy,bcopy(a,b,c)
memcpy(b,a,c) - wave5 /ali(all,dcomnat)/aga/ur4/ur200
- nasa7 /norecu/aga/ur4/ur2200/lcblas
- Third Round 1995
- new set of programs SPECint95 (8 integer
programs) and SPECfp95 (10 floating point) - benchmarks useful for 3 years
- Single flag setting for all programs
SPECint_base95, SPECfp_base95
20Benchmarks
CINT2000 (Integer Component of SPEC CPU2000)
- Program Language What Is It
- 164.gzip C Compression
- 175.vpr C FPGA Circuit Placement and Routing
- 176.gcc C C Programming Language Compiler
- 181.mcf C Combinatorial Optimization
- 186.crafty C Game Playing Chess
- 197.parser C Word Processing
- 252.eon C Computer Visualization
- 253.perlbmk C PERL Programming Language
- 254.gap C Group Theory, Interpreter
- 255.vortex C Object-oriented Database
- 256.bzip2 C Compression
- 300.twolf C Place and Route Simulator
http//www.spec.org/osg/cpu2000/CINT2000/
21Benchmarks
CFP2000 (Floating Point Component of SPEC
CPU2000)
- Program Language What Is It
- 168.wupwise Fortran 77 Physics / Quantum
Chromodynamics - 171.swim Fortran 77 Shallow Water Modeling
- 172.mgrid Fortran 77 Multi-grid Solver 3D
Potential Field - 173.applu Fortran 77 Parabolic / Elliptic
Differential Equations - 177.mesa C 3-D Graphics Library
- 178.galgel Fortran 90 Computational Fluid
Dynamics - 179.art C Image Recognition / Neural Networks
- 183.equake C Seismic Wave Propagation Simulation
- 187.facerec Fortran 90 Image Processing Face
Recognition - 188.ammp C Computational Chemistry
- 189.lucas Fortran 90 Number Theory / Primality
Testing - 191.fma3d Fortran 90 Finite-element Crash
Simulation - 200.sixtrack Fortran 77 High Energy Physics
Accelerator Design - 301.apsi Fortran 77 Meteorology Pollutant
Distribution
http//www.spec.org/osg/cpu2000/CFP2000/
22Benchmarks
Sample Results For SpecINT2000
http//www.spec.org/osg/cpu2000/results/res2000q3/
cpu2000-20000718-00168.asc
Base Base
Base Peak Peak Peak Benchmarks
Ref Time Run Time Ratio Ref Time
Run Time Ratio 164.gzip 1400
277 505 1400 270
518 175.vpr 1400 419 334
1400 417 336 176.gcc
1100 275 399 1100 272
405 181.mcf 1800 621
290 1800 619 291 186.crafty
1000 191 522 1000 191
523 197.parser 1800 500
360 1800 499 361 252.eon
1300 267 486 1300 267
486 253.perlbmk 1800 302
596 1800 302 596 254.gap
1100 249 442 1100 248
443 255.vortex 1900 268
710 1900 264 719 256.bzip2
1500 389 386 1500 375
400 300.twolf 3000 784
382 3000 776 387 SPECint_base200
0 438 SPECint2000
442
Intel OR840(1 GHz Pentium III processor)
23Benchmarks
Performance Evaluation
- For better or worse, benchmarks shape a field
- Good products created when have
- Good benchmarks
- Good ways to summarize performance
- Given sales is a function in part of performance
relative to competition, investment in improving
product as reported by performance summary - If benchmarks/summary inadequate, then choose
between improving product for real programs vs.
improving product to get more salesSales almost
always wins! - Execution time is the measure of computer
performance!
24Benchmarks
How to Summarize Performance
- Management would like to have one number.
- Technical people want more
- They want to have evidence of reproducibility
there should be enough information so that you or
someone else can repeat the experiment. - There should be consistency when doing the
measurements multiple times.
How would you report these results?
25Quantitative Principles of Computer Design
- 1.1 Introduction
- 1.2 The Task of a Computer Designer
- 1.3 Technology and Computer Usage Trends
- 1.4 Cost and Trends in Cost
- 1.5 Measuring and Reporting Performance
- 1.6 Quantitative Principles of Computer Design
- 1.7 Putting It All Together The Concept of
Memory Hierarchy
Make the common case fast. Amdahls Law Relates
total speedup of a system to the speedup of some
portion of that system.
26Amdahl's Law
Quantitative Design
Speedup due to enhancement E
This fraction enhanced
- Suppose that enhancement E accelerates a fraction
F of the task by a factor S, and the remainder of
the task is unaffected
27Amdahl's Law
Quantitative Design
This fraction enhanced
ExTimeold
ExTimenew
28Amdahl's Law
Quantitative Design
- Floating point instructions improved to run 2X
but only 10 of actual instructions are FP
29Quantitative Design
Cycles Per Instruction
CPI (CPU Time Clock Rate) / Instruction Count
Cycles / Instruction Count
Number of instructions of type I.
Instruction Frequency
where
- Invest Resources where time is Spent!
30Quantitative Design
Cycles Per Instruction
Suppose we have a machine where we can count the
frequency with which instructions are executed.
We also know how many cycles it takes for each
instruction type.
- Base Machine (Reg / Reg)
- Op Freq Cycles CPI(i) ( Time)
- ALU 50 1 .5 (33)
- Load 20 2 .4 (27)
- Store 10 2 .2 (13)
- Branch 20 2 .4 (27)
- Total CPI 1.5
How do we get CPI(I)? How do we get time?
31Quantitative Design
Locality of Reference
- Programs access a relatively small portion of the
address space at any instant of time. - There are two different types of locality
- Temporal Locality (locality in time) If an item
is referenced, it will tend to be referenced
again soon (loops, reuse, etc.) - Spatial Locality (locality in space/location)
If an item is referenced, items whose addresses
are close by tend to be referenced soon (straight
line code, array access, etc.)
32The Concept of Memory Hierarchy
- 1.1 Introduction
- 1.2 The Task of a Computer Designer
- 1.3 Technology and Computer Usage Trends
- 1.4 Cost and Trends in Cost
- 1.5 Measuring and Reporting Performance
- 1.6 Quantitative Principles of Computer Design
- 1.7 Putting It All Together The Concept of
Memory Hierarchy
Fast memory is expensive. Slow memory is
cheap. The goal is to minimize the
price/performance for a particular price point.
33Memory Hierarchy
Registers
Level 1 cache
Level 2 Cache
Memory
Disk
34Memory Hierarchy
- Hit data appears in some block in the upper
level (example Block X) - Hit Rate the fraction of memory access found in
the upper level - Hit Time Time to access the upper level which
consists of - RAM access time Time to determine hit/miss
- Miss data needs to be retrieve from a block in
the lower level (Block Y) - Miss Rate 1 - (Hit Rate)
- Miss Penalty Time to replace a block in the
upper level - Time to deliver the block the processor
- Hit Time ltlt Miss Penalty (500 instructions on
21264!)
35Memory Hierarchy
Registers
Level 1 cache
Level 2 Cache
Memory
Disk
- What is the cost of executing a program if
- Stores are free (theres a write pipe)
- Loads are 20 of all instructions
- 80 of loads hit (are found) in the Level 1 cache
- 97 of loads hit in the Level 2 cache.
36Wrap Up
- 1.1 Introduction
- 1.2 The Task of a Computer Designer
- 1.3 Technology and Computer Usage Trends
- 1.4 Cost and Trends in Cost
- 1.5 Measuring and Reporting Performance
- 1.6 Quantitative Principles of Computer Design
- 1.7 Putting It All Together The Concept of
Memory Hierarchy