Title: Course Introduction and Overview
1Course Introduction and Overview
- Pradondet Nilagupta
- Spring 2005
- (original notes from
- Prof. Randy Katz,
- Prof. Dan Connors,
- Prof. Amirali Baniasadi)
2Course Info
- Lecturer Pradondet Nilagupta
- Email pom_at_ku.ac.th
- Homepagehttp//www.cpe.ku.ac.th/pom/courses/205
521/204521.html - Phone 9428555 ext 1401
- office rm.406 Building 15 Engineering Faculty
- Office hours Wed. 4-6 PM. or make an appointment
- Lecture hr Wed. 6-9 PM.
3References Textbook
- Main Textbook (required)
- Computer Architecture a Quantitative Approach 3rd
Edition, John L. Hennessy, David A. Patterson,
Morgan Kaufmann 2003. - Supplement Text
- Advance Computer Architectures A design Space
Approach, Dezso Sima, Terence Fountain, Peter
Kacsuk, Addison-Wesley, 1997 - Computer System Design and Architecture, Vincent
P. Heuring, Harry F. Jordan, Addison-Wesley 1997. - Computer Organization, V. Carl Hamacher, Zvonko
G. Vranesic, Safwat G. Zaky,, McGraw-Hill, 1996.
4Grading
- 25 Homeworks
- 30 MidtermExam
- 30 Final Exam
- 15 Paper Presentation
5Topic Coverage
- Fundamentals of Computer Design (Chapter 1)
- Instruction Set Principle (Chapter 2)
- Pipelining Basic and Intermediate Concepts (
Appendix A) - Instructional Level Parallelism (Chapter 3, 4)
- Memory Hierarchy Design(Chapter 5)
- Storage System (Chapter 7)
- Computer Arithmetic (Appendix H)
- Vector Processors (Appendix G)
6Related Courses
Strong Prerequisite
Digital Org.
Comp. Arch.
Parallel
Why, Analysis, Evaluation
Parallel Architectures, Languages, Systems
How to build it Implementation details
7Course Focus (1/2)
- To Understand the design techniques, machine
structures, technology factors, evaluation
methods that will determine the form of computers
in 21st Century
Parallelism
Technology
Programming
Languages
Applications
Interface Design (Inst. Set Arch.)
Computer Architecture Instruction Set
Design Organization Hardware
Operating
Measurement Evaluation
History
Systems
8Computer Architecture Is
- the attributes of a computing system as seen
by the programmer, i.e., the conceptual structure
and functional behavior, as distinct from the
organization of the data flows and controls the
logic design, and the physical implementation. - Amdahl, Blaaw, and Brooks, 1964
SOFTWARE
9Computer Architectures Changing Definition
- 1950s to 1960s Computer Architecture Course
Computer Arithmetic - 1970s to mid 1980s Computer Architecture
Course Instruction Set Design, especially ISA
appropriate for compilers - 1990s Computer Architecture CourseDesign of
CPU, memory system, I/O system, Multiprocessors,
Networks - 2010s Computer Architecture Course Self
adapting systems? Self organizing structures?DNA
Systems/Quantum Computing?
10Computer Architecture Topics (1/2)
Input/Output and Storage
Disks, WORM, Tape
RAID
Emerging Technologies Interleaving Bus protocols
DRAM
Coherence, Bandwidth, Latency
Memory Hierarchy
L2 Cache
L1 Cache
Addressing, Protection, Exception Handling
VLSI
Instruction Set Architecture
Pipelining, Hazard Resolution, Superscalar,
Reordering, Prediction, Speculation
Pipelining and Instruction Level Parallelism
11Computer Architecture Topics (2/2)
M
P
M
P
M
P
M
P
S
Interconnection Network
Processor-Memory-Switch
Topologies, Routing, Bandwidth, Latency, Reliabili
ty
Multiprocessors Networks and Interconnections
12- Throughout this text we will focus on optimizing
machine cost per performance
13Computer Architecture
- Role of a computer architect
- To design and engineer the various levels of a
computer system to maximize performance and
programmability within limits of technology and
cost
14Levels of Abstraction
S/W and H/W consists of hierarchical layers of
abstraction, each hides details of lower
layers from the above layer The instruction set
arch. abstracts the H/W and S/W interface and
allows many implementation of varying cost
and performance to run the same S/W
15The Task of Computer Designer
- determine what attribute are important for a new
machine - design a machine to maximize cost performance
- What are these Task?
- instruction set design
- function organization
- logic design
- implementation
- IC design, packaging, power, cooling.
16Instruction Set Architecture (ISA)
- refer to actual programmer visible instruction
set - serve as the boundary between software and
hardware - must be designed to survive changes in hardware
technology, software technology, and application
characteristic. - i.e. 80xx, 68xxx,80x86
17Instruction Set Architecture (ISA)
- Instruction set
- Complete set of instructions used by a machine
- ISA
- Abstract interface between the HW and
lowest-level SW. It encompasses information
needed to write machine-language programs
including - Instructions
- Memory size
- Registers used
- . . .
18Instruction Set Architecture (ISA)
- ISA is considered part of the SW
- Several implementations for the same ISA can
exist - Modern ISAs
- 80x86/Pentium/K6, PowerPC, DEC Alpha, MIPS,
SPARC, HP - We are going to study MIPS
- Advantages
- Different implementations of the same
architecture - Easier to change than HW
- Standardizes instructions, machine language bit
patterns, etc. - Disadvantage
- Sometimes prevents using new innovations
19Instruction Set Architecture (ISA)
- Instruction Execution Cycle
20Organization and Hardware (1/2)
- organization includes high-level aspect of
computer design such as - memory system
- bus structure
- internal CPU
- arithmetic, logic, branch, data transfer are
implemented - i.e.. SPARC2, SPARC 20 has same instruction set
but different organization
21Organization and Hardware (2/2)
- Hardware used to refer to specific of a machine
- detailed logic design
- packaging technology of machine
machine identical ISA and nearly identical
organization but they differs in detailed
hardware implementation
22Choosing between 2 designs
- What should the computer architect aware of in
choosing between two designs? - design complexity
- complex design take longer to complete, this
means a design will need to have higher
performance to be competitive - design time both hardware and software
23Early Computing
- 1946 ENIAC, us Army, 18,000 Vacuum Tubes
- 1949 UNIVAC I, 250K, 48 systems sold
- 1954 IBM 701, Core Memory
- 1957 Moving Head Disk
- 1958 Transistor, FORTRAN, ALGOL, CDC DEC
- Founded
- 1964 IBM 360, CDC 6600, DEC PDP-8
- 1969 UNIX
- 1970 FLOPPY DISK
- 1981 IBM PC, 1st Successful Portable (Osborne1)
- 1986 Connection Machine, MAX Headroom Debut
24Underlying Technologies
Year Logic Storage Prog. Lang. O/S 54 Tubes core
(8 ms) 58 Transistor (10?s) FORTRAN 60 ALGOL,
COBOL Batch 64 Hybrid (1 ?s) thin film
(200ns) Lisp, APL, Basic 66 IC (100ns) PL/1,
Simula,C 67 Multiprog. 71 LSI (10ns) 1k
DRAM O.O. V.M. 73 (8-bit ?P) 75 (16-bit
?P) 4k DRAM 78 VLSI (10ns) 16k DRAM Networks 80
64k DRAM 84 (32-bit ?P) 256k DRAM ADA 87 ULSI 1M
DRAM 89 GAs 4M DRAM C 92 (64-bit ?P) 16M
DRAM Fortran90
Generation Evolutionary
Parallelism
25History
- 1. Big Iron Computers
- Used vacuum tubes, electric relays and bulk
magnetic storage devices. No microprocessors. No
memory. - Example ENIAC (1945), IBM Mark 1 (1944)
26History
- Von Newmann
- Invented EDSAC (1949).
- First Stored Program
- Computer. Uses Memory.
- Importance We are still using
- The same basic design.
27The Von Neumann Computer Model
- Partitioning of the computing engine into
components - Central Processing Unit (CPU) Control Unit
(instruction decode , sequencing of operations),
Datapath (registers, arithmetic and logic unit,
buses). - Memory Instruction and operand storage.
- Input/Output (I/O) sub-system I/O bus,
interfaces, devices. - The stored program concept Instructions from an
instruction set are fetched from a common memory
and executed one at a time
Major CPU Performance Limitation The Von
Neumann computing model implies sequential
execution one instruction at a time
28Hardware Components of Any Computer
29Generic CPU Machine Instruction Execution Steps
(Implied by The Von Neumann Computer Model)
Obtain instruction from program storage
Determine required actions and instruction size
Locate and obtain operand data
Compute result value or status
Deposit results in storage for later use
Determine successor or next instruction
Major CPU Performance Limitation The Von
Neumann computing model implies sequential
execution one instruction at a time
30Computer Components
- Datapath of a von Newman machine
OP1 OP2 ... Op1 Op2
General-purpose Registers
ALU i/p registers
Op1
Op2
Bus
ALU
ALU o/p register
OP1 OP2
31Computer Components
- Processor(CPU)
- Active part of the motherboard
- Performs calculations activates devices
- Gets instruction data from memory
- Components are connected via Buses
- Bus
- Collection of parallel wires
- Transmits data, instructions, or control signals
- Motherboard
- Physical chips for I/O connections, memory, CPU
32Computer Components
- CPU consists of
- Datapath (ALU Registers)
- Performs arithmetic logical operations
- Control (CU)
- Controls the data path, memory, I/O devices
- Sends signals that determine operations of
datapath, memory, input output
33Technology Change
- Technology changes rapidly
- HW
- Vacuum tubes Electron emitting devices
- Transistors On-off switches controlled by
electricity - Integrated Circuits( IC/ Chips) Combines
thousands of transistors - Very Large-Scale Integration( VLSI) Combines
millions of transistors - What next?
- SW
- Machine language Zeros and ones
- Assembly language Mnemonics
- High-Level Languages English-like
- Artificial Intelligence languages Functions
logic predicates - Object-Oriented Programming Objects operations
on objects
34Technology gt dramatic change
- Processor
- logic capacity about 30 per year
- clock rate about 20 per year
- Memory
- DRAM capacity about 60 per year (4x every 3
years) - Memory speed about 10 per year
- Cost per bit improves about 25 per year
- Disk
- capacity about 60 per year
- Question Does every thing look OK?
35Software Evolution.
- Machine language
- Assembly language
- High-level languages
- Subroutine libraries
- There is a large gap between what is convenient
for computers what is convenient for humans - Translation/Interpretation is needed between both
36Language Evolution
swap (int v, int k) int temp temp
vk vk vk1 vk1 temp
High-level language program (in C)
swap muli 2, 5, 4 add 2,
4, 2 lw 15, 0(2) lw
18, 4(2) sw 18, 0(2)
sw 15, 4(2) jr 31
Assembly language program (for MIPS)
Binary machine language program (for MIPS)
37HW - SW Components
- Hardware
- Memory components
- Registers
- Register file
- memory
- Disks
- Functional components
- Adder, multiplier, dividers, . . .
- Comparators
- Control signals
- Software
- Data
- Simple
- Characters
- Integers
- Floating-point
- Pointers
- Structured
- Arrays
- Structures ( records)
- Instructions
- Data transfer
- Arithmetic
- Shift
- Control flow
- Comparison
- . . .
38Predictions for the Late 1990s (1/2)
- Technology
- Very large dynamic RAM 64 MBits and beyond
- Large fast Static RAM 1 MB, 10ns
- Complete systems on a chip
- 10 Million Transistors
- Parallelism
- Superscalar, Superpipeline, Vector,
Multiprocessors - Processor Arrays
39Predictions for the Late 1990s (2/2)
- Low Power
- 50 of PCs portable by 1995
- Performance per watt
- Parallel I/O
- Many applications I/O limited, not computation
- Computation scaling, but memory, I/O bandwidth
not keeping pace - Multimedia
- New interface technologies
- Video, speech, handwriting, virtual reality,
40Moores Law
41Technology gt dramatic change
- Processor
- logic capacity about 30 increase per year
- clock rate about 20 increase per year
- Memory
- DRAM capacity about 60 increase per year (4x
every 3 years) - Memory speed about 10 increase per year
- Cost per bit about 25 improvement per year
- Disk
- Capacity about 60 increase per year
Higher logic density gave room for instruction
pipeline cache
Performance optimization no longer implies
smaller programs
Computers became lighter and more power efficient
42Dead Computer Society
- ACRI
- Alliant
- American Supercomputer
- Ametek
- Applied Dynamics
- Astronautics
- BBN
- CDC
- Convex
- Cray Computer
- Cray Research
- Culler-Harris
- Culler Scientific
- Cydrome
- Dana/Ardent/Stellar/Stardent
- Denelcor
- Elxsi
- ETA Systems
- Evans and Sutherland Computer Division
- Gould NPL
- Guiltech
- Intel Scientific Computers
- International Parallel Machines
- Kendall Square Research
- Key Computer Laboratories
- MasPar
- Meiko
- Multiflow
- Myrias
- Numerix
- Prisma
- Thinking Machines
- Saxpy
- Scientific Computer Systems (SCS)
- Soviet Supercomputers
- Supertek
- Supercomputer Systems (SSI)
- Suprenum
43The Processor Chip
44Intel 4004 Die Photo
- Introduced in 1970
- First microprocessor
- 2,250 transistors
- 12 mm2
- 108 KHz
45Intel 8086 Die Scan
- 29,0000 transistors
- 33 mm2
- 5 MHz
- Introduced in 1979
- Basic architecture of the IA32 PC
46Intel 80486 Die Scan
- 1,200,000 transistors
- 81 mm2
- 25 MHz
- Introduced in 1989
- 1st pipelined implementation of IA32
47Pentium Die Photo
- 3,100,000 transistors
- 296 mm2
- 60 MHz
- Introduced in 1993
- 1st superscalar implementation of IA32
48Pentium III
- 9,5000,000 transistors
- 125 mm2
- 450 MHz
- Introduced in 1999
49MOOREs LAW
Processor-DRAM Memory Gap (latency)
µProc 60/yr. (2X/1.5yr)
1000
CPU
Moores Law
100
Processor-Memory Performance Gap(grows 50 /
year)
Performance
10
DRAM 9/yr. (2X/10 yrs)
DRAM
1
1980
1981
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
1982
Time
50Moores Not-Exactly-Law
- Not a law of nature
- But fairly accurate over 38 years and counting
- No exponential is forever
- but we can delay forever (Gordon Moore in 2003)
- More about Moores Law athttp//www.intel.com/res
earch/silicon/mooreslaw.htm
51Its all about money.(Trends affect profits,
costs, etc.)
52Performance Trend
- In general, tradeoffs should improve performance
- The natural idea here HW cheaper, easier to
manufacture ? can make our processor do more
things
53Price Trends (Pentium III)
54Price Trends (DRAM memory)
55Computer Engineering Methodology
Technology Trends
56Computer Engineering Methodology
Evaluate Existing Systems for Bottlenecks
Benchmarks
Technology Trends
57Computer Engineering Methodology
Evaluate Existing Systems for Bottlenecks
Benchmarks
Technology Trends
Simulate New Designs and Organizations
Workloads
58Computer Engineering Methodology
Evaluate Existing Systems for Bottlenecks
Implementation Complexity
Benchmarks
Technology Trends
Implement Next Generation System
Simulate New Designs and Organizations
Workloads
59Context for Designing New Architectures
- Application Area
- Special Purpose (e.g., DSP) / General Purpose
- Scientific (FP intensive) / Commercial
- Level of Software Compatibility
- Object Code/Binary Compatible (cost HW vs. SW)
- Assembly Language (dream to be different from
binary) - Programming Language Why not?
60Context for Designing New Architectures
- Operating System Requirements for General Purpose
Applications - Size of Address Space
- Memory Management/Protection
- Context Switch
- Interrupts and Traps
- Standards Innovation vs. Competition
- IEEE 754 Floating Point
- I/O Bus
- Networks
- Operating Systems / Programming Languages
61Recent Trends in Computer Design
- The cost/performance ratio of computing systems
have seen a steady decline due to advances in - Integrated circuit technology decreasing
feature size, ? - Clock rate improves roughly proportional to
improvement in ? - Number of transistors improves proportional to
????(or faster). - Architectural improvements in CPU design.
- Microprocessor systems directly reflect IC
improvement in terms of a yearly 35 to 55
improvement in performance. - Assembly language has been mostly eliminated and
replaced by other alternatives such as C or C - Standard operating Systems (UNIX, NT) lowered
the cost of introducing new architectures. - Emergence of RISC architectures and RISC-core
(x86) architectures. - Adoption of quantitative approaches to computer
design based on empirical performance
observations.