Title: CS, CoE, EE 362 Digital Computers II: Architecture
1CS, CoE, EE 362Digital Computers II Architecture
- Prof. Mark Franklin jbf_at_cse.wustl.edu
- Course Assistants
- Drew Frank ajf1_at_cec.wustl.edu
- Required Book Heuring Jordan 2nd Edition
- Optional Book Intro. VHDL Yalamanchili
- Read Academic Integrity Statement.
- Course Web Site http//www.cse.wustl.edu/jbf/cse
362.d/cse362.html
2Four Key Questions
- What components must every computer have ?
- How can computers be described, specified and
evaluated ? - What constitutes computer architecture (hardware,
software, firmware, algorithms, etc.) ? - How does technology effect computer architecture
(chip size, feature size, power, pin density,
etc) ?
3Essential Computer Components
- Processor interpret/execute instructions.
- Memory store instructions data.
- Communication Device(s) communicate with outside
world, I/O.
Classic Computer Architecture (SISD Single
Instruction Stream-Single Data Stream)
Processor
Control Unit
Input/ Output
Memory
ALU
4Architecture Components
- INSTRUCTION SET DESIGN Programmer visible
instruction set Algorithm, compiler, OS
design, algorithmic complexity - HIGH LEVEL COMPONENT ORGANIZATION Memory
system, bus structure, processor design, branch
handling, pipelining, execution
algorithms, instructions/second,
clocks/instruction. - HARDWARE Detailed logic design, packaging VLSI
Logic design CAD algorithms speed, area,
power,
5Program Control Unit
ALU
ALU
ALU
ALU
Program Memory
Interconnection Network
Data Memory Unit
Input / Output
(SIMD) Single Instruction Stream Multiple Data
Stream Architecture
6Performance Expression Amdahls Law
7Amdahls Law
It does no good to have many processors if there
is not enough parallelism. What portion of a
computation can be sequential if we want the
processors to be used at 50 percent efficiency ?
( S p/2 )
8Generalize Amdahls Law
Example Suppose a program runs in 100 seconds
on a machine. Multiply operations are responsible
for 80 seconds of this time. How much do we have
to improve the speed of multiplication if we want
the program to run 4 times faster? What about 5
times faster? PRINCIPAL Make the common case
fast!
9Computer Market Partitioning(costs are for
processor, not system)
- Desktop Computing (100 - 1,000)
- Price-performance
- Servers (200 - 2,000)
- Availability (reliability effectiveness)
- Scalability
- Throughput
- Embedded Computers (0.20 - 1,000)
- Real-time performance
- Power and memory minimization
- Cost minimization
- Interface with special purpose logic use of
processor cores
10HLL (e.g., C, C, Perl) vs Machine/Assembly
Language (AL)
- HLL Pros
- Easier to express algorithms due to higher level
constructs (e.g., For, Case, Arithmetic
expressions, objects, etc.) - Type checking (Hardware for type checking ?).
- Some memory allocation checking.
- Assembly Language Pros
- More control over ISA ? more speed, less memory
- More control over I/O
- Combination is often best for embedded systems
HLL calling AL .
11Example HLL ? AL Mapping
HLL
AL
- LOAD R1, d
- LOAD R2, e
- LOAD R3, c
- MPY R4, R2, R1
- ADD R5, R4, R3
- STORE R5, b
12Buses I
- A set of path(s) (wires) connecting on-chip or
off-chip modules. - Serial bus transmit one bit at a time
- Parallel bus transmits many bits simultaneously
- Generally time-shared.
- Generally has separate data control paths.
- Typically has a separate bus controller or
arbiter that decides which modules can use the
bus at any given time.
13Buses II
- Some common buses
- On-chip AMBA, Wishbone, (generally not standard)
- Off-chip PCI Bus Family),
- ---------------- 32bit transfer 64bit transfer
- 33-MHz PCI 133 MB/sec 266 MB/sec
- 66-MHz PCI 266 MB/sec 532 MB/sec
- 100-MHz PCI-X ------------ 800 MB/sec
- 133-MHz PCI-X ------------ 1 GB/sec
- PCI-e(xpress) serial, 1 lane 500 MB/sec
- PCI-e(xpress) serial, 4 lanes 2 GB/sec
- Off-chip Other buses - SCSI, IDE, Infiniband
- Common issues Arbitration, congestion.
- Logical equivalence between buses, multiplexers
and switches.
14Bandwidth Requirements
15Bandwidth Trend
16Simple Queuing Theory View of Buses
- Bus is a shared resource and can be viewed as a
server in a queuing system. - Modules attached to the bus present inputs
(i.e., requests) to the server (or Bus) and are
queued up if the server is busy.
Memory
Server
CPU
Queue
BUS
I/O
17Basic Queueing Theory
- Utilization time a server is busy
- Average Queue Length Avg of jobs in queue.
- Average System Delay (latency) Avg time from job
entry into, to job departure from system. - Arrival Time Distribution Poisson Distribution
of arrival times (exponential interarrival
times). - Service Time Distribution Exponentially
distributed service times. - Queue Charactericstics Infinite length FIFO
service discipline.
18Basic Queueing Results
19Basic Queueing Results
Waiting Time
M/M/1
Queue Length
M/M/1
20Computer Generations
- 1 1950 - 1959 Vacuum Tubes
- 2 1960 - 1968 Transistors
- 3 1969 - 1977 Integrated Circuit
- 4 1978 - 2005 LSI-Large Scale
Integration VLSI-Very LSI - 5 2005 - 20?? ULSI-Ultra LSI parallel
processing
21Technology How we make a chip (roughly)
22Integrated Circuit Cost
-
Cost.per.wafer - Cost.per.die -----------------------------------
- (Dies.per.wafer) x
(Yield) - Wafer.area
- Dies.per.wafer -------------------
(approximate) - Die.area
- 1
- Yield ------------------------------------------
---- (empirical observation) - (1 (Defects.per.area)x(die.area/2))2
- Typical Die area 1.5 cm x 1.5 cm Wafer
Diameter 10 inches - Defects.per.cm2 1.7 Yield 50
-
23TECHNOLOGY TRENDS
- Semiconductors
- Transistor Density 50/year, quadruple in 4
years. - Die Size 10 - 25/year
- IC Logic Technology
- Transistors per Chip 50 - 60/year
- Device Speed 30/year
- Wire/Communications Speed constant (Cu vs Al)
- Magnetic Disk Technology
- Density 25 - 60 / year
- Access Time 35 / 10 years (8 ms).
24Feature and Die Size
25Wafer Size
12-inch wafer
26SILICON MAGNETIC DENSITIES
27Processor Performance Gains
Performance (x VAX-10/780)
28Processor Cost Trends with Time
29SILICON MAGNETIC DENSITIES