The Elements of Computers

About This Presentation

Title:

The Elements of Computers

Description:

A processor able to interpret and execute programs; A memory for storing the programs and the data they ... Babbage's Difference Engine. The Analytical Engine. ... – PowerPoint PPT presentation

Number of Views:456

Avg rating:3.0/5.0

Slides: 186

Provided by: cwin

Category:

more less

Transcript and Presenter's Notes

Title: The Elements of Computers

1
The Elements of Computers

A processor able to interpret and execute
programs
A memory for storing the programs and the data
they process
Input-output equipment for transferring
information between the computer and the outside
word.

2
The brain versus the Computer

Brain central processing unit (CPU)
program control unit control instructions
arithmetic-logic unit (ALU) execution data
Similarities and differences
digital or discrete information abacus
analog or continuous information slide-rule

3
The Turing Machine
Processor P
Read-write head
Memory tape M
4
An Abstract Computer

Turing machine was introduced by the English
mathematician Alan M. Turing in 1930.
The tape M as a memory
unbounded length
blank or one of a small set of symbols
The processor P
a small number of internal states
linked to M via the read-write head

5
Instruction Format

Sh Ti Oj Sk
the current state of processor is Sh
the symbol it expects to read on the square of M
under the read-write head is Ti
perform the action Oj
write a new symbol
move the tape to left or right
change the state of P to Sk

6
Add Two Unary Numbers via Turing Machine

Instructions Comment
S0 b R S1 move read-write head one square to
right S1 1 R S1 move read-write head rightward
across n1 S1 b 1 S2 replace blank between n1
and n2 by 1S2 1 L S2 move read-write head
leftward across n1 S2 b R S3 blank square
reached move one square to rightS3 1 b
S3 replace left-most 1 by blankS3 b H S3 halt
the result n1n2 is now on the tape

7
A Little Flavor of RISC

A universal TM can by itself perform every
reasonable computation.
t different tape symbols
s different processor states
ts lt 30 implies that it can have a very small
instruction set

8
Limitations of Computers

Unsolvable problems
no Turing machine and no practical computer can
solve
Goldbachs conjecture
Undecidable problems
TM halting problem
finite-state machines

9
Limitations of Computers

Intractable problems
no computer can solve a given problem in a
reasonable amount of time
finding an Euler circuit in a graph
traveling salesman problem
scheduling of airline flights
routing of wires in an electronic circuit
sequencing of steps in a factory assembly line
Brute force

10
Speed Limitations

The time complexity of an algorithm
order f(n) denoted O(f(n))
computing time is growing with the problem size n
O(n) n1 n1 100 n1
O(n2) n2 n2 10 n2
O(n100) n3 n3 1.047 n3
O(2n) n4 n4 6.644 n4
(Effect of computer speedup by 100 on four
algorithms)

11
The Mechanical Era

Babbages Difference Engine.
The Analytical Engine.
1896 Hollerith formed a company and renamed to
IBM in 1924.

12
Electronic Computers

The first generation
stored program concept
mathematician John von Neumann (1903-1957)
vacuum-tube computers (1940-1950)
ferrite-core memory until the 1970
machine language
assembly language
IAS computer

13
The IAS Computer(I)

12-bit address
212 4K 40-bit words
a pair of 20-bit instructions
fixed-point number system
one-address instruction format

14
The IAS Computer(II)

Program control unit (PCU)
AR memory address register
IR instruction opcode register
IBR next-instruction buffer register
PC program counter
Data processing unit (DPU)
AC accumulator register
DR general-purpose data register
MQ multiplier-quotient register

15
The IAS Computer(III)

Hardware description language (HDL) or
register-transfer language (RTL)
ACM(100)ACACM(100)M(102)AC

16
The IAS Computer(IV)

instruction type
data transfer
ACMQ
ACM(X)
data processing
ACAC M(X)
ACAC ? 2
program control
go to M(X, 019)
if AC ? 0 then go to M(X, 019)

17
The Shortcomings of IAS Computer

Self-modification process is difficult to
debugging...
The small amount of storage...
No procedure call or return instructions
Lack of text processing (biased toward numerical
computation)...
I/O instructions are not mentioned...

18
The Contribution of the First Generation
Computers

To use of a CPU with a small set of registers
A separate main memory for instruction and data
storage
An instruction set with a limited range of
operations and addressing capabilities.
The term von Neumann computer has become
synonymous with a computer of conventional design.

19
The Second Generation Computer(I)

The transistor, a high-speed electronic switch,
versus the vacuum tube
Ferrite cores for the main memories
Magnetic disks for the second memories

20
The Second Generation Computer(II)

More registers index registers
index instructions
array
More program control instructions
call
return
More scientific Floating-point instructions
M ? B-E

21
The Second Generation Computer(III)

Input/Output operations
trivial data-transfer task
at very low speeds compared to CPU
Input-output processors
channels
make CPU execution and IO data transfer
independently

22
The Second Generation Computer(IV)

Programming languages
high-level languages
Scientific languages
1954, FORmula TRANslation (FORTRAN) Business
language
1959, Common Business Oriented Language (COBOL)

23
The Second Generation Computer(V)

System management
batch processing
a rudimentary version of operating system
multiprogramming
time-sharing system
keep CPU and IOPs busy by overlapping CPU and IO
operations.

24
A Nonstandard ArchitectureSTACK Computers

Top of the stack (TOS)
push operation
pop operation
stack pointer (SP)
generally slower than von Neumann machine
Pocket calculator
CALL sub and RETURN

25
A Nonstandard ArchitectureSTACK Computers(II)

Z W 3 ? (X - Y)
Polish notation
Z W 3 X Y - ?

PUSH WPUSH 3PUSH XPUSH YSUBTRACTMULTIPLYADD
POP Z
26
The Third Generation Computer(I)

1961, Integrated circuits (IC)
a large number of transistors to be combined on a
tiny piece of semiconductor material, usually
silicon.
Standardize computer
Software compatible
1964 IBM/360
1970 IBM/370
1979 IBM/4300
1990 IBM/390
about 200 distinct instructions

27
The Third Generation Computer(II)

Two major control states of CPU
a supervisor state
a user state
Architecture
microprogramming
placed in a special control memory in the PCU
a CPU can execute floating-point instruction
without floating-point arithmetic circuits

28
The Third Generation Computer(III)

Supercomputer
CDC Cyber series
pipelining
involves overlapping the execution of
instructions
multiprocessor
to be executed simultaneously
Minicomputers
DEC, Digit Equipment Corp. (1965)
Programmed Data Processor (PDP)
low cost

29
The VLSI Era

SSI
MSI
LSI
VLSI very large-scale integration
ULSI or MCM

30
CMOSA Zero-Detection Circuit(I)

z x0x1x2x3
x0x1x2x3 0000 makes z 1
x0x1x2x3 0001 makes z 0

31
CMOSA Zero-Detection Circuit(II)

Transistor
Gate
Inverter (NOT)

32
CMOSA Zero-Detection Circuit(III)

Gate NAND

33
CMOSA Zero-Detection Circuit(IV)

Gate NOR

34
Introduction Circuits

In 1959, Texas Instruments and Fairchild Corps.
Chip dimensions 10x10mm to 30x30x4mm with 300 or
more pins
IC density SSI, MSI, LSI, VLSI, ULSI...

35
Introduction Circuits
109 106 103 1
1G-bit DRAM
1M-bit DRAM
64-bit microprocessor
16-bit microprocessor
32-bit microprocessor
1K-bit DRAM
8-bit microprocessor
4-bit microprocessor
MSI
SSI
1960 1970 1980 1990 2000 2010
36
Introduction to CMOS
Complementary Metal-Oxide-Semiconductor

Why CMOS?
Basic Concepts
CMOS Technology

37
Why CMOS?

Low power dissipation
at stable logic 0 or logic 1
Pstatic leakage current
Pdynamic switching current charging/dischargin
g of Cload
Pdynamic ? frequence
A distinct advantage it leads to reduce heating

38
Why CMOS?

High Logic Integration Density
freedom to adjust the size by demand
linewidth as small as 0.1?m resolution is
possible by using optical lithography
logic density is increased as size decreased
achieve greater integration densities than that
of a bipolar technology

39
Why CMOS?

Logic Swings
rail-to-rail output logic voltage swings
better noise immunity
more reliable logic circuits
a bipolar TTL gate output range 0.3, 3.6
a CMOS gate output range 0, 5 volts

40
Why CMOS?

Symmetrical Transient Response
to switch from a logic 0 to a logic 1 can be made
equal to the time needed to switch from a logic 1
to a logic 0
Simply timing in a large system design

41
Why CMOS?

Bipolar Integrated Circuits
bipolar emitter-coupled logic (ECL) is the
fastest silicon logic available
due to much higher power dissipation levels and
subsequent heating, it has not taken over the
microprocessor market
BiCMOS tends to provide the best aspects of both
worlds

42
Why CMOS?

Gallium Arsenide?
The electron mobility is much larger in GaAs
can react to higher frequencies
Materials Costs
Technology Know-How
Applications

43
Chapter 2Design Methodology

design process
gate level
register level
processor level
computer-aided design
analysis methods

44
System Design

a large and complex system, such as a computer
a collection of connected components

45
System Representation (I)

A system modeling by a directed graph
a set of nodes V v1, v2, v3, ,vn
a set of edges E (v1, v2), (v1, v3), ,
(vn-1, vn)
edge e (vi, vj) connects node vi to node vj

46
System Representation (II)

a set of information processing components C
a set of lines S that carry information signals
between components
the system G is associating C with S

47
Structure versus Behavior (I)

structure
a graph
the abstract graph consisting of block diagram
with no function information
behavior
a truth table or a mathematical equation
to determine for any given input signal to system
and its corresponding output

48
Structure versus Behavior (II)

neither can be derived from the other
a schematic diagram, block diagram
conveys structure rather than behavior
needs more formal descriptions, a text, truth
table, a list of equations

49
HDL Hardware Description Language (I)

HDL- Hardware Description Language Babbages
notations
VHDL - based on the programming language Ada
Verilog - based on the programming language C
Both are embodied in formal standards sponsored
by IEEE (the Institute of Electrical and
Electronics Engineers)

50
HDL Hardware Description Language (II)

more precise
technology independent
descriptions of gate and register levels
documentation
suitable for CAD programs
long and verbose

51
Half Adder - Block Symbol
Half_adder
x
sum
y
carry
52
Half Adder - Truth Table
53
Half Adder - Behavior

entity half_adder is port (x, y in bit sum,
carry out bit)end half_adderarchitecture
behavior of half_adder isbegin sum lt x xor y
carry lt x and yend behavior

54
Half Adder - Structure

architecture structure of half_adder is
component xor_circuit port (a, b in bit c out
bit) end component component nand_circuit
port (d, e in bit f out bit) end
component signal alpha bitbegin XOR
xor_circuit port map (agtx, bgty, cgtsum)
NAND1 nand_gate port map (dgtx, egty, fgtalpha)
NAND2 nand_gate port map (dgtalpha,
egtalpha,
fgtcarry) end structure

55
Half Adder - Block Diagram
xor_circuit
XOR
a
x
sum
c
b
alpha
nand_gate
nand_gate
NAND1
NAND2
d
d
f
f
carry
e
e
y
56
Exclusive-OR - Block Diagram
AND
x1
NOT
OR
x1 ? x2
NOT
AND
x2
57
Exclusive-OR - Truth Table
58
Gate Level

combinational logic
z(x1, x2, , xn)
truth table
logic circuits
standard gates
functional complete gate types
AND, OR, NOT
AND, NOT
NAND
NOR

59
Full Adder - Truth Table
60
Gate Level (Logic Level)

processing with binary digits (bits)
0 and 1
design components
simple and memoryless logic gates
flip-flops of bit-storage devices
combinational logic
flip-flops
sequential circuits

61
Combinational Logic(I)

A combinational function is a logic or boolean
function
mapping a set of 2n input combinations of n
binary variables onto the output values 0 and 1
z(x1, x2, , xn)
function z can be defined as a truth table

62
Combinational Logic(II)

The truth table of full-adder as shown in Figure
2.9(a) in page 74
a pair of three-binary-variable functions
the sum output s0(x0, y0, c-1)
the carry output c0(x0, y0, c-1)
realization using half adders
realization using AND and OR gates
realization using NAND, NOR, and NOT gates

63
Standard Gates

AND
x1x2 1 if and only if x1 and x2 are both 1
OR
x1x2 1 if and only if x1 or x2 or both are 1
EXCLUSIVE-OR
x1? x2 1 if and only if x1 or x2 but not both
are 1
NOT (inverter)
x1 1 if and only if x1 0

64
Functional Complete (I)

AND, OR, NOT
AND, NOT
a b a b a ? b
NAND
a a ? a
a ? b a ? b a ? b ? a ? b
a b a b a ? b a ? a ? b ? b

65
Functional Complete (II)

NOR
a a a
a ? b a ? b a b a a b b
a b a b a b a b

66
Boolean Algebra

George Boole (1815-1864)
Boolean equation
s0 x0 y0 c-1 x0 y0 c-1 x0 y0 c-1 x0 y0 c-1
c0 (x0 c-1)( x0 y0)( y0 c-1)
SOP (sum-of-products)
POS (product-of-sums)
two-level logic circuit the longest IO path -
propagation delay

67
Balance Logic Design

To balance between hardware cost and operating
speed is depending on IC technology
considerations
Two-level adder has the shortest propagation
delay
Two-level adder has more gates and has a higher
hardware cost

68
Logic Synthesizer

To design circuits automatically via
computer-aided synthesis tools.
Restrictions of synthersizer
fan-in of a gate
fan-out of a gate
gate minimization
an intractable problem
only practical for small circuits

69
Flip-Flops (I)

A flip-flop is an 1-bit storage element
a sequence logic circuit a combinational
circuit memory
synchronization
external clock signal CK of a flip-flop
Four-bit ripple-carry D-flip-flop a serial
adder

70
Flip-Flops (II)

Edge triggering state changes around one edge of
CK (clock signal)
0-to-1
1-to-0
an edge-triggered D (delay) flip-flop
0-to-1 triggering edge of clock signal of CK
others well-known flip-flop
JK flip-flop
SR flip-flop
T flip-flop

71
Flip-Flops (II)

Edge triggering
a sequence of discrete state values y(i)
one for every clock cycle i
Timing diagram - Figure 2.11
Characteristic equation of D flip-flop
y(i1) D(i)

72
Sequential circuits

A combinational circuit a set of flip-flops
A serial adder Figure 2.12
A four-bit-stream serial adder Figure 2.13

73
Register Level

Register-transfer level
a grouped, ordered sets of small combinational or
sequential circuits
process or store words or vectors
combinational
Multiplexers
Decoders and encoders
sequential
Shift registers
Counters

74
Component Types

MSI parts in IC series
Standard cells in VLSI
with or without the functional completeness
property
no universal graphic symbols
usually by an abbreviated description of their
behavior

75
Generic Block Representation of a Register-Level
Component

Data input lines
Data output lines
Control input lines
select lines
enable lines
clock lines
etc.
Control output lines

m
k
Multifunctionunit
76
Generic Block Representation

select lines one of several possible operations
that the unit is to perform
enable lines time or condition for a selected
operation to be performed
active, enable, or asserted state
an overbar low enable or active value is 0

77
Operations

Gate-level B0, 1
Register level Bm set of 2m m-bit words

78
Multiplexers (MUX)

a device intended to route data from one of
several sources to a common destination
k-input, m-bit MUXk2p

X0
X1
X2p-1
Data in
m
...
p
Multiplexer (MUX)
Select S
Enable e
m
Data out Z
79
Multiplexersas Function Generators

A 2n-input, 1-bit multiplexer MUX can generate
any n-variable function
z(v1, v2, , vn-1)
A 2-input, 4-bit multiplexer Figure 2.20
An 8-input multiplexer Figure 2.21
Multiplexer-based full adder Figure 2.22

80
Decoders

1-out-of-2n or 1/2n decoder
a 1/4 decoder Figure 2.23
used in RAMs to select storage cells to be read
from or written into.

81
Encoders

To generate the address or index of an active
input line
2n-to-n encoder
x0x1x2x3x4x5x6x7 00000010z0z1z2 110

82
Decoders

1-out-of-2n or 1/2n decoder
a 1/4 decoder Figure 2.23
used in RAMs to select storage cells to be read
from or written into.

83
Processor (System) Level

The highest in the computer design hierarchy
concerned with the storage and processing of
blocks of information
more complex and based on VLSI technology
very much a heuristic process

84
Processor-Level Components

Four main groups
processors
memories
IO devices
interconnection networks

85
Central Processing Unit

A general-purpose, instruction-set processor
specialized processors such as IOPs
operates on word-organized instructions and data

86
Chapter ThreeProcessor Basics

the overall design of instruction-set processors
CPU of a computer
microprocessors RISC and CISC types

87
CPU Organization

Fundamentals
External communication
User and supervisor modes
CPU operation
Accumulator-based CPU
Programming consideration
Instruction set
Program execution

88
Fundamentals

To execute sequences of instructions (programs),
which are stored in an external main memory.
Program execution steps
CPU transfers instructions with operands from
main memory to registers in CPU.
CPU executes the instructions sequentially except
when execution sequence is altered by a branch
instruction.
when necessary, CPU transfers results from CPU
registers to main memory

89
External Communication(I)

without a cache
CPU communicates directly with the main memory
a high-capacity multi-chip RAM (random-access
memory)
disadvantage speed disparity
CPU is significantly faster than memory (5 to 10
times)

90
External Communication(II)

with a cache
CM positioned between CPU and MM
CM is faster and smaller than MM
CM can reside wholly or in part in CPU
typically permits CPU to load or store in a
single clock cycle
advantage CM is transparent to CPU's instruction
CM as forming a single, seamless memory space of
2m addressable storage
further discussion in chapter 6

91
External Communication(III)

with IO devices
IO ports IO devices are associated with
addressable registers
CPU can load/store a word from/to IO ports
IO-mapped versus memory-mapped IO
IO ports share the same set of memory addresses
IO instructions produce IO control signals but
not memory-referencing signals

92
User and Supervisor Modes

user programs and supervisor programs
a user or application program handles a specific
application
a supervisor program manages various routine
aspects of the computer system
normally, CPU switches back and forth between
user and supervisor programs
interrupt is a way of requesting and switching to
supervisor mode

93
CPU Operation(I)

Overview of a CPU behavior (Figure 3.2, pp. 140)
instruction cycle a fetch step and a execution
step
micro-operations (register-transfer operations)
within an instruction cycle

94
CPU Operation(II)

the shortest well-defined CPU micro-operation is
the CPU cycle time or clock period, Tclock
Tclock the CPU cycle time
f CPU's clock frequency in MHz
Tclock 1/f
each instruction is fetch from M in on CPU cycle
when M is CM
execution step is run in another CPU cycle

95
Accumulator-Based CPU(I)

to keep CPU relatively small
a small set of registers and circuits to
implement a functionally complete set of
instructions
the central role of registers --- the accumulator
register

96
Accumulator-Based CPU(I)

a small accumulator-based CPU (Figure 3.3, pp.
141)
PCU and DPU
fetch step
IR.AR M(PC)
IRop, ARadr
load/store
ACM(adr)
M(adr)AC

97
Programming Consideration(I)

data processing --- thress operands operations
--- ZXY
single address instructions (pp. 142)
HDL format ACM(X) DRAC ACM(Y)
ACACDR M(Z)AC
ASM format LD X MOV DR,AC LD Y ADD ST Z
implicit operand AC and DR
load/store architecture --- only uses the load
and store instructions to access memory
memory-referencing instrution form
ACf(AC,M(adr))
HDL format ACM(X) ACACM(Y) M(Z)AC
ASM format LD X ADD Y ST Z
more complicated instruction-decoding logic in
PCU and more execution time in ADD
less instructions --- reduced overall execution
time?
the cost performance debate of RISC-CISC

98
Programming Consideration(I)

data processing
three operands operations
Z X Y

99
Programming Consideration(II)

single address instructions (pp. 142)
HDL format ASM format
ACM(X) LD X
DRAC MOV DR,AC
ACM(Y) LD Y
ACACDR ADD
M(Z)AC ST Z
implicit operand AC and DR
load/store architecture
only uses the load and store instructions to
access memory

100
Programming Consideration(III)

memory-referencing instruction form
ACfi(AC,M(adr))
HDL format ASM format
ACM(X) LD X
ACACM(Y) ADD Y
M(Z)AC ST Z
more complicated instruction-decoding logic in
PCU and more execution time in ADD
less instructions --- reduced overall execution
time?

101
Programming Consideration(IV)

the cost performance debate of RISC-CISC Central
Processing Unit

102
Instruction Set

the flavor instruction set of RISC (a load/store
architecture)
data transfer load, store, move register
data processing add, subtract, and, not
program control branch, branch zero
(Figure 3.4, pp. 143)
Example 3.1 a multiplication program (pp.144)

103
A Multiplication Program(I)

Line Location Instruction or data
0 one 00001
1 mult N
2 ac 00000
3 prod 00000
4 ST ac
5 loop LD mult
6 BZ exit
.
17 BRA loop
18 exit ...

104
A Multiplication Program(II)

Line Location Instruction or data
7 LD one
8 MOV DR, AC
9 LD mult
10 SUB
11 ST mult
12 LD acc
13 MOV DR,AC
14 LD prod
15 ADD
16 ST prod

105
Program Execution

cycle by cycle execution (pp. 146)
PCU actions
fetch cycle includes the pair of
register-transfer operations
IR.ARM(PC)
PCPC1

106
Architecture Extensions(I)

multipurpose register set for storing data and
address
register file
additional data, instruction, and address types
fixed-point multiply and divide instructions
call and return instructions

107
Architecture Extensions(II)

register to indicate computation status
condition code or flag register
zero result or divide by zero, ...
program control stack
procedure calling
extern interrupts
push-down stack - stack pointer
Figure 3.7, pp. 148.

108
Pipelining(I)

CPU speedup techniques
cache memories
instruction-level parallelism
in DPU
in PCU
Overlapping instructions in a two-stage
instruction pipeline
Figure 3.8, pp. 150.

109
Pipelining(II)

Branch instruction
reduce the efficiency of instruction pipelining
More than two stages
to increase the level of parallelism attainable

110
ARM6 Microprocessor

Organization of the ARM6
SR, PC, WDR, RDR, AR, IR, ALU, Shifter, Buses
(Figure 3.9, pp. 152)
Core instruction set of the ARM6
Data transfer, Data processing, Program control
(Figure 3.10, pp. 153 )
Shift or rotation operation
LSL logically left shift
MOV R0, R1, LSL 2 R0 R1 ? 4

111
Motorola 680X0 family

Organization of the 68020
D0-D7, A0-A7, PC, CC (Figure 3.11, pp. 155)
Instruction set of the 68020
Data transfer, Data processing, Program control,
External synchronization (Figure 3.12, pp.
156-157 )

112
680X0 ASM for Vector Addition

Vector addition (Figure 3.13, pp. 158)
MOVE.L 2001, A0 MOVE.L 3001,
A1 MOVE.L 4001, A2START ABCD -(A0),
-(A1) MOVE.B (A1), -(A2) CMPA 1001,
A0 BNE START

113
Chapter SixMemory Organization

impact on performance
survey storage-device technologies
multilevel hierarchical memory systems
cache memories

114
Memory Types

CPU registers
working memory for temporary storage of
instructions and data
Main memory (primary memory)
five or more clock cycles are usual
Secondary memory
in milliseconds
Cache
one to three clock cycles

115
Performance and Cost

Cost/performance trade-off
cost of memory cC/S dollars/bit
access time tA 10y

116
Pipelining(II)

Branch instruction
reduce the efficiency of instruction pipelining
More than two stages
to increase the level of parallelism attainable
ARM6 Microprocessor
Figure 3.9
Figure 3.10

117
The Von Neumann Bottleneck

The speed mismatch between the CPU and main
memory.
Storage density has grown rapidly
Access time have decreased at a much slower rate
capacity of single chip RAM
19754Kb (Kilobit),
1985256Kb,
199516Mb(Megabit)

118
Access Modes(I)

Random-access memory
storage can be accessed in any order
access time is independent of location
Serial-access memory
storage can be accessed only in a certain
predetermined sequence
magnetic disks, magnetic tapes, and optical disks
(CD-ROM)
access time depends on its position relative to
the read-write head

119
Access Modes(II)

Serial access tends to be slower than random
access.
Semirandom-access mode
magnetic disks and CD-ROM
if each track has its own read-write head, tracks
can be accessed randomly
access within track is serial

120
Memory Retention(I)

Read-only memory
memories cannot be altered on-line
a nonerasable storage device
compact disk ROM
Programmable read-only memory
memories can be changed off-line
CD-recordable disk (CD-R) as a programmable CD

121
Memory Retention(II)

DRO destructive readout
reading will destroy the stored data
restoration
a write operation followed by a read
NDRO nondestructive readout
reading does not effect the stored data

122
Memory Retention(III)

Dynamic memory
Refreshing periodically
a stored 1 tends to 0 or vice versa due to some
physical decay process
a capacitor represents a stored 1 tends to 0 due
to leaking away
Static memory
require no refreshing
lower access time, faster, than DRAM

123
Memory Retention(IV)

Volatile
destroy the storage data if power is lost
most IC memories are volatile
Nonvolatile
most magnetic and optical memories

124
Memory Retention(V)

Cycle time - the elapsed time tM
the minimum time that must elapse between the
start of two consecutive access operations
tM ? tA
Data-transfer rate or bandwidth bM
bM w/tM
w is the number of bits can be transferred
simultaneously to or from the memory

125
Memory Retention(VI)

Reliability - MTBF
the mean time before failure
no moving parts (mechanical motion) has much
higher reliability
very high density or data-transfer rate has the
reliability problem
error-detecting and error-correcting codes can
increase the reliability of any memory
Performance parameters tA tM bM

126
Memory Retention(VII)
Primary Access Alter- Perfor- Access
Technology medium mode ability mance timeBipolar
NDRO,semiconductor Electronic Random R/W
volatile 10nsMetal oxide (MOS) DRO,
NDRO,semiconductor Electronic Random R/W volatile
50nsMagnetic (hard) NDRO,semiconductor Magne
tic Semirandom R/W volatile 10msMagnetic- NDRO
,optical disk Optical Semirandom
R/W nonvolatile 50msCompact disk NDRO,ROM Op
tical Semirandom R nonvolatile 100nsMagnetic
tape NDRO,cartridge Magnetic
Serial R/W nonvolatile 1s

Figure 6.6, pp. 407

127
Random-Access Memory(II)

Semiconductor RAMs DRAM and SRAM
a capacitor with one transistor versus six
transistors (Figure 6.9, pp. 410)
destructive readout and subsequently written back
to the cell is required for DRAM

128
MT4LC8M8E1(Micron Technology 1997)

64Mb (226) ? 223 8-bit bytes ? 8M?8bit
memory address size m23
data word size w8
13-bit row address (external address lines)
10-bit column address
control lines RAS, CAS, WE, OE,
address bus A0A12
data bus DQ1DQ8 (32 pins)

129
MT4LC8M8E1(Micron Technology 1997)

64Mb (226) ? 223 8-bit bytes ? 8M?8bit
tA50ns, tM90ns, page mode
tREF refresh at least once every 64ms
refresh an entire row of storage locations in a
single read cycle
if one-row read operation takes 90ns, total
refreshing time 90ns ? 8192 0.737ms,the
fraction of time to refresh 0.737 / 64 1.15
(a negligible amount)
Figure 6.13

130
Other Semiconductor Memories(I)

Read-only memory (ROM)
without writing ability and nonvolatile
store permanent code at the instruction and
microinstruction levels
Programmable ROM
can be programmed only once
can be programmed repeatedly (FPGA)
erased in bulk off-line

131
Other Semiconductor Memories(II)

Flash memory - like as a PROM
nonvolatile storage and can be programmed and
erased on-line
can be programmed a bit at a time
can be erased in a large blocks, that is,a
flash erase process
can randomly read a bit and write a block
storage density and access time are comparable to
those of DRAM

132
Fast RAM interfaces(I)

The gap between microprocessor and those of cheap
but slow DRAMs
Use a bigger memory word
Access more than one word at a time
interleaving rule
interference or contention occurs if two or more
addresses require simultaneous access to the same
module

133
Fast RAM interfaces(II)

Synchronous DRAM (SDRAM)
achieves a speed doubling by pipelining its
internal operations and by implementing two-way
address interleaving.
Cached DRAM (CDRAM)
an on-chip cache realized by a small, fast SRAM
that acts as a high-speed buffer or front-end
memory for the main DRAM
have a fast burst mode of operation

134
Fast RAM interfaces(III)

Rambus DRAM (1992)
the master transmits an initial packet on
Rambus channel
each Rambus DRAM chip examines it
DRAM unit Ri containing the address returns
ready or busy to the master
if Ri ready, the master proceeds to transfer to
or from Ri a data packet of up to 256 bytes in
burst mode at speed up to 500 MB/s, that is 1
byte every 2 ns.
if Ri busy, the master must try again later...

135
Serial-Access Memories

Tracks
data transfer to or from a track serially
low cost per bit
long access time
read-write head positioning time
slow speed of tracks moving
serial data transfer

136
Access Methods(I)

Seek time tS
the average time to move a head from one track to
another
Rotational latency time tL
the average time to rotate the information cell
closed to head
Block
all words in a block are stored in consecutive
locations such that an entire block takes one
seek and one latency time

137
Access Methods(II)

Data-transfer rate
V cm/s the speed of the stored information
relative to the read-write head
T bits/cm the storage density along the track
TV bits/s

138
Access Methods(III)

Time to access a block in a serial-access memory
tB tS 1/2r n/rN
tS the average seek time
r the revolutions per second
1/2r the average latency of a track
n the number of words per block
N the number of words per track
n/rN the data transfer time

139
Magnetic Hard-Disk

9.3GB Quantum XP39100, 1996 pp. 422
tS 7.9ms
r 0.12 revs/ms
n 8
N 144X512 73,728 bytes/track

140
Magnetic Tape

Cartridge or Cassette
Data stored in parallel
80-track tape with
density 110 Kb/in
tape speed 50 in/s
max. data-transfer rate 110K x 80 / 8 x 50 55
MB/s
200m tape 55/50 x 200/0.0254 8.661 GB

141
Optical Memory

CD-ROM (compact disk)
CD-R (recordable)
CD-RW (rewritable)
DVD (digital video disk)

142
Memory Systems

General characteristics
Multilevel memories
Hierarchical organization
Two key design issues
automatic translation of addresses
dynamic relocation of data

143
Multilevel Memories

A n-level system (M1, M2, , Mn)
two level
main memory (semiconductor DRAMs)
secondary memory (magnetic-disk units)
three level
cache memory (semiconductor SRAMs)
split cache (Instruction I-cache, Data D-cache)
four level
level 1 cache
level 2 cache
both are the nonsplit or unified caches

144
General Characterics(I)

Two adjacent memory levels Mi and Mi1
cost per bit Ci gt Ci1
access time tAi lt tAi1
storage capacity Si lt Si1
Communication between levels
CPU can communicate directly with M1
M1 can communicate directly with M2, and so on
except that CPU can bypass cache and go to main
memory

145
General Characterics(II)

Relocation of addresses and transferring data
between two adjacent levels is a relatively slow
process
requires some extent predictable approach to
guess the future addresses generated by the CPU

146
Cache and Virtual Memory

The cache and main memory act as a single memory
to the software
The main and secondary memories are NOT
transparent to system software
The main and secondary memories are transparent
to user code --- virtual memory --- like a
single, larger, and directly addressable memory

147
Reasons of Virtual Memory

To free user programs from the need of storage
allocation
To permit sharing of memory space among users
To make programs independent of physical
configuration and capacity of memories
To achieve the very low access time and cost per
bit

148
Locality of Reference(I)

The characteristic of computer programs
the predictability of memory addresses
the locality of reference

149
Locality of Reference(II)

Spatial locality
Instruction and data are specified and
subsequently stored in memory
Temporal locality --- Working set W(t, T)
the tend of loops in programs are executed
repeatedly
W(t, T) tends to change rather slower during the
time interval (t-T, t)

150
Cost and Performance(I)

Factors of performance
the address-reference statistics
the access time (tA)
the storage capacity (Si)
the size of blocks (pages) (SPi)
the allocation algorithm (blocks-swapping
process)

151
Cost and Performance(II)

The average cost per bit of memory
to reach the goal of making c approach c2
S1 must be smaller than S2

152
Cost and Performance(III)

The performance can be measured by the hit ratio
H
the probability that a virtual address generated
by the CPU are currently stored in the faster
memory
hit reference to M1
miss reference to M2
miss ratio 1 - H

153
Cost and Performance(IV)

N1 the number of references of M1
N2 the number of references of M2

154
Address Translation(I)

Address mapping or address translation process
map virtual address onto real address
by programmer
by compiler
by loader
by run-time

155
Address Translation(II)

Static translation
compete the translation as the program loaded
Dynamic translation
complete the translation during execution
run-time address translation by MMU (Memory
management unit)

156
Base Addressing

Effective address base displacementAeff B
D or Aeff B.D
Limit address length of block
Figure 6.25, pp. 434

157
Translation Look-Aside Buffer

TLB
AV BV.D
BRTLB(BV)
AR BR.D
Figure 6.26, pp. 434
MIPS R2/3000, pp. 434-435

158
Pages and Segments(I)

Page (frame) is a fixed size of block
suitable for physical partitioning and swapping
of information
Segment is logical block of program or data
its boundary corresponding to the natural program
or data boundary
stack segment

159
Pages and Segments(II)

Two-stage address translation
AV SI.PI.D
PB Segment TLB(SB.SI)
P Page TLB(PB.PI)
AR P.D
Figure 6.30, pp. 440

160
Memory Address Translation in Intel Pentium

32-bit linear address
N 32-bit Effective address AV Segment
TLB(STB.14-bit Ls) 10-bit Nd. 10-bit Np.
12-bit Displacement
AR Page table TLB( Page directory
TLB(PDB.Nd).Np)

161
Page Size

Utilization versus page size
Figure 6.32, pp. 443
Hit ratio versus page size
Figure 6.33, pp. 443

162
Memory Allocation

Nonpreemptive allocation
all blocks already occupying memory can be
overwritten or moved
first fit
best fit
Preemptive allocation
relocation is allowed
move
replace

163
Replacement policies

FIFO
LRU
OPT
Figure 6.36, pp. 448
Figure 6.37, pp. 449
Figure 6.38, pp. 450

164
Caches(I)

History
appeared as early as 1968 IBM S/360
in 1980, caches directly address the von Neumann
bottleneck by providing the CPU with fast,
single-cycle access to its external memory

165
Caches(II)

A cache servers as
a fast intermediate memory
a buffer between the CPU and its main memory
TLBs within a MMU
Data buffers built into high-speed secondary
memory devices

166
Caches(III)

Access time ratio
(M1, M2) is around tA2/tA1 5/1
(M2, M3) is around tA3/tA2 1000/1
By high-speed hardware circuits rather than by
software routines
Figure 6.39, pp. 453

167
Cache Organization(I)

Cache data memory (cache blocks or lines)
Cache tag memory (cache directory)

Cache M1
Cachedatamemory
Cachetag
Hit
Address
Control
Data
168
Cache Organization(II)

Performance factors
time to match tag address
time to access data memory
use the SRAM technology of 10ns access time
Two general organizations
look-aside
look-through
Figure 6.41, pp. 454

169
Look-Aside Cache

CPU placing a real address on system bus
cache comparing address to tag
if hit, read or write operates on cache
if miss, read or write operates on main memory
and a block of data including its address is
transferred from main memory to cache
if miss, use the block replacement policy such as
LRU to determine where to place the incoming
block
the block transfer could tie up the system bus

170
Look-Through Cache

CPU placing a real address on a separate local
bus
cache access and memory access can proceed
concurrently
CPU sends memory requests to main memory only
after a cache miss
to speed up cache-main memory transfer, the local
bus between cache and main memory can be wider
than system bus, such as wide as the block size
of cache, 16-byte or 128-bit data bus
disadvantages
higher complexity and cost
longer main memory access time if miss occurs

171
Cache Operation(I)

Read operation - Figure 6.42, a cache with 4-byte
block size and 12-bit address
Write operation - Figure 6.43
a temporary inconsistency between cache and main
memory is possible
preventing the improper use of stale data is the
cache coherence or cache consistency problem
between multiprocessors
between single-CPU and IO controllers
a systematically updating policy (chapter 7)

172
Cache Operation(II)

a systematically updating policy
a change (dirty) bit for each cache block
cache write-back or copy-back technique
if replacing occurs, the block data is written
back to main memory when its change bit is on
disadvantages
has a temporary inconsistency before write-back
complicates recovery from system failures

173
Cache Operation(III)

cache write-through policy
write data to both cache and main memory for
every memory write cycle
use more write cycles than write-back policy and
slower system performance

174
Address Mapping

To quickly determine whether a tag address is
presented in the cache
the fastest technique is to use the associative
or content addressing scheme to compared all tags
simultaneously

175
Associative Addressing

Fields of item in CAM (Content Address Memories)
KEY stored address
DATA information to be accessed
memory access request
an associative cache as the tag, key
the incoming tag compared simultaneously to all
tags in caches tag memory
if cache hit occurs, a match signal triggers the
memory access from caches data field
If cache miss occurs, forward request to main
memory

176
Associative Memory(I)

A fixed-length word for each unit
Mask register
to identify the bit positions (need not be
adjacent) that define the key
Match circuit
to compare with a bit of key simultaneously
Select circuit
to enable the data field to be accessed

177
Associative Memory(II)

about 10 transistors for a bit associated memory
(Figure 6.44-46, pp. 458-460)
the caches LRU block replacement policy is
implemented by special hardware that constantly
monitors cache usage.

178
Direct Mapping(I)

An alternative, simpler addressing-mapping
technique for caches
Divide M1 into s sets M1(0), M1(1), , M1(s1-1)
where s1 2s
each set as a block of n consecutive words

179
Direct Mapping(II)

M2(i) is mapping into M1(j) if ji (modulo s1)
if S1 26 64blocks with address i, i64,
i128, i192, can be mapped into M1(i)

180
Set-Associative Addressing

K-way set-associative mapping
Each set contains k2h blocks
Permits up to k members of the same equivalent
class to be stored in the cache simultaneously
M2(i) and M2(j) in the same class if ij (modulo
s1)
One-way set-associative mapping is equivalent to
set-associative mapping
Two-way, four-way, eight-way,
Figure 6.49, pp. 463

181
Design of a 2-Way Set-Associative Cache(I)

8KB 2-way set-associative addressing
Example of a 32-bit processor
(Figure 6.50, pp. 464)
8B block, VAX-11/780 in 1978
32B block, PowerPC/603 in 1993

182
Design of a 2-Way Set-Associative Cache(II)

32-bit Memory address
Tag 20 bits ? 20 bits per cache tag
Set address 9 bits ? 512 sets
Displacement 3 bits ? 64 bits per block
Cache architecture
Tag RAM ? 512 ? 20 ? 2 (T0 and T1)
Data RAM ? 512 ? 64 ? 2 (D0 and D1)
2 20-bit tag comparators

183
Design of a 2-Way Set-Associative Cache(III)

Cache operation
Use 9-bit set address to read T0 and T1 and
compare both outputs with Atag simultaneously
if a match occurs, Ti is used to initiate a
memory access of Di to or from 64-bit data bus
if a smaller data bus is used, a block needs
several cycles to transfer data
if a miss occurs, a 64-bit block is swapped from
main memory to cache
VAX 11/788 uses a random replacement and
write-through updating policy
PowerPC/603 uses a LRU and write-back policy

184
Structure versus Performance

The type of information to store in cache
The dimension of cache
The control method of cache
The impact of performance

185
Cache Types(I)

By different access behavior patterns
A unified cache stores both instruction and data
together
A split cache has two independent units
an I-cache for instructions
few write operation
more temporal and spatial locality
a D-cache for data

186
Cache Types(II)

By the level in the memory hierarchy
Primary cache Level 1 (L1) cache
via part of on-chip memory of a microprocessor
chip
Secondary cache Level 2 (L2) cache
via an off-chip memory

187
Performance

tB tA2 the block-transfer time from main
memory to cache can be identical to a single

Write a Comment

User Comments (0)

About PowerShow.com

The Elements of Computers - PowerPoint PPT Presentation

The Elements of Computers

A processor able to interpret and execute programs; A memory for storing the programs and the data they ... Babbage's Difference Engine. The Analytical Engine. ... – PowerPoint PPT presentation