Title: Chapter 1 Microcomputers and Microprocessors
1Chapter 1 Microcomputers and Microprocessors
- Microprocessor Evolution and Performance
2Contents
- Introduction to microcomputer system
- Microprocessor evolution
- the INTEL processor family
- Microprocessor performance
3Introduction to Microcomputer
- An microcomputer can be interpreted as a machine
with - I/O devices for Input/Output,
- microprocessor for processing,
- memory units for storage
- Buses for connecting the above components
- In 1970, a microcomputer was normally interpreted
as a computer considerably smaller than a
mini-computer, possibly using ROM for program
storage
4Basic hardware units
- Input
- e.g. keyboard, mouse
- Microprocessor
- e.g. 8085, 8086, mc68000 microprocessors
- Memory
- e.g. RAM, hard disk
- Output
- e.g. monitor, printer
5Buses
- Buses External connections to input/output unit
- Major Buses
- Address bus address of memory locations
containing instructions or data - Data bus contents of memory locations
- Control Bus synchronization and handshaking
between components
6General Architecture
Memory Unit
Secondary memory
Primary memory
Microprocessing unit
Input unit
Output unit
7Processor History
8First Generation Computers
- Vacuum tube technology
- Large room, air-conditioned
- Tube life-time 3,000 hours
- Useless Machine?
- 1951 1st Univac I (UNIVersal Automatic Computer)
delivered - 1952 Prediction of presidential election by CBS
- 1952 IBM Model 710 Data Processing System
9Second Generation Computers
- The Transistor Is Born (Solid-State Era)
- 1948 invention of bipolar transistors
- 1956 Nobel physics award Drs. William Shockley,
John Bardeen and Walter H. Brattain (Bell Labs) - 1954 Bell Labs all-transistorized computer
(TRADIC) - 800 transistors
- Much less heat
- More reliable and less costly
10Second Generation Computers
- Mainframe Computers
- 1958 IBMs 1st transistorized computer 7070/7090
- 1959 1401 (business-oriented model)
- Built on circuit boards mounted into rack panels,
or frames - Main frame (mainframe) the CPU portion of the
computer - Popular with business and industry
11Third Generation Computers
- Invention of IC 1959
- Dr. Robert Noyce (Fairchild) and Jack Kilby (TI)
- Kilby fabricating resistors, capacitors and
transistors on a germanium wafer, and connecting
these parts with fine gold wires - Noyce isolating individual components with
reverse-biased diodes, and deposing an adherent
metal film over the circuit, thus connecting the
components - 1st IC 2-transistor multivibrator
- By mid 1960s memory chips with 1,000 components
are common
12Third Generation Computers
- 1964 IBM 360 Series (32-bit)
- The first to use IC technology
- A family of 6 compatible computers
- 40 different I/O and auxiliary storage devices
- Memory capacity 16K words to over 1MB.
- 32-bit registers x 16
- 24-bit address bus
- 128-bit data bus
13Third Generation Computers
- 1964 IBM 360 Series (32-bit)
- 375,000 computations per second
- (ltlt 150 mips Pentium 100)
- 5 billion development cost
- IBM became the leading mainframe company
14Minicomputer
- 1960s Space Race between US USSR
- IC industry boom
- A tremendous demand by scientists and engineers
for an inexpensive computer that they could
operate by themselves - 1965 DEC PDP-8 (by Edson de Castros group)
- Low-cost (25,000) minicomputer
- 12-bit
- 16-bit PDP-11
- Supermini
15Microprocessors CPU on a Chip
- 1968 INTEL (Integrated Electronics)
- Founded by Robert Noyce and Gordon Moore
(Fairchild) - Original goals semiconductor memory market
- 1969 customized ICs for Busicom for calculator
- Ted Hoff and Stan Mazor proposed 4-bit CPU on a
single chip, plus ROM, RAM chips
16Microprocessors CPU on a Chip
- 1971 4000 Family
- By Fredrico Faggin
- 4001 2K ROM with 4-bit I/O port
- 4002 320-bit RAM, 4-bit output port
- 4003 10-bit serial-in parallel-out shift
register - 4004 4-bit processor
- Processor-on-a-chip Micro-processor era
17Microprocessors CPU on a Chip
- 1972 8008, 8-bit
- 1974 8080, an improved version
18Microprocessors CPU on a Chip
- 8-bit CPUs
- 16-bit address (64K)
- MC6800 Motorola
- 6502 MOS Technology (spin-off from Motorola)
- Apple-II, Apple DOS
- Z-80 Zilog (spin-off from Intel)
- Z-80 cards on Apple-II, CP/M
19Microprocessors CPU on a Chip
- 16-bit CPUs (Late 1970s)
- 8086, 80186, 80286 Intel
- PC, PC-DOS, MS-DOS, SCO-Unix
- MC68000 Motorola
- 16-bit instructions
- Hardware multiply and divide
- 20-bit address buses (1MB)
- Workstations Sun3
20Microprocessors CPU on a Chip
- 32-bit CPUs
- 80386, 80486 Intel
- MC68020, 68030 Motorola
- 64-bit CPUs
- Pentium, Pentium Pro (64-bit external data bus,
32-bit internal registers, not recognized as
64-bit CPUs in terms of internal register word
length)
21Microcomputers Computers Based on Microprocessors
- 1975 MITS Altair 8800 (Kit)
- 399, i8080, programmed by depositing 1s/0s via
front panel switches - Other Computers boom
- 8080 MITS,
- 6800 SWTPC 6800,
- Z-80 TRS-80,
- 6502 Apple I, 8K, programmed with BASIC
- Steve Jobs Steve Wozniak, millionaires from PC
COMs
22Personal Computers the Open Architecture Era
- 1982 IBM PC
- A system board (mother board)
- Intel 8088 processor
- 16K memory
- 5 expansion slots
- Third-party vendors to supply various IO adapter
cards - Open architecture
- Computer with interchangeable components
23Micro-controllers Microcomputers on a Chip
- Microcontroller a computer on a chip
- Microprocessor, plus
- On-chip memory, plus
- Input/output ports
- 1995 microcontrollers out sold microprocessors
101 - embedded on various equipments
- Thermostat, machine tools, communication,
automotive, - Evolution getting greater IO capabilities
- Intel MCS-51, MCS-96,
24High-Performance Processors
- Supercomputers
- Aircraft design, global climate modeling,
oil-bearing formation, molecular design of new
drugs, financial behavior - CDC6600, 7600 Seymour Cray
- Cray-1 1976, the first true supercomputer
- ECL, 128 KW power consumption
- 130 MFLOPS (Pentium 100 150 MFLOPS)
- 5.1 million
25High-Performance Processors
- Parallel Processors
- Tens of gigaflops
- Multi-processors wired by a common bus
- Each is given a portion of the problem to solve
- Hypercube early 1980s
- Cosmic Cube, iPSC (with i860/RISC chips)
- 2D rectangular Mesh architecture multiple
processor at each node - Intel teraflops computer with 4500 nodes, each
powered by 2 Pentium Pro 200.
26RISC vs. CISC
- RISC Reduced Instruction Set Computer (1980s)
- A small number of fixed-length instructions
- Simple addressing modes
- A large number of registers
- Instructions executed in one clock cycle
- Intel i860 (Cray on a Chip)
- 82 instructions, 32-bit long each
- Four addressing modes
- 32 general-purpose registers
27RISC vs. CISC
- CISC Complex Instruction Set Computer
- A large number of variable length instructions
- Multiple addressing modes
- A small number of registers
- Multiple number of clock cycles to execute
- Intel 8086
- Over 3000 instruction forms, 1-6 bytes
- 9 addressing modes
- 8 general-purpose registers
- Execution from 2 to 80 cycles
28RISC vs. CISC
- RISC
- Control unit is much simpler (simpler
instructions, execution in 1 CLK) - Faster execution with less total on-chip logic
- Chip area 10 (vs 50 for CISC)
- More area for register file, data and instruction
caches, FPU, and co-processor - PowerPC 32-bit, by IBM, Apple, Motorola
- Sparc for SunMicro workstations
29Application-Specific Processors
- DSP Chips
- Mostly for analog signal processing
- ADC-DSP-DAC architecture
- Avoid processing analog signals using discrete
circuits, involving capacitors and inductance - DSP conduct complex mathematic functions
- Digital filter, spectrum analysis
30Application-Specific Processors
- DSP Chip Architecture
- Different data/program areas Harvard
Architecture - Hardware multipliers and adders, optimized to
execute on a single cycle - Arithmetic pipelining several instructions
operated at once - Hardware loop control
- Multiple IO ports for communication with other
processors
31Summary of Processor History
- 1940s Vacuum tube, large and consuming large
power - 1950s Transistor (1948-)
- 1959 First IC (second industrial revolution)
- 1960s IC was popular to build CPUs.
- 1971 Intel 4004 microprocessor (2300
transistors) - Starts of the microprocessor age
- Late 1970s 8080/85
32Summary of Processor History
- 1980 RISC (reduced instruction set computer)
- CISC (complicated instruction set computer) vs.
RISC - CISC family Intel 80x86, Pentium Motorola 68000
series - All others are RISC series.
33Evolution of INTEL Processors
- 4004 (71)-Pentium Pro (93-)
34INTEL
- Integrated Electronics
- 1968 founded by Robert Noyce and Gordon Moore
- IA Intel Architecture (e.g, IA-16, IA-32, IA-64)
since 8008 (72) had became the de facto standard - Evolution
- Internal register sizes
- External bus widths
- Real, Protected, and Virtual 8086 modes
354-bit Processors
- 4004
- first microprocessor
- became available in 1971
- 4-bit microprocessor
- 4-bit registers 4-bit data bus
- transistors 2250
- Min. feature size 10 microns
- Address bus 10 bits/1K
- 0.06 MIPS (_at_ 0.108 MHz)
- No internal cache
368-bit Processors
- 8008, 8080, 8085
- became available in 1974
- 8-bit microprocessor
378086 IA standard
- Became available in 1978
- 16-bit data bus
- 20-bit address bus (was 16-bit for 8080)
- memory organization 16 segments of 64KB (1 MB
limit) - Re-organize CPU into BIU (bus interface unit) and
EU (execution unit) - Allow fetch and execution simultaneously
- Internal register expanded to 16-bit
- Allow access of low/high byte separately
388086
- Hardware multiply and divide instructions
- External math co-processor
- Instruction set compatible with 8080/8085
- 8086 defined the 80x86 architecture
398086
- Not quite successful
- 16-bit data bus Requires two separate 8-bit
memory banks - Memory chips were expensive
408088 PC standard
- Became available in 1979, almost identical to
8086 - 8-bit data bus for hardware compatibility with
8080 - 16-bit internal registers and data bus (same as
8086) - 20-bit address bus (was 16-bit for 8080)
- BIU re-designed
- memory organization 16 segments of 64KB (1 MB
limit) - Two memory accesses for 16-bit data (less
efficient) - But less cost
- 8088 used by IBM PC (1982), 16K-64K, 4.77MHz
4180186, 80188 High Integration CPU
- PC system
- 8088 CPU various supporting chips
- Clock generator
- 8251 serial IO (RS232)
- 8253 timer/counter
- 8255 PPI (programmable periphial interface)
- 8257 DMA controller
- 8259 interrupt controller
- 80186/80188 8086/8088 supporting functions
- Compatible instruction set ( 9 new instructions)
4280286
- Became available in 1982
- used in IBM AT computer (1984)
- 16-bit data bus
- clock speed 25 faster than 8088, throughput 5
times greater than 8088 - 24-bit address bus (16 MB) (vs. 20-bit/1M 8086)
4380286 Real vs. Protected Modes
- Larger address space 24-bit address bus
- Real Mode vs. Protected Mode
- Real Mode
- Power on default mode
- Function like a 8086 use 20-bit least
significant address lines (1M) - Software compatible with 286
- 16 new instructions (for Protected Mode
management) - Faster 286 redesigned processor, plus higher
clock rate (6-8MHz)
4480286 Real vs. Protected Modes
- Protected Mode
- Multi-program environment
- Each program has a predetermined amount of memory
- Addressed via segment selector (physical
addresses invisible) 16M addressable - Multiple programs loaded at once (within their
respective segments), protected from read/write
by each other
4580286 Real vs. Protected Modes
- Protected Mode
- Cannot be switch back to real mode to avoid
illegal access by switching back and forth
between modes - A faster 8086 only?
- MS-DOS requires that all programs be run in Real
Mode
46Clock Speed
- Electrical signals cannot change instantaneously
(transition period required) - System clock provides timing signal for
synchronization - Cannot be used to compare the performance of
microprocessors with different instruction sets - e.g., a 66 MHz Pentium is twice as fast as a 66
MHz 80486
4780386DX (aka. 80386)
- available in 1985, a major redesign of 86/286
- Compatibility commitment through 2000
- 32-bit data and address buses (4 GB memory)
- Real Address Mode 1M visible, 286 real mode
- Protected Virtual Address Mode
- On board MMU
- Segmented tasks of 1byte to 4G bytes
- Segment base, limit, attributes defined by a
descriptor register - Page swapping 4K pages, up to 64TB virtual
memory space - Windows, OS/2, Unix/Linux
4880386DX (aka. 80386)
- Virtual 8086 mode (a special Protected mode
feature) permitted multiple 8086 virtual
machines-multitasking (similar to real mode) - Windows (multiple MSDOSs)
- Clock rate
- max. 40MHz, 2 pulses per R/W bus cycle
- External memory cache to avoid wait
- Fast SRAM
- 93 hit rate with 64K cache
- Compatible instructions (14 new)
4980386SX
- 80386SX (for transition to 32-bit)
- 16-bit data bus/32-bit register
- 24-bit address bus
5080486DX
- 1989 a polished 386, 6 new OS level instructions
- virtually identical to 386 in terms of
compatibility - RISC design concepts
- fewer clock cycles per operation, a single clock
cycle for most frequently used instructions - Max 50MHz
- 5 stage execution pipeline
- Portions of 5 instructions execute at once
5180486DX
- Highly Integrated
- On board 8K memory cache
- FPP (equivalent to external 80387 co-processor)
- Twice as fast as 386 at any given clock rate
- 20Mhz 486 40Mhz 386
5280486SX
- 80486SX
- NOT a 16-bit version for transition purpose
- no coprocessor
- No internal cache
- For low-end applications
- Max. 33Mhz only
5380486DX2/DX4 Overdrive Chips
- Processor speed increased too fast
- Redesign of microcomputer for compatibility
becomes harder - Solution Separating internal speed with external
speed, improve performance independently - 80486DX2/DX4 internal clock twice/three times
(NOT four times) the external clock runs faster
internally
5480486DX2/DX4 Overdrive Chips
- System board design is independent of processor
upgrade (less expensive components are allowed) - Processor operate at maximum speed data rate
internally - Only slow access to external data operates at
system board rate - Internal cache offset the speed gap
- 486DX2 66 66 internal, 33 external
- 486DX4 100 100 internal, 33 external (3x)
- Overdrive sockets for upgrading 486dx/sx to
486dx2/dx4 (with overdrive socket pin-outs)
55Pentium Superscaler Processor
- available in 1992
- 32-bit architecture
- Superscaler architecture
- Scaling scaling down etchable feature size to
increase complexity of IC (e.g., DRAM) - 10 microns/4004 to 0.13 microns (2001)
- Superscaler go beyond simply scaling down
- Two instruction pipelines each with own ALU,
address generation circuitry, data cache
interface - Execute two different instructions simultaneously
56Pentium Superscaler Processor
- Onboard cache
- Separate 8K data and code caches to avoid access
conflicts - FPP
- Instruction pipeline 8 stage
- Optimized floating point functions
- 5x-10x FLOPs of 486
- 2x performance of 486 at any clock rate
57Pentium Superscaler Processor
- Compatibility with 386/486
- Internal 32-bit registers and address bus
- Data bus expanded to 64-bits for higher data
transfer rate - Compare 8088 to 386sx transition
58Pentium Superscaler Processor
- non-clone competition from AMD, Cyrix
- development of brand identity by Intel
59Pentium Pro Two Chips in One
- Became available in 1995
- Superscaler of degree 3
- Can execute 3 instructions simultaneously
- Optimized for 32-bit operating systems (e.g.,
Windows NT, OS2/Warp) - Two separate silicon die on the same package
- Processor 0.35 u, 5.5 million transistors
- 256KB(/512K) Level 2 cache included on chip, 15.5
million transistors in smaller area
60Pentium Pro Two Chips in One
- On Board Level 2 cache
- Simplifies system board design
- Requires less space
- Gains faster communication with processor
- Internal (level 1) cache 8K
- Pentium Pro 133 2x Pentium 66 4x 486DX2 66
61Pentium ProDynamic Execution
- Dynamic execution reduce idle processor time by
predicting instruction behaviors - Multiple Branch Prediction look as far as 30
instructions ahead to anticipate program branches - Data Flow Analysis looks at upcoming
instructions and determine if they are available
for processing, depending on other instructions.
Determine optimal execution sequences. - Speculative Execution execute instructions in
different order as entered. Speculative results
are stored until final states can be determined.
62Processor Future
- Whats More from Moores Law?
63Moore's Law
- In 1965, Gordon Moore predicted that
- The number of transistors per integrated circuit
would double every 18 months - He forecast that this trend would continue
through 1975
64Moores Law
65Other Microprocessors
- Motorola family
- from 6809 (Apple II) through 68040
- PowerPC
- joint venture between Apple, IBM, and Motorola
- RISC Processors
- DEC Alpha, MIPS, Sun SPARC, etc.
66CISC vs. RISC
- CISC (Complex Instruction Set Computer)
- CISC processors have a large versatile
instruction set that supports many complex
addressing modes - move complexity from software to hardware
- RISC (Reduced Instruction Set Computer)
- RISC processors have a small instruction set
- move complexity from hardware to software
67Microprocessor Performance
- Two main factors
- Respond time
- the time between the start and completion of a
task, also referred to as execution time - Throughput
- the total amount of work done in a given time
68MIPS
- Million Instructions Per Second
- MIPS (Instruction count) / (Execution time in
micro second X 106) - It specifies performance inversely to execution
time - Faster machines have a higher MIPS rating
69Some Problems of MIPS
- Cannot compare computers with different
instruction sets, since the instruction count
will certainly differ - MIPS varies between programs on the same computer
70iCOMP
- An index provided by Intel for comparison of
performance of their 32-bit microprocessors - Based on a variety of performance components that
represent integer mathematics, graphics, etc. - Combine results of a set of software application
benchmarks
71(No Transcript)
72Chapter 2Computer Codes, Programming, and
Operating Systems
- Number Systems
- Computer Codes
- Programming
- Operating Systems
73Number Systems
- Decimal Base 10
- Binary Base 2
- Octal Base 8
- Hexadecimal Base 16
74Base Conversion 2?10
- Binary to Decimal
- D ?i0,n-1 bi x 2i
- Decimal to Binary
- Repeated subtraction
- D ?i0,m-1 bi x 2i D - 2m (bm1)
- D lt D m lt m (m max exp. s.t. (bm1)
- Long division
- D D/2 bi D lt D
75(No Transcript)
76MCS-51 Program Development
.SDT
Symbol Converter
ICE
(CVTSYM)
Program
.SYM
Editor
Assembler
Linker
.ASM
.OBJ
.HEX
(X8051)
(Link)
Target
77Chapter 380x86 Processor Architecture
- 8086/88
- Segmented Memory
- 80386
- 80486
- Pentium
- Pentium Pro
78The 8086 and 8088
- Processor Model
- Programming Model
798086 IA standard
- Became available in 1978
- 16-bit data bus
- 20-bit address bus (was 16-bit for 8080)
- memory organization 16 segments of 64KB (1 MB
limit) - Re-organize CPU into BIU (bus interface unit) and
EU (execution unit) - Allow fetch and execution simultaneously
- Internal register expanded to 16-bit
- Allow access of low/high byte separately
808088 PC standard
- Became available in 1979, almost identical to
8086 - 8-bit data bus for hardware compatibility with
8080 - 16-bit internal registers and data bus (same as
8086) - 20-bit address bus (was 16-bit for 8080)
- BIU re-designed
- memory organization 16 segments of 64KB (1 MB
limit) - Two memory accesses for 16-bit data (less
efficient) - But less cost
- 8088 used by IBM PC (1982), 16K-64K, 4.77MHz
8180186, 80188 High Integration CPU
- PC system
- 8088 CPU various supporting chips
- Clock generator
- 8251 serial IO (RS232)
- 8253 timer/counter
- 8255 PPI (programmable periphial interface)
- 8257 DMA controller
- 8259 interrupt controller
- 80186/80188 8086/8088 supporting functions
- Compatible instruction set ( 9 new instructions)
828086 Processor Model BIUEU
- BIU
- Memory IO address generation
- EU
- Receive codes and data from BIU
- Not connected to system buses
- Execute instructions
- Save results in registers, or pass to BIU to
memory and IO
838086 Processor Model
Address Generation and Bus Control
EU
BIU
Instruction Queue
84Fetch and Execution Cycle
- BIUEU allows the fetch and execution cycle to
overlap - 0. System boot, Instruction Queue is empty
- 1. IP gtBIUgt address bus IP
- 2. Mem(IP-1) gt Instruction Queuetail
- 3a. InstrQhead gt EU gt execution
- 3b. MemIP gt InstrQtail
- Maybe multiple instructions
- Repeat 3a3b (overlapped)
85Waiting Conditions Memory Access
- BIUEU execute (almost) continuously without
waiting - Waiting Conditions Accessing memory locations
not in queue - BIU suspend instruction fetch
- Issues external memory address
- Resumes instruction fetch and execution
86Waiting Conditions Jump
- Next Jump Instruction
- Instructions in queue are discarded
- EU wait for the next instruction after the jump
location to be fetched by BIU - Resume execution
87Waiting Conditions Long Instructions
- Long Instruction is being executed
- Instruction Full
- BIU waits
- Resume instruction fetch after EU pull one or tow
bytes from queue
88BIU 8088 vs. 8086
- BIU is the major difference
- 8088
- data bus 8-bit (vs. 16-bit/8086)
- Instruction queue 4 bytes (vs. 6-byte/8086)
- Only 30 slower than 8086
- If queue is kept full
898086 Programming Model
908086 Programming Model
- Data Group
- AX (AHAL) Accumulator
- BX (BHBL) Base
- CX (CHCL) Counter
- DX (DHDL) Data
918086 Programming Model
- Segment Group
- CS Code Segment
- DS Data Segment
- ES Extra Segment
- SS Stack Segment
- Segment Registers
- Base address to particular segments
928086 Programming Model
- Pointer/Index Group
- IP Instruction Pointer ?CS
- SI Source Index?DS
- DI Destination Index?ES
- SP Stack Pointer?SS
- Index Registers
- Index (offset) or Pointer to a Base address
938086 Flag Word
SF ZF X AF X
PF X CF
PF (Even) Parity Flag (even number of 1s in
low-order 8 bits of result)
AF Aux. Carry Carry/Borrow on bit 3 (Low nibble
of AL)
ZF Zero Flag (1 result is zero)
SF Sign Flag (0 positive, 1 negative)
948086 Flag Word
X X X X OF
DF IF TF
TF Trap flag (single-step after next
instruction clear by single-step interrupt)
IF Interrupt-Enable enable maskable interrupts
DF Direction flag auto-decrement (1) or
increment(0) index on string operations
OF Overflow signed result cannot be expressed
within bits in destination operand
95Segmented Memory
- Linear vs. Segmented
- Linear Addressing
- The entire memory is regarded as a whole
- the entire memory space is available all the time
- Segmented
- memory is divided into segments
- Process is limited to access designated segments
at a given time
968086 Memory Organization
- Even and Odd Memory Banks
- 16-bit data bus?two-byte / two one-byte access
- Allows processor to work on bytes or on words
(16-bit) - IO operations are normally conducted in bytes
- Can handle odd-length instructions
- Single byte instructions
- Multiple byte (and very long) instructions
978086 Memory Organization
- Memory Space
- 20-bit address bus
- Linearly, 1M bytes directly addressable
- Memory Banks
- Can read 16-bit data (512K words) from even and
odd-addressed simultaneously - ?need Two memory banks in parallel
- ?BHE control line allows addressing even/odd
banks or both
98Memory Organization Alignment
- Endianess
- One way to model multi-byte CPU register
- AX ? AHAL
- Two ways to store operands in memory
- Big-endian CPU (IBM370, M68, Sparc)
- High-order-byte-first (HOBF)
- Maps highest-order byte of internal
register?lowest (1st) memory byte address - Operand address?address of MSB
- MOV R1, N ? N 1st byte in memory MSB of
register
99Memory Organization Alignment
- Little-endian CPU (DEC, Intel)
- Low-order-byte-first (LOBF)
- Maps lowest-order byte of register ?1st memory
byte - Operand address ?address of LSB (1st memory byte)
- MOV AX, N ?N 1st byte in memory LSB of
register - AL?N, AH?N1
- Configurable
- Can switch between Big/Little-endian, or
- Provide instructions which convert 16-/32-bit
data between two byte ordering (80486)
1008086 Memory Organization
- Aligned operand
- Operand aligned at even-byte (word/dword)
boundaries - Allows single access to read/write one operand
- Through internal shift/swap mechanism, if
necessary - Mis-aligned words
- Word operand not start at even address
- Need 2 read cycles to read/write the word (8086)
- Issues two addresses to access the two
even-aligned words containing the operand in
order to access the operand - slower but transparent to programmer
1018086 Memory Organization
- 8088
- always 2 cycles for word operations
- Aligned or not
- Because of 8-bit external data bus
- Single memory bank is sufficient
1028086 Memory Map
- Memory Map How memory space is allocated
- ROM Area boot, BIOS
- RAM OS/User Apps data
- Unused
- Reserved for future hardware/software uses
- Dedicated for specific system interrupt and rest
functions, etc.
103Segment Registers
- 64K memory segments x 16
- 16-bit offset each
- CS, DS, ES, SS
104Logical and Physical Addresses
- Physical 20-bit
- Logical 16-bit
- 16-byte segment boundaries
- Address Translation
- E.g., CSIP
10580286
- First with Protection Mode
- Review of 286 Protected Mode Next
10680286
- Became available in 1982
- used in IBM AT computer (1984)
- 16-bit data bus
- clock speed 25 faster than 8088, throughput 5
times greater than 8088 - 24-bit address bus (16 MB) (vs. 20-bit/1M 8086)
10780286 Real vs. Protected Modes
- Larger address space 24-bit address bus
- Real Mode vs. Protected Mode
- Real Mode
- Power on default mode
- Function like a 8086 use 20-bit least
significant address lines (1M) - Software compatible with 286
- 16 new instructions (for Protected Mode
management) - Faster 286 redesigned processor, plus higher
clock rate (6-8MHz)
10880286 Real vs. Protected Modes
- Protected Mode
- Multi-program environment
- Each program has a predetermined amount of memory
- Addressed via segment selector (physical
addresses invisible) 16M addressable - Multiple programs loaded at once (within their
respective segments), protected from read/write
by each other
10980286 Real vs. Protected Modes
- Protected Mode
- Cannot be switch back to real mode to avoid
illegal access by switching back and forth
between modes - A faster 8086 only?
- MS-DOS requires that all programs be run in Real
Mode
11080386 Model
- Refine 286 Protect Mode
- Expand to 32-bit registers
- New Virtual 8086 Mode
11180386 Review
11280386DX (aka. 80386)
- available in 1985, a major redesign of 86/286
- Compatibility commitment through 2000
- 32-bit data and address buses (4 GB memory)
- Real Address Mode 1M visible, 286 real mode
- Protected Virtual Address Mode
- On board MMU
- Segmented tasks of 1byte to 4G bytes
- Segment base, limit, attributes defined by a
descriptor register - Page swapping 4K pages, up to 64TB virtual
memory space - Windows, OS/2, Unix/Linux
11380386DX (aka. 80386)
- Virtual 8086 mode (a special Protected mode
feature) permitted multiple 8086 virtual
machines-multitasking (similar to real mode) - Windows (multiple MSDOSs)
- Clock rate
- max. 40MHz, 2 pulses per R/W bus cycle
- External memory cache to avoid wait
- Fast SRAM
- 93 hit rate with 64K cache
- Compatible instructions (14 new)
11480386SX
- 80386SX (for transition to 32-bit)
- 16-bit data bus/32-bit register
- 24-bit address bus
11580386 Real vs. Protected Modes
- Larger address space 32-bit address bus (4G)
- Real Mode vs. Protected Mode (refined from 286)
- Real Mode
- Power on default mode
- Function like a 8086 (1) use only 20-bit least
significant address lines (1M) (2) segmented
memory retained (64K) - Software compatible with 286
- New Real Mode Features
- access to 32-bit register set
- two new segments F, G
11680386 Real vs. Protected Modes
- Protected Mode
- new addressing mechanism vs. real mode
- supports protection levels
- segment size 1 to 4G (not 64K, fixed)
- segment register pointer to a descriptor table
- not base address
11780386 Real vs. Protected Modes
- Protected Mode
- descriptor table (8 byte per entry)
- 32-bit base address of segment
- segment size
- access rights
- memory address base address (in table) offset
(in instruction)
11880386 Real vs. Protected Modes
- Protected Mode
- Paging mechanism
- map 32-bit linear address (baseoffset)
gtphysical address page frame address - ?(4K page frames in system memory)
- 64TB of virtual memory
11980386 Real vs. Protected Modes
- Protected Mode
- Protection mechanism
- tasks/data/instructions are assigned a privilege
level (PL) - tasks running at lower PL cannot access tasks or
data segments at a higher PL - running programs that are protected from the
others
12080386 Real vs. Protected Modes
- Two Ways to Run 8086 Programs
- Real Mode
- Virtual 8086 Mode
- Virtual 8086 Mode
- runs multiple 8086other 386 (protected mode)
programs independently - each sees 1 MB (mapped via paging to anywhere in
4GB space) - running V8086 Protected mode simultaneously
12180386 Processor Model
386
12280386 Processor Model BIUCPUMMU
- BIU
- control 32-bit address and data buses
- keep instruction queue full (16 bytes)
- Address pipelining
- address of next memory location is output halfway
through current bus cycle - more address decode time
- slower memory chip is OK
- easier to keep up with faster (2 CLK) bus cycle
of 386
12380386 Processor Model BIU
- dynamic data bus sizing
- switch between 16-/32-bit data bus on the fly
- accommodate to external 16-bit memory cards or IO
devices - adjust bus timing to use only the least
significant 16 bits
12480386 Processor Model BIU
- External memory
- 4 memory banks (4x832bits)
- BE0-BE3 for bank selection
- access byte or word or double word
- aligned operands 1 bus cycle
- mis-aligned (not 4) 2 bus cycles
12580386 Processor Model CPU
- CPUIU (instruction) EU (execution)
- fetching execution overlap
- IU
- retrieval instructions from queue
- decode
- store in decoded queue
- EUALUregisters (32-bit)
- execute decode instructions
12680386 Processor Model MMU
- Segmentation unit
- Real mode generate the 20-bit physical address
- Protected mode store base/size/rights in
descriptor registers - cache descriptor tables in RAM
- faster operations
- Paging Unit
- determines physical addresses associated with
active segments (divided into 4K pages) - virtual memory support to allow larger programs
12780386 Programming Model
- General Purpose Registers
- Data Addresses Groups
- Status Control Flags
- VM, RF, NT, IOPL
- Segment Group
12880386 Programming Model
- Special purpose Registers
12980386 Programming Model
- Memory Management
- segment descriptors
- keep base, size, access rights
- 3 types of tables global (GDT), local (LDT),
interrupt (IDT) - addressing
- index (to a table) RPL
- base offset (from instruction)
- Paging
- TLB
13080386 Programming Model
- Protection (PL)
- task CPL
- instruction RPL
- data segment DPL
- Gates
- special descriptors that allows access to higher
PL tasks from lower PL tasks
13180486 Review
13280486DX
- 1989 a polished 386, 6 new OS level instructions
- virtually identical to 386 in terms of
compatibility - RISC design concepts
- fewer clock cycles per operation, a single clock
cycle for most frequently used instructions - Max 50MHz
- 5 stage execution pipeline
- Portions of 5 instructions execute at once
13380486DX
- Highly Integrated
- On board 8K memory cache
- FPP (equivalent to external 80387 co-processor)
- Twice as fast as 386 at any given clock rate
- 20Mhz 486 40Mhz 386
13480486SX
- 80486SX
- NOT a 16-bit version for transition purpose
- no coprocessor
- No internal cache
- For low-end applications
- Max. 33Mhz only
13580486DX2/DX4 Overdrive Chips
- Processor speed increased too fast
- Redesign of microcomputer for compatibility
becomes harder - Solution Separating internal speed with external
speed, improve performance independently - 80486DX2/DX4 internal clock twice/three times
(NOT four times) the external clock runs faster
internally
13680486DX2/DX4 Overdrive Chips
- System board design is independent of processor
upgrade (less expensive components are allowed) - Processor operate at maximum speed data rate
internally - Only slow access to external data operates at
system board rate - Internal cache offset the speed gap
- 486DX2 66 66 internal, 33 external
- 486DX4 100 100 internal, 33 external (3x)
- Overdrive sockets for upgrading 486dx/sx to
486dx2/dx4 (with overdrive socket pin-outs)
137486 Processor Features
- 386 features
- Real/Protected Modes
- Memory Management
- PLs
- registers bus sizes
- New features
- 6 OS instructions
- 8K/16K onboard cache (was external before 386)
138486 Processor Features
- A better 386
- 5 stage instruction pipeline
- IF/ID/EX gt PF/D1/D2/EX/WB
- PF instructions gt Q (216-bytes)
- D1 determine opcode
- D2 determine memory address of operands
- EX execute indicated OP
- WB update register
139486 Processor Features
- Reduced Instruction Cycle Times
- 5 stage instruction pipeline (e.g., Fig. 3.18)
- instruction cycle times
- 8086 4 CLK
- 80386 2 CLK
- 80486 1 CLK (?close to RISC)
- about 2X faster than 386
140486 Processor Model 386FPUCache
- 386 units retained BIU, CPU, MMU
- new FPU (80387) Cache (8K/16K)
- FPU
- 387 onboard
- 0.8 u gt transistors increased (275K gt 1
millions) - simplified system board design
- speedup FP operations
141(No Transcript)
142486 Processor Model Cache
- Cache (8K/16K (dx4))
- Function bridge processor memory bandwidth
- 8088 4.77MHz
- 80486 50MHz
- Pentium 100MHz
- Pentium Pro 133 MHz
- Main Memory (DRAM) relatively slow
- Fast Static RAMs (SRAM) as cache
143486 Processor Model Cache
- Organization
- 8K
- 4-way set associative
- 4 direct mapped caches wired in parallel
- each block maps to a set of 4 lines
- unified data code in the same cache
- write-through update cache and memory page on
write operations
144486 Processor Model Cache
- locality (why caches help?)
- spatial locality e.g., array of data
- temporal e.g., loops in codes
- operations on hit/miss
- 128-bit cache lines
- 32-bit x N to catch locality (N4)
- 128-bit 16-byte
145486 Processor Model Cache
- Mapping
- memory gt many-to-many gt cache
- Data RAM save memory data
- Tag RAM save memory address information
- 3 methods of mapping
- fully associative memory block to any cache line
- direct map memory block to specific line
- trashing
- set associative memory block to a set of cache
lines
146486 Processor Model Cache
- Replacement policy (LRU)
- valid bits all 4 lines in use ?
- NO gt use any unused line
- YES gt find one to replace
- LRU bits which is least recently used
147(No Transcript)
148(No Transcript)
149Pentium Review
150Pentium Superscaler Processor
- available in 1992
- 32-bit architecture
- Superscaler architecture
- Scaling scaling down etchable feature size to
increase complexity of IC (e.g., DRAM) - 10 microns/4004 to 0.13 microns (2001)
- Superscaler go beyond simply scaling down
- Two instruction pipelines each with own ALU,
address generation circuitry, data cache
interface - Execute two different instructions simultaneously
151Pentium Superscaler Processor
- Onboard cache
- Separate 8K data and code caches to avoid access
conflicts - FPP
- Instruction pipeline 8 stage
- Optimized floating point functions
- 5x-10x FLOPs of 486
- 2x performance of 486 at any clock rate
152Pentium Superscaler Processor
- Compatibility with 386/486
- Internal 32-bit registers and address bus
- Data bus expanded to 64-bits for higher data
transfer rate - Compare 8088 to 386sx transition
153Pentium Superscaler Processor
- non-clone competition from AMD, Cyrix
- development of brand identity by Intel
154Pentium Pro Review
155Pentium Pro Two Chips in One
- Became available in 1995
- Superscaler of degree 3
- Can execute 3 instructions simultaneously
- Optimized for 32-bit operating systems (e.g.,
Windows NT, OS2/Warp) - Two separate silicon die on the same package
- Processor 0.35 u, 5.5 million transistors
- 256KB(/512K) Level 2 cache included on chip, 15.5
million transistors in smaller area
156Pentium Pro Two Chips in One
- On Board Level 2 cache
- Simplifies system board design
- Requires less space
- Gains faster communication with processor
- Internal (level 1) cache 8K
- Pentium Pro 133 2x Pentium 66 4x 486DX2 66
157Pentium ProDynamic Execution
- Dynamic execution reduce idle processor time by
predicting instruction behaviors - Multiple Branch Prediction look as far as 30
instructions ahead to anticipate program branches - Data Flow Analysis looks at upcoming
instructions and determine if they are available
for processing, depending on other instructions.
Determine optimal execution sequences. - Speculative Execution execute instructions in
different order as entered. Speculative results
are stored until final states can be determined.