Title: The Microprocessor and its Architecture
1Chapter 2
- The Microprocessor and its Architecture
The Intel 8086, 80X86, and Pentium Family
2Contents
- Internal architecture of the Microprocessor
- The programmers model, i.e. the registers model
- The processor (organization) model
- Memory addressing with segmentation
- - In the real mode
- - In the protected mode
- Memory addressing with paging
3Objectives for this Chapter
- Describe the function and purpose of
program-visible registers - Describe the Flags register and the purpose of
flag bits - Describe how memory is accessed using
segmentation in both the real mode and the
protected mode - Describe the program-invisible registers
- Describe the structures and operation of the
memory paging mechanism - Describe the organizational processor model
- Briefly review the evolution of the 80X86
architecture
4The Intel Family
Addressable Memory, bytes
2A
(A)
? (1978)
Microcontrollers)
? (2000)
I n c r e a s e
? I n c r e a s e
5Programming Model
General Purpose Registers
Special Purpose Registers
Segment Registers
- 80386 and above
- 32-bit registers (except seg. regs.)
- Two additional segment registersF,G
6General-Purpose Registers
- The top portion of the programming model contains
the general purpose registers EAX,
EBX, ECX, EDX, EBP, ESI, and EDI - Can carry both Data Address offsets
- Although general in nature, each has a special
purpose and name - EAX Accumulator
- Used also as AX (16 bit), AH (8 bit),
and AL (8 bit) - EBX Base Index often used to address memory
(BX, BH, and BL)
7General-Purpose Registers (continued)
- ECX count, for shifts, rotates, and loops (CX,
CH, and CL) - EDX data, used with multiply and divide (DX,
DH, and DL) - EBP base pointer used to address stack data
(BP) - ESI source index (SI) for memory locations,
e.g. with string instructions - EDI destination index (DI) for memory locations
8Special-Purpose Registers
- ESP, EIP, and EFLAGS
- Each has a specific task
- ESP Stack pointer Offset to the top of the
stack in the stack segment. Used with procedure
calls (SP) - EIP Instruction Pointer Offset to the next
instruction in a program in the code segment (IP) - EFLAGS indicates latest conditions (state) of
the microprocessor (FLAGS)
Used With
SS
CS
9EFLAGS
80386DX
10The Flags
Determined by last operation
Basic Flag Bits (8086 etc.) Output, Input bits
- C Carry/borrow from last operation
- P the parity flag (little used today)
- A auxiliary flag Half-carry between bits 3 and
4, - used with BCD arithmetic
- Z zero
- S sign
- O Overflow
- D direction - Determines auto
increment/decrement direction for SI and DI
registers with string instructions - I interrupt - Enables (using STI) or disables
(using CLI) the processing of hardware interrupts
arriving at the INTR input pin of the processor - T Trap - Turns trapping interrupt (for program
debugging) on/off
Set/Reset explicitly by the programmer
Some flag bits can be both, e.g. the C flag
11Newer Flag Bits
- IOPL 2-bit I/O privilege level in protected
mode - NT nested task
- RF resume flag (used with debugging)
- VM virtual mode multiple DOS programs each
with a 1 MB memory partition in Windows - AC alignment check detects addressing memory
on wrong boundary for words/double words - VIF virtual interrupt flag
- VIP virtual interrupt pending
- ID CPUID instruction is supported
- The instruction gives info on CPU version and
manufacturer
12 Segment Registers
Each register points to the start of a segment in
memory
- The segment registers are
- CS (code),
- DS (data),
- ES (extra data. used as destination for some
string instructions), - SS (stack),
- FS, and GS Additional segment registers on 80386
and above - Segment registers define the start of a section
(segment) of memory for a program. - A segment is either
- - 64K (216) bytes of fixed length (real mode),
or - - Up to 4G (232) bytes of variable length
(protected mode). - All code (programs) reside in a code segment.
13Real Mode Memory Addressing
- Used by the DOS operating system
- The only mode available on the 8086-8088
- 20 bit address bus ? 1 MB, 16 bit data bus, 16
bit registers - Real mode memory is the first 1M (220) bytes of
the memory system (real, conventional, DOS
memory) in later processors - Real mode 20-bit addresses are obtained by
combining a segment number (in a segment
register) and an offset address (in
another processor register) - The segment register address (16-bits) is
appended with a 0H or 00002 (or multiplied by 10H
or 16d) to form a 20-bit start of segment address
- Then the effective memory address (EA)
- this 20-bit segment start address the 16-bit
offset address in another processor register - For the 8086, segment length is fixed _at_ 216 64K
bytes (determined by the size of the offset
registers)
141 MB
(1 MB)
EA (Effective Address) of byte accessed
64 KB Segment
20-bit (5-byte) Physical Memory address
16-bit each
Appended 4 bits (0H)
Segment number In Segment Register
15Effective Address Calculations
- EA segment register (SR) x 10H offset
- (a) SR 1000H
- 10000 0023 10023
- (b) SR AAF0H
- AAF00 0134 AB034
- (c) SR 1200H
- 12000 FFF0 21FF0
Q Is 3FC81 a valid start address of a segment?
16Overlapping segments How to detect overlap?
Top of CS 090F0 FFFF 190EF
Code should be limited to only this portion of
the code segment, to avoid effects of segment
overlap
17Defaults
Convention Example EA CSIP
- Default segment numbers in
- CS for program (code)
- SS for stack
- DS for data
- ES for string (destination) data
- Default offset addresses that go with them
Offset Literal or in a CPU register
Segment number in Segment register
18Addressing Modes Summary
19Segmentation Pros and Cons
- Advantages
- Allows easy and efficient relocation of code and
data - To relocate code or data, only the number in the
relevant segment register needs to be changed - Consequences
- ?A program can be located anywhere in memory
without making any changes to it (addresses are
not absolute, but offsets relative to start of
segments) - ?Program writer needs not worry about actual
memory structure (map) of the computer used to
execute it - Disadvantages
- Complex hardware and for address generation
- Address computation delay for every memory access
- Software limitation Program size limited by
segment size (64KB with the 8086)
20Limitations of the above real mode segmentation
scheme
- Segment size is fixed at and limited to 64 KB
- Segment can not begin at an arbitrary memory
address - With 20-bit memory addressing, can only begin at
addresses starting with 0H, i.e. at 16 byte
intervals - ? Principle is difficult to apply with 80286 and
above, with segment registers remaining at
16-bits! - 80286 and above use 24, 32 bit addresses but
still 16-bit segment registers - No protection mechanisms Programs can overwrite
operating system code segments and corrupt them! - ? Use memory segmentation in the protected mode
Append 00H 0000H
21Protected Mode Segmentation
Primarily, what is needed
- Flexible definition of segment starting address
- Flexible definition of segment size
- Protection mechanisms that prevent programs from
corrupting the code and data of each other and of
the operating system
22Basic Segmentation in the Protected Mode
(Processor)
(Memory)
Address Translation
(Segment Register)
(Offset Register)
Segment Descriptor Table
Table
Access
Maximum Allowed Offset
Segment Descriptor
Segment Start Address
Scheme also checks for privileges and access
rights to prevent programs from corrupting other
programs or the operating system
Offset ? Seg number
23Protected Mode 80286 and above
- Domain of the Windows operating system
- 32-bit addressing 4G of memory with 2G for the
system and 2 G for the application - Protected mode still uses segment and offset
addresses, but - - Segment definition is through a more complex
selector/descriptor mechanism (greater
flexibility) - - Offset address 16-bit (286) or 32-bits (386
and above e.g. EIP register) - Descriptors are placed in descriptor tables in
main memory - Protection is provided by restricting access to
memory segments through - - Privilege levels,
- - and Access rights
24Descriptors specify memory segments
- Segment number (still in a 16-bit segment
register) defines the segment through a
selector/descriptor
(not directly as in real mode ? but more
flexibility) - 16 bits segment register 13 bit descriptor
selector 1 bit descriptor table selector
2-bit requested privilege
Segment Register, e.g. DS
How many segments can be defined in total?
(1 Table, Segments available to all tasks)
(1 table for each task, segments local to each
task)
213 8192
25- Each 8-byte segment descriptor entry in the table
contains - Base address (start address of segment) (size
mP address bus) - Limit (maximum offset, i.e. offset for the end
address of segment) (segment size 1 Limit) - Privilege level and access rights to this segment
- So a segment can start at any location have a
specified length.
8-byte Segment Descriptors
Instruction Mode 16/32 bits
Segment Availability
Contains 2-bit Descriptor Privilege Level
LSB
LSB
Max Limit lt Max offset
Max Limit Max offset
Base 3-byte ? 24 bit addressing Limit 2-byte
(16 bit) ? Seg. Size 1B-64 KB Note provision
for upward compatibility (286 software run on
higher processors)
Base 4-byte ? 32 bit addressing Limit 2
1/2-byte (20 bit) ? Size 1B-1MB With G (4 K
multiplier) bit 1 4KB-4GB
26Protected Mode 80386 and above (Pentium class)
- The base is a 32-bit address at which the memory
segment starts - The limit is a 20-bit number. When added to the
base, it addresses the last location in the
segment - The limit has a modifier bit called Granularity
(G). If G0 no change - If G1, append limit with FFFH, i.e. segment size
is multiplied by 4K - With limit specifying 1 MB segments and G1 (i.e.
4K multiplier) Max Segment size 4K x 1 MB
4 GB - With 16K segments like this, the system can
address 16K x 4 GB 64 TB (not necessarily all
will be in physical memory)
27- 80386 and above Example
- Descriptor has base 23000000H
- limit 012FFH
-
- With G 0
- Segment start 23000000H
- Segment end 23000000H 012FFH230012FFH
- Segment size 12FFH1H 1300H
- ( 19 x 256 bytes)
-
- With G 1 (? so actual limit 012FFFFFH)
(append
limit in descriptor by FFFH) - Segment start 23000000H
- Segment end 23000000H 012FFFFFH 242FFFFFH
- Segment size 12FFFFF1H 1300000H 212 x
1300H - 4K x 1300H
28Processor 80286
Protected Mode Segmentation Example
(in main memory)
24-bit Address
Because each descriptor in the table is 8 bytes
wide, Selector?000b is used as an offset from
GDT (or LDT) base address to point to the start
of the required segment descriptor
Always 0s for upward compatibility
Descriptor 2
Segment size Limit1 FF1 100H bytes
8-byte Segment Descriptor 1
Access Rights byte
Limit
Base
Offset
MSB
16-bit Segment Register
H
?000b
Descriptor 0
What is the RPL value?
GDT Base Address
What is the selector value? Are we using the
global or the local descriptor? table?
( segment )
29The Access Rights Byte 80286 higher
This is the access rights byte in the 8-byte
segment descriptor
80386 and higher have 4 more access rights bits
W /R
yet
E Not Code (0) or Code (1)
or stack
00 Highest Privilege
ED Expand Direction for the segment
Not Code Segment
Code Segment
DPL will be compared with the request privilege
level (RPL) in the segment register specifying
this segment. Allow access to the segment only
if RPL has higher or equal privilege to the DPL,
subject to the state of C bit if applicable
30Privilege Levels
00 Highest Privilege 01 10 11 Lowest Privilege
Hardware Privilege Comparator
RPL (In Seg Reg)
RPL ? DPL Allow Access to segment
DPL (in descriptor)
31Types of Descriptor Tables in memory
- One Global Descriptor Table (GDT) (64 KB Max)
(Start and Limit are cached in GDTR) - 1. Descriptors for all global segments (common
to all tasks) - (cached on the processor for the currently
used 6 segments- CS,GS) -
- 2. For each task
- ? Descriptor for the tasks task state segment
(TSS) in memory - The TSS holds all information about the task
e.g. processor registers, LDT selector, etc. - (descriptor is cached on the processor for the
currently running task- selected by TR register) -
- ? Descriptor for the tasks Local Descriptor
Table (LDT) in memory - The LDT holds descriptors for all local segments
for that task. - (descriptor is cached on the processor for the
currently running task- selected by LDTR
register) - One Interrupt Descriptor Table (IDT) (64 KB Max)
(Start, Limit cached in IDTR) - 8-byte descriptors called interrupt gates that
define the attributes and starting addresses of
the interrupt service routines for up to 256
hardware and software interrupts
32Types of Descriptor Tables in memory
Segment Register
Offset
i
Memory System Code, Data, Stack,
Extra Segments
16-bit Selectors
Segment i
Calculate Physical Address
Base, Limit, Access for Seg i
Addressed Byte
LDTR
8-byte Descriptors
For each Task
Base, Limit, Access for LDT j
One LDT table for each task
Interrupt Descriptor
number
for task 0
for task j
..
(table base address, limit)
33Program Invisible Registers (caches)
Invisible mP Registers (not seen by programmer)
In main memory
Segment Descriptor
Descriptor cache for the currently used 6 segments
Loaded from GDT or LDT tables In memory every
time the segment number changes
Visible mp registers
Segment Selector
Task Register
Cache for the task state segment (TSS) and the
descriptor of the local segment descriptor table
- for the currently executing task
Loaded from the GDT In memory every time the
task changes
Task Selector
Tasks TSS
LDT Selector
Tasks LDT
LDT Register
(for the GDT and IDT tables)
Global Descriptor Table (GDT) for segments,
tasks, and LDT
(16-bit)
GDT
GDT
(24 or 32 bits)
Descriptor Table for interrupts
LDT for a task is accessed through a descriptor
in the GDT
34Memory Management Paging 80386 and above
- The paging mechanism translates a logical
(virtual, linear) address generated by the
program to a physical (real) address that
accesses a storage location in memory - Address space consists of pages of bytes Virtual
pages physical pages (frames) of the same size
(e.g. 4K Bytes) - Translation is done from virtual to physical
pages - Physical pages may or may not reside in physical
memory Linear pages - If page is not in memory (page fault occurs), it
is brought into memory for use - Paging applies to both real and protected modes
in 80836 above - Paging can be enabled or disabled (using bit 31
of control register CR0) - If disabled, the address computed with
segmentation is the physical address - If enabled, paging operates on the virtual
address obtained with segmentation to provide
the physical address
35Example of 1-level Paging
Address Translation (in the Memory Management
Unit- MMU)
Logical Pages, Each 4KB
Physical frames (pages), Each 4KB
(Memory)
(From Processor)
Logical, Linear, Virtual, Programmer Address
32 bits
Same offset
Frames are physical pages
1-Level Paging
Physical Address
(To Memory)/ or Get from HD
Logical Page
- Page table maps logical page s to
corresponding physical frame s - Offset part is
the same for both
362-Level Paging
- 4 GB address space ? 1 M x 4K pages
- 1-level approach requires a single huge page
table (but maybe we do not need to do all
translations!) - 2-level paging uses several smaller page tables
(up to 1K tables), each providing translation for
1K pages. - Allows using only as many of the smaller page
tables as needed - For 2-level Paging 32-bit linear byte address
space (as generated by the
processor) is divided into three parts - Directory 10 bits, determines which page table
in the page table directory - Page table 10 bits, determines which page in
that page table - Memory offset 12 bits, determines which byte in
that page (same for both logical and physical
pages)
372-Level Memory Paging 80386 and above
32 bit linear byte address (232 4 G virtual
bytes)
1024 x 1024 1 M
Which byte in that page? (offset in page)
3
10
12 bits
10
1024
1
Addressed Physical byte
Which page table? (Page table )
Which page in that page table? (Page in table)
2
1K x 4 bytes 4KB (In memory)
offset
4 bytes
Logical
? table
offset
Start address of Physical Page
1K x 4 bytes (In memory)
4 K bytes
Page mapping Is done here (Linear ? Physical)
Page In Page Out
4 bytes
Start address of a page table
offset
1024 page entries per table
1024 page table Start addresses
Page table directory Base address
1024 x 1024 x 4 K 4 G Physical bytes
3880386 and abovePaging is controlled by four
control registers CR0-CR3 on the mp
(Pentium only)
Most significant 20 bits of the
?000H
table
32-bit start address of the page table directory
Linear address corresponding to the most recent
page fault
1 Paging 0 No Paging (address generated by
segmentation is considered physical address)
39Format for the linear address
(Page Table number)
(Page number in Table)
10 bits
12 bits
10 bits
Format for an entry in ? The page table
directory or ? a page table
- If page is not in memory
- A page fault interrupt
- occurs to bring it from
- mass storage
?000H
20 bits
in memory
Each entry is 32 bits i.e. 4 bytes
32-bit Start Address of ? A page table or ?
A physical page
Attribute bits for the entry page table or page
40Memory Paging 80386 and above
WK 2
1024 x 1024 1 M
32 bit linear address (from segmentation) (4 G of
virtual bytes)
Linear Page
Page Table
10
12 bits
10
Which page table in Directory?
Which byte in that page?
1024
Append
Addressed Physical byte
10 bits
?00b
Which page in that page table?
? 12-bit offset
For 1024 pages
Offset 12 bits
Append
10 bits
?00b
1 byte
20 bits
? 12-bit offset
000H Append
(In memory)
Physical Page
4 K byte page
4 bytes
Append 12 bits
20 bits
20 bits
12 bits
000H
Append
000H
1024 page entries
1024 page table entries
CR3
Base Address of page directory
000H
1024 x 1024 x 4 K 4 G Physical bytes
41Memory Paging Example
Linear ? Physical Linear ? Physical
Physical Memory Pages
?
Physical Page
Linear Page
C8FFF
4K Bytes
Physical Page 00110H
Physical Address
Base address for page table 0
Corresponding Physical Byte
,,,,,
?000H
Page 00110 (Physical)
000H
Base for table 2
Base for table 1
Start of Page Table 0
12 bits
20 bits
?000H
?000H
000H
4K Bytes
Base Address of Page Table Directory
320H
Table Directory
000H
?00b
Physical Byte
?00b
Linear (Logical) Byte Address
Page 00000 (Physical)
42- Memory space required to accommodate the page
directory and the page tables - ? To page the full linear address space of 4
GB - - Each page is 4KB, so we need to map (translate
address for) 1M pages - - 1M pages 1K tables x (1 K page translations/
table) - Page Tables 1K tables x (1K x 4) 4 MB 4096
KB - Page Table Directory 1K x 4
4 KB -
Total 4100)d KB - This is a considerable amount of memory
- So, some operating systems do not support paging
for the total memory space, e.g. Windows 3.1
pages only 16 MB (i.e. only 16M/4K
4K pages) - This requires only 4 page tables, occupying 4 x
(1K x 4B) 16 KB of memory. The page
directory table is 4 x 4B 16 B
43Speeding Up the Paging Mechanism
- Paging requires accessing the page table
directory and a page table (in main memory) to
generate the physical memory address of the
required memory location - This slows down memory access
- To speed this up, a fast associative mapping
cache memory is used to store the most recent
page address translations, which are also likely
to be accessed in the near future - The 80386 uses a 32-entry TLB (translation
look-aside buffer) for holding the physical
addresses of the most recently used 32 memory
pages
(Translation mapping from logical
to physical). Scenarios - 1. Hit in the TLB cache? Good, use the address
translation in TLB - Miss in the TLB?
- 2. Hit in the page tables Get translation from
page tables in memory and place it in TLB (e.g.
by replacing the least-recently-used existing
entry there) - 3. Miss in the page tables (page fault)?, Bring
page from mass storage into physical memory and
update both the page tables and the TLB
44TLB Operation Speeding up paging
Associative Search
1
Corresponding Frame
3
Add entry to TLB
Add entry to TLB
3
into memory
2, 3
2
Add entry to Page Table
3
Physical
We assume 1-Level paging for simplicity
Miss in page table also i.e.
Note Frame Physical Page
45Organizational Model of the Processor
- Functional aspects- how the processor actually
functions - Internal organization is determined
by functionality required
- Two main tasks for the microprocessor in a
system - Interface with external peripherals
- Execute instructions
External Buses
Control bus
Microprocessor-based System e.g. a microcomputer
Memory
I/O Devices
46The 8086 processor model (Organization)
- Early pipelining attempts
- Two main functional units
- - The Bus Interface Unit (BIU)
- - The Execution Unit (EU)
- The BIU generates memory and I/O addresses for
reading code and transferring data to/from the
processor - The EU takes code and data from the BIU, executes
the instructions, and stores results in the
general purpose registers - Pipelined architecture
- Two, hopefully independent operations, are
executed at the same time by two separate units - - Fetch by the BIU
- - Execute by the EU
47External mp busses
Control
The 8086 processor model
FIFO Instruction/Operand Queue
- Interfacing (BIU)
- Generate all timing control
- signals for reads, writes, etc.
- Synchronize data transfers
- With all system modules
- Execution (EU)
- Recognize, decode, and
- execute fetched program
- instructions
BIU fills it by fetches from memory
EU empties it by executing instructions
ALU
Has no direct interaction With external mp busses
BIU
EU
48The 8086 processor model
Non-pipelined ?
b. Start fetching at the correct target location
Wasted fetches after a Jump inst.
2. Fetch operand
Pipelined (8086)
c. Execute at last!
a. Oops! Turned out to be a Jump instruction!
1. Operand not in Queue
3. Execute
Fetch-Execute Overlap
- Wasted Fetches and Executes (inefficiency)
- RISC Modern architectures
- Reduce fetches from memory (operate mostly on
registers) - Speed up memory fetches (cache)
- Use small instructions (both in length and in
execution time) - Finer pipeline stages (super pipelined- 486)
- Multiple pipelines (superscalar- P5)
- Predict how the jump will go
- Common Scenarios that cause pipeline
inefficiency - Operand is not in queue
- Jump or branch instructions
- Long-executing instructions
- e.g. 83 clock cycles for execution vs. 4 cycles
for a fetch. - BIU fills the buffer and waits idly!
49Evolution of the 80X86 Intel Processors
0.251
2-stage pipelining
1 execution unit
0.51
Super pipelining
5-stage pipelining
Super scalar
2 execution units
5-stage pipelining
I n c r e a s e
Pentium Pro, Pentium II, III
3 execution units
12-stage pipelining
Multi-core Architecture e.g. Itanium 2
50Super pipelining
EU
Example 486 Processor Model Colored units are
new additions
Example 486 Processor Model
51Chapter 2 Summary
- Described the function and purpose of
program-visible registers (mP Programming model) - Described the Flags register and the purpose of
each flag bit - Described how memory is accessed using
segmentation, both in the real mode and the
protected mode - Described the program-invisible registers
- Described the structures and operation of the
memory paging mechanism - Described the organizational model of the 8086 mP
- Reviewed the evolution of the 80X86 architecture