Title: Prelude to Multiprocessing
1Prelude to Multiprocessing
- Detecting cpu and system-board capabilities with
CPUID and the MP Configuration Table
2CPUID
- Recent Intel processors provide a cpuid
instruction (opcode 0x0F, 0xA2) to assist
software in detecting a CPUs capabilities - If its implemented, this instruction can be
executed in any of the processor modes, and at
any of its four privilege levels - But this cpuid instruction might not be
implemented (e.g., 8086, 80286, 80386)
3Intel x86 EFLAGS register
31
16
21
0
0
0
0
0
0
0
0
0
0
I D
V I P
V I F
A C
V M
R F
15
0
0
N T
IOPL
O F
D F
I F
T F
S F
Z F
0
A F
0
P F
1
C F
Software can toggle the ID-bit (bit 21) in the
32-bit EFLAGS register if the processor is
capable of executing the cpuid instruction
4But what if theres no EFLAGS?
- The early Intel processors (8086, 80286) did not
implement any 32-bit registers - The FLAGS register was only 16-bits wide
- So there was no ID-bit that software could try to
toggle (to see if cpuid existed) - How can software be sure that the 32-bit EFLAGS
register exists within the CPU?
5Detecting 32-bit processors
- Theres a subtle difference in the way the
logical shift/rotate instructions work when
register CL contains the shift-factor - On the 32-bit processors (e.g., 80386) the value
in CL is truncated to 5-bits, but not so on the
16-bit CPUs (8086, 80286) - Software can exploit this distinction, in order
to tell if EFLAGS is implemented
6Detecting EFLAGS
- Heres a test for the presence of EFLAGS
- mov -1, ax a nonzero value
- mov 32, cl shift-factor of 32
- shl cl, ax do logical shift
- or ax, ax test result in AX
- jnz is32bit EFLAGS present
- jmp is16bit EFLAGS absent
7Testing for ID-bit toggle
- Heres a test for the presence of the CPUID
instruction - pushfl copy EFLAGS contents
- pop eax to accumulator register
- mov eax, edx save a duplicate image
- btc 21, eax toggle the ID-bit (bit 21)
- push eax copy revised contents
- popfl back into EFLAGS
- pushfl copy EFLAGS contents
- pop eax back into accumulator
- xor edx, eax do XOR with prior value
- bt 21, eax did ID-bit get toggled?
- jc y_cpuid yes, can execute cpuid
- jmp n_cpuid else cpuid unimplemented
8How does CPUID work?
- Step 1 load value 0 into register EAX
- Step 2 execute cpuid instruction
- Step 3 Verify GenuineIntel character- string
in registers (EBX,EDX,ECX) - Step 4 Find maximum CPUID input-value in the
EAX register
9Version and Features
- load 1 into EAX and execute CPUID
- Processor model and stepping information is
returned in register EAX
- 20 19 16 13 12 11
8 7 4 3 0
Extended Family ID
Extended Model ID
Type
Family ID
Model
Stepping ID
10Some Feature Flags in EDX
28
H T T
9
3
1
2
0
13
P G E
A P I C
P S E
D E
V M E
F P U
HTT HyperThreading Technology (1 yes, 0
no) PGE Page Global Entries (1yes, 0no) APIC
Advanced Programmable Interrupt Controller
on-chip (1 yes,0 no) PSE Page-Size
Extensions (1 yes, 0 no) DE Debugging
Extensions (1yes, 0no) VME Virtual-8086 Mode
Enhancements (1 yes, 0 no) FPU
Floating-Point Unit on-chil (1yes, 0no)
11Some Feature Flags in ECX
5
V M X
VMX Virtual Machine Extensions (1 yes, 0 no)
12Multiprocessor Specification
- Its an industry standard, allowing OS software
to use multiple processors in a uniform way - OS software searches in three regions of the
physical address-space below 1-megabyte for a
paragraph-aligned data-structure of length
16-bytes called the MP Floating Pointer
Structure - Search in lowest KB of Extended Bios Data Area
- Search in topmost KB of conventional 640K RAM
- Search in the 128KB ROM-BIOS (0xE0000-0xFFFFF)
13MP Floating Pointer Structure
- This structure may contain an ID-number for one a
small number of standard SMP system
architectures, or may contain the memory address
for a more extensive MP Configuration Table
having entries that specify a customized system
architecture - The machines in our classroom employ the latter
of these two options
14An example record
- The MP Configuration Table will contain a record
for each logical processor
reserved (0)
reserved (0)
Feature Flags
CPU signature (stepping, model, family)
CPU Flags BP (bit 1), EN (bit 0)
Local-APIC version
Local-APIC ID
Entry Type 0
BP Bootstrap Processor (1yes, 0no), EN
Enabled (1yes, 0no)
15Our mpinfo.cpp utility
- We created a Linux utility that will display the
system-information contained in the MP
Configuration Table (in hex format) - You can refer to the MP Specification 1.4
document (online) to interpret this display - This utility needs a device-driver dram.c to be
pre-installed (in order that it be able to
directly access the systems memory)
16A processors Local-APIC
- The purpose of each processors APIC is to allow
the CPUs in a multiprocessor system to send
messages to one another and to manage the
delivery of the interrupt-requests from the
various peripheral devices to one (or more) of
the CPUs in a dynamically programmable way - Each processors Local-APIC has a variety of
registers, all memory mapped to
paragraph-aligned addresses within the 4KB page
at physical-address 0xFEE00000
17Local-APICs register-space
APIC
0xFEE00000
4GB physical address-space
RAM
0x00000000
18Analogies with the PIC
- Among the registers in a Local-APIC are these
(which had analogues in the older 8259 PICs
design - IRR Interrupt Request Register (256-bits)
- ISR In-Service Register (256-bits)
- TMR Trigger-Mode Register (256-bits)
- For each of these, its 256-bits are divided among
eight 32-bit register addresses
19New way to do EOI
- Instead of using a special End-Of-Interrupt
command-byte, the Local-APIC contains a dedicated
write-only register (named the EOI Register)
which an Interrupt Handler writes to when it is
ready to signal an EOI
issuing EOI to the Local-APIC mov 0xFEE00000,
ebx address of the cpus Local-APIC movl 0,
fs0xB0(ebx) write any value into EOI
register Here we assume segment-register FS
holds the selector for a segment-descriptor for
a writable 4GB-size expand-up data-segment
whose base-address equals 0
20Each CPU has its own timer!
- Four of the Local-APIC registers are used to
implement a programmable timer - It can privately deliver a periodic interrupt (or
one-shot interrupt) just to its own CPU - 0xFEE00320 Timer Vector register
- 0xFEE00380 Initial Count register
- 0xFEE00390 Current Count register
- 0xFEE003E0 Divider Configuration register
21Timers Local Vector Table
0xFEE00320
7 0
12
16
17
Interrupt ID-number
M O D E
M A S K
B U S Y
MODE 0one-shot 1periodic
MASK 0unmasked 1masked
BUSY 0not busy 1busy
22Timers Divide-Configuration
0xFEE003E0
3 2 1 0
reserved (0)
0
Divider-Value field (bits 3, 1, and 0) 000
divide by 2 001 divide by 4 010 divide by
8 011 divide by 16 100 divide by 32 101
divide by 64 110 divide by 128 111 divide
by 1
23Initial and Current Counts
0xFEE00380
Initial Count Register (read/write)
0xFEE00390
Current Count Register (read-only)
When the timer is programmed for periodic mode,
the Current Count is automatically reloaded from
the Initial Count register, then counts down
with each CPU bus-cycle, generating an interrupt
when it reaches zero
24Using the timers interrupts
- Setup your desired Initial Count value
- Select your desired Divide Configuration
- Setup the APIC-timers LVT register with your
desired interrupt-ID number and counting mode
(periodic or one-shot), and clear the LVT
registers Mask bit to initiate the automatic
countdown operation
25In-class exercise 1
- Run the cpuid.cpp Linux application (on our
course website) to see if the CPUs in our
classroom implement HyperThreading (i.e.,
multiple logical processors in a cpu) - Then run the mpinfo.cpp application, to see if
the MP Base Configuration Table has entries for
more than one processor - If both results hold true, then we can write our
own multiprocessing software in H235!
26In-class exercise 2
- Run the apictick.s demo (on our CS 630 website)
to observe the APICs periodic
interrupt-handler drawing Ts onscreen - It executes for ten-milliseconds (the 8254 is
used here to create that timed delay) - Try reprogramming the APICs Divider
Configuration register, to cut the interrupt
frequency in half (or perhaps to double it)