Multiprocessor Initialization presentation

About This Presentation

Transcript and Presenter's Notes

Title: Multiprocessor Initialization

1
Multiprocessor Initialization

An introduction to the use of Interprocessor
Interrupts

2
Multiprocessor topology
Back Side Bus
Local APIC
Local APIC
IO APIC
CPU 0
CPU 1
Front Side Bus
peripheral devices
system memory
bridge
3
The Local-APIC ID register
31
24
0
reserved
APIC ID
This register is initially zero, but its APIC ID
Field (8-bits) is programmed by the BIOS during
system startup with a unique processor
identification- number which subsequently is used
when specifying the processor as a recipient of
inter-processor interrupts.
Memory-Mapped Register-Address 0xFEE00020
4
The Local-APIC EOI register
31
0
write-only register
This write-only register is used by Interrupt
Service Routines to issue an End-Of-Interrupt
command to the Local-APIC. Any value written to
this register will be interpreted by the
Local-APIC as an EOI command. The value stored
in this register is initially zero (and it will
remain unchanged).
Memory-Mapped Register-Address 0xFEE000B0
5
The Spurious Interrupt register
31
7
0
8
reserved
spurious vector
E N
Local-APIC is Enabled (1yes, 0no)
This register is used to Enable/Disable the
functioning of the Local-APIC, and when enabled,
to specify the interrupt-vector number to be
delivered to the processor in case the Local-APIC
generates a spurious interrupt. (In some
processor-models, the vectors lowest 4-bits are
hardwired 1s.)
Memory-Mapped Register-Address 0xFEE000F0
6
Interrupt Command Register

Each Pentiums Local-APIC has a 64-bit Interrupt
Command Register
It can be programmed by system software to
transmit messages (via the Back Side Bus) to one
or several other processors
Each processor has a unique identification number
in its APIC Local-ID Register that can be used
for directing messages to it

7
ICR (upper 32-bits)
31
24
0
reserved
Destination field
The Destination Field (8-bits) can be used to
specify which processor (or group of processors)
will receive the message
Memory-Mapped Register-Address 0xFEE00310
8
ICR (lower 32-bits)
15
31
19 18
0
7
10 8
12
Vector field
R / O
Delivery Mode 000 Fixed 001 Lowest
Priority 010 SMI 011 (reserved) 100
NMI 101 INIT 110 Start Up 111
(reserved)
Destination Shorthand 00 no shorthand 01
only to self 10 all including self 11 all
excluding self
Trigger Mode 0 Edge 1 Level
Level 0 De-assert 1 Assert
Destination Mode 0 Physical 1 Logical
Delivery Status 0 Idle 1 Pending
Memory-Mapped Register-Address 0xFEE00300
9
MP initialization protocol

Set shared processor-counter equal to 1
Step 1 issue an INIT IPI to all-except-self
Delay for 10 milliseconds
Step 2 issue Startup IPI to all-except-self
Delay for 200 microseconds
Step 3 issue Startup IPI to all-except-self
Delay for 200 microseconds
Check the value of the processor-counter

10
Issue INIT IPI

address Local-APIC via register FS
mov sel_fs, ax
mov ax, fs
broadcast INIT IPI to all-except-self
mov 0x000C4500, eax
mov eax, fs0xFEE00300)
.B0 btl 12, fs(0xFEE00300)
jc .B0

11
Issue Startup IPI

broadcast Startup IPI to all-except-self
using vector 0x11 to specify entry-point
at real memory-address 0x00011000
mov 0x000C4611, eax
mov eax, fs(0xFEE00300)
.B1 btl 12, fs(0xFEE00300)
jc .B1

12
Timing delays

Intels MP Initialization Protocol specifies the
use of some timing-delays
10 milliseconds ( 10,000 microseconds)
200 microseconds
We can use the 8254 Timers Channel 2 for
implementing these timed delays, by programming
it for one-shot countdown mode, then polling
bit 5 at i/o port 0x61

13
Mathematical examples
EXAMPLE 1 Delaying for 10-milliseconds means
delaying for 1/100-th of a second (because 100
times 10 milliseconds one-thousand milliseconds)
EXAMPLE 2 Delaying for 200-microseconds means
delaying 1/5000-th of a second (because 5000
times 200 microseconds one-million microseconds)
GENERAL PRINCIPLE Delaying for
xmicroseconds means delaying for 1000000/x
seconds (because 1000000/x times x-microseconds
one-million microseconds)

14
Mathematical theory
PROBLEM Given the desired delay-time in
microseconds, express the desired delay-time in
clock-frequency pulses and program that number
into the PITs Latch-Register
RECALL Clock-Frequency-in-Seconds 1193182
Hertz
ALSO One second equals one-million microseconds
APPLYING DIMENSIONAL ANALYSIS
Pulses-Per-Microsecond Pulses-Per-Second /
Microseconds-Per-Second
Delay-in-Clock-Pulses Delay-in-Microseconds
Pulses-Per-Microsecond
CONCLUSION
For a desired time-delay of x microseconds, the
number of clock-pulses may be computed as x
(1193182 /1000000) 1193182 / (1000000 / x ) as
dividing by a fraction amounts to multiplying by
that fractions reciprocal
15
Delaying for EAX microseconds

We use the 8254 Timer/Counter Channel 2 to
generate a
timed delay (expressed in microseconds by value
in EAX)
mov eax, ecx copy delay-time to ECX
mov 1000000, eax microseconds-per-sec
xor edx, edx extended to quadword
div ecx perform dword division
mov eax, ecx copy quotient into ECX
mov 1193182, ecx input-pulses-per-sec
xor edx, edx extended to quadword
div ecx perform dword division
now transfer the quotient from AX to the
Channel 2 Latch

16
Mutual Exclusion

Shared variables must not be modified by more
than one processor at a time (mutual exclusion)
The Pentiums lock prefix helps enforce this
Example every processor adds 1 to count
lock
incl (count)
Example all processors needs private stacks
mov 0x1000, ax
lock
xadd new_SS, ax
mov ax, ss

17
ROM-BIOS isnt reentrant

The video service-functions in ROM-BIOS that we
use to display a message-string at the current
cursor-location (and afterward advance the
cursor) modify global storage locations (as well
as i/o ports), and hence must be called by one
processor at a time
A shared memory-variable (called mutex) is used
to enforce this mutual exclusion

18
Implementing a spinlock

mutex .word 1
spin btw 0, mutex
jnc spin
lock
btrw 0, mutex
jnc spin
ltCRITICAL SECTION OF CODE GOES HEREgt
lock
btsw 0, mutex

19
Demo smphello.s

Each CPU needs to access its Local-APIC
The BSP (Boot-Strap Processor) wakes up other
processors by broadcasting the INIT-SIPI-SIPI
message-sequence
Each AP (Application Processor) starts
executing at a 4K page-boundary, and needs its
own private stack-area
Shared variables need exclusive access

20
In-class exercise

Include this procedure that multiple CPUs will
execute simultaneously (without lock)
total .word 0 the shared variable
add_one_thousand
mov 1000, cx
nxinc addw 1, (total)
loop nxinc
ret

21
We may need a barrier

We can use a software construct (known as a
barrier) to stop CPUs from entering a block of
code until a prescribed number of them are all
ready to enter it together
arrived .word 0 shared variable
barrier lock
incw (arrived)
await cmpw 2, (arrived)
jb await
call add_one_thouand

Write a Comment

User Comments (0)

About PowerShow.com

Multiprocessor Initialization PowerPoint PPT Presentation