Title: Intel
1Intels cmpxchg instruction
- How does the Linux kernels cmos_lock mechanism
work?
2Review of the i386 registers
CS
EAX
DS
EBX
ES
ECX
FS
EDX
GS
ESI
SS
EDI
Segment Registers (16-bits)
EBP
ESP
EIP
General Registers (32-bits)
EFLAGS
Program Control and Status Registers (32 bits)
3The x86 system registers
CR0
DR0
CR1
DR1
CR2
DR2
CR3
DR3
CR4
DR4
CR5
DR5
CR6
DR6
(16-bits)
CR7
DR7
LDTR
Control Registers (32-bits)
Debug Registers (32-bits)
TR
GDTR
means unimplemented
IDTR
(48-bits)
4How often is cmpxchg used?
cat vmlinux.asm grep cmpxchg c01046de
f0 0f b1 15 3c 99 30 lock cmpxchg
edx,0xc030993c c0105591 f0 0f b1 15 3c 99
30 lock cmpxchg edx,0xc030993c c01055d9
f0 0f b1 15 3c 99 30 lock cmpxchg
edx,0xc030993c c010b895 f0 0f b1 11
lock cmpxchg edx,(ecx) c010b949 f0
0f b1 0b lock cmpxchg
ecx,(ebx) c0129a9f f0 0f b1 0b
lock cmpxchg ecx,(ebx) c0129acf f0 0f
b1 0b lock cmpxchg ecx,(ebx) c012d37
7 f0 0f b1 0e lock cmpxchg
ecx,(esi) c012d41a f0 0f b1 0e
lock cmpxchg ecx,(esi) c012d968 f0 0f
b1 16 lock cmpxchg edx,(esi) c012e56
8 f0 0f b1 2e lock cmpxchg
ebp,(esi) c012e57a f0 0f b1 2e
lock cmpxchg ebp,(esi) c012e58a f0 0f
b1 2e lock cmpxchg ebp,(esi) c012e83
f f0 0f b1 13 lock cmpxchg
edx,(ebx) c012e931 f0 0f b1 0a
lock cmpxchg ecx,(edx) c012ea94 f0 0f
b1 11 lock cmpxchg edx,(ecx) c012ecf
4 f0 0f b1 13 lock cmpxchg
edx,(ebx) c012f08e f0 0f b1 4b 18
lock cmpxchg ecx,0x18(ebx) c012f163 f0
0f b1 11 lock cmpxchg
edx,(ecx) c013cb60 f0 0f b1 0e
lock cmpxchg ecx,(esi) c0148b3c f0 0f
b1 29 lock cmpxchg ebp,(ecx) c0150d0
f f0 0f b1 3b lock cmpxchg
edi,(ebx) c0150d87 f0 0f b1 31
lock cmpxchg esi,(ecx) c0199c5e f0 0f
b1 0b lock cmpxchg ecx,(ebx) c024b06
f f0 0f b1 0b lock cmpxchg
ecx,(ebx) c024b2fe f0 0f b1 51 18
lock cmpxchg edx,0x18(ecx) c024b321 f0
0f b1 51 18 lock cmpxchg
edx,0x18(ecx) c024b34b f0 0f b1 4b 18
lock cmpxchg ecx,0x18(ebx) c024b960
f0 0f b1 53 18 lock cmpxchg
edx,0x18(ebx)
Heres the occurrence that we studied in the
rtc_cmos_read() kernel-function
plus 28 other times!
5Intels documentation
- You can find out what any of the Intel x86
instructions does by consulting the official
software developers manual, online
at http//www.intel.com/products/processor/manual
s/index.htm - Our course-webpage has a link to this site that
you can just click (under Resources) - The instruction-set reference is two parts
- Volume 2A for opcodes A through M
- Volume 2B for opcodes N through Z
6Example cmpxchg
- Operation of the cmpxchg instruction is
described (on 3 pages) in Volume 2A - Theres an English-sentence description, and also
a description in pseudo-code - You probably do not want to print out this
complete volume (.pdf) over 700 pages! - (You could order a printed copy from Intel)
7Instruction format
- Intels assembly language syntax differs from the
GNU/Linux syntax (known as ATT syntax with
roots in UNIX history) - When ATT syntax is used, the cmpxchg
instruction has this layout lock cmpxchg
reg, reg/mem
mnemonic opcode
source operand
destination operand
optional prefix (used for SMP)
8effects and affects
- According to Intels manual, the cmpxchg
instruction also uses two implicit operands
(i.e., operands not mentioned in the instruction)
- The CPUs accumulator register
- The CPUs EFLAGS register
- The accumulator-register (EAX) is both a
source-operand and a destination-operand - The six status-bits in the EFLAGS register will
get modified, as a side-effect this instruction
9cmpxchg description
- This instruction compares the accumulator with
the destination-operand (so the ZF-bit in EFLAGS
gets assigned accordingly) - Then
- If (accumulator destination)
- ZF ? 1 destination ? source
- If (accumulator ! destination)
- ZF ? 0 accumulator ? destination
10An instruction-instance
- In our recent disassembly of Linuxs kernel
function rtc_cmos_read(), this cmpxchg
instruction-instance was used lock cmp
xchg edx, cmos_lock
prefix opcode
source-operand destination-operand
Note Keep in mind that the accumulator eax will
affect what happens! So we need to consider this
instruction within its surrounding context
11The complete function
c0105574 ltrtc_cmos_readgt c0105574 53
push ebx c0105575 9c
pushf c0105576 5b
pop ebx c0105577 fa
cli c0105578 64 8b 15
08 20 30 c0 mov fs0xc0302008,edx c010557f
0f b6 c8 movzbl
al,ecx c0105582 42
inc edx c0105583 c1 e2 08
shl 0x8,edx c0105586 09 ca
or ecx,edx c0105588 a1 3c
99 30 c0 mov 0xc030993c,eax c010558d
85 c0 test
eax,eax c010558f 75 f7
jne c0105588 ltrtc_cmos_read0x14gt c0105591
f0 0f b1 15 3c 99 30 lock cmpxchg
edx,0xc030993c c0105598 c0 c0105599
85 c0 test
eax,eax c010559b 75 eb
jne c0105588 ltrtc_cmos_read0x14gt c010559d
88 c8 mov
cl,al c010559f e6 70
out al,0x70 c01055a1 e6 80
out al,0x80 c01055a3 e4 71
in 0x71,al c01055a5
e6 80 out al,0x80 c01055a7
c7 05 3c 99 30 c0 00 movl
0x0,0xc030993c c01055ae 00 00
00 c01055b1 53 push
ebx c01055b2 9d
popf c01055b3 0f b6 c0
movzbl al,eax c01055b6 5b
pop ebx c01055b7 c3
ret
12The preparation steps
- The instructions that preceed cmpxchg will
setup register EDX (source operand) and register
EAX (the x86 accumulator) - Several instructions are used to set up a value
in EDX, and result in this layout
31
8 7
0
The current processors value for per_cpu__cpu_nu
mber plus 1
CMOS registers index
EDX
this might be zero
but this part is guaranteed to be non-zero!
13The cmos_lock variable
- This global variable is initialized to zero,
meaning that access to CMOS memory locations is
not currently locked - If some CPU stores a non-zero value in this
variables memory-location, it means that access
to CMOS memory is locked - The kernel needs to insure that only one CPU at a
time can set this lock
14The most likely senario
- One of the CPUs wishes to access CMOS memory so
it needs to test cmos_lock to be sure that
access is now unlocked (i.e., cmos_lock 0
is true) - The CPU copies the cmos_lock variable into the
EAX, where it can then be tested using the test
eax, eax instruction - A conditional-jump follows the test
15The busy-wait loop
Here is a busy-wait loop, used to wait for
the CMOS access to be unlocked
spin mov cmos_lock, eax copy lock-variable
to accumulator test eax, eax was CMOS
access unlocked? jnz spin if it wasnt,
then check it again A CPU will fall through
to here if unlocked access was detected,
and that CPU will now attempt to set the lock
in other words, it will try to assign a
non-zero value to the cmos_lock variable.
But theres a potential race here the
cmos_lock might have been zero when it was
copied, but it could have been changed by now
and thats why we need to execute lock
cmpxchg at this point
16Busy-waiting will be brief
spin see if the lock-variable is clear mov
cmos_lock, eax test eax, eax
jnz spin ok, now we try to grab the lock
lock cmpxchg edx, cmos_lock did
another CPU grab it first? test eax,
eax jnz spin
If our CPU wins the race, the (non-zero) value
from source-operand EDX will have been stored
into the (previously zero) cmos_lock
memory-location, but the (previously zero)
accumulator EAX will not have been modified
hence our CPU will not jump back, but will fall
through and execute the critical section
of code (just a few instructions), then will
promptly clear the cmos_lock variable.
17The less likely case
spin see if the lock-variable is clear mov
cmos_lock, eax test eax, eax
jnz spin ok, now we try to grab the lock
lock cmpxchg edx, cmos_lock did
another CPU grab it first? test eax,
eax jnz spin
If our CPU loses the race, because another CPU
changed cmos_lock to some non-zero value after
we had fetched our copy of it, then the (now
non-zero) value from the cmos_lock
destination-operand will have been copied into
EAX, and so the final conditional-jump shown
above will take our CPU back into the
spin-loop, where it will resume busy-waiting
until the winner of the race clears cmos_lock.
18 flowchart
Setup nonzero value in EDX
start
no
EAX is zero?
EAX ? cmos_lock
yes
EAX equals cmos_lock ?
yes
no
ZF ? 1 cmos_lock ? EDX
ZF ? 0 EAX ? cmos_lock
EAX is zero?
no
yes
critical section
cmos_lock ? 0
finish
19btr/bts versus cmpxchg
- In an earlier lesson we used the btr/bts
instructions to achieve mutual exclusion,
whereas Linux uses cmpxchg to do that - We think btr/bts is easier to understand, so
why do you think the Linux developers would
prefer to use cmpxchg instead?
ltallow some class discussion heregt
20In-class exercise 1
- Was it really necessary to insert a second test
eax, eax following cmpxchg? - Can you design a simple LKM that would verify
your answer to that question?
21EFLAGS
- The Intel documentation does not state precisely
how other EFLAGS status-bits (besides ZF) are
affected by cmpxchg, only that they reflect the
comparison of accumulator and destination
operands - Usually the CPU implements comparison-
of-operands by performing a subtraction
31
11 10 9
8 7 6 5 4 3 2 1 0
C F
1
P F
0
A F
0
Z F
S F
T F
I F
D F
O F
IOPL
N T
R F
V M
A C
22In-class exercise 2
- Can you decide what Intel means by the
comparison operation, by writing suitable code
that examines the effect on EFLAGS of cmpxchg
opnd1, opnd2 and these two plausable
alternatives - cmp opnd1, opnd2
- cmp opnd2, opnd1