Title: A race-cure case study
1A race-cure case study
- A look at how some standard software tools can
illuminate what is happening inside Linux
2Our recent race example
- Our cmosram.c device-driver included a race
condition in its read() and write()
functions, since accessing any CMOS
memory-location is a two-step operation, and thus
is a critical section in our code - outb( reg_id, 0x70 )
- datum inb( 0x71 )
- Once the first step in this sequence is taken,
the second step needs to follow
3No interventions!
- To guarantee the integrity of each access to CMOS
memory, we must prohibit every possibility that
another control-thread may intervene and access
that same i/o-port - The main ways in which an intervention by another
thread might happen are - The current CPU could get interrupted or
- Another CPU could access the same i/o-port
4Linuxs solution
- Linux provides a function that an LKM can call
which is designed to insure exclusive access to
a CMOS memory-location - datum rtc_cmos_read( reg_id )
- By using this function, a programmer does not
have to expend time and mental effort analyzing
the race-condition and devising a suitable cure
for it
5But how does it work?
- As computer science students, we are not
satisfied with just using convenient black-box
solutions which we dont understand - Such purported solutions may not always
accomplish everything that they claim if they
perform correctly today, they still may fail in
some way in the future (if hardware changes) we
dont want to be helpless!
6Is open source enough?
- In theory we could try to track down the actual
behavior of the rtc_cmos_read() function, by
reading Linuxs source-code - But is that really a practical approach?
- In some cases the answer might be yes, but in
other situations it might be no! - Life is short, and the kernel source-files are
very numerous with many layers
7LXR can help
- The Linux Cross-Reference tool offers a way to
automate searching kernel source - This tool is online (see our websites link under
Resources) and it is hosted on a server in
Norway - http//lxr.linux.no/
- Here you just click on Browse the Code
8From ltarch/i386/kernel/time.cgt
- unsigned char rtc_cmos_read(unsigned char addr)
-
- unsigned char val
- lock_cmos_prefix( addr )
- outb_p( addr, RTC_PORT(0) )
- val inb_p( RTC_PORT(1)
- lock_cmos_suffix( addr )
- return val
-
- EXPORT_SYMBOL( rtc_cmos_read )
9Another approach
- There is an alternative to searching kernel
source files -- which may well be faster - We can use some standard command-line tools,
including objdump and grep - In this approach, we look at the compiled
kernels object-file, named vmlinux, found
normally in the /usr/src/linux subdirectory - Using objdump that file can be parsed!
10objdump can disassemble
- Change the current working directory
- cd /usr/src/linux
- Then, to disassemble the vmlinux kernel file we
use can this command - objdump -d vmlinux
- But the amount of output will be huge, so its
hard to find the part were interested in
11grep can do filtering
- If we want to see the rtc_cmos_read code we
could use grep to eliminate irrelevant parts of
the disassembly-output - objdump d vmlinux grep rtc_cmos_read
- But we still see too many lines of output
(because the rtc_cmos_read() function gets
called at many places in the kernel)
12System.map
- We can use a special textfile, located in the
/boot directory, which tells us where each
exported kernel-symbol will reside at run-time
in the virtual address-space - You can use cat to look at this textfile
- cat /boot/System.map
- And you can use grep to find only the symbol
you care about - cat /boot/System.map grep rtc_cmos_read
13Example on our machines
- cat /boot/System.map-2.6.22.5cslabs grep
rtc_cmos_read - c0105574 T rtc_cmos_read
- c029b8a8 r __ksymtab_rtc_cmos_read
- c02a0bff r __kstrtab_rtc_cmos_read
- Note that the usual symbolic link is missing
from the /boot directory - on our class and lab machines -- so you have to
type a longer name - With superuser privileges this could be fixed
using the ln command - root ln System.map-2.6.22.5cslabs
System.map
14Now we know where to look
- From the System.map we learn where in the
kernel our rtc_cmos_read() function will reside - We can extract that functions code, for study
purpose, using these steps - Save the complete vmlinux disassembly
- Use grep to find its starting-address
- Use vi to delete earlier and later instructions
15- Step 1 saving the vmlinux disassembly
- objdump d /usr/src/linux/vmlinux gt
/vmlinux.asm - Step 2 finding our functions entry-point
- cat /vmlinux.asm grep -n c0105574
16What we discover
- Find the line that shows this virtual address
(with colon) -
- cat vmlinux.asm grep -n c0105574
- 6812c0105574 53 push ebx
and tell us which line-number its on
OK, heres that line
and this is its line-number
17Use a text-editor
- Remove all the lines in your vmlinux.asm
textfile whose line-numbers precede 6812 - Scroll down, to find where your function ends
(i.e., find its return-instruction ret) - c01055b7 c3 ret
- Delete all the lines that follow the return
18The complete function
c0105574 ltrtc_cmos_readgt c0105574 53
push ebx c0105575 9c
pushf c0105576 5b
pop ebx c0105577 fa
cli c0105578 64 8b 15
08 20 30 c0 mov fs0xc0302008,edx c010557f
0f b6 c8 movzbl
al,ecx c0105582 42
inc edx c0105583 c1 e2 08
shl 0x8,edx c0105586 09 ca
or ecx,edx c0105588 a1 3c
99 30 c0 mov 0xc030993c,eax c010558d
85 c0 test
eax,eax c010558f 75 f7
jne c0105588 ltrtc_cmos_read0x14gt c0105591
f0 0f b1 15 3c 99 30 lock cmpxchg
edx,0xc030993c c0105598 c0 c0105599
85 c0 test
eax,eax c010559b 75 eb
jne c0105588 ltrtc_cmos_read0x14gt c010559d
88 c8 mov
cl,al c010559f e6 70
out al,0x70 c01055a1 e6 80
out al,0x80 c01055a3 e4 71
in 0x71,al c01055a5
e6 80 out al,0x80 c01055a7
c7 05 3c 99 30 c0 00 movl
0x0,0xc030993c c01055ae 00 00
00 c01055b1 53 push
ebx c01055b2 9d
popf c01055b3 0f b6 c0
movzbl al,eax c01055b6 5b
pop ebx c01055b7 c3
ret
19Some magic numbers
- There are some hexadecimal constants in this
code-disassembly which we probably will not
understand without more research - This memory-address 0xc030993c
- This i/o-port address 0x80
- This memory-address fs0xc0302008
- Theres also a jump-target, but we do have some
help in deciphering what it means - jne c0105588 ltrtc_cmos_read0x14gt
20The cmpxchg instruction
- The cmpxchg instruction performs these CPU
actions in a single operation - cmpxchg source, destination
- The destination-operand is compared with the
accumulator-registers value, and the eflags-bits
are adjusted to reflect this comparisons result - If ZF is set, the value of the source-operand is
copied to the destination-operand otherwise, the
destination operand is copied to the accumulator
register - A lock prefix stops another CPUs bus-access
21spinlock
Before the codes critical section we have
this
c0105588 a1 3c 99 30 c0 mov
0xc030993c,eax c010558d 85 c0
test eax,eax c010558f 75 f7
jne c0105588 ltrtc_cmos_read0x14gt
c0105591 f0 0f b1 15 3c 99 30 lock
cmpxchg edx,0xc030993c c0105598
c0 c0105599 85 c0 test
eax,eax c010559b 75 eb
jne c0105588 ltrtc_cmos_read0x14gt
Then we have the functions critical section of
code
c010559d 88 c8 mov
cl,al c010559f e6 70
out al,0x70 c01055a1 e6 80
out al,0x80 c01055a3 e4 71
in 0x71,al c01055a5 e6
80 out al,0x80
I/O-port 0x80 has an undefined system
function used for time-delay
And then after the codes critical section we
have this
c01055a7 c7 05 3c 99 30 c0 00 movl
0x0,0xc030993c
22The System-map again
- The System.map shows what the other mysterious
memory-addresses mean - We see that memory-address c030993c has the label
cmos_lock (supporting our previous conclusion
about a spinlock) also we get a clue about
0xc0302008
cat /boot/System.map-2.6.22.5cslabs grep
c030993c c030993c B cmos_lock
cat /boot/System.map-2.6.22.5cslabs grep
c0302008 c0302008 D per_cpu__cpu_number
23What is per_cpu data?
- With SMP systems there is often a need for each
CPU to have its own version of some
program-variables value - One example each CPU needs a unique
identification-number (used in scheduling tasks
for load-balancing and respecting
processor-affinity, and keeping track of which
CPU now owns a particular lock) - Thats what per_cpu__cpu_number is
24Role of segmentation
- Linux has a clever way of allowing CPUS to access
their per_cpu variables using the same name for
different locations - This can be arranged by exploiting the CPUs
memory-segmentation architecture - The FS segment-register is used by the kernel to
reference identically-named, but differently
positioned, storage-locations
25Each CPU has its own GDT
- The Operating System sets up a Global Descriptor
Table for each CPU its an array of
memory-segment descriptors
63
32
segment access rights
segment- base 23..16
segment- base 31..24
segment- limit 19..16
G
D
segment-base 15..0
segment-limit 15..0
31
0
segment-base tells where the memory-area
begins, segment-limit tells how far the
memory-area extends, and access rights
specifies how the memory-area will be used by
the CPU (e.g., user or kernel)
26In-class exercise 1
- Install our dram.c device-driver, so you can
run our showgdt.cpp application - You will see a CPUs memory-descriptors
(displayed as quadwords in hex format) - You will probably see a slightly different table
when you run showgdt again if Linux schedules
it on a different CPU
27Whats in register FS?
- You can use our newinfo.cpp utility to quickly
create an LKM that displays the values in the
CPUs segment-registers
// using global variables simplifies the inline
assembly language short _cs, _ds, _es, _fs,
_gs, _ss // global variables int my_get_info(
) int len asm( mov cs, _cs \n mov ds,
_ds ) len sprintf( buf, CS04X DS04X
\n, _cs, _ds ) return len
28In-class exercise 2
- Use the value in the FS segment-register to look
up that segments base-address (different
base-address on different CPU) - Convert the virtual base-address to its
corresponding physical base-address - Use our fileview utility to look at whats
stored in physical memory at those spots - Check the location fs0xc0302008
29virtual-to-physical
- If a virtual address is not in the high area
(i.e., if its below 0xF8000000), then it is easy
to calculate its physical address by doing a
simple subtraction
High Memory Area
0xF8000000
kernel space (1GB)
0xC0000000
user space (3GB)
4GB
Subtract 0xC0000000 from virtual address to get
physical address but NOT in HMA
virtual address-space