Title: Introduction to System Calls and Kernel Modules
1Introduction to System Calls and Kernel Modules
2Simplified Organization of Linux Kernel
System Call Interface
Process manage- ment
Memory manage- ment
File- systems
Device Control
Network Sockets
Architec- ture specific details
Virtual Memory Sub- system
Block Device Drivers
Character Device Drivers
Network Protocols
Network Device Drivers
CPU
Memory
Disk
Console
Network
3System Calls
- Operating systems typically support two levels of
privileges - User mode application execute at this level
- Supervisor mode OS (kernel) code executes at
this level - Applications need to call OS routines to request
privileged operations. - System calls
- Safely transfer control from lower privilege
level (user mode) to higher privilege level
(supervisor mode). - Examples open, read, write, close, wait, exec,
fork, kill - Kernel can tightly control entry points for the
application into the OS. - Application cant randomly jump into any part of
the OS code.
4Syscall operation
- Syscall invoked via a special CPU instruction
that triggers a software trap - int 0x80/lcall7/lcall27
- Process making the syscall is interrupted
- Information needed to continue its execution
later is saved - Processor switches to higher privileged level
- Processor determines the service being requested
by the user-mode by examining the processor state
and/or its stack.
- Executes the requested system call operation.
- Process making the syscall may be put to sleep
if the syscall involved blocking I/O. - When syscall completes, process is woken up (if
needed). - Original process state is restored
- Processor switches back to lower (user) privilege
level - Process returns from syscall and continues
execution.
5Intermediate Library to invoke syscalls
- To make it easier to invoke system calls, OS
writers normally provide a library that sits
between programs and system call interface. - Libc, glibc, etc.
- This library provides wrapper routines
- Wrappers hide the low-level details of
- Preparing arguments
- Passing arguments to kernel
- Switching to supervisor mode
- Fetching and returning results to application.
- Helps to reduce OS dependency and increase
portability of programs.
6Implementing System Calls
7Steps in writing a system call
- Create an entry for the system call in the
kernels syscall_table - User processes trapping to the kernel (through
SYS_ENTER or int 0x80) find the syscall function
by indexing into this table. - Write the system call code as a kernel function
- Be careful when reading/writing to user-space
- Use copy_to_user or copy_from_user routines
- Generate/Use a user-level system call stub
- Hides the complexity of making a system call from
user applications.
8Step 1 Create a sys_call_table entry
- arch/x86/ia32/ia32entry.S
- ENTRY(sys_call_table)
-
- .quad sys_epoll_create1
- .quad sys_dup3 / 330 /
- .quad sys_pipe2
- .quad sys_inotify_init1
- .quad sys_foo / 333
/ - arch/x86/include/asm/unistd_32.h
-
- define __NR_epoll_create1 329 define __NR_dup3
330 - define __NR_pipe2 331
- define __NR_inotify_init1 332
- define __NR_foo 333
- arch/x86/include/asm/unistd_64.h
- /
- This file contains the system call numbers.
- /
-
- define __NR_epoll_create1 291
- __SYSCALL(__NR_epoll_create1, sys_epoll_create1)
- define __NR_dup3 292
- __SYSCALL(__NR_dup3, sys_dup3)
- define __NR_pipe2 293
- __SYSCALL(__NR_pipe2, sys_pipe2)
- define __NR_inotify_init1 294
- __SYSCALL(__NR_inotify_init1, sys_inotify_init1)
- define __NR_foo 295
- __SYSCALL(__NR_foo, sys_foo)
for x86-64 machines
for 64-bit x86 machines
9Step 2 Write the system call (1)
- System call with no arguments and integer return
value - asmlinkage int sys_foo(void)
- printk (KERN ALERT I am foo. UID is d\n,
current-gtuid) - return current-gtuid
-
- Syscall with one primitive argument
- asmlinkage int sys_foo(int arg)
- printk (KERN ALERT This is foo. Argument is
d\n, arg) - return arg
10Step 2 Write the system call (2)
- Verifying argument passed by user space
- asmlinkage long sys_close(unsigned int fd)
-
- struct file filp
- struct files_struct files
current-gtfiles - struct fdtable fdt
- spin_lock(files-gtfile_lock)
- fdt files_fdtable(files)
- if (fd gt fdt-gtmax_fds)
- goto out_unlock
- filp fdt-gtfdfd
- if (!filp)
- goto out_unlock
-
- Call-by-reference argument
- User-space pointer sent as argument.
- Data to be copied back using the pointer.
- asmlinkage ssize_t sys_read ( unsigned int fd,
- char __user buf, size_t count)
-
-
- if( !access_ok( VERIFY_WRITE, buf, count))
- return EFAULT
-
11Example syscall implementation
- asmlinkage int sys_foo(void)
- static int count 0
- printk(KERN_ALERT "Hello World! d\n",
count) - return -EFAULT // what happens to this
return value? -
- EXPORT_SYMBOL(sys_foo)
12Step 3 Generate user-level stub using your new
system call - the new way
- old macros _syscall0, _syscall1, etc are now
obsolete in the new kernels. - The new way to invoke a system call is using the
the syscall(...) library function. - Do a "man syscall" for details.
- For instance, for a no-argument system call named
foo(), you'll call - ret syscall(__NR_sys_foo)
- Assuming you've defined __NR_sys_foo earlier
- For a 1 argument system call named foo(arg), you
call - ret syscall(__NR_sys_foo, arg)
- and so on for 2, 3, 4 arguments etc.
- For this method, check
- http//www.ibm.com/developerworks/linux/library/l-
system-calls/
13Using your new system call - the new way(contd.)
- include ltstdio.hgt
- include lterrno.hgt
- include ltunistd.hgt
- include ltlinux/unistd.hgt
- // define the new syscall number. Standard
syscalls are defined in linux/unistd.h - define __NR_sys_foo 333
- int main(void)
-
- int ret
- while(1)
- // making the system call
- ret syscall(__NR_sys_foo)
-
- printf("ret d errno d\n", ret, errno)
-
- sleep(1)
14Using your new system call - the old way
- You can still replicate the old _syscall0,
_syscall1 etc assembly code stubs in your user
program, but this is really not necessary
anymore. - These stubs use the old method of raising "int
0x80" software interrupts - which are found to be quite slow on newer Pentium
machines. - But this technique still works for backward
compatibility. - For this method, check http//www.linuxjournal.com
/article/1145
15Using your new system call - the old way(contd.)
- _syscall0(type,name)
- type type of return value (e.g. void or int)
- name name of the system call (e.g. foo)
- _syscall0(int,foo)
- Defines syscall entry point for asmlinkage int
sys_foo(void) - _syscall1(type,name,type1,arg1)
- type and name same as before
- type1 type of first argument
- name1 name of first argument
- _syscall1(void,foo,int,arg)
- Defines syscall entry point for asmlinkage void
sys_foo(int arg) - and similarly for two arguments, three
arguments and so on. - For definitions of _syscallN macros, check
- include/asm/unistd.h
- Also, pay attention to the usage and
implementation of __syscall_return macro. What
does it do?
16Using your new system call - the old way(contd.)
- include ltstdio.hgt
- include lterrno.hgt
- include ltunistd.hgt
- include ltlinux/unistd.hgt
- // define the new syscall number. Standard
syscalls are defined in linux/unistd.h - define __NR_foo 311
- // generate a user-level stub
- _syscall0(int,foo)
- int main(void)
-
- int ret
- while(1)
- // making the system call
- ret foo()
-
- printf("ret d errno d\n", ret, errno)
17SYSENTER/SYSEXIT Method
- This is the newest and fastest of all methods to
make system calls in Pentium class machines. - Pentium machines have long supported these new
instructions as a faster technique to enter and
exit the kernel mode than the old technique based
on raising the "int 0x80" software interrupt.
Newer linux kernels have apparently adopted this
technique. - You can read about the details in the following
links and maybe even try it out using the example
code. - http//manugarg.googlepages.com/systemcallinlinux2
_6.html - http//www.win.tue.nl/aeb/linux/lk/lk-4.html
- http//manugarg.googlepages.com/aboutelfauxiliaryv
ectors