Title: Advanced%20Character%20Driver%20Operations
1Advanced Character Driver Operations
- Ted Baker ? Andy Wang
- COP 5641 / CIS 4930
2Topics
- Managing ioctl command numbers
- Block/unblocking a process
- Seeking on a device
- Access control
3ioctl
- For operations beyond simple data transfers
- Eject the media
- Report error information
- Change hardware settings
- Self destruct
- Alternatives
- Embedded commands in the data stream
- Driver-specific file systems
4ioctl
- User-level interface
- int ioctl(int fd, unsigned long cmd, ...)
- ...
- Variable number of arguments
- Problematic for the system call interface
- In this context, it is meant to pass a single
optional argument - Just a way to bypass the type checking
- Difficult to audit ioctl calls
- E.g., 32-bit vs. 64-bit modes
- Currently uses lock_kernel(), or the global
kernel lock - See vfs_ioctl() in /fs/ioctl.c
5ioctl
- Driver-level interface
- int (ioctl) (struct inode inode,
- struct file filp,
- unsigned int cmd,
- unsigned long arg)
- cmd is passed from the user unchanged
- arg can be an integer or a pointer
- Compiler does not type check
6Choosing the ioctl Commands
- Need a numbering scheme to avoid mistakes
- E.g., issuing a command to the wrong device
(changing the baud rate of an audio device) - Check include/asm/ioctl.h and Documentation/ioctl/
ioctl-decoding.txt
7Choosing the ioctl Commands
- A command number uses four bitfields
- Defined in ltlinux/ioctl.hgt
- lt direction, type, number, sizegt
- direction direction of data transfer
- _IOC_NONE
- _IOC_READ
- _IOC_WRITE
- _IOC_READ WRITE
8Choosing the ioctl Commands
- type (ioctl device type)
- 8-bit (_IOC_TYPEBITS) magic number
- Associated with the device
- number
- 8-bit (_IOC_NRBITS) sequential number
- Unique within device
- size size of user data involved
- The width is either 13 or 14 bits (_IOC_SIZEBITS)
9Choosing the ioctl Commands
- Useful macros to create ioctl command numbers
- _IO(type, nr)
- _IOR(type, nr, datatype)
- _IOW(type, nr, datatype)
- _IOWR(type, nr, datatype)
- Example
- cmd _IOWR(k, 1, struct foo)
The macro will figure out that size
sizeof(datatype)
10Choosing the ioctl Commands
- Useful macros to decode ioctl command numbers
- _IOC_DIR(nr)
- _IOC_TYPE(nr)
- _IOC_NR(nr)
- _IOC_SIZE(nr)
11Choosing the ioctl Commands
- The scull example
- / Use 'k' as magic number /
- define SCULL_IOC_MAGIC 'k
- / Please use a different 8-bit number in your
code / - define SCULL_IOCRESET _IO(SCULL_IOC_MAGIC, 0)
12Choosing the ioctl Commands
- The scull example
- /
- S means "Set" through a ptr,
- T means "Tell" directly with the argument value
- G means "Get" reply by setting through a
pointer - Q means "Query" response is on the return
value - X means "eXchange" switch G and S atomically
- H means "sHift" switch T and Q atomically
- /
- define SCULL_IOCSQUANTUM _IOW(SCULL_IOC_MAGIC,
1, int) - define SCULL_IOCSQSET _IOW(SCULL_IOC_MAGIC, 2,
int) - define SCULL_IOCTQUANTUM _IO(SCULL_IOC_MAGIC, 3)
- define SCULL_IOCTQSET _IO(SCULL_IOC_MAGIC, 4)
- define SCULL_IOCGQUANTUM _IOR(SCULL_IOC_MAGIC,
5, int)
Set new value and return the old value
13Choosing the ioctl Commands
- The scull example
- define SCULL_IOCGQSET _IOR(SCULL_IOC_MAGIC, 6,
int) - define SCULL_IOCQQUANTUM _IO(SCULL_IOC_MAGIC, 7)
- define SCULL_IOCQQSET _IO(SCULL_IOC_MAGIC, 8)
- define SCULL_IOCXQUANTUM _IOWR(SCULL_IOC_MAGIC,
9, int) - define SCULL_IOCXQSET _IOWR(SCULL_IOC_MAGIC,10,
int) - define SCULL_IOCHQUANTUM _IO(SCULL_IOC_MAGIC,
11) - define SCULL_IOCHQSET _IO(SCULL_IOC_MAGIC, 12)
- define SCULL_IOC_MAXNR 14
14The Return Value
- When the command number is not supported
- Return EINVAL
- Or ENOTTY (according to the POSIX standard)
15The Predefined Commands
- Handled by the kernel first
- Will not be passed down to device drivers
- Three groups
- For any file (regular, device, FIFO, socket)
- Magic number T.
- For regular files only
- Specific to the file system type
16Using the ioctl Argument
- If it is an integer, just use it directly
- If it is a pointer
- Need to check for valid user address
- int access_ok(int type, const void addr,
- unsigned long size)
- type either VERIFY_READ or VERIFY_WRITE
- Returns 1 for success, 0 for failure
- Driver then results EFAULT to the caller
- Defined in ltasm/uaccess.hgt
- Mostly called by memory-access routines
17Using the ioctl Argument
- The scull example
- int scull_ioctl(struct inode inode, struct file
filp, - unsigned int cmd, unsigned long
arg) - int err 0, tmp
- int retval 0
- / check the magic number and whether the
command is defined / - if (_IOC_TYPE(cmd) ! SCULL_IOC_MAGIC)
- return -ENOTTY
-
- if (_IOC_NR(cmd) gt SCULL_IOC_MAXNR)
- return -ENOTTY
-
-
18Using the ioctl Argument
- The scull example
-
- / the concept of "read" and "write" is
reversed here / - if (_IOC_DIR(cmd) _IOC_READ)
- err !access_ok(VERIFY_WRITE, (void __user
) arg, - _IOC_SIZE(cmd))
- else if (_IOC_DIR(cmd) _IOC_WRITE)
- err !access_ok(VERIFY_READ, (void __user )
arg, - _IOC_SIZE(cmd))
-
- if (err) return -EFAULT
-
19Using the ioctl Argument
- Data transfer functions optimized for most used
data sizes (1, 2, 4, and 8 bytes) - If the size mismatches
- Cryptic compiler error message
- Conversion to non-scalar type requested
- Use copy_to_user and copy_from_user
- include ltasm/uaccess.hgt
- put_user(datum, ptr)
- Writes to a user-space address
- Calls access_ok()
- Returns 0 on success, -EFAULT on error
20Using the ioctl Argument
- __put_user(datum, ptr)
- Does not check access_ok()
- Can still fail if the user-space memory is not
writable - get_user(local, ptr)
- Reads from a user-space address
- Calls access_ok()
- Stores the retrieved value in local
- Returns 0 on success, -EFAULT on error
- __get_user(local, ptr)
- Does not check access_ok()
- Can still fail if the user-space memory is not
readable
21Capabilities and Restricted Operations
- Limit certain ioctl operations to privileged
users - See ltlinux/capability.hgt for the full set of
capabilities - To check a certain capability call
- int capable(int capability)
- In the scull example
- if (!capable(CAP_SYS_ADMIN))
- return EPERM
A catch-all capability for many system
administration operations
22The Implementation of the ioctl Commands
- A giant switch statement
-
- switch(cmd)
- case SCULL_IOCRESET
- scull_quantum SCULL_QUANTUM
- scull_qset SCULL_QSET
- break
- case SCULL_IOCSQUANTUM / Set arg points to
the value / - if (!capable(CAP_SYS_ADMIN))
- return -EPERM
-
- retval __get_user(scull_quantum, (int
__user )arg) - break
-
-
23The Implementation of the ioctl Commands
-
- case SCULL_IOCTQUANTUM / Tell arg is the
value / - if (!capable(CAP_SYS_ADMIN))
- return -EPERM
-
- scull_quantum arg
- break
- case SCULL_IOCGQUANTUM / Get arg is
pointer to result / - retval __put_user(scull_quantum, (int
__user ) arg) - break
- case SCULL_IOCQQUANTUM / Query return it
(gt 0) / - return scull_quantum
-
24The Implementation of the ioctl Commands
-
- case SCULL_IOCXQUANTUM / eXchange use arg
as pointer / - if (!capable(CAP_SYS_ADMIN))
- return -EPERM
-
- tmp scull_quantum
- retval __get_user(scull_quantum, (int
__user ) arg) - if (retval 0)
- retval __put_user(tmp, (int __user )
arg) -
- break
25The Implementation of the ioctl Commands
-
- case SCULL_IOCHQUANTUM / sHift like Tell
Query / - if (!capable(CAP_SYS_ADMIN))
- return -EPERM
-
- tmp scull_quantum
- scull_quantum arg
- return tmp
- default / redundant, as cmd was checked
against MAXNR / - return -ENOTTY
- / switch /
- return retval
- / scull_ioctl /
26The Implementation of the ioctl Commands
- Six ways to pass and receive arguments from the
user space - Need to know command number
- int quantum
- ioctl(fd,SCULL_IOCSQUANTUM, quantum) / Set by
pointer / - ioctl(fd,SCULL_IOCTQUANTUM, quantum) / Set by
value / - ioctl(fd,SCULL_IOCGQUANTUM, quantum) / Get by
pointer / - quantum ioctl(fd,SCULL_IOCQQUANTUM) / Get by
return value / - ioctl(fd,SCULL_IOCXQUANTUM, quantum) /
Exchange by pointer / - / Exchange by value /
- quantum ioctl(fd,SCULL_IOCHQUANTUM, quantum)
-
27Device Control Without ioctl
- Writing control sequences into the data stream
itself - Example console escape sequences
- Advantages
- No need to implement ioctl methods
- Disadvantages
- Need to make sure that escape sequences do not
appear in the normal data stream (e.g., cat a
binary file) - Need to parse the data stream
28Blocking I/O
- Needed when no data is available for reads
- When the device is not ready to accept data
- Output buffer is full
29Introduction to Sleeping
30Introduction to Sleeping
- A process is removed from the schedulers run
queue - Certain rules
- Never sleep when running in an atomic context
- Multiple steps must be performed without
concurrent accesses - Not while holding a spinlock, seqlock, or RCU
lock - Not while disabling interrupts
31Introduction to Sleeping
- Okay to sleep while holding a semaphore
- Other threads waiting for the semaphore will also
sleep - Need to keep it short
- Make sure that it is not blocking the process
that will wake it up - After waking up
- Make no assumptions about the state of the system
- The resource one is waiting for might be gone
again - Must check the wait condition again
32Introduction to Sleeping
- Wait queue contains a list of processes waiting
for a specific event - include ltlinux/wait.hgt
- To initialize statically, call
- DECLARE_WAIT_QUEUE_HEAD(my_queue)
- To initialize dynamically, call
- wait_queue_head_t my_queue
- init_waitqueue_head(my_queue)
33Simple Sleeping
- Call variants of wait_event macros
- wait_event(queue, condition)
- queue wait queue head
- Passed by value
- Waits until the boolean condition becomes true
- Puts into an uninterruptible sleep
- Usually is not what you want
- wait_event_interruptible(queue, condition)
- Can be interrupted by any signals
- Returns nonzero if sleep was interrupted
- Your driver should return -ERESTARTSYS
34Simple Sleeping
- wait_event_killable(queue, condition)
- Can be interrupted only by fatal signals
- wait_event_timeout(queue, condition, timeout)
- Wait for a limited time (in jiffies)
- Returns 0 regardless of condition evaluations
- wait_event_interruptible_timeout(queue,
- condition,
- timeout)
35Simple Sleeping
- To wake up, call variants of wake_up functions
- void wake_up(wait_queue_head_t queue)
- Wakes up all processes waiting on the queue
- void wake_up_interruptible(wait_queue_head_t
queue) - Wakes up processes that perform an interruptible
sleep
36Simple Sleeping
- Example module sleepy
- static DECLARE_WAIT_QUEUE_HEAD(wq)
- static int flag 0
- ssize_t sleepy_read(struct file filp, char
__user buf, - size_t count, loff_t pos)
- printk(KERN_DEBUG "process i (s) going to
sleep\n", - current-gtpid, current-gtcomm)
- wait_event_interruptible(wq, flag ! 0)
- flag 0
- printk(KERN_DEBUG "awoken i (s)\n",
current-gtpid, - current-gtcomm)
- return 0 / EOF /
Multiple threads can wake up at this point
37Simple Sleeping
- Example module sleepy
- ssize_t sleepy_write(struct file filp, const
char __user buf, - size_t count, loff_t pos)
- printk(KERN_DEBUG "process i (s) awakening
the readers...\n", - current-gtpid, current-gtcomm)
- flag 1
- wake_up_interruptible(wq)
- return count / succeed, to avoid retrial /
-
38Blocking and Nonblocking Operations
- By default, operations block
- If no data is available for reads
- If no space is available for writes
- Non-blocking I/O is indicated by the O_NONBLOCK
flag in filp-gtflags - Defined in ltlinux/fcntl.hgt
- Only open, read, and write calls are affected
- Returns EAGAIN immediately instead of block
- Applications need to distinguish non-blocking
returns vs. EOFs
39A Blocking I/O Example
- scullpipe
- A read process
- Blocks when no data is available
- Wakes a blocking write when buffer space becomes
available - A write process
- Blocks when no buffer space is available
- Wakes a blocking read process when data arrives
40A Blocking I/O Example
- scullpipe data structure
- struct scull_pipe
- wait_queue_head_t inq, outq / read and write
queues / - char buffer, end / begin of buf, end of buf
/ - int buffersize / used in pointer arithmetic
/ - char rp, wp / where to read, where to write
/ - int nreaders, nwriters / number of openings
for r/w / - struct fasync_struct async_queue /
asynchronous readers / - struct semaphore sem / mutual exclusion
semaphore / - struct cdev cdev / Char device structure /
-
41A Blocking I/O Example
- static ssize_t scull_p_read(struct file filp,
char __user buf, - size_t count, loff_t
f_pos) - struct scull_pipe dev filp-gtprivate_data
- if (down_interruptible(dev-gtsem)) return
-ERESTARTSYS - while (dev-gtrp dev-gtwp) / nothing to read
/ - up(dev-gtsem) / release the lock /
- if (filp-gtf_flags O_NONBLOCK)
- return -EAGAIN
- if (wait_event_interruptible(dev-gtinq,
(dev-gtrp ! dev-gtwp))) - return -ERESTARTSYS
- if (down_interruptible(dev-gtsem)) return
-ERESTARTSYS -
42A Blocking I/O Example
- if (dev-gtwp gt dev-gtrp)
- count min(count, (size_t)(dev-gtwp -
dev-gtrp)) - else / the write pointer has wrapped /
- count min(count, (size_t)(dev-gtend -
dev-gtrp)) - if (copy_to_user(buf, dev-gtrp, count))
- up (dev-gtsem)
- return -EFAULT
-
- dev-gtrp count
- if (dev-gtrp dev-gtend) dev-gtrp dev-gtbuffer
/ wrapped / - up (dev-gtsem)
- / finally, awake any writers and return /
- wake_up_interruptible(dev-gtoutq)
- return count
-
43Advanced Sleeping
44Advanced Sleeping
- Uses low-level functions to affect a sleep
- How a process sleeps
- 1. Allocate and initialize a wait_queue_t
structure - DEFINE_WAIT(my_wait)
- Or
- wait_queue_t my_wait
- init_wait(my_wait)
Queue element
45Advanced Sleeping
- 2. Add to the proper wait queue and mark a
process as being asleep - TASK_RUNNING ?TASK_INTERRUPTIBLE or
TASK_UNINTERRUPTIBLE - Call
- void prepare_to_wait(wait_queue_head_t queue,
- wait_queue_t wait,
- int state)
46Advanced Sleeping
- 3. Give up the processor
- Double check the sleeping condition before going
to sleep - The wakeup thread might have changed the
condition between steps 1 and 2 - if (/ sleeping condition /)
- schedule() / yield the CPU /
-
47Advanced Sleeping
- 4. Return from sleep
- Remove the process from the wait queue if
schedule() was not called - void finish_wait(wait_queue_head_t queue,
- wait_queue_t wait)
48Advanced Sleeping
- scullpipe write method
- / How much space is free? /
- static int spacefree(struct scull_pipe dev)
- if (dev-gtrp dev-gtwp)
- return dev-gtbuffersize - 1
- return ((dev-gtrp dev-gtbuffersize - dev-gtwp)
- dev-gtbuffersize) - 1
-
49Advanced Sleeping
- static ssize_t
- scull_p_write(struct file filp, const char
__user buf, - size_t count, loff_t f_pos)
- struct scull_pipe dev filp-gtprivate_data
- int result
- if (down_interruptible(dev-gtsem)) return
-ERESTARTSYS - / Wait for space for writing /
- result scull_getwritespace(dev, filp)
- if (result)
- return result / scull_getwritespace called
up(dev-gtsem) / - / ok, space is there, accept something /
- count min(count, (size_t)spacefree(dev))
50Advanced Sleeping
- if (dev-gtwp gt dev-gtrp)
- count min(count, (size_t)(dev-gtend -
dev-gtwp)) - else / the write pointer has wrapped, fill up
to rp - 1 / - count min(count, (size_t)(dev-gtrp - dev-gtwp
- 1)) - if (copy_from_user(dev-gtwp, buf, count))
- up (dev-gtsem) return -EFAULT
-
- dev-gtwp count
- if (dev-gtwp dev-gtend) dev-gtwp dev-gtbuffer
/ wrapped / - up(dev-gtsem)
- wake_up_interruptible(dev-gtinq)
- if (dev-gtasync_queue)
- kill_fasync(dev-gtasync_queue, SIGIO,
POLL_IN) - return count
-
Notify asynchronous readers who are waiting
51Advanced Sleeping (Scenario 1)
- / Wait for space for writing caller must hold
device semaphore. - On error the semaphore will be released before
returning. / - static int scull_getwritespace(struct scull_pipe
dev, - struct file filp)
- while (spacefree(dev) 0) / full /
- DEFINE_WAIT(wait)
- up(dev-gtsem)
- if (filp-gtf_flags O_NONBLOCK) return
-EAGAIN - prepare_to_wait(dev-gtoutq, wait,
TASK_INTERRUPTIBLE) - if (spacefree(dev) 0) schedule()
- finish_wait(dev-gtoutq, wait)
- if (signal_pending(current)) return
-ERESTARTSYS - if (down_interruptible(dev-gtsem)) return
-ERESTARTSYS -
- return 0
Task state RUNNING
Queue full
52Advanced Sleeping (Scenario 1)
- / Wait for space for writing caller must hold
device semaphore. - On error the semaphore will be released before
returning. / - static int scull_getwritespace(struct scull_pipe
dev, - struct file filp)
- while (spacefree(dev) 0) / full /
- DEFINE_WAIT(wait)
- up(dev-gtsem)
- if (filp-gtf_flags O_NONBLOCK) return
-EAGAIN - prepare_to_wait(dev-gtoutq, wait,
TASK_INTERRUPTIBLE) - if (spacefree(dev) 0) schedule()
- finish_wait(dev-gtoutq, wait)
- if (signal_pending(current)) return
-ERESTARTSYS - if (down_interruptible(dev-gtsem)) return
-ERESTARTSYS -
- return 0
Task state RUNNING ? INTERRUPTIBLE
Queue full
53Advanced Sleeping
- / Wait for space for writing caller must hold
device semaphore. - On error the semaphore will be released before
returning. / - static int scull_getwritespace(struct scull_pipe
dev, - struct file filp)
- while (spacefree(dev) 0) / full /
- DEFINE_WAIT(wait)
- up(dev-gtsem)
- if (filp-gtf_flags O_NONBLOCK) return
-EAGAIN - prepare_to_wait(dev-gtoutq, wait,
TASK_INTERRUPTIBLE) - if (spacefree(dev) 0) schedule()
- finish_wait(dev-gtoutq, wait)
- if (signal_pending(current)) return
-ERESTARTSYS - if (down_interruptible(dev-gtsem)) return
-ERESTARTSYS -
- return 0
Task state INTERRUPTIBLE / sleep /
Queue full
54Advanced Sleeping (Scenario 2)
- / Wait for space for writing caller must hold
device semaphore. - On error the semaphore will be released before
returning. / - static int scull_getwritespace(struct scull_pipe
dev, - struct file filp)
- while (spacefree(dev) 0) / full /
- DEFINE_WAIT(wait)
- up(dev-gtsem)
- if (filp-gtf_flags O_NONBLOCK) return
-EAGAIN - prepare_to_wait(dev-gtoutq, wait,
TASK_INTERRUPTIBLE) - if (spacefree(dev) 0) schedule()
- finish_wait(dev-gtoutq, wait)
- if (signal_pending(current)) return
-ERESTARTSYS - if (down_interruptible(dev-gtsem)) return
-ERESTARTSYS -
- return 0
Task state RUNNING
Queue full
55Advanced Sleeping (Scenario 2)
- / Wait for space for writing caller must hold
device semaphore. - On error the semaphore will be released before
returning. / - static int scull_getwritespace(struct scull_pipe
dev, - struct file filp)
- while (spacefree(dev) 0) / full /
- DEFINE_WAIT(wait)
- up(dev-gtsem)
- if (filp-gtf_flags O_NONBLOCK) return
-EAGAIN - prepare_to_wait(dev-gtoutq, wait,
TASK_INTERRUPTIBLE) - if (spacefree(dev) 0) schedule()
- finish_wait(dev-gtoutq, wait)
- if (signal_pending(current)) return
-ERESTARTSYS - if (down_interruptible(dev-gtsem)) return
-ERESTARTSYS -
- return 0
wake up
Task state RUNNING ? RUNNING
Queue !full
56Advanced Sleeping (Scenario 2)
- / Wait for space for writing caller must hold
device semaphore. - On error the semaphore will be released before
returning. / - static int scull_getwritespace(struct scull_pipe
dev, - struct file filp)
- while (spacefree(dev) 0) / full /
- DEFINE_WAIT(wait)
- up(dev-gtsem)
- if (filp-gtf_flags O_NONBLOCK) return
-EAGAIN - prepare_to_wait(dev-gtoutq, wait,
TASK_INTERRUPTIBLE) - if (spacefree(dev) 0) schedule()
- finish_wait(dev-gtoutq, wait)
- if (signal_pending(current)) return
-ERESTARTSYS - if (down_interruptible(dev-gtsem)) return
-ERESTARTSYS -
- return 0
Task state RUNNING ? INTERRUPTIBLE
Queue !full
57Advanced Sleeping (Scenario 2)
- / Wait for space for writing caller must hold
device semaphore. - On error the semaphore will be released before
returning. / - static int scull_getwritespace(struct scull_pipe
dev, - struct file filp)
- while (spacefree(dev) 0) / full /
- DEFINE_WAIT(wait)
- up(dev-gtsem)
- if (filp-gtf_flags O_NONBLOCK) return
-EAGAIN - prepare_to_wait(dev-gtoutq, wait,
TASK_INTERRUPTIBLE) - if (spacefree(dev) 0) schedule()
- finish_wait(dev-gtoutq, wait)
- if (signal_pending(current)) return
-ERESTARTSYS - if (down_interruptible(dev-gtsem)) return
-ERESTARTSYS -
- return 0
Task state INTERRUPTIBLE / no sleep /
Queue !full
58Advanced Sleeping (Scenario 3)
- / Wait for space for writing caller must hold
device semaphore. - On error the semaphore will be released before
returning. / - static int scull_getwritespace(struct scull_pipe
dev, - struct file filp)
- while (spacefree(dev) 0) / full /
- DEFINE_WAIT(wait)
- up(dev-gtsem)
- if (filp-gtf_flags O_NONBLOCK) return
-EAGAIN - prepare_to_wait(dev-gtoutq, wait,
TASK_INTERRUPTIBLE) - if (spacefree(dev) 0) schedule()
- finish_wait(dev-gtoutq, wait)
- if (signal_pending(current)) return
-ERESTARTSYS - if (down_interruptible(dev-gtsem)) return
-ERESTARTSYS -
- return 0
Task state RUNNING
Queue full
59Advanced Sleeping (Scenario 3)
- / Wait for space for writing caller must hold
device semaphore. - On error the semaphore will be released before
returning. / - static int scull_getwritespace(struct scull_pipe
dev, - struct file filp)
- while (spacefree(dev) 0) / full /
- DEFINE_WAIT(wait)
- up(dev-gtsem)
- if (filp-gtf_flags O_NONBLOCK) return
-EAGAIN - prepare_to_wait(dev-gtoutq, wait,
TASK_INTERRUPTIBLE) - if (spacefree(dev) 0) schedule()
- finish_wait(dev-gtoutq, wait)
- if (signal_pending(current)) return
-ERESTARTSYS - if (down_interruptible(dev-gtsem)) return
-ERESTARTSYS -
- return 0
Task state RUNNING ? INTERRUPTIBLE
Queue full
60Advanced Sleeping (Scenario 3)
- / Wait for space for writing caller must hold
device semaphore. - On error the semaphore will be released before
returning. / - static int scull_getwritespace(struct scull_pipe
dev, - struct file filp)
- while (spacefree(dev) 0) / full /
- DEFINE_WAIT(wait)
- up(dev-gtsem)
- if (filp-gtf_flags O_NONBLOCK) return
-EAGAIN - prepare_to_wait(dev-gtoutq, wait,
TASK_INTERRUPTIBLE) - if (spacefree(dev) 0) schedule()
- finish_wait(dev-gtoutq, wait)
- if (signal_pending(current)) return
-ERESTARTSYS - if (down_interruptible(dev-gtsem)) return
-ERESTARTSYS -
- return 0
wake up
Task state INTERRUPTIBLE ? RUNNING
Queue !full
61Advanced Sleeping (Scenario 3)
- / Wait for space for writing caller must hold
device semaphore. - On error the semaphore will be released before
returning. / - static int scull_getwritespace(struct scull_pipe
dev, - struct file filp)
- while (spacefree(dev) 0) / full /
- DEFINE_WAIT(wait)
- up(dev-gtsem)
- if (filp-gtf_flags O_NONBLOCK) return
-EAGAIN - prepare_to_wait(dev-gtoutq, wait,
TASK_INTERRUPTIBLE) - if (spacefree(dev) 0) schedule()
- finish_wait(dev-gtoutq, wait)
- if (signal_pending(current)) return
-ERESTARTSYS - if (down_interruptible(dev-gtsem)) return
-ERESTARTSYS -
- return 0
Task state RUNNING / do not sleep /
Queue !full
62Advanced Sleeping (Scenario 4)
- / Wait for space for writing caller must hold
device semaphore. - On error the semaphore will be released before
returning. / - static int scull_getwritespace(struct scull_pipe
dev, - struct file filp)
- while (spacefree(dev) 0) / full /
- DEFINE_WAIT(wait)
- up(dev-gtsem)
- if (filp-gtf_flags O_NONBLOCK) return
-EAGAIN - prepare_to_wait(dev-gtoutq, wait,
TASK_INTERRUPTIBLE) - if (spacefree(dev) 0) schedule()
- finish_wait(dev-gtoutq, wait)
- if (signal_pending(current)) return
-ERESTARTSYS - if (down_interruptible(dev-gtsem)) return
-ERESTARTSYS -
- return 0
Task state RUNNING
Queue full
63Advanced Sleeping (Scenario 4)
- / Wait for space for writing caller must hold
device semaphore. - On error the semaphore will be released before
returning. / - static int scull_getwritespace(struct scull_pipe
dev, - struct file filp)
- while (spacefree(dev) 0) / full /
- DEFINE_WAIT(wait)
- up(dev-gtsem)
- if (filp-gtf_flags O_NONBLOCK) return
-EAGAIN - prepare_to_wait(dev-gtoutq, wait,
TASK_INTERRUPTIBLE) - if (spacefree(dev) 0) schedule()
- finish_wait(dev-gtoutq, wait)
- if (signal_pending(current)) return
-ERESTARTSYS - if (down_interruptible(dev-gtsem)) return
-ERESTARTSYS -
- return 0
Task state RUNNING ? INTERRUPTIBLE
Queue full
64Advanced Sleeping (Scenario 4)
- / Wait for space for writing caller must hold
device semaphore. - On error the semaphore will be released before
returning. / - static int scull_getwritespace(struct scull_pipe
dev, - struct file filp)
- while (spacefree(dev) 0) / full /
- DEFINE_WAIT(wait)
- up(dev-gtsem)
- if (filp-gtf_flags O_NONBLOCK) return
-EAGAIN - prepare_to_wait(dev-gtoutq, wait,
TASK_INTERRUPTIBLE) - if (spacefree(dev) 0) schedule()
- finish_wait(dev-gtoutq, wait)
- if (signal_pending(current)) return
-ERESTARTSYS - if (down_interruptible(dev-gtsem)) return
-ERESTARTSYS -
- return 0
Task state INTERRUPTIBLE
Queue full
65Advanced Sleeping (Scenario 4)
- / Wait for space for writing caller must hold
device semaphore. - On error the semaphore will be released before
returning. / - static int scull_getwritespace(struct scull_pipe
dev, - struct file filp)
- while (spacefree(dev) 0) / full /
- DEFINE_WAIT(wait)
- up(dev-gtsem)
- if (filp-gtf_flags O_NONBLOCK) return
-EAGAIN - prepare_to_wait(dev-gtoutq, wait,
TASK_INTERRUPTIBLE) - if (spacefree(dev) 0) schedule()
- finish_wait(dev-gtoutq, wait)
- if (signal_pending(current)) return
-ERESTARTSYS - if (down_interruptible(dev-gtsem)) return
-ERESTARTSYS -
- return 0
wake up
Task state INTERRUPTIBLE ? RUNNING
Queue !full
66More Examples of Advanced Sleeping
- See linux/wait.h
- Implementations of wait_event, and
wait_event_interruptible
67Exclusive Waits
- Avoid waking up all processes waiting on a queue
- Wake up only one process
- Call
- void prepare_to_wait_exclusive(wait_queue_heat_t
queue, - wait_queue_t
wait, int state) - Set the WQ_FLAG_EXCLUSIVE flag
- Add the queue entry to the end of the wait queue
- wake_up stops after waking the first process with
the flag set
68The Details of Waking Up
- / wakes up all processes waiting on the queue /
- void wake_up(wait_queue_head_t queue)
- / wakes up processes that perform an
interruptible sleep / - void wake_up_interruptible(wait_queue_head_t
queue) - / wake up to nr exclusive waiters /
- void wake_up_nr(wait_queue_head_t queue, int
nr) - void wake_up_interruptible_nr(wait_queue_head_t
queue, int nr) - / wake up all exclusive waiters /
- void wake_up_all(wait_queue_head_t queue)
- void wake_up_interruptible_all(wait_queue_head_t
queue) - / do not lose the CPU during this call /
- void wake_up_interruptible_sync(wait_queue_head_t
queue)
69Ancient History sleep_on
70Testing the scullpipe Driver
- Window 1
- cat /dev/scullpipe
71Testing the scullpipe Driver
- Window 1
- cat /dev/scullpipe
- Window2
- ls aF gt /dev/scullpipe
72Testing the scullpipe Driver
- Window 1
- cat /dev/scullpipe
- ./
- ../
- file1
- file2
- Window2
- ls aF gt /dev/scullpipe
73poll and select
- Nonblocking I/Os often involve the use of poll,
select, and epoll system calls - Allow a process to determine whether it can read
or write open files without blocking - Can block a process until any of a set of file
descriptors becomes available for reading or
writing - select introduced in BSD Linux
- poll introduced in System V
- epoll added in 2.5.45 for better scaling
74poll and select
- All three calls supported through the poll method
- unsigned int (poll) (struct file filp,
- poll_table wait)
- 1. Call poll_wait on one or more wait queues that
could indicate a change in the poll status - If no file descriptors are available, wait
- 2. Return a bit mask describing the operations
that could be immediately performed without
blocking
75poll and select
- poll_table defined in ltlinux/poll.hgt
- To add a wait queue into the poll_table, call
- void poll_wait(struct file ,
- wait_queue_head_t ,
- poll_table )
- Bit mask flags defined in ltlinux/poll.hgt
- POLLIN
- Set if the device can be read without blocking
76poll and select
- POLLOUT
- Set if the device can be written without blocking
- POLLRDNORM
- Set if normal data is available for reading
- A readable device returns (POLLIN POLLRDNORM)
- POLLWRNORM
- Same meaning as POLLOUT
- A writable device returns (POLLOUT POLLWRNORM)
- POLLPRI
- High-priority data can be read without blocking
77poll and select
- POLLHUP
- Returns when a process reads the end-of-file
- POLLERR
- An error condition has occurred
- POLLRDBAND
- Out-of-band data is available for reading
- Associated with sockets
- POLLWRBAND
- Data with nonzero priority can be written to the
device
78poll and select
- Example
- static unsigned int scull_p_poll(struct file
filp, - poll_table
wait) - struct scull_pipe dev filp-gtprivate_data
- unsigned int mask 0
- down(dev-gtsem)
- poll_wait(filp, dev-gtinq, wait)
- poll_wait(filp, dev-gtoutq, wait)
- if (dev-gtrp ! dev-gtwp) / circular buffer not
empty / - mask POLLIN POLLRDNORM / readable /
- if (spacefree(dev)) / circular buffer not full
/ - mask POLLOUT POLLWRNORM / writable /
- up(dev-gtsem)
- return mask
-
79poll and select
- No end-of-file support
- The reader sees an end-of-file when all writers
close the file - Check dev-gtnwriters in read and poll
- Problem when a reader opens the scullpipe before
the writer - Need blocking within open
80Interaction with read and write
- Reading from the device
- If there is data in the input buffer, return at
least one byte - poll returns POLLIN POLLRDNORM
- If no data is available
- If O_NONBLOCK is set, return EAGAIN
- poll must report the device unreadable until one
byte arrives - At the end-of-file, read returns 0, poll returns
POLLHUP
81Interaction with read and write
- Writing to the device
- If there is space in the output buffer, accept at
least one byte - poll reports that the devices is writable by
returning POLLOUT POLLWRNORM - If the output buffer is full, write blocks
- If O_NONBLOCK is set, write returns EAGAIN
- poll reports that the file is not writable
- If the device is full, write returns -ENOSPC
82Interaction with read and write
- In write, never wait for data transmission before
returning - Or, select may block
- To make sure the output buffer is actually
transmitted, use fsync call
83Interaction with read and write
- To flush pending output, call fsync
- int (fsync) (struct file file, struct dentry
dentry, int datasync) - Should return only when the device has been
completely flushed - datasync
- Used by file systems, ignored by drivers
84The Underlying Data Structure
85The Underlying Data Structure
- When the poll call completes, poll_table is
deallocated with all wait queue entries removed - epoll reduces this overhead of setting up and
tearing down the data structure between every I/O
86Asynchronous Notification
- Polling
- Inefficient for rare events
- A solution asynchronous notification
- Application receives a signal whenever data
becomes available - Two steps
- Specify a process as the owner of the file (so
that the kernel knows whom to notify) - Set the FASYNC flag in the device via fcntl
command
87Asynchronous Notification
- Example (user space)
- / create a signal handler /
- signal(SIGIO, input_handler)
- / set current pid the owner of the stdin /
- fcntl(STDIN_FILENO, F_SETOWN, getpid())
- / obtain the current file control flags /
- oflags fcntl(STDIN_FILENO, F_GETFL)
- / set the asynchronous flag /
- fcntl(STDIN_FILENO, F_SETFL, oflags FASYNC)
88Asynchronous Notification
- Some catches
- Not all devices support asynchronous notification
- Usually available for sockets and ttys
- Need to know which input file to process
- Still need to use poll or select
89The Drivers Point of View
- 1. When F_SETOWN is invoked, a value is assigned
to filp-gtf_owner - 2. When F_SETFL is executed to change the status
of FASYNC - The drivers fasync method is called
- static int
- scull_p_fasync(int fd, struct file filp, int
mode) - struct scull_pipe dev filp-gtprivate_data
- return fasync_helper(fd, filp, mode,
dev-gtasync_queue) -
90The Drivers Point of View
- fasync_helper adds or removes processes from the
asynchronous list - void fasync_helper(int fd, struct file filp, int
mode, - struct fasync_struct fa)
- 3. When data arrives, send a SIGNO signal to all
processes registered for asynchronous
notification - Near the end of write, notify blocked readers
- if (dev-gtasync_queue)
- kill_fasync(dev-gtasync_queue, SIGIO, POLL_IN)
- Similarly for read, as needed
91The Drivers Point of View
- 4. When the file is closed, remove the file from
the list of asynchronous readers in the release
method - scull_p_fasync(-1, filp, 0)
92The llseek Implementation
- Implements lseek and llseek system calls
- Modifies filp-gtf_pos
- loff_t scull_llseek(struct file filp, loff_t
off, int whence) - struct scull_dev dev filp-gtprivate_data
- loff_t newpos
- switch(whence)
- case 0 / SEEK_SET /
- newpos off
- break
- case 1 / SEEK_CUR, relative to the current
position / - newpos filp-gtf_pos off
- break
-
93The llseek Implementation
- case 2 / SEEK_END, relative to the end of
the file / - newpos dev-gtsize off
- break
- default / can't happen /
- return -EINVAL
-
- if (newpos lt 0) return -EINVAL
- filp-gtf_pos newpos
- return newpos
-
94The llseek Implementation
- Does not make sense for serial ports and keyboard
inputs - Need to inform the kernel via calling
nonseekable_open in the open method - int nonseekable_open(struct inode inode,
struct file filp) - Replace llseek method with no_llseek (defined in
ltlinux/fs.hgt in your file_operations structure
95Access Control on a Device File
- Prevents unauthorized users from using the device
- Sometimes permits only one authorized user to
open the device at a time
96Single-Open Devices
- Example scullsingle
- static atomic_t scull_s_available
ATOMIC_INIT(1) - static int scull_s_open(struct inode inode,
struct file filp) - struct scull_dev dev scull_s_device
- if (!atomic_dec_and_test(scull_s_available))
- atomic_inc(scull_s_available)
- return -EBUSY / already open /
-
- / then, everything else is the same as before
/ - if ((filp-gtf_flags O_ACCMODE) O_WRONLY)
scull_trim(dev) - filp-gtprivate_data dev
- return 0 / success /
-
Returns true, if the tested value is 0
97Single-Open Devices
- In the release call, marks the device idle
- static int
- scull_s_release(struct inode inode, struct file
filp) - atomic_inc(scull_s_available) / release the
device / - return 0
-
98Restricting Access to a Single User (with
multiple processes) at a Time
- Example sculluid
- Includes the following in the open call
- spin_lock(scull_u_lock)
- if (scull_u_count / someone is using the
device / - (scull_u_owner ! current-gtuid) / not the
same user / - (scull_u_owner ! current-gteuid) / not the
same effective uid (for su) / - !capable(CAP_DAC_OVERRIDE)) / not root
override / - spin_unlock(scull_u_lock)
- return -EBUSY / -EPERM would confuse the user
/ -
- if (scull_u_count 0) scull_u_owner
current-gtuid - scull_u_count
- spin_unlock(scull_u_lock)
99Restricting Access to a Single User (with
Multiple Processes) at a Time
- Includes the following in the release call
- static int scull_u_release(struct inode inode,
- struct file filp)
- spin_lock(scull_u_lock)
- scull_u_count-- / nothing else /
- spin_unlock(scull_u_lock)
- return 0
-
100Blocking open as an Alternative to EBUSY
(scullwuid)
- A user might prefer to wait over getting errors
- E.g., data communication channel
- spin_lock(scull_w_lock)
- while (!scull_w_available())
- spin_unlock(scull_w_lock)
- if (filp-gtf_flags O_NONBLOCK) return -EAGAIN
- if (wait_event_interruptible(scull_w_wait,
- scull_w_available()
)) - return -ERESTARTSYS / tell the fs layer to
handle it / - spin_lock(scull_w_lock)
-
- if (scull_w_count 0) scull_w_owner
current-gtuid - scull_w_count
- spin_unlock(scull_w_lock)
101Blocking open as an Alternative to EBUSY
(scullwuid)
- The release method wakes pending processes
- static int scull_w_release(struct inode inode,
- struct file filp)
- int temp
- spin_lock(scull_w_lock)
- scull_w_count--
- temp scull_w_count
- spin_unlock(scull_w_lock)
- if (temp 0)
- wake_up_interruptible_sync(scull_w_wait)
- return 0
-
102Blocking open as an Alternative to EBUSY
- Might not be the right semantics for interactive
users - Blocking on cp vs. getting a return value EBUSY
or -EPERM - Incompatible policies for the same device
- One solution one device node per policy
103Cloning the Device on open
- Allows the creation of private, virtual devices
- E.g., One virtual scull device for each process
with different tty device number - Example scullpriv
104Cloning the Device on open
- static int scull_c_open(struct inode inode,
struct file filp) - struct scull_dev dev
- dev_t key
- if (!current-gtsignal-gttty)
- PDEBUG("Process \"s\" has no ctl tty\n",
current-gtcomm) - return -EINVAL
-
- key tty_devnum(current-gtsignal-gttty)
- spin_lock(scull_c_lock)
- dev scull_c_lookfor_device(key)
- spin_unlock(scull_c_lock)
- if (!dev) return -ENOMEM
- .../ then, everything else is the same as
before /
105Cloning the Device on open
- / The clone-specific data structure includes a
key field / - struct scull_listitem
- struct scull_dev device
- dev_t key
- struct list_head list
-
- / The list of devices, and a lock to protect it
/ - static LIST_HEAD(scull_c_list)
- static spinlock_t scull_c_lock
SPIN_LOCK_UNLOCKED
106Cloning the Device on open
- / Look for a device or create one if missing /
- static struct scull_dev scull_c_lookfor_device(de
v_t key) - struct scull_listitem lptr
-
- list_for_each_entry(lptr, scull_c_list, list)
- if (lptr-gtkey key)
- return (lptr-gtdevice)
-
- / not found /
- lptr kmalloc(sizeof(struct scull_listitem),
GFP_KERNEL) - if (!lptr) return NULL
107Cloning the Device on open
- / initialize the device /
- memset(lptr, 0, sizeof(struct scull_listitem))
- lptr-gtkey key
- scull_trim((lptr-gtdevice)) / initialize it
/ - init_MUTEX((lptr-gtdevice.sem))
- / place it in the list /
- list_add(lptr-gtlist, scull_c_list)
- return (lptr-gtdevice)
-
108Whats going on?
scull_listitem
struct scull_dev device dev_t key
scull_c_list