Advanced%20Character%20Driver%20Operations - PowerPoint PPT Presentation

About This Presentation
Title:

Advanced%20Character%20Driver%20Operations

Description:

Title: PowerPoint Presentation Last modified by: Andy Wang Created Date: 1/1/1601 12:00:00 AM Document presentation format: On-screen Show (4:3) Other titles – PowerPoint PPT presentation

Number of Views:220
Avg rating:3.0/5.0
Slides: 109
Provided by: fsu99
Learn more at: http://www.cs.fsu.edu
Category:

less

Transcript and Presenter's Notes

Title: Advanced%20Character%20Driver%20Operations


1
Advanced Character Driver Operations
  • Ted Baker ? Andy Wang
  • COP 5641 / CIS 4930

2
Topics
  • Managing ioctl command numbers
  • Block/unblocking a process
  • Seeking on a device
  • Access control

3
ioctl
  • For operations beyond simple data transfers
  • Eject the media
  • Report error information
  • Change hardware settings
  • Self destruct
  • Alternatives
  • Embedded commands in the data stream
  • Driver-specific file systems

4
ioctl
  • User-level interface
  • int ioctl(int fd, unsigned long cmd, ...)
  • ...
  • Variable number of arguments
  • Problematic for the system call interface
  • In this context, it is meant to pass a single
    optional argument
  • Just a way to bypass the type checking
  • Difficult to audit ioctl calls
  • E.g., 32-bit vs. 64-bit modes
  • Currently uses lock_kernel(), or the global
    kernel lock
  • See vfs_ioctl() in /fs/ioctl.c

5
ioctl
  • Driver-level interface
  • int (ioctl) (struct inode inode,
  • struct file filp,
  • unsigned int cmd,
  • unsigned long arg)
  • cmd is passed from the user unchanged
  • arg can be an integer or a pointer
  • Compiler does not type check

6
Choosing the ioctl Commands
  • Need a numbering scheme to avoid mistakes
  • E.g., issuing a command to the wrong device
    (changing the baud rate of an audio device)
  • Check include/asm/ioctl.h and Documentation/ioctl/
    ioctl-decoding.txt

7
Choosing the ioctl Commands
  • A command number uses four bitfields
  • Defined in ltlinux/ioctl.hgt
  • lt direction, type, number, sizegt
  • direction direction of data transfer
  • _IOC_NONE
  • _IOC_READ
  • _IOC_WRITE
  • _IOC_READ WRITE

8
Choosing the ioctl Commands
  • type (ioctl device type)
  • 8-bit (_IOC_TYPEBITS) magic number
  • Associated with the device
  • number
  • 8-bit (_IOC_NRBITS) sequential number
  • Unique within device
  • size size of user data involved
  • The width is either 13 or 14 bits (_IOC_SIZEBITS)

9
Choosing the ioctl Commands
  • Useful macros to create ioctl command numbers
  • _IO(type, nr)
  • _IOR(type, nr, datatype)
  • _IOW(type, nr, datatype)
  • _IOWR(type, nr, datatype)
  • Example
  • cmd _IOWR(k, 1, struct foo)

The macro will figure out that size
sizeof(datatype)
10
Choosing the ioctl Commands
  • Useful macros to decode ioctl command numbers
  • _IOC_DIR(nr)
  • _IOC_TYPE(nr)
  • _IOC_NR(nr)
  • _IOC_SIZE(nr)

11
Choosing the ioctl Commands
  • The scull example
  • / Use 'k' as magic number /
  • define SCULL_IOC_MAGIC 'k
  • / Please use a different 8-bit number in your
    code /
  • define SCULL_IOCRESET _IO(SCULL_IOC_MAGIC, 0)

12
Choosing the ioctl Commands
  • The scull example
  • /
  • S means "Set" through a ptr,
  • T means "Tell" directly with the argument value
  • G means "Get" reply by setting through a
    pointer
  • Q means "Query" response is on the return
    value
  • X means "eXchange" switch G and S atomically
  • H means "sHift" switch T and Q atomically
  • /
  • define SCULL_IOCSQUANTUM _IOW(SCULL_IOC_MAGIC,
    1, int)
  • define SCULL_IOCSQSET _IOW(SCULL_IOC_MAGIC, 2,
    int)
  • define SCULL_IOCTQUANTUM _IO(SCULL_IOC_MAGIC, 3)
  • define SCULL_IOCTQSET _IO(SCULL_IOC_MAGIC, 4)
  • define SCULL_IOCGQUANTUM _IOR(SCULL_IOC_MAGIC,
    5, int)

Set new value and return the old value
13
Choosing the ioctl Commands
  • The scull example
  • define SCULL_IOCGQSET _IOR(SCULL_IOC_MAGIC, 6,
    int)
  • define SCULL_IOCQQUANTUM _IO(SCULL_IOC_MAGIC, 7)
  • define SCULL_IOCQQSET _IO(SCULL_IOC_MAGIC, 8)
  • define SCULL_IOCXQUANTUM _IOWR(SCULL_IOC_MAGIC,
    9, int)
  • define SCULL_IOCXQSET _IOWR(SCULL_IOC_MAGIC,10,
    int)
  • define SCULL_IOCHQUANTUM _IO(SCULL_IOC_MAGIC,
    11)
  • define SCULL_IOCHQSET _IO(SCULL_IOC_MAGIC, 12)
  • define SCULL_IOC_MAXNR 14

14
The Return Value
  • When the command number is not supported
  • Return EINVAL
  • Or ENOTTY (according to the POSIX standard)

15
The Predefined Commands
  • Handled by the kernel first
  • Will not be passed down to device drivers
  • Three groups
  • For any file (regular, device, FIFO, socket)
  • Magic number T.
  • For regular files only
  • Specific to the file system type

16
Using the ioctl Argument
  • If it is an integer, just use it directly
  • If it is a pointer
  • Need to check for valid user address
  • int access_ok(int type, const void addr,
  • unsigned long size)
  • type either VERIFY_READ or VERIFY_WRITE
  • Returns 1 for success, 0 for failure
  • Driver then results EFAULT to the caller
  • Defined in ltasm/uaccess.hgt
  • Mostly called by memory-access routines

17
Using the ioctl Argument
  • The scull example
  • int scull_ioctl(struct inode inode, struct file
    filp,
  • unsigned int cmd, unsigned long
    arg)
  • int err 0, tmp
  • int retval 0
  • / check the magic number and whether the
    command is defined /
  • if (_IOC_TYPE(cmd) ! SCULL_IOC_MAGIC)
  • return -ENOTTY
  • if (_IOC_NR(cmd) gt SCULL_IOC_MAXNR)
  • return -ENOTTY

18
Using the ioctl Argument
  • The scull example
  • / the concept of "read" and "write" is
    reversed here /
  • if (_IOC_DIR(cmd) _IOC_READ)
  • err !access_ok(VERIFY_WRITE, (void __user
    ) arg,
  • _IOC_SIZE(cmd))
  • else if (_IOC_DIR(cmd) _IOC_WRITE)
  • err !access_ok(VERIFY_READ, (void __user )
    arg,
  • _IOC_SIZE(cmd))
  • if (err) return -EFAULT

19
Using the ioctl Argument
  • Data transfer functions optimized for most used
    data sizes (1, 2, 4, and 8 bytes)
  • If the size mismatches
  • Cryptic compiler error message
  • Conversion to non-scalar type requested
  • Use copy_to_user and copy_from_user
  • include ltasm/uaccess.hgt
  • put_user(datum, ptr)
  • Writes to a user-space address
  • Calls access_ok()
  • Returns 0 on success, -EFAULT on error

20
Using the ioctl Argument
  • __put_user(datum, ptr)
  • Does not check access_ok()
  • Can still fail if the user-space memory is not
    writable
  • get_user(local, ptr)
  • Reads from a user-space address
  • Calls access_ok()
  • Stores the retrieved value in local
  • Returns 0 on success, -EFAULT on error
  • __get_user(local, ptr)
  • Does not check access_ok()
  • Can still fail if the user-space memory is not
    readable

21
Capabilities and Restricted Operations
  • Limit certain ioctl operations to privileged
    users
  • See ltlinux/capability.hgt for the full set of
    capabilities
  • To check a certain capability call
  • int capable(int capability)
  • In the scull example
  • if (!capable(CAP_SYS_ADMIN))
  • return EPERM

A catch-all capability for many system
administration operations
22
The Implementation of the ioctl Commands
  • A giant switch statement
  • switch(cmd)
  • case SCULL_IOCRESET
  • scull_quantum SCULL_QUANTUM
  • scull_qset SCULL_QSET
  • break
  • case SCULL_IOCSQUANTUM / Set arg points to
    the value /
  • if (!capable(CAP_SYS_ADMIN))
  • return -EPERM
  • retval __get_user(scull_quantum, (int
    __user )arg)
  • break

23
The Implementation of the ioctl Commands
  • case SCULL_IOCTQUANTUM / Tell arg is the
    value /
  • if (!capable(CAP_SYS_ADMIN))
  • return -EPERM
  • scull_quantum arg
  • break
  • case SCULL_IOCGQUANTUM / Get arg is
    pointer to result /
  • retval __put_user(scull_quantum, (int
    __user ) arg)
  • break
  • case SCULL_IOCQQUANTUM / Query return it
    (gt 0) /
  • return scull_quantum

24
The Implementation of the ioctl Commands
  • case SCULL_IOCXQUANTUM / eXchange use arg
    as pointer /
  • if (!capable(CAP_SYS_ADMIN))
  • return -EPERM
  • tmp scull_quantum
  • retval __get_user(scull_quantum, (int
    __user ) arg)
  • if (retval 0)
  • retval __put_user(tmp, (int __user )
    arg)
  • break

25
The Implementation of the ioctl Commands
  • case SCULL_IOCHQUANTUM / sHift like Tell
    Query /
  • if (!capable(CAP_SYS_ADMIN))
  • return -EPERM
  • tmp scull_quantum
  • scull_quantum arg
  • return tmp
  • default / redundant, as cmd was checked
    against MAXNR /
  • return -ENOTTY
  • / switch /
  • return retval
  • / scull_ioctl /

26
The Implementation of the ioctl Commands
  • Six ways to pass and receive arguments from the
    user space
  • Need to know command number
  • int quantum
  • ioctl(fd,SCULL_IOCSQUANTUM, quantum) / Set by
    pointer /
  • ioctl(fd,SCULL_IOCTQUANTUM, quantum) / Set by
    value /
  • ioctl(fd,SCULL_IOCGQUANTUM, quantum) / Get by
    pointer /
  • quantum ioctl(fd,SCULL_IOCQQUANTUM) / Get by
    return value /
  • ioctl(fd,SCULL_IOCXQUANTUM, quantum) /
    Exchange by pointer /
  • / Exchange by value /
  • quantum ioctl(fd,SCULL_IOCHQUANTUM, quantum)

27
Device Control Without ioctl
  • Writing control sequences into the data stream
    itself
  • Example console escape sequences
  • Advantages
  • No need to implement ioctl methods
  • Disadvantages
  • Need to make sure that escape sequences do not
    appear in the normal data stream (e.g., cat a
    binary file)
  • Need to parse the data stream

28
Blocking I/O
  • Needed when no data is available for reads
  • When the device is not ready to accept data
  • Output buffer is full

29
Introduction to Sleeping
30
Introduction to Sleeping
  • A process is removed from the schedulers run
    queue
  • Certain rules
  • Never sleep when running in an atomic context
  • Multiple steps must be performed without
    concurrent accesses
  • Not while holding a spinlock, seqlock, or RCU
    lock
  • Not while disabling interrupts

31
Introduction to Sleeping
  • Okay to sleep while holding a semaphore
  • Other threads waiting for the semaphore will also
    sleep
  • Need to keep it short
  • Make sure that it is not blocking the process
    that will wake it up
  • After waking up
  • Make no assumptions about the state of the system
  • The resource one is waiting for might be gone
    again
  • Must check the wait condition again

32
Introduction to Sleeping
  • Wait queue contains a list of processes waiting
    for a specific event
  • include ltlinux/wait.hgt
  • To initialize statically, call
  • DECLARE_WAIT_QUEUE_HEAD(my_queue)
  • To initialize dynamically, call
  • wait_queue_head_t my_queue
  • init_waitqueue_head(my_queue)

33
Simple Sleeping
  • Call variants of wait_event macros
  • wait_event(queue, condition)
  • queue wait queue head
  • Passed by value
  • Waits until the boolean condition becomes true
  • Puts into an uninterruptible sleep
  • Usually is not what you want
  • wait_event_interruptible(queue, condition)
  • Can be interrupted by any signals
  • Returns nonzero if sleep was interrupted
  • Your driver should return -ERESTARTSYS

34
Simple Sleeping
  • wait_event_killable(queue, condition)
  • Can be interrupted only by fatal signals
  • wait_event_timeout(queue, condition, timeout)
  • Wait for a limited time (in jiffies)
  • Returns 0 regardless of condition evaluations
  • wait_event_interruptible_timeout(queue,
  • condition,
  • timeout)

35
Simple Sleeping
  • To wake up, call variants of wake_up functions
  • void wake_up(wait_queue_head_t queue)
  • Wakes up all processes waiting on the queue
  • void wake_up_interruptible(wait_queue_head_t
    queue)
  • Wakes up processes that perform an interruptible
    sleep

36
Simple Sleeping
  • Example module sleepy
  • static DECLARE_WAIT_QUEUE_HEAD(wq)
  • static int flag 0
  • ssize_t sleepy_read(struct file filp, char
    __user buf,
  • size_t count, loff_t pos)
  • printk(KERN_DEBUG "process i (s) going to
    sleep\n",
  • current-gtpid, current-gtcomm)
  • wait_event_interruptible(wq, flag ! 0)
  • flag 0
  • printk(KERN_DEBUG "awoken i (s)\n",
    current-gtpid,
  • current-gtcomm)
  • return 0 / EOF /

Multiple threads can wake up at this point
37
Simple Sleeping
  • Example module sleepy
  • ssize_t sleepy_write(struct file filp, const
    char __user buf,
  • size_t count, loff_t pos)
  • printk(KERN_DEBUG "process i (s) awakening
    the readers...\n",
  • current-gtpid, current-gtcomm)
  • flag 1
  • wake_up_interruptible(wq)
  • return count / succeed, to avoid retrial /

38
Blocking and Nonblocking Operations
  • By default, operations block
  • If no data is available for reads
  • If no space is available for writes
  • Non-blocking I/O is indicated by the O_NONBLOCK
    flag in filp-gtflags
  • Defined in ltlinux/fcntl.hgt
  • Only open, read, and write calls are affected
  • Returns EAGAIN immediately instead of block
  • Applications need to distinguish non-blocking
    returns vs. EOFs

39
A Blocking I/O Example
  • scullpipe
  • A read process
  • Blocks when no data is available
  • Wakes a blocking write when buffer space becomes
    available
  • A write process
  • Blocks when no buffer space is available
  • Wakes a blocking read process when data arrives

40
A Blocking I/O Example
  • scullpipe data structure
  • struct scull_pipe
  • wait_queue_head_t inq, outq / read and write
    queues /
  • char buffer, end / begin of buf, end of buf
    /
  • int buffersize / used in pointer arithmetic
    /
  • char rp, wp / where to read, where to write
    /
  • int nreaders, nwriters / number of openings
    for r/w /
  • struct fasync_struct async_queue /
    asynchronous readers /
  • struct semaphore sem / mutual exclusion
    semaphore /
  • struct cdev cdev / Char device structure /

41
A Blocking I/O Example
  • static ssize_t scull_p_read(struct file filp,
    char __user buf,
  • size_t count, loff_t
    f_pos)
  • struct scull_pipe dev filp-gtprivate_data
  • if (down_interruptible(dev-gtsem)) return
    -ERESTARTSYS
  • while (dev-gtrp dev-gtwp) / nothing to read
    /
  • up(dev-gtsem) / release the lock /
  • if (filp-gtf_flags O_NONBLOCK)
  • return -EAGAIN
  • if (wait_event_interruptible(dev-gtinq,
    (dev-gtrp ! dev-gtwp)))
  • return -ERESTARTSYS
  • if (down_interruptible(dev-gtsem)) return
    -ERESTARTSYS

42
A Blocking I/O Example
  • if (dev-gtwp gt dev-gtrp)
  • count min(count, (size_t)(dev-gtwp -
    dev-gtrp))
  • else / the write pointer has wrapped /
  • count min(count, (size_t)(dev-gtend -
    dev-gtrp))
  • if (copy_to_user(buf, dev-gtrp, count))
  • up (dev-gtsem)
  • return -EFAULT
  • dev-gtrp count
  • if (dev-gtrp dev-gtend) dev-gtrp dev-gtbuffer
    / wrapped /
  • up (dev-gtsem)
  • / finally, awake any writers and return /
  • wake_up_interruptible(dev-gtoutq)
  • return count

43
Advanced Sleeping
44
Advanced Sleeping
  • Uses low-level functions to affect a sleep
  • How a process sleeps
  • 1. Allocate and initialize a wait_queue_t
    structure
  • DEFINE_WAIT(my_wait)
  • Or
  • wait_queue_t my_wait
  • init_wait(my_wait)

Queue element
45
Advanced Sleeping
  • 2. Add to the proper wait queue and mark a
    process as being asleep
  • TASK_RUNNING ?TASK_INTERRUPTIBLE or
    TASK_UNINTERRUPTIBLE
  • Call
  • void prepare_to_wait(wait_queue_head_t queue,
  • wait_queue_t wait,
  • int state)

46
Advanced Sleeping
  • 3. Give up the processor
  • Double check the sleeping condition before going
    to sleep
  • The wakeup thread might have changed the
    condition between steps 1 and 2
  • if (/ sleeping condition /)
  • schedule() / yield the CPU /

47
Advanced Sleeping
  • 4. Return from sleep
  • Remove the process from the wait queue if
    schedule() was not called
  • void finish_wait(wait_queue_head_t queue,
  • wait_queue_t wait)

48
Advanced Sleeping
  • scullpipe write method
  • / How much space is free? /
  • static int spacefree(struct scull_pipe dev)
  • if (dev-gtrp dev-gtwp)
  • return dev-gtbuffersize - 1
  • return ((dev-gtrp dev-gtbuffersize - dev-gtwp)
  • dev-gtbuffersize) - 1

49
Advanced Sleeping
  • static ssize_t
  • scull_p_write(struct file filp, const char
    __user buf,
  • size_t count, loff_t f_pos)
  • struct scull_pipe dev filp-gtprivate_data
  • int result
  • if (down_interruptible(dev-gtsem)) return
    -ERESTARTSYS
  • / Wait for space for writing /
  • result scull_getwritespace(dev, filp)
  • if (result)
  • return result / scull_getwritespace called
    up(dev-gtsem) /
  • / ok, space is there, accept something /
  • count min(count, (size_t)spacefree(dev))

50
Advanced Sleeping
  • if (dev-gtwp gt dev-gtrp)
  • count min(count, (size_t)(dev-gtend -
    dev-gtwp))
  • else / the write pointer has wrapped, fill up
    to rp - 1 /
  • count min(count, (size_t)(dev-gtrp - dev-gtwp
    - 1))
  • if (copy_from_user(dev-gtwp, buf, count))
  • up (dev-gtsem) return -EFAULT
  • dev-gtwp count
  • if (dev-gtwp dev-gtend) dev-gtwp dev-gtbuffer
    / wrapped /
  • up(dev-gtsem)
  • wake_up_interruptible(dev-gtinq)
  • if (dev-gtasync_queue)
  • kill_fasync(dev-gtasync_queue, SIGIO,
    POLL_IN)
  • return count

Notify asynchronous readers who are waiting
51
Advanced Sleeping (Scenario 1)
  • / Wait for space for writing caller must hold
    device semaphore.
  • On error the semaphore will be released before
    returning. /
  • static int scull_getwritespace(struct scull_pipe
    dev,
  • struct file filp)
  • while (spacefree(dev) 0) / full /
  • DEFINE_WAIT(wait)
  • up(dev-gtsem)
  • if (filp-gtf_flags O_NONBLOCK) return
    -EAGAIN
  • prepare_to_wait(dev-gtoutq, wait,
    TASK_INTERRUPTIBLE)
  • if (spacefree(dev) 0) schedule()
  • finish_wait(dev-gtoutq, wait)
  • if (signal_pending(current)) return
    -ERESTARTSYS
  • if (down_interruptible(dev-gtsem)) return
    -ERESTARTSYS
  • return 0

Task state RUNNING
Queue full
52
Advanced Sleeping (Scenario 1)
  • / Wait for space for writing caller must hold
    device semaphore.
  • On error the semaphore will be released before
    returning. /
  • static int scull_getwritespace(struct scull_pipe
    dev,
  • struct file filp)
  • while (spacefree(dev) 0) / full /
  • DEFINE_WAIT(wait)
  • up(dev-gtsem)
  • if (filp-gtf_flags O_NONBLOCK) return
    -EAGAIN
  • prepare_to_wait(dev-gtoutq, wait,
    TASK_INTERRUPTIBLE)
  • if (spacefree(dev) 0) schedule()
  • finish_wait(dev-gtoutq, wait)
  • if (signal_pending(current)) return
    -ERESTARTSYS
  • if (down_interruptible(dev-gtsem)) return
    -ERESTARTSYS
  • return 0

Task state RUNNING ? INTERRUPTIBLE
Queue full
53
Advanced Sleeping
  • / Wait for space for writing caller must hold
    device semaphore.
  • On error the semaphore will be released before
    returning. /
  • static int scull_getwritespace(struct scull_pipe
    dev,
  • struct file filp)
  • while (spacefree(dev) 0) / full /
  • DEFINE_WAIT(wait)
  • up(dev-gtsem)
  • if (filp-gtf_flags O_NONBLOCK) return
    -EAGAIN
  • prepare_to_wait(dev-gtoutq, wait,
    TASK_INTERRUPTIBLE)
  • if (spacefree(dev) 0) schedule()
  • finish_wait(dev-gtoutq, wait)
  • if (signal_pending(current)) return
    -ERESTARTSYS
  • if (down_interruptible(dev-gtsem)) return
    -ERESTARTSYS
  • return 0

Task state INTERRUPTIBLE / sleep /
Queue full
54
Advanced Sleeping (Scenario 2)
  • / Wait for space for writing caller must hold
    device semaphore.
  • On error the semaphore will be released before
    returning. /
  • static int scull_getwritespace(struct scull_pipe
    dev,
  • struct file filp)
  • while (spacefree(dev) 0) / full /
  • DEFINE_WAIT(wait)
  • up(dev-gtsem)
  • if (filp-gtf_flags O_NONBLOCK) return
    -EAGAIN
  • prepare_to_wait(dev-gtoutq, wait,
    TASK_INTERRUPTIBLE)
  • if (spacefree(dev) 0) schedule()
  • finish_wait(dev-gtoutq, wait)
  • if (signal_pending(current)) return
    -ERESTARTSYS
  • if (down_interruptible(dev-gtsem)) return
    -ERESTARTSYS
  • return 0

Task state RUNNING
Queue full
55
Advanced Sleeping (Scenario 2)
  • / Wait for space for writing caller must hold
    device semaphore.
  • On error the semaphore will be released before
    returning. /
  • static int scull_getwritespace(struct scull_pipe
    dev,
  • struct file filp)
  • while (spacefree(dev) 0) / full /
  • DEFINE_WAIT(wait)
  • up(dev-gtsem)
  • if (filp-gtf_flags O_NONBLOCK) return
    -EAGAIN
  • prepare_to_wait(dev-gtoutq, wait,
    TASK_INTERRUPTIBLE)
  • if (spacefree(dev) 0) schedule()
  • finish_wait(dev-gtoutq, wait)
  • if (signal_pending(current)) return
    -ERESTARTSYS
  • if (down_interruptible(dev-gtsem)) return
    -ERESTARTSYS
  • return 0

wake up
Task state RUNNING ? RUNNING
Queue !full
56
Advanced Sleeping (Scenario 2)
  • / Wait for space for writing caller must hold
    device semaphore.
  • On error the semaphore will be released before
    returning. /
  • static int scull_getwritespace(struct scull_pipe
    dev,
  • struct file filp)
  • while (spacefree(dev) 0) / full /
  • DEFINE_WAIT(wait)
  • up(dev-gtsem)
  • if (filp-gtf_flags O_NONBLOCK) return
    -EAGAIN
  • prepare_to_wait(dev-gtoutq, wait,
    TASK_INTERRUPTIBLE)
  • if (spacefree(dev) 0) schedule()
  • finish_wait(dev-gtoutq, wait)
  • if (signal_pending(current)) return
    -ERESTARTSYS
  • if (down_interruptible(dev-gtsem)) return
    -ERESTARTSYS
  • return 0

Task state RUNNING ? INTERRUPTIBLE
Queue !full
57
Advanced Sleeping (Scenario 2)
  • / Wait for space for writing caller must hold
    device semaphore.
  • On error the semaphore will be released before
    returning. /
  • static int scull_getwritespace(struct scull_pipe
    dev,
  • struct file filp)
  • while (spacefree(dev) 0) / full /
  • DEFINE_WAIT(wait)
  • up(dev-gtsem)
  • if (filp-gtf_flags O_NONBLOCK) return
    -EAGAIN
  • prepare_to_wait(dev-gtoutq, wait,
    TASK_INTERRUPTIBLE)
  • if (spacefree(dev) 0) schedule()
  • finish_wait(dev-gtoutq, wait)
  • if (signal_pending(current)) return
    -ERESTARTSYS
  • if (down_interruptible(dev-gtsem)) return
    -ERESTARTSYS
  • return 0

Task state INTERRUPTIBLE / no sleep /
Queue !full
58
Advanced Sleeping (Scenario 3)
  • / Wait for space for writing caller must hold
    device semaphore.
  • On error the semaphore will be released before
    returning. /
  • static int scull_getwritespace(struct scull_pipe
    dev,
  • struct file filp)
  • while (spacefree(dev) 0) / full /
  • DEFINE_WAIT(wait)
  • up(dev-gtsem)
  • if (filp-gtf_flags O_NONBLOCK) return
    -EAGAIN
  • prepare_to_wait(dev-gtoutq, wait,
    TASK_INTERRUPTIBLE)
  • if (spacefree(dev) 0) schedule()
  • finish_wait(dev-gtoutq, wait)
  • if (signal_pending(current)) return
    -ERESTARTSYS
  • if (down_interruptible(dev-gtsem)) return
    -ERESTARTSYS
  • return 0

Task state RUNNING
Queue full
59
Advanced Sleeping (Scenario 3)
  • / Wait for space for writing caller must hold
    device semaphore.
  • On error the semaphore will be released before
    returning. /
  • static int scull_getwritespace(struct scull_pipe
    dev,
  • struct file filp)
  • while (spacefree(dev) 0) / full /
  • DEFINE_WAIT(wait)
  • up(dev-gtsem)
  • if (filp-gtf_flags O_NONBLOCK) return
    -EAGAIN
  • prepare_to_wait(dev-gtoutq, wait,
    TASK_INTERRUPTIBLE)
  • if (spacefree(dev) 0) schedule()
  • finish_wait(dev-gtoutq, wait)
  • if (signal_pending(current)) return
    -ERESTARTSYS
  • if (down_interruptible(dev-gtsem)) return
    -ERESTARTSYS
  • return 0

Task state RUNNING ? INTERRUPTIBLE
Queue full
60
Advanced Sleeping (Scenario 3)
  • / Wait for space for writing caller must hold
    device semaphore.
  • On error the semaphore will be released before
    returning. /
  • static int scull_getwritespace(struct scull_pipe
    dev,
  • struct file filp)
  • while (spacefree(dev) 0) / full /
  • DEFINE_WAIT(wait)
  • up(dev-gtsem)
  • if (filp-gtf_flags O_NONBLOCK) return
    -EAGAIN
  • prepare_to_wait(dev-gtoutq, wait,
    TASK_INTERRUPTIBLE)
  • if (spacefree(dev) 0) schedule()
  • finish_wait(dev-gtoutq, wait)
  • if (signal_pending(current)) return
    -ERESTARTSYS
  • if (down_interruptible(dev-gtsem)) return
    -ERESTARTSYS
  • return 0

wake up
Task state INTERRUPTIBLE ? RUNNING
Queue !full
61
Advanced Sleeping (Scenario 3)
  • / Wait for space for writing caller must hold
    device semaphore.
  • On error the semaphore will be released before
    returning. /
  • static int scull_getwritespace(struct scull_pipe
    dev,
  • struct file filp)
  • while (spacefree(dev) 0) / full /
  • DEFINE_WAIT(wait)
  • up(dev-gtsem)
  • if (filp-gtf_flags O_NONBLOCK) return
    -EAGAIN
  • prepare_to_wait(dev-gtoutq, wait,
    TASK_INTERRUPTIBLE)
  • if (spacefree(dev) 0) schedule()
  • finish_wait(dev-gtoutq, wait)
  • if (signal_pending(current)) return
    -ERESTARTSYS
  • if (down_interruptible(dev-gtsem)) return
    -ERESTARTSYS
  • return 0

Task state RUNNING / do not sleep /
Queue !full
62
Advanced Sleeping (Scenario 4)
  • / Wait for space for writing caller must hold
    device semaphore.
  • On error the semaphore will be released before
    returning. /
  • static int scull_getwritespace(struct scull_pipe
    dev,
  • struct file filp)
  • while (spacefree(dev) 0) / full /
  • DEFINE_WAIT(wait)
  • up(dev-gtsem)
  • if (filp-gtf_flags O_NONBLOCK) return
    -EAGAIN
  • prepare_to_wait(dev-gtoutq, wait,
    TASK_INTERRUPTIBLE)
  • if (spacefree(dev) 0) schedule()
  • finish_wait(dev-gtoutq, wait)
  • if (signal_pending(current)) return
    -ERESTARTSYS
  • if (down_interruptible(dev-gtsem)) return
    -ERESTARTSYS
  • return 0

Task state RUNNING
Queue full
63
Advanced Sleeping (Scenario 4)
  • / Wait for space for writing caller must hold
    device semaphore.
  • On error the semaphore will be released before
    returning. /
  • static int scull_getwritespace(struct scull_pipe
    dev,
  • struct file filp)
  • while (spacefree(dev) 0) / full /
  • DEFINE_WAIT(wait)
  • up(dev-gtsem)
  • if (filp-gtf_flags O_NONBLOCK) return
    -EAGAIN
  • prepare_to_wait(dev-gtoutq, wait,
    TASK_INTERRUPTIBLE)
  • if (spacefree(dev) 0) schedule()
  • finish_wait(dev-gtoutq, wait)
  • if (signal_pending(current)) return
    -ERESTARTSYS
  • if (down_interruptible(dev-gtsem)) return
    -ERESTARTSYS
  • return 0

Task state RUNNING ? INTERRUPTIBLE
Queue full
64
Advanced Sleeping (Scenario 4)
  • / Wait for space for writing caller must hold
    device semaphore.
  • On error the semaphore will be released before
    returning. /
  • static int scull_getwritespace(struct scull_pipe
    dev,
  • struct file filp)
  • while (spacefree(dev) 0) / full /
  • DEFINE_WAIT(wait)
  • up(dev-gtsem)
  • if (filp-gtf_flags O_NONBLOCK) return
    -EAGAIN
  • prepare_to_wait(dev-gtoutq, wait,
    TASK_INTERRUPTIBLE)
  • if (spacefree(dev) 0) schedule()
  • finish_wait(dev-gtoutq, wait)
  • if (signal_pending(current)) return
    -ERESTARTSYS
  • if (down_interruptible(dev-gtsem)) return
    -ERESTARTSYS
  • return 0

Task state INTERRUPTIBLE
Queue full
65
Advanced Sleeping (Scenario 4)
  • / Wait for space for writing caller must hold
    device semaphore.
  • On error the semaphore will be released before
    returning. /
  • static int scull_getwritespace(struct scull_pipe
    dev,
  • struct file filp)
  • while (spacefree(dev) 0) / full /
  • DEFINE_WAIT(wait)
  • up(dev-gtsem)
  • if (filp-gtf_flags O_NONBLOCK) return
    -EAGAIN
  • prepare_to_wait(dev-gtoutq, wait,
    TASK_INTERRUPTIBLE)
  • if (spacefree(dev) 0) schedule()
  • finish_wait(dev-gtoutq, wait)
  • if (signal_pending(current)) return
    -ERESTARTSYS
  • if (down_interruptible(dev-gtsem)) return
    -ERESTARTSYS
  • return 0

wake up
Task state INTERRUPTIBLE ? RUNNING
Queue !full
66
More Examples of Advanced Sleeping
  • See linux/wait.h
  • Implementations of wait_event, and
    wait_event_interruptible

67
Exclusive Waits
  • Avoid waking up all processes waiting on a queue
  • Wake up only one process
  • Call
  • void prepare_to_wait_exclusive(wait_queue_heat_t
    queue,
  • wait_queue_t
    wait, int state)
  • Set the WQ_FLAG_EXCLUSIVE flag
  • Add the queue entry to the end of the wait queue
  • wake_up stops after waking the first process with
    the flag set

68
The Details of Waking Up
  • / wakes up all processes waiting on the queue /
  • void wake_up(wait_queue_head_t queue)
  • / wakes up processes that perform an
    interruptible sleep /
  • void wake_up_interruptible(wait_queue_head_t
    queue)
  • / wake up to nr exclusive waiters /
  • void wake_up_nr(wait_queue_head_t queue, int
    nr)
  • void wake_up_interruptible_nr(wait_queue_head_t
    queue, int nr)
  • / wake up all exclusive waiters /
  • void wake_up_all(wait_queue_head_t queue)
  • void wake_up_interruptible_all(wait_queue_head_t
    queue)
  • / do not lose the CPU during this call /
  • void wake_up_interruptible_sync(wait_queue_head_t
    queue)

69
Ancient History sleep_on
  • Not safe
  • Deprecated

70
Testing the scullpipe Driver
  • Window 1
  • cat /dev/scullpipe
  • Window2

71
Testing the scullpipe Driver
  • Window 1
  • cat /dev/scullpipe
  • Window2
  • ls aF gt /dev/scullpipe

72
Testing the scullpipe Driver
  • Window 1
  • cat /dev/scullpipe
  • ./
  • ../
  • file1
  • file2
  • Window2
  • ls aF gt /dev/scullpipe

73
poll and select
  • Nonblocking I/Os often involve the use of poll,
    select, and epoll system calls
  • Allow a process to determine whether it can read
    or write open files without blocking
  • Can block a process until any of a set of file
    descriptors becomes available for reading or
    writing
  • select introduced in BSD Linux
  • poll introduced in System V
  • epoll added in 2.5.45 for better scaling

74
poll and select
  • All three calls supported through the poll method
  • unsigned int (poll) (struct file filp,
  • poll_table wait)
  • 1. Call poll_wait on one or more wait queues that
    could indicate a change in the poll status
  • If no file descriptors are available, wait
  • 2. Return a bit mask describing the operations
    that could be immediately performed without
    blocking

75
poll and select
  • poll_table defined in ltlinux/poll.hgt
  • To add a wait queue into the poll_table, call
  • void poll_wait(struct file ,
  • wait_queue_head_t ,
  • poll_table )
  • Bit mask flags defined in ltlinux/poll.hgt
  • POLLIN
  • Set if the device can be read without blocking

76
poll and select
  • POLLOUT
  • Set if the device can be written without blocking
  • POLLRDNORM
  • Set if normal data is available for reading
  • A readable device returns (POLLIN POLLRDNORM)
  • POLLWRNORM
  • Same meaning as POLLOUT
  • A writable device returns (POLLOUT POLLWRNORM)
  • POLLPRI
  • High-priority data can be read without blocking

77
poll and select
  • POLLHUP
  • Returns when a process reads the end-of-file
  • POLLERR
  • An error condition has occurred
  • POLLRDBAND
  • Out-of-band data is available for reading
  • Associated with sockets
  • POLLWRBAND
  • Data with nonzero priority can be written to the
    device

78
poll and select
  • Example
  • static unsigned int scull_p_poll(struct file
    filp,
  • poll_table
    wait)
  • struct scull_pipe dev filp-gtprivate_data
  • unsigned int mask 0
  • down(dev-gtsem)
  • poll_wait(filp, dev-gtinq, wait)
  • poll_wait(filp, dev-gtoutq, wait)
  • if (dev-gtrp ! dev-gtwp) / circular buffer not
    empty /
  • mask POLLIN POLLRDNORM / readable /
  • if (spacefree(dev)) / circular buffer not full
    /
  • mask POLLOUT POLLWRNORM / writable /
  • up(dev-gtsem)
  • return mask

79
poll and select
  • No end-of-file support
  • The reader sees an end-of-file when all writers
    close the file
  • Check dev-gtnwriters in read and poll
  • Problem when a reader opens the scullpipe before
    the writer
  • Need blocking within open

80
Interaction with read and write
  • Reading from the device
  • If there is data in the input buffer, return at
    least one byte
  • poll returns POLLIN POLLRDNORM
  • If no data is available
  • If O_NONBLOCK is set, return EAGAIN
  • poll must report the device unreadable until one
    byte arrives
  • At the end-of-file, read returns 0, poll returns
    POLLHUP

81
Interaction with read and write
  • Writing to the device
  • If there is space in the output buffer, accept at
    least one byte
  • poll reports that the devices is writable by
    returning POLLOUT POLLWRNORM
  • If the output buffer is full, write blocks
  • If O_NONBLOCK is set, write returns EAGAIN
  • poll reports that the file is not writable
  • If the device is full, write returns -ENOSPC

82
Interaction with read and write
  • In write, never wait for data transmission before
    returning
  • Or, select may block
  • To make sure the output buffer is actually
    transmitted, use fsync call

83
Interaction with read and write
  • To flush pending output, call fsync
  • int (fsync) (struct file file, struct dentry
    dentry, int datasync)
  • Should return only when the device has been
    completely flushed
  • datasync
  • Used by file systems, ignored by drivers

84
The Underlying Data Structure
85
The Underlying Data Structure
  • When the poll call completes, poll_table is
    deallocated with all wait queue entries removed
  • epoll reduces this overhead of setting up and
    tearing down the data structure between every I/O

86
Asynchronous Notification
  • Polling
  • Inefficient for rare events
  • A solution asynchronous notification
  • Application receives a signal whenever data
    becomes available
  • Two steps
  • Specify a process as the owner of the file (so
    that the kernel knows whom to notify)
  • Set the FASYNC flag in the device via fcntl
    command

87
Asynchronous Notification
  • Example (user space)
  • / create a signal handler /
  • signal(SIGIO, input_handler)
  • / set current pid the owner of the stdin /
  • fcntl(STDIN_FILENO, F_SETOWN, getpid())
  • / obtain the current file control flags /
  • oflags fcntl(STDIN_FILENO, F_GETFL)
  • / set the asynchronous flag /
  • fcntl(STDIN_FILENO, F_SETFL, oflags FASYNC)

88
Asynchronous Notification
  • Some catches
  • Not all devices support asynchronous notification
  • Usually available for sockets and ttys
  • Need to know which input file to process
  • Still need to use poll or select

89
The Drivers Point of View
  • 1. When F_SETOWN is invoked, a value is assigned
    to filp-gtf_owner
  • 2. When F_SETFL is executed to change the status
    of FASYNC
  • The drivers fasync method is called
  • static int
  • scull_p_fasync(int fd, struct file filp, int
    mode)
  • struct scull_pipe dev filp-gtprivate_data
  • return fasync_helper(fd, filp, mode,
    dev-gtasync_queue)

90
The Drivers Point of View
  • fasync_helper adds or removes processes from the
    asynchronous list
  • void fasync_helper(int fd, struct file filp, int
    mode,
  • struct fasync_struct fa)
  • 3. When data arrives, send a SIGNO signal to all
    processes registered for asynchronous
    notification
  • Near the end of write, notify blocked readers
  • if (dev-gtasync_queue)
  • kill_fasync(dev-gtasync_queue, SIGIO, POLL_IN)
  • Similarly for read, as needed

91
The Drivers Point of View
  • 4. When the file is closed, remove the file from
    the list of asynchronous readers in the release
    method
  • scull_p_fasync(-1, filp, 0)

92
The llseek Implementation
  • Implements lseek and llseek system calls
  • Modifies filp-gtf_pos
  • loff_t scull_llseek(struct file filp, loff_t
    off, int whence)
  • struct scull_dev dev filp-gtprivate_data
  • loff_t newpos
  • switch(whence)
  • case 0 / SEEK_SET /
  • newpos off
  • break
  • case 1 / SEEK_CUR, relative to the current
    position /
  • newpos filp-gtf_pos off
  • break

93
The llseek Implementation
  • case 2 / SEEK_END, relative to the end of
    the file /
  • newpos dev-gtsize off
  • break
  • default / can't happen /
  • return -EINVAL
  • if (newpos lt 0) return -EINVAL
  • filp-gtf_pos newpos
  • return newpos

94
The llseek Implementation
  • Does not make sense for serial ports and keyboard
    inputs
  • Need to inform the kernel via calling
    nonseekable_open in the open method
  • int nonseekable_open(struct inode inode,
    struct file filp)
  • Replace llseek method with no_llseek (defined in
    ltlinux/fs.hgt in your file_operations structure

95
Access Control on a Device File
  • Prevents unauthorized users from using the device
  • Sometimes permits only one authorized user to
    open the device at a time

96
Single-Open Devices
  • Example scullsingle
  • static atomic_t scull_s_available
    ATOMIC_INIT(1)
  • static int scull_s_open(struct inode inode,
    struct file filp)
  • struct scull_dev dev scull_s_device
  • if (!atomic_dec_and_test(scull_s_available))
  • atomic_inc(scull_s_available)
  • return -EBUSY / already open /
  • / then, everything else is the same as before
    /
  • if ((filp-gtf_flags O_ACCMODE) O_WRONLY)
    scull_trim(dev)
  • filp-gtprivate_data dev
  • return 0 / success /

Returns true, if the tested value is 0
97
Single-Open Devices
  • In the release call, marks the device idle
  • static int
  • scull_s_release(struct inode inode, struct file
    filp)
  • atomic_inc(scull_s_available) / release the
    device /
  • return 0

98
Restricting Access to a Single User (with
multiple processes) at a Time
  • Example sculluid
  • Includes the following in the open call
  • spin_lock(scull_u_lock)
  • if (scull_u_count / someone is using the
    device /
  • (scull_u_owner ! current-gtuid) / not the
    same user /
  • (scull_u_owner ! current-gteuid) / not the
    same effective uid (for su) /
  • !capable(CAP_DAC_OVERRIDE)) / not root
    override /
  • spin_unlock(scull_u_lock)
  • return -EBUSY / -EPERM would confuse the user
    /
  • if (scull_u_count 0) scull_u_owner
    current-gtuid
  • scull_u_count
  • spin_unlock(scull_u_lock)

99
Restricting Access to a Single User (with
Multiple Processes) at a Time
  • Includes the following in the release call
  • static int scull_u_release(struct inode inode,
  • struct file filp)
  • spin_lock(scull_u_lock)
  • scull_u_count-- / nothing else /
  • spin_unlock(scull_u_lock)
  • return 0

100
Blocking open as an Alternative to EBUSY
(scullwuid)
  • A user might prefer to wait over getting errors
  • E.g., data communication channel
  • spin_lock(scull_w_lock)
  • while (!scull_w_available())
  • spin_unlock(scull_w_lock)
  • if (filp-gtf_flags O_NONBLOCK) return -EAGAIN
  • if (wait_event_interruptible(scull_w_wait,
  • scull_w_available()
    ))
  • return -ERESTARTSYS / tell the fs layer to
    handle it /
  • spin_lock(scull_w_lock)
  • if (scull_w_count 0) scull_w_owner
    current-gtuid
  • scull_w_count
  • spin_unlock(scull_w_lock)

101
Blocking open as an Alternative to EBUSY
(scullwuid)
  • The release method wakes pending processes
  • static int scull_w_release(struct inode inode,
  • struct file filp)
  • int temp
  • spin_lock(scull_w_lock)
  • scull_w_count--
  • temp scull_w_count
  • spin_unlock(scull_w_lock)
  • if (temp 0)
  • wake_up_interruptible_sync(scull_w_wait)
  • return 0

102
Blocking open as an Alternative to EBUSY
  • Might not be the right semantics for interactive
    users
  • Blocking on cp vs. getting a return value EBUSY
    or -EPERM
  • Incompatible policies for the same device
  • One solution one device node per policy

103
Cloning the Device on open
  • Allows the creation of private, virtual devices
  • E.g., One virtual scull device for each process
    with different tty device number
  • Example scullpriv

104
Cloning the Device on open
  • static int scull_c_open(struct inode inode,
    struct file filp)
  • struct scull_dev dev
  • dev_t key
  • if (!current-gtsignal-gttty)
  • PDEBUG("Process \"s\" has no ctl tty\n",
    current-gtcomm)
  • return -EINVAL
  • key tty_devnum(current-gtsignal-gttty)
  • spin_lock(scull_c_lock)
  • dev scull_c_lookfor_device(key)
  • spin_unlock(scull_c_lock)
  • if (!dev) return -ENOMEM
  • .../ then, everything else is the same as
    before /

105
Cloning the Device on open
  • / The clone-specific data structure includes a
    key field /
  • struct scull_listitem
  • struct scull_dev device
  • dev_t key
  • struct list_head list
  • / The list of devices, and a lock to protect it
    /
  • static LIST_HEAD(scull_c_list)
  • static spinlock_t scull_c_lock
    SPIN_LOCK_UNLOCKED

106
Cloning the Device on open
  • / Look for a device or create one if missing /
  • static struct scull_dev scull_c_lookfor_device(de
    v_t key)
  • struct scull_listitem lptr
  • list_for_each_entry(lptr, scull_c_list, list)
  • if (lptr-gtkey key)
  • return (lptr-gtdevice)
  • / not found /
  • lptr kmalloc(sizeof(struct scull_listitem),
    GFP_KERNEL)
  • if (!lptr) return NULL

107
Cloning the Device on open
  • / initialize the device /
  • memset(lptr, 0, sizeof(struct scull_listitem))
  • lptr-gtkey key
  • scull_trim((lptr-gtdevice)) / initialize it
    /
  • init_MUTEX((lptr-gtdevice.sem))
  • / place it in the list /
  • list_add(lptr-gtlist, scull_c_list)
  • return (lptr-gtdevice)

108
Whats going on?
scull_listitem
struct scull_dev device dev_t key
scull_c_list
Write a Comment
User Comments (0)
About PowerShow.com