Title: UNIX
1UNIX The Kernel
2UNIX Internals Motivations
- Knowledge of UNIX Internals helps in
- understanding similar systems (for example, NT,
LINUX) - designing high performance UNIX applications
3WHAT IS THE KERNEL?
- Part of UNIX OS that contains code for
- controlling execution of processes (creation,
termination, suspension, communication) - scheduling processes fairly for execution on the
CPU. - allocating main memory for exec of processes.
- allocating secondary memory for efficient storage
and retrieval of user data. - Handling peripherals such as terminals, tape
drives, disk drives and network devices.
4Kernel Characteristics
- Kernel loaded into memory and runs until the
system is turned off or crashes. - Mostly written in C with some assembly language
written for efficiency reasons. - User programs make use of kernel services via the
system call interface. - Provides its services transparently.
5Kernel Subsystems
- File system
- Directory hierarchy, regular files, peripherals
- Multiple file systems
- Process management
- How processes share CPU, memory and signals
- Input/Output
- How processes access files, terminal I/O
- Interprocess Communication
- Memory management
- System V and BSD have different implementations
of different subsystems.
6TALKING TO THE KERNEL
- Processes accesses kernel facilities via system
calls - Peripherals communicate with the kernel via
hardware interrupts.
7EXECUTION IN USER MODE AND KERNEL MODE
- Kernel contains several data structures needed
for implementing kernel services. These
structures include - Process table contains an entry for every
process in the system - Open-file table, contains at least one entry for
every open file in the system.
8Execution in kernel mode and user mode
- When a process executes a system call, the
execution mode of the process changes from user
mode to kernel mode. - Processes in user mode can access their own
instructions and data but not kernel instructions
and data structures. - In kernel mode, a process can access system data
structures, such as the process table.
9Flow of Control during a System call
- User process invokes a system call (for example
open( )) - Every system call is allocated a code number at
system initialization. - C runtime library version of the system call
places the system call parameter and the system
call code number into machine registers and then
executes a trap machine instruction switching to
kernel code and kernel mode.
10Flow of control of a system call
- trap instruction uses the system call number as
in index into a system call vector table (located
in kernel memory) which is an array of pointers
to the kernel code for each system call. - Code corresponding to system call executes in
kernel mode, modifying kernel data structures if
necessary. - Performs special "return" instruction that flips
machine back into user mode and returns to the
user process's code
11SYNCHRONOUS VS ASYNCHRONOUS PROCESSING
- Usually, processes performing system calls cannot
be preempted. - Processes must relinquish voluntarily the CPU for
example while waiting for I/O to complete. - Kernel sends a process to sleep and will wake it
up when I/O is completed. - The scheduler does not allocate sleeping process
any CPU time and will allocate the CPU to other
processes while the hardware device is servicing
the I/O request.
12INTERRUPTS AND EXCEPTIONS
- UNIX system allows devices such as I/O
peripherals and clock to interrupt CPU
asynchronously. - On receipt of the interrupt, kernel saves its
current context (frozen image of what the process
was doing), determines cause of interrupt and
services the interrupt. - Devices are allocated an interrupt priority based
in their relative importance. - When the kernel services an interrupt, it blocks
out lower priority interrupts but services higher
priority interrupts
13PROCESSOR EXECUTION LEVELS
- Kernel must sometimes prevent the occurrence of
interrupts during critical activity to avoid
corruption of data. - Typical Interrupt Levels
- Machine Errors
- Clock
- Higher priority
- Disk
- Network Devices
- Terminals
- Software Interrupts Lower priority
14Interrupts
- Interrupts are serviced by kernel interrupt
handlers which must be very fast to avoid loosing
any interrupts. - If an interrupt of higher priority occurs while a
lower interrupt is services, nesting will occur
and higher interrupt is serviced.
15DISK ARCHITECTURE
- Disk is split in two ways sliced like a pizza
called sectors - And subdivided into concentric rings called
tracks. - Blocks are are individual areas bounded by the
intersection of sectors and tracks they are the
basic units of disk storage. - Typical blocks can hold 4K bytes.
16Disk architecture (cont)
- Several variations of disk architecture many
disks contains several platters, stacked one upon
the other. In these systems, a collection of
tracks with the same index number is called a
cylinder. - Big issue sequential reads are much faster than
random ones (factor of 10 to 15) - When a sequence of contiguous blocks is read,
there is a latency delay between each block due
to latency of the communication between the disk
controller and the device driver.
17Disk architecture (cont)
- Want consecutive data to be on the same track
though not consecutive on the track. See
interleaving techniques wherein - Consecutive blocks are three sectors apart.
- Extent file systems support large consecutive
chunks at once. - (needed for data intensive applications)
- I/O is always done in terms of blocks
18THE FILE SUBSYSTEM
- Support of
- Regular files
- Directory
- Special files correspond to peripherals such as
tapes, terminals or disks and inter-process
communication mechanisms such as pipes and
sockets.
19INODES
- Contains permissions, owner, groups and last
modification times. - Type of file regular, directory or special file
- If it is symbolic link, the value of the symbolic
link. - If it is a regular file or directory, contains
location of its disks blocks
20Inode (cont)
- Direct pointers to block 0 to 9
- Indirect pointer to an entire block which holds
10 .. 1033 blocks. - Double indirect pointer (in primary inode) to a
block that is just pointers to other blocks, each
of which holds 1024 pointers to data blocks.
21LAYOUT OF THE FILE SYSTEM
- File system has following structure
- First logical block boot block for starting OS.
- Second logical block superblock that contains
information about free pages and inode list. - Following is the inode list which is a list of
inodes. Administrators specify size of inode list
when configuring the file system. Kernel
references inodes by index into the inode list. - The data blocks start at the end of the inode
list and contain file data and administrative
data.
22CONVERSION OF PATHNAME TO AN INODE
- Initial access to a file is through its pathname.
The kernel needs to translate a pathname to
inodes to access files. - The algorithm namei parses the pathname one
component at a time, converting each component
into an inode based on its name and the directory
being searched and eventually returns the inode
of the input path name.
23Namei ALGORITHM
- if pathname is absolute, then search starts from
the root inode - if pathname is relative, search is started from
the inode corresponding to the current working
directory of the process. (kept in the process u
area) - the components of the pathname are then processed
from left to right. Every component, except the
last one, should either be a directory or a
symbolic link. Let's call the intermediate inodes
the working inodes.
24Namei algorithm (cont)
- If the working inode is a directory, the current
pathname component is looked for in the directory
corresponding to the working inode. If it is not
found, it returns an error, otherwise, the value
of the working inode number becomes the inode
number associated with the located pathname
component.
25Namei (cont)
- If the working inode corresponds to a symbolic
link, the pathname up to and including the
current path component is replaced by the
contents of the symbolic link, and the pathname
is reprocessed. - The inode corresponding to the final pathname
component is the inode of the file referenced by
the entire pathname
26MOUNTING FILE SYSTEMS
- When UNIX is started, the directory hierarchy
corresponds to the file system located on a
single disk called the root device. - The mount utility allows a super-user to splice
the root directory of a file system into the
existing directory hierarchy. - File systems created on other devices can be
attached to the original directory hierarchy
using the mount mechanism.
27MOUNT(CONT')
- When mount is established, users are unaware of
crossing mount points. - File system may be detached from the main
hierarchy using the umount utility. - Links do not work across mounts (System V)
- Example
- mount /dev/floppy /mtn
- umount /mtn
28Mount (cont)
- Kernel maintains a system-wide data structure
called the mount table that allows multiple file
systems to be accessed via a single directory
hierarchy. - mount( ) and umount( ) system calls modify table,
in the following manner
29MOUNT (CONT')
- with mount( ), an entry is added with
- device number containing file system
- a pointer to the root inode of the newly mounted
file system - a pointer to the inode of the mount point
- a pointer to the file-system-specific mount data
structure of the newly mounted file system.
30Umount ()
- With umount() several checks are made in the
kernel - checks that there are no open files in the file
system to be un-mounted - flushes the superblock and buffered inodes back
to the file system - removes mount table entry and removes "mount
point" mark from the mount point directory
31THE PROCESS SUBSYSTEM process states
- Every process on the system can be in one of 6
states - running process is currently using the CPU
- runnable ready to run, will run depending on
priority - sleeping waiting for an event
- suspended (e.g., as a result of ctrl Z)
- idle being created by fork( ), not yet runnable
- zombie terminated but parent has not accept its
return value
32Example of process state
- For example, when process issues an I/O command,
it becomes suspended, then becomes runnable again
when I/O completes and will run depending on
priority.
33PROCESS COMPOSITION
- code area executable (text) portion of the
process - data area used by the process to contain static
data - stack area used by the process to store
temporary data - user area holds housekeeping info
- page tables used for memory management
34USER AREA
- Every process has a private user area for
housekeeping information that is used by the
kernel for process management. - It contains control and status information.
- The contents of the user area are only accessible
when the process is executing in kernel space. - The kernel can only access the user area of the
currently running process, and not the user area
of other processes.
35PROCESS USER AREA (CONT')
- The important fields in the user area include
- a pointer to the process table slot of the
currently executing process - file descriptors for all open files
- internal I/O parameters
- current directory and current root
- process and file size limits
- real and effective user Ids
- an array indicating how a process reacts to
signals - how much CPU time process has recently used
36PROCESS TABLE
- The process table is a kernel data structure that
contains one entry for every process in the
system. - The process table contains fields that must
always be accessible to the kernel.
37Process entry info
- state (running, runnable, sleeping, suspended,
idle or zombified) - process ID and Parent PID
- its real and effective user ID and group ID (GID)
- location of its code, data, stack and user areas
- a list of all pending signals
- various timers give process execution time and
kernel resource utilization
38THE SCHEDULER
- The scheduler is responsible for sharing CPU time
between competing processes. - The scheduler maintains a multilevel priority
queue that allows it to schedule processes
efficiently and follows a specific algorithm for
selecting which process should be running.
39Scheduling Rules
- The kernel allocates the CPU to a process for a
time quantum, preempts a process that exceeds its
time quantum and feeds it back into one of the
several priority queues. - During every second, processes in the non-empty
queue of the highest priority queue are allocated
the CPU is a round-robin fashion.
40Scheduler (cont)
- To support real-time processes, scheduler needs
to be changed so - that scheduling is based on priority inheritance
rather than time quanta. Also, more preemption
points in the kernel are needed.
41Context Switch
- To switch from one process to another, the kernel
saves the process's program counter, stack
pointer and other important info in the process's
user area. - When the process is ready to run, the kernel will
get this info from the process's user area.
42Loading an executable
- A user compiles the source code of a program to
create an executable file, which consists of
several parts - Set of "headers" that describe the attributes of
a file - Program text
- Machine language representation
- Other sections, such as symbol table information
43Loading an executable
- Kernel loads an executable file into memory
during an exec( ) system call. - Loaded process contains at least 3 parts, called
regions - Text corresponds to text sections of the
executable file - Data corresponds to data section of the
executable file - Stack is automatically created and its size is
dynamically adjusted by the kernel at run time.
44Loading an executable
- Compiler generates address for a virtual address
space with a given address range. - Memory Management Unit translates virtual
addresses generated by the compiler into
addresses of physical memory.
45THE BOOT and INIT PROCESS
- administrator initializes system through
bootstrap sequence - UNIX system, bootstrap sequence eventually reads
the boot block (boot 0) of a disk and loads into
memory - The program contained in the boot block loads the
kernel from the File system (for example, /unix) - After kernel loaded into memory, boot program
transfers control to the start address of the
kernel and the kernel starts running.
46Boot process (cont)
- After initialization, kernel mounts root file
system and handcrafts environment for process 0. - Process 0 forks() from within kernel.
- Process 1, running in kernel mode, creates its
user-level context by allocating a data region
and attaching to its address space.
47Boot process (cont)
- Process 1 copies code from kernel space to new
regions which forms new user-context of process
1. - Process 1 sets up saved user registers contexts,
"returns" from kernel mode and executes code just
copied from kernel. - Process 1 is now a user-level process and the
text code consists of a call to exec the
/etc/init program.
48The End