Linux Virtual File System - PowerPoint PPT Presentation

About This Presentation
Title:

Linux Virtual File System

Description:

Describe methods and invariants needed to implement a new ... BSD implemented VFS for NFS: aim dispatch to different filesystems. VMS had elaborate filesystem ... – PowerPoint PPT presentation

Number of Views:886
Avg rating:3.0/5.0
Slides: 33
Provided by: bra94
Category:
Tags: file | linux | system | virtual | vms

less

Transcript and Presenter's Notes

Title: Linux Virtual File System


1
Linux Virtual File System
  • Peter J. Braam

2
Aims
  • Present the data structures in Linux VFS
  • Provide information about flow of control
  • Describe methods and invariants needed to
    implement a new file system
  • Illustrate with some examples

3
History
File access
  • BSD implemented VFS for NFS aim dispatch to
    different filesystems
  • VMS had elaborate filesystem
  • NT/Win95 have VFS type interfaces
  • Newer systems integrate VM with buffer cache.

VFS
nfs
ufs
Coda
disk
Venus
udp
4
Linux Filesystems
  • Media based
  • ext2 - Linux native
  • ufs - BSD
  • fat - DOS FS
  • vfat - win 95
  • hpfs - OS/2
  • minix - well.
  • Isofs - CDROM
  • sysv - Sysv Unix
  • hfs - Macintosh
  • affs - Amiga Fast FS
  • NTFS - NTs FS
  • adfs - Acorn-strongarm
  • Network
  • nfs
  • Coda
  • AFS - Andrew FS
  • smbfs - LanManager
  • ncpfs - Novell
  • Special ones
  • procfs -/proc
  • umsdos - Unix in DOS
  • userfs - redirector to user

5
Linux Filesystems (ctd)
  • Forthcoming
  • devfs - device file system
  • DFS - DCE distributed FS
  • Varia
  • cfs - crypt filesystem
  • cfs - cache filesystem
  • ftpfs - ftp filesystem
  • mailfs - mail filesystem
  • pgfs - Postgres versioning file system
  • Linux serves (unrelated to the VFS!)
  • NFS - user kernel
  • Coda
  • AppleShare - netatalk/CAP
  • SMB - samba
  • NCP - Novell

6
Linux is Obsolete
Usefulness
  • Andrew Tanenbaum

7
Linux VFS
  • Multiple interfaces build up VFS
  • files
  • dentries
  • inodes
  • superblock
  • quota
  • VFS can do all caching provides utility fctns
    to FS
  • FS provides methods to VFS many are optional

File access
VFS
nfs
VFS
ext2fs
Coda FS
VFS
disk
udp
Venus
8
User level file access
  • Typical user level types and code
  • pathnames /myfile
  • file descriptors fd open(/myfile)
  • attributes in struct stat stat(/myfile,
    mybuf), chmod, chown...
  • offsets write, read, lseek
  • directory handles DIR dh opendir(/mydir)
  • directory entries struct dirent ent
    readdir(dh)

9
VFS
  • Manages kernel level file abstractions in one
    format for all file systems
  • Receives system call requests from user level
    (e.g. write, open, stat, link)
  • Interacts with a specific file system based on
    mount point traversal
  • Receives requests from other parts of the kernel,
    mostly from memory management

10
File system level
  • Individual File Systems
  • responsible for managing file directory data
  • responsible for managing meta-data timestamps,
    owners, protection etc
  • translates data between
  • particular FS data e.g. disk data, NFS data,
    Coda/AFS data
  • VFS data attributes etc in standard format
  • e.g. nfs_getattr(.) returns attributes in VFS
    format, acquires attributes in NFS format to do
    so.

11
Anatomy of stat system call
sys_stat(path, buf) dentry namei(path)
if ( dentry NULL ) return -ENOENT inode
dentry-gtd_inode rc inode-gti_op-gti_permission(i
node) if ( rc ) return -EPERM rc
inode-gti_op-gti_getattr(inode, buf)
dput(dentry) return rc
Establish VFS data
Call into inode layer of filesystem
Call into inode layer of filesystem
12
Anatomy of fstatfs system call
sys_fstatfs(fd, buf) / for things
like df / file fget(fd) if ( file
NULL ) return -EBADF superb
file-gtf_dentry-gtd_inode-gti_super rc
superb-gtsb_op-gtsb_statfs(sb, buf) return rc
Translate fd to VFS data structure
Call into superblock layer of filesystem
13
Data structures
  • VFS data structures for
  • VFS handle to the file inode (BSD vnode)
  • User instantiated file handle file (BSD file)
  • The whole filesystem superblock (BSD vfs)
  • A name to inode translation dentry

14
Shorthand method notation
  • super block methods sss_methodname
  • inode methods iii_methodname
  • dentry methods ddd_methodname
  • file methods fff_methodname
  • instead of
  • inode i_op lookup we write iii_lookup

15
namei
FS
VFS
struct dentry namei(parent, name) if (dentry
d_lookup(parent,name)) else
ddd_hash(parent, name) ddd_revalidate(dentry) iii
_lookup(parent, name) sss_read_inode()
struct inode iget(ino, dev) / try cache
else .. /
16
Superblocks
  • Handle metadata only (attributes etc)
  • Responsible for retrieving and storing metadata
    from the FS media or peers
  • Struct superblocks hold things like
  • device, blocksize, dirty flags, list of dirty
    inodes
  • super operations
  • wait queue
  • pointer to the root inode of this FS

17
Super Operations (sss_)
  • Ops on Inodes
  • read_inode
  • put_inode
  • write_inode
  • delete_inode
  • clear_inode
  • notify_change
  • Superblock manips
  • read_super (mount)
  • put_super (unmount)
  • write_super (unmount)
  • statfs (attributes)

18
Inodes
  • Inodes are VFS abstraction for the file
  • Inode has operations (iii_methods)
  • VFS maintains an inode cache, NOT the individual
    FSs (compare NT, BSD etc)
  • Inodes contain an FS specific area where
  • ext2 stores disk block numbers etc
  • AFS would store the FID
  • Extraordinary inode ops are good for dealing with
    stale NFS file handles etc.

19
Whats inside an inode - 1
list_head i_hash list_head i_list list_head
i_dentry int i_count long i_ino int
i_dev m,a,ctime u,gid mode size n_link
caching
Identifies file
Usual stuff
20
Whats inside an inode -2
superblock i_sb inode_ops i_op wait objects,
semaphore lock vm_area_struct pipe/socket
info page information union
ext2fs_inode_info i_ext2 nfs_inode_info i_nfs
coda_inode_info i_coda .. u
Which FS
For mmap, networking waiting
FS Specific info blocknos fids etc
21
Inode state
  • Inode can be on one or two lists
  • (hash in_use) or (hash dirty ) or unused
  • inode has a use count i_count
  • Transitions
  • unused ? hash iget calls sss_read_inode
  • dirty? in_use sss_write_inode
  • hash ? unused call on sss_clear_inode, but if
  • i_nlink 0 iput calls sss_delete_inode when
    i_count falls to 0

22
Inode Cache
1. iget if i_countgt0 2. iput if i_countgt1 - -
3. free_inodes 4. syncing inodes
Players
Inode_hashtable
sss_clear_inode (freeing inos) or sss_delete_inode
(iput)
sss_read_inode (iget)
Unused inodes
Dirty inodes
sss_write_inode (sync one)
media fs only (mark_inode_dirty)
Used inodes
23
Sales
Red Hat Software sold 240,000 copies of Red Hat
Linux in 1997 and expects to reach 400,000 in
1998.Estimates of installed servers
(InfoWorld)- Linux 7 million- OS/2 5
million- Macintosh 1 million
24
Inode operations (iii_)
  • symbolic links
  • readlink
  • follow link
  • pages
  • readpage, writepage, updatepage - read or write
    page. Generic for mediafs.
  • bmap - return disk block number of logical block
  • special operations
  • revalidate - see dentry sect
  • truncate
  • permission
  • lookup return inode
  • calls iget
  • creation/removal
  • create
  • link
  • unlink
  • symlink
  • mkdir
  • rmdir
  • mknod
  • rename

25
Dentry world
  • Dentry is a name to inode translation structure
  • Cached agressively by VFS
  • Eliminates lookups by FS private caches
  • timing on Coda FS ls -lR 1000 files after
    priming cache
  • linux 2.0.32 7.2secs
  • linux 2.1.92 0.6secs
  • disk fs less benefit, NFS even more
  • Negative entries!
  • Namei is dramatically simplified

26
Inside dentrys
  • name
  • pointer to inode
  • pointer to parent dentry
  • list head of children
  • chains for lots of lists
  • use count

27
Dentry associated lists
Legend
inode
dentry
dentry inode relationship
dentry tree relationship
inode I_dentry list head
inode i_dentry list head
d_inode pointer
d_parent pointer
d_child chains place d_alloc remove d_prune,
d_invalidate, d_put
d_alias chains place d_instantiate remove
dentry_iput
28
Dcache
dentry_hashtable (d_hash chains)
  • namei tries cache d_lookup
  • ddd_compare
  • Success ddd_revalidate
  • d_invalidate if fails
  • proceed if success
  • Failure iii_lookup
  • find inode
  • iget
  • sss_read_inode
  • finish
  • d_add
  • can give negative entry in dcache

dhash(parent, name) list head
namei iii_lookup d_add
prune d_invalidate d_drop
unused dentries (d_lru chains)
29
Dentry methods
  • ddd_revalidate can force new lookup
  • ddd_hash compute hash value of name
  • ddd_compare are names equal?
  • ddd_delete, ddd_put, ddd_iput FS cleanup
    opportunity

30
Dentry particulars
  • ddd_hash and ddd_compare have to deal with
    extraordinary cases for msdos/vfat
  • case insensitive
  • long and short filename pleasantries
  • ddd_revalidate -- can force new lookup if inode
    not in use
  • used for NFS/SMBfs aging
  • used for Coda/AFS callbacks

31
Style
Dijkstra probably hates me Linus Torvalds
32
Memory mapping
  • vm_area structure has
  • vm_operations
  • inode, addresses etc.
  • vm_operations
  • map, unmap
  • swapin, swapout
  • nopage -- read when page isnt in VM
  • mmap
  • calls on iii_readpage
  • keeps a use count on the inode until unmap
Write a Comment
User Comments (0)
About PowerShow.com