Title: Distributed File Systems
1Chapter 6
2Topics
- Review of UNIX
- Sun NFS
- VFS architecture
- caching
3Layered Structure
- Directory service
- Mapping file name? unique file ID
- Access control
- File service
- Mapping file ID ? inode
- File access
- Block service
- Block management
- Device access
Directory
File service
Block service
4Hierarchical Directory Systems
- A general hierarchy a tree of directories
User directory
root
directory
directory
directory
file
directory
file
file
directory
directory
file
file
file
file
5File System Layout
- Disk is divided up into several partitions
- Each partition has one file system
- MBR master boot record
- boot the computer contain the partition table
- Partition table
- Starting ending addresses of each partition
- One partition is marked as active
- Within each partition
- Boot block first block, a program loads the OS
- Superblock key parameters about the file sys.
6Implementing Files
- Key issue how to keep track of which disk
sectors go with which file? - E.g., block size 512B, file size2014B, so where
are these 2014/5144 blocks on disk? - Many methods
- Contiguous allocation
- Linked list allocation
- I-nodes
- Each one has its own pros and cons
7Index Nodes (i-nodes)
- An i-node lists the attributes and disk addresses
of the files blocks - Only when a file is open, its i-nodes should be
loaded into memory - Much smaller than FAT
- Irrelevant to size of disk
Disk block containing additional disk addresses
8i-node and 3-level index
i-node
4 KB
1
4 KB
1K pointers
12
13
14
1K pointers
15
1K pointers
9Managing open files in File Service layer
1
1
2
10Implementing Directories
- Directory system map the ASCII file name onto
the info needed to locate the data - Directory entry
- Where are the attributes stored?
- In the directory entry (MS-DOS/Windows)
- In the i-nodes (UNIX)
i-node
DOS/Windows
UNIX
11Implementing Directories Example
.
Lnk_cnt2
Lnk_cnt1
..
foo
Hello world!
6
6
4
bin
.
..
usr
4
VMUNIX
5
2
vmunix
.
5
local
3
..
foo
6
3
/usr/bin
bin2
8
8
12Locate A File /usr/ast/mbox
Block 406 is /usr/ast dir.
I-node 6 is for /usr
I-node 26 is for /usr/ast
root
Block 132 is /usr dir.
/usr/ast is in block 406
/usr is in block 132
/usr/ast/mbox is i-node 60
Looking up usr yields i-node 6
/usr/ast is i-node 26
13How to Share A File?
- If directory entry has addresses of blocks
- How about new appended blocks?
- Addresses of Disk blocks stored separately
- UNIX i-node approach
- Symbolic linking create a link file containing
the path name
Dir A
Dir A
Dir A
Dir B
Dir C
Dir B
Dir C
Dir B
Dir C
i-node
Link file ../Dir C/File1
File 1
File 1
Directory entry contains disk address
File 1
Symbolic linking
14Caching
- Reserve a set of blocks in main memory as disk
sectors cache - How cache works?
- Maintenance of the cache
- Like page replacement FIFO, LRU, etc.
Front (LRU)
Rear (MRL)
Hash table
15Write Important Blocks Back First
- Write critical blocks back to disk immediately
after they are updated (write-through) - Reduce the probability of inconsistency greatly
- Write-through cache modified blocks are written
back immediately - Compared to delayed-write
- Dont keep data blocks in memory for too long
- Force synchronization periodically (per 30 sec)
16Block Read Ahead
- If a file is read sequentially, read block (k1)
when block k is in used by a process - If a file is randomly accessed, read ahead wastes
bandwidth - Detect the access patterns for open files
- Switch between read ahead or not according to
current pattern - Q how to use it on stateless or stateful servers?
17Mapping file systems to physical devices
18Mounting
19man mount
- Mount attaches a file system to the file
system hierarchy at the mount_point, which is
the pathname of a directory. If mount_point has
any contents prior to the mount operation,
these are hidden until the file system is
unmounted. -
- The table of currently mounted file systems
can be found by examining the mounted file
system information file. This is provided by a
file system that is usually mounted on
/etc/mnttab.
20NFS Architecture
21Stateless File Server
- Robust in the face of failures, but
- Not all operations are idempotent
- Like lock operation
- Longer messages
- Longer processing time
22Transparency
- Location transparency
- Path name (i.e. full name of file) does not say
where the file is located. - Location Independence
- Path name is independent of the server. Hence
you can move a file from server to server without
changing its name. - Have a namespace of files and then have some
(dynamically) assigned to certain servers. This
namespace would be the same on all machines in
the system. - Root transparency
- made up name
- / is the same on all systems
- This would ruin some conventions like /tmp
23NFS Protocols
- Mounting
- Analyze the pathname
- Request store file handler
- Static auto mounting
- Directory and file access
- Support most UNIX calls
- No support for open() and close()
24VFS/v-node Architecture
- Motivation share a common file server by an
arbitrary collection of clients and servers - Require a file-system independent framework for
file access - v-node (virtual i-node) for every open file in
the VFS layer - Check if a directory or file is local
- Contain a pointer pointing to an r-node (remote
i-node) in NFS client - VFS represent any file system
- Well-defined interface
- One for each file system
25Virtual File System
26v-node
- Data fields (struct v-node)
- Methods (struct vnodeops)
r-node
v_flag v_count v_type
v_vfsmountedhere
v_data
v_op
FS-dependent implementation of vnodeops (Shared
among Unix vnodes)c
vop_open vop_lookup vop_read
vop_mkdir vop_getaddr
Interface definition
FS-independent part
27VFS implementation
- Data fields (struct vfs)
- Methods (struct vfsops)
FS-dependent data
vfs_data
vfs_next vfs_fstype
vfs_vnodecoverd
vfs_op
vfs_mount vfs_root vfs_unmount
vfs_sync vfs_statvfs
FS-dependent implementation of vfsops
Interface definition
FS-independent part
28Struct vfs instance
- vfs_data
- vfs_ops
- vfs_next pointer to the next FS mounted
- vfs_fstype ufs, nfs, ext2fs, etc.
29Mounting
vfs
vfs
Root file system
Mounted file system
rootvfs
covers
belongs to
mounted here
ROOT
ROOT
/
/
/usr
vnode
vnode
vnode
v-nodes for mounted-on directories are kept in
main memory.
30Implementation
- Server export one or more of its directories for
access by remote clients - /etc/exports file, e.g.,
- /usr/local accesshostAhostB
- /usr/bin ro
- Client mount the exported directories
- Become part of its directory
- No difference between a local file or a remote
file - Two clients can communicate by sharing files in
their common directories.
31Mount A Remote File System
- Call mount program, specify the remote directory
and local mount point. - E.g., mount -t msdos /dev/ad0s1 /mnt/windows
- E.g., mount indus/usr/src /usr/src
- Parse the name and find the server
- Contact the server
- Receive the file handler
- Create a v-node for the remote directory in vfs
layer - Create a r-node in NFS client, pointed by the
v-node
32Mount (1)
Mounting (part of) a remote file system in NFS.
33Mount (2)
Mounting nested directories from multiple servers
in NFS.
34Automounting (1)
ps -fe grep automount
35Automounting (2)
- Using symbolic links with automounting.
- Can also be used with file replication.
36Open A Remote File
- Parse the file name
- Get the v-node and r-node of the mounted file
system - Ask NFS client to open the file
- Contact server and get the file handler for the
opened file - NFS client creates an r-node for the file
- vfs creates a v-node for the file
37File Attributes (1)
- Some general mandatory file attributes in NFS.
38File Attributes (2)
- Some general recommended file attributes.
39Semantics of File Sharing (1)
- On a single processor, when a read follows a
write, the value returned by the read is the
value just written. - In a distributed system with caching, obsolete
values may be returned.
40Semantics of File Sharing (2)
Modified session semantics changes to an open
file are initially visible only to the processes
on the same machine. Upon closed, the changes are
visible to other machines.
41UNIX Semantic
- Probably Unix doesn't quite do this.
- If a write is large (several blocks) do seeks for
each - During a seek, the process sleeps (in the kernel)
- Another process can be writing a range of blocks
that intersects the blocks for the first write. - The result could be (depending on disk
scheduling) that the result does not have a last
write. - Perhaps Unix semantics means - A read returns the
value stored by the last write providing one
exists.
42File Locking in NFS
More complicated with file replication.
43Client Caching (1)
- Q where to put the cache? a) user space b)
kernel space
44Client Caching (2)
- Using the NFS version 4 callback mechanism to
recall file delegation.
45Lease
- When a client wants a file, the server gives a
lease on it that specifies how long the copy is
valid - Client renew the lease before it expires
- No message sent when a lease expires
- How about client crash?
- How about server crash?
- Lease time and reboot time
46Cache Management Algorithms
47General Principles for DS
- Proposed by Satyanarayanan
- Clients have cycles to burn
- Cache whenever possible
- Exploit the usage properties
- Minimize system-wide knowledge and change
- Trust the fewest possible entities
- Batch work where possible
48Possible Trends
- Main memory file system
- Fiber optic network
- Effects on cache
- Mobile users
- Disconnection
- Geographic location
- Multimedia application
- VOD