Title: Naming and Directories
1Naming and Directories
2Recall from the last time
- File system components
- Disk management organizes disk blocks into files.
- Many disk blocks management schemes
- A file header associates the file with its data
blocks - Naming provides file names and directories to
users. - Protection
- Reliability
3File Header Storage
- Under UNIX, a file header is stored in a data
structure called i-node - For early UNIX systems
- I-nodes are stored in a special array
- Fixed number of array entries
- Maximum number of files fixed
- Not stored near data blocks on disk
- Reading a small file involves
- One disk seek to get the i-node
- Other disk seek(s) to get file blocks
4Reasons for Separate Allocations
- Reliability
- Data corruptions are unlikely to affect i-nodes
- Reduced fragmentation
- File headers are smaller than a whole block
- By packing them in an array, multiple headers can
be fetched from disk - File headers are accessed more often
- e.g., ls
- Grouping file headers improves disk efficiency
5For BSD 4.2
- Portions of file header array stored on each
cylinder - For small directories
- All file headers and data stored in the same
cylinder - Reduce seek time
6Naming
- Naming allows users to issue file names instead
of i-node numbers - A mapping from names (paths) to I-nodes
- Similar to the DNS in the Internet.
7Directories
- A table of file names and their i-node numbers
- Under many file systems
- Directories are implemented as normal files
- Containing file names and i_node numbers
- Only the OS is permitted to modify directories
- Is this right?
8Name Space
- Flat name space
- Hierarchical naming
- Relational name space
- Contextual naming
- Content-based naming
9Flat Name Space
- All files are stored in a single directory
- Easy to implement
- - Not scalable for large directories
- Name collisions multiple files with the same
names
10Hierarchical Naming
- Uses multiple levels of directories
- Most popular name space organization
- Conceptual model maps well into the human model
of organizing things - A file cabinet contains many files
- Scalable
- The probability of name collisions decreases
- Spatial locality
- Store all files under a directory within a
cylinder to avoid disk seeks
11More on Hierarchical Naming
- Absolute path name consisting the path from the
root directory / to the file - e.g., /pets/cat.jpg
root directory
sub directory
file name
12Drawbacks of Hierarchical Naming
- - Not all files can fit into the hierarchical
model - - Accessing a file may involve many levels of
directory lookups, or a path resolution before
getting to the file content
13An Example of Path Resolution
- To access the data content of /pets/cat.jpg
- The system needs to perform the following disk
I/Os
14An Example of Path Resolution
- To access the data content of /pets/cat.jpg
- The system needs to perform the following disk
I/Os - 1. Read in the file header for the root directory
/ - Stored at a fixed location on disk
15An Example of Path Resolution
- To access the data content of /pets/cat.jpg
- The system needs to perform the following disk
I/Os - 2. Read the first data block for the root
directory - Lookup the directory entry for pets
pets
16An Example of Path Resolution
- To access the data content of /pets/cat.jpg
- The system needs to perform the following disk
I/Os - 3. Read the file header for pets
pets
pets
17An Example of Path Resolution
- To access the data content of /pets/cat.jpg
- The system needs to perform the following disk
I/Os - 4. Read the first data block for the pet
directory - Lookup the directory entry for cat.jpg
pets
pets
cat
18An Example of Path Resolution
- To access the data content of /pets/cat.jpg
- The system needs to perform the following disk
I/Os - 5. Read the file header for cat.jpg
pets
cat
pets
cat
19An Example of Path Resolution
- To access the data content of /pets/cat.jpg
- The system needs to perform the following disk
I/Os - 6. Read the data block for cat.jpg
pets
cat
pets
cat
20Some Performance Optimizations
- Top-level directories are usually cached
- A user inside a directory (e.g., /pets)
- Can issue relative path names (e.g., cat.jpg) to
refer files within the current directory
21Relational Name Space
- Hierarchical naming model is largely a tree
- One step beyond is the relational naming model,
which allows the construction of general graphs - A file can belong to multiple folders
- According to its attributes
- Files can be accessed in a manner similar to
relational databases
22Pros and Cons of Relational Name Space
- More flexible than hierarchical naming
- - May require a long list of attributes to name a
single piece of data - e.g., this lecture
- Keywords operating systems, file systems,
naming, PowerPoint XP - - Who will create those attributes?
23Contextual Naming
- Takes advantage of the observation that certain
attributes can be added automatically - e.g., when you try to open a file by Word, a
system will search only the file types supported
by Word (.doc, .txt, .html) - Avoids a long list of attributes
24Content-Based Naming
- Searches a file by its content instead of names
- File contents are extracted automatically
- e.g., I want a photo of a cat taken five years
ago - The system returns all files satisfying the
criteria
25Content-Based Naming
- - Requires advanced information processing
techniques - e.g., image recognition
- Many existing systems use manual indexing
- Automated content-based naming is still an active
area of research
26Example The Internet File System
- Can be viewed as a worldwide file system
- What is the naming scheme for the Internet file
system?
27The Internet File System
- Contains shades of various naming schemes
- Flat name space
- Each website provides a unique name
- Hierarchical name space
- Within individual websites
- Relational name space
- Can search the Internet via search engines
- Contextual name space
- Page ranked according to relevance
- Content-based name space
- You can find your information without knowing the
exact file names
28Example Plan 9
- Modern UNIX has a deep-rooted influence from the
Plan 9 OS - Developed by Bell lab
- Major design philosophy everything is a file
- A single hierarchical name space for
- Processes (e.g., /proc)
- Files
- IPC (e.g., pipe)
- Devices (e.g., /dev/fd0)
- Use open/close/read/write for everything
- e.g., /dev/mem