Disks and Files - PowerPoint PPT Presentation

About This Presentation
Title:

Disks and Files

Description:

Title: Processes, Threads and Address Spaces Author: Kai Li Last modified by: Valued Sony Customer Created Date: 6/17/1995 11:31:02 PM Document presentation format – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 30
Provided by: Kai45
Category:

less

Transcript and Presenter's Notes

Title: Disks and Files


1
Disks and Files
  • Vivek Pai
  • Princeton University

2
Why Files
  • Physical reality
  • Block oriented
  • Physical sector s
  • No protection among users of the system
  • Data might be corrupted if machine crashes
  • Filesystem model
  • Byte oriented
  • Named files
  • Users protected from each other
  • Robust to machine failures

3
File Structures
  • Byte sequence
  • Read or write a number of bytes
  • Unstructured or linear
  • Record sequence
  • Fixed or variable length
  • Read or write a number of records
  • Tree
  • Records with keys
  • Read, insert, delete a record (typically using
    B-tree)

4
File Structures Today
  • Stream of bytes
  • Simplest to implement in kernel
  • Easy to manipulate in other forms
  • Little performance loss
  • More complicated structures
  • Hardware assist fell out of favor
  • Special-purpose hardware slower, costly

5
File Types
  • ASCII plain text
  • A Unix executable file
  • header magic number, sizes, entry point, flags
  • Text (code)
  • Data
  • relocation bits
  • symbol table
  • Devices
  • Everything else in the system

6
So What Makes Filesystems Hard?
  • Files grow and shrink in pieces
  • Little a priori knowledge
  • 6 orders of magnitude in file sizes
  • Overcoming disk performance behavior
  • Desire for efficiency
  • Coping with failure

7
File System Components
User
  • Disk management
  • Arrange collection of disk blocks into files
  • Naming
  • User gives file name, not track or sector number,
    to locate data
  • Security
  • Keep information secure
  • Reliability/durability
  • When system crashes, lose stuff in memory, but
    want files to be durable

File Naming
File access
Disk management
Disk drivers
8
Some Definitions
  • File descriptor (fd) an integer used to
    represent a file easier than using names
  • Metadata data about data - bookkeeping data
    used to eventually access the real data
  • Open file table system-wide list of descriptors
    in use

9
Kinds of Metadata
  • inode index node, or a specific set of
    information kept about each file
  • Two forms on disk and in memory
  • Directory names and location information for
    files and subdirectories
  • Note stored in files in Unix
  • Superblock contains information to describe the
    file system, disk layout
  • Information about free blocks/inodes on disk

10
Contents of an Inode
  • Disk inode
  • File type, size, blocks on disk
  • Owner, group, permissions (r/w/x)
  • Reference count
  • Times creation, last access, last mod
  • Inode generation number
  • Padding other stuff
  • 128 bytes on classic Unix

11
Directories in Unix
  • Stored like regular files
  • Contents are file names and inode s
  • Names are nul-terminated strings
  • Logic
  • Separates file from location in tree
  • File can appear in multiple places
  • What are the drawbacks?

12
Effects of Corruption
  • inode file gets damaged
  • Maybe some free block gets viewed
  • Directory lose files/directories
  • Might get to read deleted files
  • Superblock cant figure out anything
  • This is why we replicate the superblock

13
Data Structures for A Typical File System
Process control block
Open file table (systemwide)
Memory Inode
Disk inode
Open file pointer array
. . .
14
Opening A File
fd open( FileName, access)
  • File name lookup and authentication
  • Copy the file metadata into the in-memory data
    structure, if it is not in yet
  • Create an entry in the open file table (system
    wide) if there isnt one
  • Create an entry in PCB
  • Link up the data structures
  • Return a pointer to user

PCB
Allocate link up data structures
Open file table
File name lookup authenticate
Metadata
File system on disk
15
Reading And Writing
  • What happens when you
  • read 10 bytes from a file?
  • write 10 bytes into an existing file?
  • write 1024 bytes into a file?
  • Disk works on blocks (sectors)
  • Can have temporary (ephemeral) buffers
  • Longer lasting buffers disk cache

16
Reading A Block
read( fd, userBuf, size )
PCB
Open file table
Get physical block to sysBuf copy to userBuf
Metadata
read( device, phyBlock, size )
Buffer cache
Logical ? phyiscal
Disk device driver
17
A Disk Layout for A File System
Super block
File metadata (i-node in Unix)
File data blocks
Boot block
  • Superblock defines a file system
  • size of the file system
  • size of the file descriptor area
  • free list pointer, or pointer to bitmap
  • location of the file descriptor of the root
    directory
  • other meta-data such as permission and various
    times
  • For reliability, replicate the superblock

18
File Usage Patterns
  • How do users access files?
  • Sequential bytes read in order
  • Random read/write element out of middle of
    arrays
  • Whole file or partial file
  • How are files used?
  • Most files are small
  • Large files use up most of the disk space
  • Large files account for most of the bytes
    transferred
  • Bad news
  • Need everything to be efficient

19
Data Structures for Disk Management
  • A header for each file (part of the file
    meta-data)
  • Disk sectors associated with each file
  • A data structure to represent free space on disk
  • Bit map
  • 1 bit per block (sector)
  • blocks numbered in cylinder-major order, why?
  • Linked list
  • Others?
  • How much space does a bit map need for a 4G disk?

20
Linked Files (Alto)
  • File header points to 1st block on disk
  • Each block points to next
  • Pros
  • Can grow files dynamically
  • Free list is similar to a file
  • Cons
  • random access horrible
  • unreliable losing a block means losing the rest

File header
. . .
null
21
Contiguous Allocation
  • Request in advance for the size of the file
  • Search bit map or linked list to locate a space
  • File header
  • first sector in file
  • number of sectors
  • Pros
  • Fast sequential access
  • Easy random access
  • Cons
  • External fragmentation
  • Hard to grow files

22
Single-Level Indexed Files orExtent-based
Filesystems
  • A user declares max size
  • A file header holds an array of pointers to point
    to disk blocks
  • Pros
  • Can grow up to a limit
  • Random access is fast
  • Cons
  • Clumsy to grow beyond limit
  • Periodic cleanup of new files
  • Up-front declaration a real pain

Disk blocks
File header
23
File Allocation Table (FAT)
  • Approach
  • A section of disk for each partition is reserved
  • One entry for each block
  • A file is a linked list of blocks
  • A directory entry points to the 1st block of the
    file
  • Pros
  • Simple
  • Cons
  • Always go to FAT
  • Wasting space

0
foo
217
217
619
399
EOF
619
399
FAT
24
Multi-Level Indexed Files (Unix)
data
  • 13 Pointers in a header
  • 10 direct pointers
  • 11 1-level indirect
  • 12 2-level indirect
  • 13 3-level indirect
  • Pros Cons
  • In favor of small files
  • Can grow
  • Limit is 16G and lots of seek
  • What happens to reach block 23, 5, 340?

data
1
2
. . .
data
. . .

11
12
13
. . .

data
. . .

. . .

data
. . .

. . .

25
Challenges
  • Unix filesystem has great flexibility
  • Extent-based filesystems have speed
  • Seeks kill performance locality
  • Bitmaps show contiguous free space
  • Linked lists easy to search
  • How do you perform backup/restore?

26
Bigger, Faster, Stronger
  • Making individual disks larger is hard
  • Throw more disks at the problem
  • Capacity increases
  • Effective access speed may increase
  • Probability of failure also increases
  • Use some disks to provide redundancy
  • Generally assume a fail-stop model
  • Fail-stop versus Byzantine failures

27
RAID (Redundant Array of Inexpensive Disks)
  • Main idea
  • Store the error correcting codes on other disks
  • General error correcting codes are too powerful
  • Use XORs or single parity
  • Upon any failure, one can recover the entire
    block from the spare disk (or any disk) using
    XORs
  • Pros
  • Reliability
  • High bandwidth
  • Cons
  • The controller is complex

RAID controller
XOR
28
Synopsis of RAID Levels
RAID Level 0 Non redundant (JBOD)
RAID Level 1Mirroring
RAID Level 2Byte-interleaved, ECC
RAID Level 3Byte-interleaved, parity
RAID Level 4Block-interleaved, parity
RAID Level 5Block-interleaved, distributed
parity
29
Did RAID Work?
  • Performance yes
  • Reliability yes
  • Cost no
  • Controller design complicated
  • Fewer economies of scale
  • High-reliability environments dont care
  • Now also software implementations
Write a Comment
User Comments (0)
About PowerShow.com