Title: CENG 340 Data Management and File Structures
1CENG 340 Data Management and File Structures
2Introduction to File management
3Motivation
- Most computers are used for data processing (over
80 billion/year). A big growth area in the
information age - This course covers data processing from a
computer science perspective - Storage of data
- Organization of data
- Access to data
- Processing of data
4Data Structures vs File Structures
- Both involve
- Representation of Data
-
- Operations for accessing data
- Difference
- Data structures deal with data in main memory
- File structures deal with data in secondary
storage
5Where do File Structures fit in Computer Science?
Application
DBMS
File system
Operating System
Hardware
6Computer Architecture
data is manipulated here
- Semiconductors - Fast, expensive, volatile,
small
Main Memory (RAM)
data transfer
Secondary Storage
- disks, tape - Slow,cheap, stable, large
data is stored here
7- Advantages
- Main memory is fast
- Secondary storage is big (because it is cheap)
- Secondary storage is stable (non-volatile) i.e.
data is not lost during power failures - Disadvantages
- Main memory is small. Many databases are too
large to fit in MM. - Main memory is volatile, i.e. data is lost during
power failures. - Secondary storage is slow (10,000 times slower
than MM)
8How fast is main memory?
- Typical time for getting info from
- Main memory 12 nanosec 120 x 10-9 sec
- Magnetic disks 30 milisec 30 x 10-3 sec
- An analogy keeping same time proportion as above
- Looking at the index of a book 20 sec
- versus
- Going to the library 58 days
9Normal Arrangement
- Secondary storage (SS) provides reliable,
long-term storage for large volumes of data - At any given time, we are usually interested in
only a small portion of the data - This data is loaded temporarily into main memory,
where it can be rapidly manipulated and
processed. - As our interests shift, data is transferred
automatically between MM and SS, so the data we
are focused on is always in MM.
10Goal of the file structures
- Minimize the number of trips to the disk in order
to get desired information - Grouping related information so that we are
likely to get everything we need with only one
trip to the disk.
11Physical Files and Logical Files
- physical file a collection of bytes stored on a
disk or tape - logical file a "channel" (like a telephone line)
that connects the program to a physical file - The program (application) sends (or receives)
bytes to (from) a file through the logical file.
The program knows nothing about where the bytes
go (came from). - The operating system is responsible for
associating a logical file in a program to a
physical file in disk or tape. Writing to or
reading from a file in a program is done through
the operating system.
12Files
- The physical file has a name, for instance
myfile.txt - The logical file has a logical name (a varibale)
inside the program. - In C
- FILE outfile
- In C
- fstream outfile
13Basic File Processing Operations
- Opening
- Closing
- Reading
- Writing
- Seeking
14Opening Files
- Opening Files
- links a logical file to a physical file.
- In C
- FILE outfile
- outfile fopen(myfile.txt, w)
- In C
- fstream outfile
- outfile.open(myfile.txt, iosout)
15Closing Files
- Cuts the link between the physical and logical
files. - After closing a file, the logical name is free to
be associated to another physical file. - Closing a file used for output guarantees
everything has been written to the physical file.
(When the file is closed the leftover from the
buffer is flushed to the file.) - In C
- fclose(outfile)
- In C
- outfile.close()
16Reading
- Read data from a file and place it in a variable
inside the program. - In C
- char c
- FILE infile
- infile fopen(myfile.txt,r)
- fread(c, 1, 1, infile)
- In C
- char c
- fstream infile
- infile.open(myfile.txt,iosin)
- infile gtgt c
17Writing
- Write data from a variable inside the program
into the file. - In C
- char c
- FILE outfile
- outfile fopen(mynew.txt,w)
- fwrite(c, 1, 1, outfile)
- In C
- char c
- fstream outfile
- outfile.open(mynew.txt,iosout)
- outfile ltlt c
18Seeking
- Used for direct access an item can be accessed
by specifying its position in the file. - In C
- fseek(infile,0, 0) // moves to the beginning
- fseek(infile, 0, 2) // moves to the end
- fseek(infile,-10, 1) //moves 10 bytes from
- //current position
- In C
- infile.seekg(0,iosbeg)
- infile.seekg(0,iosend)
- infile.seekg(-10,ioscur)
19File Systems
- Data is not scattered hither and thither on disk.
- Instead, it is organized into files.
- Files are organized into records.
- Records are organized into fields.
20Example
- A student file may be a collection of student
records, one record for each student - Each student record may have several fields, such
as - Name
- Address
- Student number
- Gender
- Age
- GPA
- Typically, each record in a file has the same
fields.
21Properties of Files
- Persistance Data written into a file persists
after the program stops, so the data can be used
later. - Sharability Data stored in files can be shared
by many programs and users simultaneously. - Size Data files can be very large. Typically,
they cannot fit into MM.