File Organization and Indexing

About This Presentation

Title:

File Organization and Indexing

Description:

(2) If R B. Blocking. Blocking factor (bfr) ... Record size = R and block size = B. bfr = floor(B/R) and the rest is empty space ... If R B. Files of Records ... – PowerPoint PPT presentation

Number of Views:142

Avg rating:3.0/5.0

Slides: 30

Provided by: imadr

Category:

more less

Transcript and Presenter's Notes

Title: File Organization and Indexing

1
Lecture 26 (11/16/05)

File Organization and Indexing

2
Announcements

Report 4 write up
Exam II guideline

3
(No Transcript)
4
-Sectors ? hard-coded on the disk surface and
cannot be changed - Blocks (pages) ? During disk
formatting

- The block size B is fixed for each system
(B512 bytes to B4096 bytes)
blocks are transferred between disk and main
memory for processing

5
Disk Storage Devices

Preferred secondary storage device for high
storage capacity and low cost
The number of bytes that can be stored
A disk pack contains several magnetic disk
platters connected to a rotating spindle
Usually each has two surfaces
Disks are divided into concentric circular tracks
on each disk surface
A surface can double-sided or single-sided
Typical track capacities vary from 4 to 50 Kbytes

6
Disk Storage Devices

Because a track usually contains a large amount
of information, it is divided into smaller
sectors
The division of a track into sectors is
hard-coded on the disk surface and cannot be
changed
A track is divided into blocks (pages)
During disk formatting
The block size B is fixed for each system
Typical block sizes range from B512 bytes to
B4096 bytes
Whole blocks are transferred between disk and
main memory for processing

7
Disk Storage Devices

To read/write a block from/to disk
A read-write head moves to the track that
contains the block to be transferred
seek time
Disk rotation moves the block under the
read-write head for reading or writing
Rotational delay (or latency)
Transfer time
A physical disk block (hardware) address consists
of
a cylinder number (imaginary collection of tracks
of same radius from all recorded surfaces)
track number or surface number (within the
cylinder),
block number (within track)

8
Disk Storage Devices

A buffer is contiguous reserved area in main
memory that holds one or more block of data
Reading or writing a disk block is time consuming
Read disk ? buffer
Write buffer ? disk
Read/write operations work either on one block or
cluster of blocks (must fit in buffer) at a time
Principle of locality of reference

9
O.S. Modules
10
Records

Data is stored in form of records each of which
is a collection of fields
Records contain fields which have values of a
particular type
Record type or record format is collection of the
field names making up a record along with their
data types
Fields may be fixed length or variable length
E.g. VarChar(10)
Fixed and variable length records
Contain variable-length fields
Contain repeating groups (multi-valued
attributes)
Contain optional fields (can be null)
Contain records of different record formats
(mixed files)
Usually a file contains records from single
record type (or relation)
E.g. placing the grades of student next to their
records

11
Records

A system can easily identify and parse
fixed-length records
Each has the same size with the set of fields
(and fields lengths)
For variable-length records
with variable-length fields
Use a special delimiter or separator to terminate
fields
Delimiter should not appear in fields
Or, record length of field in bytes preceding the
field value
with repeating groups
Need a separator for the values of the repeating
group (s) and another for the fields

12
Records

with optional fields
If a lot of optional fields, store ltfield-name,
field-valuegt pairs rather than field values only
Otherwise, use nulls
Need a separator for field-names and
field-values, a second one for fields and a third
one for records (why ?)
in mixed files
Each record must be preceded by a record type

13
Variable length fields
Variable length and optional fields
14
Unspanned Block Organization for fixed-length
records
Spanned Block Organization for variable-length
records

- Blocking factor (bfr) bfr floor(B/R)
Used when
(1) When utilizing empty space or
(2) If RgtB

15
Blocking

Blocking factor (bfr) refers to the number of
records per block
There may be empty space in a block if an
integral number of records do not fit in one
block
Record size R and block size B
bfr floor(B/R) and the rest is empty space
Spanned blocking
Records can span a number of blocks
A pointer at the end of the first block points to
the block containing the remainder of the record
in 2nd block (if blocks are not contiguous)
Used
When utilizing empty space or
If RgtB

16
Files of Records

File records can be un-spanned (no record can
span two blocks) or spanned (a record can be
stored in more than one block)
The physical disk blocks that are allocated to
hold the records of a file can be contiguous,
linked, or indexed
In a file of fixed-length records, all records
have the same format
Usually, unspanned blocking is used with such
files
Files of variable-length records require
additional information to be stored in each
record, such as separator characters and field
types
Usually spanned blocking is used with such files

17
Files Headers

A file descriptor (or file header) includes
information that describes the file, such as
Record format (Field names/Field order/Field data
types)
Separator characters
The addresses of the file blocks on disk
To search for a record on disk one or more blocks
are copied into memory buffers
The buffers are then searched using information
in the file header
What if address is not known?
The main goal of file organization is locate the
block that contains a desired record with a
minimal number of block transfers

18
Heap Files

Also called a unordered or pile files
No order is enforced on the records of the file
Insert operation
New records are inserted at the end of the file
Record insertion is quite efficient
Search operation
To search for a record, a linear search through
the file records is necessary
This requires reading and searching half the file
blocks on the average, and is hence quite
expensive
Read_Ordered operation
Reading the records in order of a particular
field requires sorting the file records

19
Unordered Files

Delete operation
To delete a record we must first find it and then
either delete or mark it for deletion
The former causes wasted storage within the file
Both require periodic reorganization
Remove wasted storage or deleted records
Modify operation
if fixed-length, do it in its current position
if variable-length, delete the old one and then
reinsert the updated one

20
An Example Heap File
21
Sequential Files

Also called a sorted or ordered files
File records are kept sorted by the values of an
ordering field
Physically sorted
Insert operation
Insertion is expensive b/c records must be
inserted in the correct order
It is common to keep a separate unordered
overflow file for new records to improve
insertion efficiency
This is periodically merged with the main ordered
file
Delete operation
Expensive because of physical moving of the rest
of the records

22
(No Transcript)
23
Ordered Files

Search operation
A binary search can be used to search for a
record on its ordering field value
This requires reading and searching log2 of the
file blocks on the average, an improvement over
linear search
Search by a non-ordering field is expensive
Read_Ordered operation
Reading the records in order of the ordering
field is quite efficient
Reading the records in order of a non-ordering
field requires sorting the file records
Modify operation
Non-ordering attribute in place
Ordering attribute Delete and then reinsert
record in new correct

24
Indexed Files

A single-level index is an auxiliary file that
makes it more efficient to search for a record in
the data file
Create Index Sql command
CREATE INDEX part_of_name ON customer (name(10))
i.e., using the first 10 characters of the name
column
E.g. ISAM or MyISAM in MySQL (on Pk)
The index is usually specified on one field of
the file
although it could be specified on several fields
Usually, an index is a file of entries
ltfield value, pointer to recordgt
ordered by the field value
The index is called an access path on the field
Provides another (fact) access mechanism to the
file

25
Indexes as Access Paths

The index file usually occupies considerably less
disk blocks than the data file because its
entries are much smaller
sometimes the entries are even less (sparse
indexes)
A binary search on the index yields a pointer to
the file record --- all indexes are sorted
Indexes can also be characterized as dense or
sparse
A dense index has an index entry for every search
key value (and hence every record) in the data
file
A sparse (or nondense) index, on the other hand,
has index entries for only some of the search
values

26
Indexes as Access Paths

Primary Index
Defined on an ordered data file (by a key field)
i.e., no duplicates allowed for ordering field
Index provides faster access using the ordering
field
Includes one index entry for each block in the
data file
the index entry has the key field value for the
first record (usually) in the block, (called
block anchor)
A primary index is a nondense (sparse) index,
since it includes an entry for each disk block of
the data file
the keys for the anchor records rather than for
every search value in every record
One primary index per file

27
Search for Amir, John
28
Indexes as Access Paths

Clustering Index
Defined on an ordered data file (by a non-key
field)
Clustering field
Unlike primary index which requires that the
ordering field of the data file have a distinct
value for each record
Index provides faster access using the ordering
field
Includes one index entry for each distinct value
of the field
the index entry points to the first data block
that contains records with that field value
Dense or sparse?
One clustering index per file

29
Search for dept. 5 and then 7

Write a Comment

User Comments (0)

About PowerShow.com

File Organization and Indexing - PowerPoint PPT Presentation

File Organization and Indexing

(2) If R B. Blocking. Blocking factor (bfr) ... Record size = R and block size = B. bfr = floor(B/R) and the rest is empty space ... If R B. Files of Records ... – PowerPoint PPT presentation