External Storage - PowerPoint PPT Presentation

About This Presentation
Title:

External Storage

Description:

External Storage Primary Storage : Main Memory (RAM). Secondary Storage: Peripheral Devices Disk Drives Tape Drives Secondary storage is CHEAP. Secondary storage is ... – PowerPoint PPT presentation

Number of Views:121
Avg rating:3.0/5.0
Slides: 19
Provided by: WillT154
Category:
Tags: external | heap | sort | storage

less

Transcript and Presenter's Notes

Title: External Storage


1
External Storage
  • Primary Storage Main Memory (RAM).
  • Secondary Storage Peripheral Devices
  • Disk Drives
  • Tape Drives
  • Secondary storage is CHEAP.
  • Secondary storage is SLOW, about 250,000 times
    slower. (i.e. can do 250,000 CPU instructions in
    one I/O time).
  • Primary storage is volatile.
  • Secondary storage is permanent.

2
Golden Rule of File Processing
  • Minimize the number of disk accesses!!
  • Arrange information so that you get what you want
    with few disk accesses.
  • Arrange information so you minimize future disk
    accesses.
  • An organization for data on disk is often called
    a file structure.
  • Disk based space/time tradeoff Compress
    information to save time by reducing disk
    accesses.

3
Disk Drives
  • Track - a circle around a disk that holds
    information.
  • Sector - arc portion of a track.
  • Interleaving factor - Physical distance between
    logically adjacent sectors on a track, to allow
    for processing of sector data.
  • Locality of Reference - If a record is read from
    a disk, the next request is likely to come from
    near the same place in the file.
  • Cluster - smallest unit of allocation usually
    several sectors.
  • Extent - group of physically contiguous clusters
  • Internal Fragmentation - wasted space within a
    sector if the record size is not the sector size.

4
Access Time
  • Seek Time - time for I/O head to reach desired
    track. Based on distance between I/O head and
    desired track.
  • F(n)tns where t is time to traverse one track
    and s is the startup time for the I/O head.
  • Rotational delay (latency) - time for data to
    rotate to I/O head position. Determined by disk
    RPM.
  • Transfer time - time for data to move under the
    I/O head. Determined by RPM and size of
    information to transfer.

5
Disk Access time example
  • 675 Mbyte disk drive
  • 15 platters --gt 45 Mbyte/platter
  • 612 tracks/platter
  • 150 sectors/track --gt 512 bytes/sector
  • 8 sectors/cluster -gt 4K/cluster -gt 18
    clusters/track
  • 1 track seek --gt .08ms
  • seek startup 3 ms
  • 1 revolution --gt 16.7ms
  • Interleaving factor of 3 -gt 3 revolutions to read
    1 track (50.1 msec)
  • How long to read a file of 128K divided into 256
    records of 512 bytes?
  • Uses 2 tracks (150 sectors on one 106 on other)
  • The average distance for seek is disk size/3 (not
    2!)

6
Access Time
  • total time initial seek second seek
    2(rotational delaytransfer time)
  • total (612/3.083) (.083) 2(.53)16.7
    139.3
  • Assumes clusters are contiguous and tracks are
    contiguous
  • If clusters are randomly spread across disk
  • time 32(612/3.08316.7/224/15016.7)
    969.6 msec
  • 24/150 is the part of the disk to read (8 sectors
    with an interleaving factor of 3).

7
Buffers
  • Read time for one track
  • 612/3.08 3 (3.5)16.7 77.8
  • Read time for one sector
  • 612/3.08 3 (.51/150)16.7 27.8
  • Read time for one byte
  • 612/3.08 3 (.5)16.7 27.7
  • Nearly all disk drives read/write one sector
    every time there is an I/O access.
  • The information in a sector is stored in a buffer
    in the operating system.

8
Buffer Pools
  • A series of buffers used by an application to
    cache disk data is called a buffer pool.
  • Double buffering - read data from disk while CPU
    is processing the previous buffer. Same with
    writing store written data into buffer while
    previous buffer is being physically placed on
    disk.
  • Many buffers - allows for large differences
    between I/O time and CPU time. Sometimes I/O may
    fill a buffer faster than CPU can empty it.

9
Programmers View of Files
  • Logical view of files
  • An array of bytes.
  • A file pointer marks the current position.
  • Three fundamental operations
  • Read bytes from current position (move file
    pointer).
  • Write byes to current position (move file
    pointer).
  • Set file pointer to specified position.

10
External Sorting
  • Problem sort data sets too large to fit in main
    memory.
  • Assume the data is stored on a disk drive.
  • To sort, portions of the data must be brought
    into main memory, processed, and returned to
    disk.
  • An external sort should minimize disk accesses.

11
External Computation Model
  • Secondary storage is divided into equal sized
    blocks.
  • The basic I/O operation transfers one block of
    information.
  • Under certain circumstances, reading blocks of a
    file in sequential order is more efficient.
  • Minimize seek time.
  • File physically sequential.
  • Head does not move between accesses - no
    timesharing

12
More Model
  • Typically, the time to perform a single block I/O
    operation is enough to Quicksort the contents of
    the block.
  • Most systems have single drive, so must sort on a
    single drive.
  • Need to minimize the number of block I/O
    operations.

13
Key Sorting
  • Often records are large while keys are small
  • Approach 1
  • read in records, sort them, write them out
  • Approach 2 (resembles pointer sort)
  • Read in only the key values
  • Store with each key the location on disk of its
    associated record.
  • Read in records in key order and write them out
    in sorted order

14
External sort Simple Mergesort
  • Quicksort requires random access to the entire
    set of records
  • A better process for external data is a modified
    Mergesort algorithm
  • This processes n elements in O(log n) passes.
  • A group of sorted records is called a run.

15
External Mergesort Algorithm
  • Split the file into two files.
  • Read a block from each file.
  • Take first record from each block, output them in
    sorted order.
  • Take next record from each block, output them to
    a second file in sorted order.
  • Repeat until finished, alternating between output
    files. Read new input blocks as needed. Now
    have runs of size 2
  • Repeat steps 2-5 except the input files have
    groups of 2 that need to be merged.
  • Each pass provides runs of twice the size.

16
Problems
  • Is each pass through then input and output files
    truly sequential?
  • If the files are all on the same disk, then the
    heads will be moving from one input file to
    another input file to an output file to another
    output file.
  • Very helpful if each file has its own disk head.
  • How do we reduce the number of passes.
  • At the beginning, read in as much data as
    possible and sort it internally and just write it
    out.
  • Can merge more than 2 runs at a time. - multiway
    merging.

17
THE MERGESORT
  • The Mergesort process has 2 phases
  • Break the file into large initial runs.
  • Merge the runs together to make a single sorted
    list

18
Replacement Selection
  • This method tries to maximize the size of initial
    runs
  • Break available memory into an array for a heap,
    an input buffer and an output buffer.
  • Fill the array from disk.
  • Make a min-heap
  • Send the smallest value to the output buffer
  • Read next key
  • If new key is greater than last output value
  • replace the root with this key and heapify
  • else
  • replace the root with the last key and heapify
  • add next record to a new heap (end of the array).
Write a Comment
User Comments (0)
About PowerShow.com