Lecture 6 : External Sorting - PowerPoint PPT Presentation

About This Presentation
Title:

Lecture 6 : External Sorting

Description:

... the first 10 MB of each sorted chunk (call them input buffers) in main memory ... if there is no more data in the sorted chunk and do not use it for merging. ... – PowerPoint PPT presentation

Number of Views:78
Avg rating:3.0/5.0
Slides: 7
Provided by: cau2
Category:

less

Transcript and Presenter's Notes

Title: Lecture 6 : External Sorting


1
Lecture 6 External Sorting
  • Bong-Soo Sohn
  • Assistant Professor
  • School of Computer Science and Engineering
  • Chung-Ang University

2
External Sorting
  • Sorting algorithm that can handle massive amounts
    of data (using external memory)
  • Required when data does not fit into main memory
  • out-of-core algorithm vs in-core algorithm

3
Motivation
  • Sometimes the data to sort are too large to fit
    in memory (Why not virtual memory?)
  • Use external memory (disk)
  • Disk performance
  • seek time (major factor)
  • rotational latency
  • Transfer
  • Primary rule for disk access
  • Minimize the number of disk accesses
  • Assume external(secondary) memory is divided into
    equal sized blocks (ex. 1KB, 4KB, )
  • Block unit where data is stored and retrived

4
External Merge Sort Idea
  • EX) sorting 900MB of data using only 100MB of
    RAM
  • Read 100 MB of the data in main memory and sort
    by some conventional method (usually quicksort).
  • Write the sorted data to disk.
  • Repeat steps 1 and 2 until all of the data is
    sorted in 100 MB chunks, which now need to be
    merged into one single output file.
  • Read the first 10 MB of each sorted chunk (call
    them input buffers) in main memory (90 MB total)
    and allocate the remaining 10 MB for output
    buffer.
  • Perform a 9-way merging and store the result in
    the output buffer. If the output buffer is full,
    write it to the final sorted file. If any of the
    9 input buffers gets empty, fill it with the next
    10 MB of its associated 100 MB sorted chunk or
    otherwise mark it as exhausted if there is no
    more data in the sorted chunk and do not use it
    for merging.

5
2-way merge sort
  • of passes 5

W1
6
5-way merge sort
T1
  • we can reduce of passes
Write a Comment
User Comments (0)
About PowerShow.com