External Sorting - PowerPoint PPT Presentation

About This Presentation
Title:

External Sorting

Description:

CPSC 404, Laks V.S. Lakshmanan. 1. External Sorting. Chapter 13 (Sec. ... for me to tune the performance of such a sort algorithm & how do I tune them? ... – PowerPoint PPT presentation

Number of Views:173
Avg rating:3.0/5.0
Slides: 29
Provided by: laksvsla
Category:
Tags: external | sorting

less

Transcript and Presenter's Notes

Title: External Sorting


1
External Sorting
  • Chapter 13 (Sec. 13-1-13.5) Ramakrishnan
    Gehrke and
  • Chapter 11 (Sec. 11.4-11.5) G-M et al. (R2)
  • OR
  • Chapter 2 (Sec. 2.4-2.5) Garcia-et Molina al.
    (R1)

NOTE sorting using B-trees to be assigned for
reading after we cover B-trees.
2
What you will learn from this set of lectures
  • How to efficiently sort a file so large it wont
    all fit in memory?
  • What knobs are available for me to tune the
    performance of such a sort algorithm how do I
    tune them?

3
Motivation
  • A classic problem in computer science!
  • Data requested in sorted order
  • e.g., arrange movie records in increase order of
    ratings.
  • Why is it important?
  • is first step in bulk loading (i.e., creating) B
    tree index.
  • useful also for eliminating duplicate copies in a
    collection of records for aggregation (Why?)
    can you think of any other strategy for these
    tasks?
  • (later) for efficient joins of relations, i.e.,
    Sort-merge algo.
  • Problem sort 1GB of data with 50MB of RAM.

4
I/O Computation Model
  • disk I/O (read/write a block) very expensive
    compared to processing it once its in memory.
  • random block accesses very common.
  • reasonable model for computation using secondary
    storage ? count only disk I/Os.

5
Desiderata for good DBMS algorithms
  • overlap I/O and cpu processing as much as
    possible.
  • use as much data from each block read as
    possible depends on record clustering.
  • cluster records that are accessed together in
    consecutive blocks/pages.
  • buffer frequently accessed blocks in memory.

6
Merge Sort Overview
  • Idea given B buffer pages of main memory
  • (Sort phase)
  • read in B pages of data each time and sort
    internally
  • suppose entire data stored in m B pages
  • sort phase produces m sorted runs (sublists)
    of B pages each

7
Merge Sort Overview
  • (Merge phase)
  • repeatedly merge two sorted runs
  • (Pass 0 the sort phase)
  • Pass 1 produces m/2 sorted runs of 2B pages
    each
  • Pass 2 produces m/4 sorted runs of 4B pages
    each
  • continue until one sorted run of m B pages
    produced
  • 2-way merge can be optimized to k-way merge,
    where k can be as large as B-1 (see Ch. 13.2 and
    (R1 OR R2) chapters).

8
Sorting Example
  • Setup
  • 10M records of 100 bytes 1GB file.
  • Stored on Megatron 747 disk, with 4KB blocks,
    each holding 40 records header information.
  • Suppose each cylinder 1MB. ? entire file takes
    up 1000 cylinders ? 250,000 blocks.
  • 50M available main memory 50 x 106/(4000)
    12,500 blocks 1/20th of file.
  • Sort by primary key field. gt gt

9
Merge Sort
  • Common mainmemory sorting algorithms don't
    optimize disk I/O's. Variants of Merge Sort do
    better.
  • Merge take two sorted lists and repeatedly
    choose the smaller of the heads'' of the lists
    (head first among the unchosen).
  • Example merge 1,3,4,8 with 2,5,7,9
    1,2,3,4,5,7,8,9.
  • Merge Sort based on recursive algorithm divide
    list of records into two parts recursively
    mergesort the parts, and merge the resulting
    lists.

10
TwoPhase, Multiway Merge Sort
  • Vanilla Merge Sort still not very good in disk
    I/O model.
  • log2 n passes, so each record is read/written
    from disk log2 n times.
  • 2PMMS 2 reads 2 writes per block.
  • Phase 1
  • 1. Fill buffer with records.
  • 2. Sort using favorite mainmemory sort.
  • 3. Write sorted sublist to disk.
  • 4. Repeat until all records have been put into
    one of the sorted sublists.
  • Sorted sublist SSL sorted run.

11
TwoPhase, Multiway Merge Sort
12
TwoPhase, Multiway Merge Sort
13
TwoPhase, Multiway Merge Sort
14
TwoPhase, Multiway Merge Sort
15
2PMMS (contd.)
  • Phase 2
  • Use one buffer for each of the sorted sublists
    and one buffer for output.
  • i.e., split available buffer into several parts
    (logical buffers) and allocate for various
    purposes.


Available Memory
16
2PMMS (contd.)
  • Phase 2
  • Use one buffer for each of the sorted sublists
    and one buffer for output.

Pointers to first unchosen
Select the smallest unchosen
Output buffer
17
2PMMS (contd.)
  • Phase 2
  • Use one buffer for each of the sorted sublists
    and one buffer for output.

5
Pointers to first unchosen
8
6
Select the smallest unchosen
5
Output buffer
18
2PMMS (contd.)
  • Initially load input buffers with the first
    blocks of their respective sorted lists.
  • Repeatedly run a competition among the first
    unchosen records of each of the buffered blocks.
  • Move the record with the least key to the output
    block it is now chosen.''
  • Manage the buffers as needed
  • If an input block is exhausted, get the next
    block from the same list (i.e., file).
  • If the output block is full, write it to disk.
  • (assumes each buffer holds just one block. In
    practice,you design for each buffer to hold as
    many blocks as possible.)

19
Analysis of Naïve Implementation
  • Recall problem parameters.
  • Assume blocks are stored at random, so average
    access time includes seek rotational delay
    transfer, and is about 15 ms.
  • File stored on 250,000 blocks, read and written
    once in each phase.
  • 1,000,000 disk I/O's 0.015 seconds 15,000
    seconds 250 minutes 4 hours.
  • Problem and Interlude
  • How many records can you sort with 2PMMS, under
    our assumptions about records, main memory, and
    the disk? What would you do if there were more?

20
Improving the Running Time of 2PMMS
  • Here are some techniques that sometimes make
  • secondarymemory algorithms more efficient
  • 1. Group blocks by cylinder (cylindrification).
  • 2. One big disk ? several smaller disks.
  • 3. Mirror disks multiple copies of the same
  • data.
  • 4. Prefetching'' or double buffering.''
  • 5. Disk scheduling the elevator'' algorithm.

?
?
21
Cylindrification
  • If we are going to read or write blocks in a
    known order, place them by cylinder, so once we
    reach that cylinder we can read block after
    block, with no seek time or rotational latency.
  • Application to Phase 1 of 2PMMS Recall problem
    parameters.
  • 1. Initially, records on 1000 consecutive
    cylinders.
  • 2. Load main memory from 50 consecutive
    cylinders.
  • Order of blocks read is unimportant, so only time
  • besides transfer time is one random seek
  • and 49 1cylinder seeks (neglect).
  • Time to transfer 12,500 blocks at 0.5 ms/block
    6.25 sec.
  • 3. Write each sorted sublist onto 50 consecutive
    cylinders,
  • so write time is also about 6.25 sec. What about
    for the whole file?

Single sorted sublist perspective
But in Phase 2 Blocks are read from the 20
sublists in a datadependent order, so we cannot
take much advantage of cylinders. (why 20
sublists? cos there is 1000 cylinders worth of
data) .
22
Prefetching
  • If we have extra space for mainmemory buffers,
    consider loading buffers in advance of need.
    Similarly, keep output blocks buffered in main
    memory until it is convenient to write them to
    disk.
  • Example Phase 2 of 2PMMS
  • With 50MB of main memory, we can afford to
    buffer two cylinders for each sublist and for the
    output (recall one cylinder 1MB).
  • Consume one cylinder for each sublist while the
    other is being loaded from disk.
  • Similarly, write one output cylinder while the
    other is being constructed (via merge).

23
Prefetching
  • Thus, seek and rotational latency are made
    negligible for Phase I (thanks to
    cylindrification) and are minimized for phase II
    (thanks to prefetching).
  • So, the total time for phase 2 is approx. one
    read plus one write of whole file, but one
    cylinderful at a time.

24
Prefetching/Double buffering Illustrated
i/p buffer
o/p buffer
  • Alternately process different halves of each
    input buffer while the
  • other part is being filled from disk.
  • Similar idea for output buffer.

25
Using B Trees for Sorting
  • Scenario Table to be sorted has B tree index on
    sorting column(s).
  • Idea Can retrieve records in order by traversing
    leaf pages.
  • Is this a good idea?
  • Cases to consider
  • B tree is clustered Good idea!
  • B tree is not clustered Could be a very bad idea!

26
Clustered B Tree Used for Sorting
  • Cost root to the left-most leaf, then retrieve
    all leaf pages (Alternative 1)
  • If Alternative 2 is used? Additional cost of
    retrieving data records each page fetched just
    once.

Index
(Direct search)
Data Entries
("Sequence set")
Data Records
27
Unclustered B Tree Used for Sorting
  • Alternative (2) for data entries each data entry
    contains rid of a data record. In general, one
    I/O per data record!

Index
(Directs search)
Data Entries
("Sequence set")
Data Records
28
Conclusion Sorting Records!
  • Sorting has become a blood sport!
  • Parallel sorting is the name of the game ...
  • Datamation Sort 1M records of size 100 bytes
  • Typical DBMS 15 minutes
  • World record 3.5 seconds
  • 12-CPU SGI machine, 96 disks, 2GB of RAM
  • These records beaten big time in 2010
  • UCSD Sort gt 1 TB in 60 s! Sort 1 trillion
    records in 172 min.!
  • New benchmarks proposed
  • Minute Sort How many can you sort in 1 minute?
  • Dollar Sort How many can you sort for 1.00?
Write a Comment
User Comments (0)
About PowerShow.com