Improve Run Generation - PowerPoint PPT Presentation

About This Presentation
Title:

Improve Run Generation

Description:

Title: Data Representation Methods Author: Preferred Customer Last modified by: sahni Created Date: 6/17/1995 11:31:02 PM Document presentation format – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 31
Provided by: Preferr1140
Learn more at: https://www.cise.ufl.edu
Category:

less

Transcript and Presenter's Notes

Title: Improve Run Generation


1
Improve Run Generation
  • Overlap input,output, and internal CPU work.
  • Reduce the number of runs (equivalently, increase
    average run length).

2
Internal Quick Sort
Use 6 as the pivot (median of 3). Input first,
middle, and last blocks first. In-place
partitioning.
Input blocks from the ends toward the
middle. Sort left and right groups
recursively. Can begin output as soon as left
most block is ready.
3
Alternative Internal Sort Scheme
Partition into 3 areas, each may be more than 1
block in size.
DISK
DISK
4
Steady State Operation
  • Synchronization is done when the current internal
    sort terminates.

5
New Strategy
  • Use 2 input and 2 output buffers.
  • Rest of memory is used for a min loser tree.
  • Actually, 3 buffers adequate.

6
Steady State Operation
  • Synchronization is done when the active input
    buffer gets empty (the active output buffer will
    be full at this time).

7
Initialize
3
4
8
4
3
6
8
1
5
7
3
2
6
9
4
5
2
5
8
Fill From Disk
8
Initialize
3
6
1
4
8
5
7
4
3
6
8
1
5
7
3
2
6
9
4
5
2
5
8
Fill From Disk
9
Initialize
1
3
6
3
2
4
8
5
7
6
9
4
3
6
8
1
5
7
3
2
6
9
4
5
2
5
8
Fill From Disk
10
Initialize
1
3
2
6
3
2
4
4
8
5
7
5
8
6
9
4
3
6
8
1
5
7
3
2
6
9
4
5
2
5
8
Fill From Disk
11
Initialize
1
3
2
6
3
5
4
4
8
5
7
5
8
6
9
4
3
6
8
1
5
7
3
2
6
9
4
5
2
5
8
Fill From Disk
12
Initialize
2
3
2
6
3
5
4
4
8
5
7
5
8
6
9
4
3
6
8
1
5
7
3
2
6
9
4
5
2
5
8
Fill From Disk
13
Generate Run 1
Fill From Tree
3 5 4
Fill From Disk
14
Generate Run 1
Fill From Tree
1
2
3
2
6
3
5
4
4
8
5
7
5
8
6
9
3
4
3
6
8
5
7
3
2
6
9
4
5
2
5
8
3 5 4
Fill From Disk
15
Generate Run 1
Fill From Tree
1
2
3
3
2
6
3
5
4
4
8
5
7
5
8
6
9
5
3
4
3
6
8
5
7
3
6
9
4
5
2
5
8
3 5 4
Fill From Disk
16
Generate Run 1
Fill From Tree
1
2
3
2
3
2
6
3
5
4
4
8
5
7
5
8
6
9
4
5
3
4
3
6
8
5
7
3
6
9
4
5
5
8
3 5 4
Interchange Role Of Buffers
Fill From Disk
17
Interchange Role Of Buffers
Write To Disk
Fill From Tree
1
2
3
2
3
2
6
3
5
4
4
8
5
7
5
8
6
9
5
3
4
4
3
6
8
5
7
3
6
9
4
5
5
8
1 9 2
Fill From Disk
18
Continue With Run 1
Write To Disk
Fill From Tree
1
2
3
2
3
4
6
3
5
4
4
8
5
7
5
8
6
9
5
3
4
4
3
6
8
5
7
3
6
9
4
5
5
8
1 9 2
Fill From Disk
19
Continue With Run 1
Write To Disk
Fill From Tree
3
1
2
4
2
3
4
6
3
5
4
4
8
5
7
5
8
6
9
4
5
4
3
6
8
5
7
3
6
9
4
5
5
8
1
1 9 2
Fill From Disk
20
Continue With Run 1
Write To Disk
Fill From Tree
3
1
2
3
4
2
3
4
6
3
5
4
4
8
5
7
5
8
6
9
4
5
9
4
3
6
8
5
7
6
9
4
5
5
8
1
1 9 2
Fill From Disk
21
Continue With Run 1
Write To Disk
Fill From Tree
3
1
2
3
4
3
2
3
4
6
3
5
4
4
8
5
7
5
8
6
9
9
4
5
2
4
6
8
5
7
6
9
4
5
5
8
1
1 9 2
Interchange Role Of Buffers
Fill From Disk
22
Write To Disk
Fill From Tree
Interchange Role Of Buffers
3
3
4
3
3
4
6
3
5
4
4
8
5
7
5
8
6
9
9
4
5
2
4
6
8
5
7
6
9
4
5
5
8
1
6 1 3
Fill From Disk
23
Continue With Run 1
Write To Disk
Fill From Tree
3
3
4
3
3
4
6
3
5
4
4
8
5
7
5
8
6
9
9
4
5
2
4
6
8
5
7
6
9
4
5
5
8
1
6 1 3
Fill From Disk
24
Continue With Run 1
Write To Disk
Fill From Tree
3
4
3
4
3
3
4
6
3
5
4
4
8
5
7
5
8
6
9
2
9
4
5
6
6
8
5
7
6
9
4
5
5
8
1
6 1 3
Fill From Disk
25
Continue With Run 1
Write To Disk
Fill From Tree
3
4
3
4
4
3
3
6
3
5
4
4
8
5
7
5
8
6
9
2
9
4
5
6
1
6
8
5
7
6
9
5
5
8
1
6 1 3
Fill From Disk
26
RUN SIZE
  • Let k be number of external nodes in loser tree.
  • Run size gt k.
  • Sorted input gt 1 run.
  • Reverse of sorted input gt n/k runs.
  • Average run size is 2k.

27
Comparison
  • Memory capacity m records.
  • Run size using fill memory, sort, and output run
    scheme m.
  • Use loser tree scheme.
  • Assume block size is b records.
  • Need memory for 4 buffers (4b records).
  • Loser tree k m 4b.
  • Average run size 2k 2(m 4b).
  • 2k gt m when m gt 8b.

28
Comparison
  • Assume b 100.

29
Comparison
  • Total internal processing time using fill memory,
    sort, and output run scheme O((n/m) m
    log m) O(n log m).
  • Total internal processing time using loser tree
    O(n log k).
  • Loser tree scheme generates runs that differ in
    their lengths.

30
Merging Runs Of Different Length
22
22
13
7
15
7
Cost 42
Cost 44
Best merge sequence?
Write a Comment
User Comments (0)
About PowerShow.com