Cosequential Processing - PowerPoint PPT Presentation

1 / 42

About This Presentation

Title:

Cosequential Processing

Description:

To merge lists into a single sorted list (union) Make a single sorted list from many ... 20177 Cottage Cheese 5 392. 20179 Chicken Soup 6 32. 20231 T-bone 2 43 ... – PowerPoint PPT presentation

Number of Views:424

Avg rating:3.0/5.0

Slides: 43

Provided by: jims67

Category:

more less

Transcript and Presenter's Notes

Title: Cosequential Processing

1
Cosequential Processing

Chapter 8

2
Cosequential Processing

Coordinated processing of two or more sequential
lists
Goals
To merge lists into a single sorted list (union)
Make a single sorted list from many
To match records with the same keys
(intersection)
Apply transactions to a master file
Find entries which exist in multiple lists

3
Cosequential Processing

Keys
Matching/merging may be by a single key or
several.
Number of keys only affects compare operator, not
sort strategy

4
Master Transaction File Processing

Common processing strategy on sequential files.
Common since historically sequential processing
was the rule (tapes, cards)
Companies stored data in sequential files
Lists of transactions posted against these
record periodically.

5
Master Transaction File Processing

Consider a grocery store
Record of inventory for each type of item stored
in a large sequential file (master file)
As items sold, a the item number and quantity
sold posted (written) as records to a transaction
file
As trucks deliver new items, item numbers and
quantities are entered into the transaction file.
As new types of items are added to inventory, or
old items are discontinued, entries about this
are placed in the transaction file.

6
Master Transaction File Processing

grocery store example

Master File
Transaction File
Item Item Name Type Quan 20231 Shoe Shine (br)
6 4 20231 Shoe Shine (bl) 6 1 20177 Cottage
Cheese 5 392 20179 Chicken Soup 6 32 20231
T-bone 2 43 ....
Item Trans Quan Item Name 20231 U
-2 20231 U 50 20379 U -5 20443 U
-4 20445 A 40 Corn Chips 20532 A 300
Butter 20534 D 20558 U 200 ....
U - Update A - Add D - Delete
7
Master Transaction File Processing

Periodically update master from transaction

New Master File
Transaction File
Update Operation
Old Master File
Update Messages
8
Master Transaction File Processing

Transactions are applied against master.
New master is created
Invalid Transactions result in Message
Important changes in Messages - audit trail
Transaction and master must be in sorted order.

9
Master Transaction File Processing

Processing Scheme
Read record Mast from old Master and Trans from
Transaction
While more records in both files
if Add and Trans.ID lt Mast.ID, write Mast to new
master
else If Trans.ID Mast.ID then
If UPDATE then update record and write to new
master
If Delete then continue (no write)
else trasaction error
else write Mast to new master
Read next from transaction, next from old master
If more records in old master, write to new
master
If more records in transaction, give errors

10
Merging

Merge two (or more) sorted lists into a single
sorted list
May remove duplicates (union) or keep

Bill Gray Hillery Jenny Linda Mary Randy
Bill Cathy Fran Gray Hillery Jenny Kenny Linda Mar
y Pete Randy Sally Zeke
merge
Cathy Fran Kenny Pete Sally Zeke
11
Merging

Merge(List1,Max1,List2, Max2,Result)
int next1 0 next2 0 out 0
while Max1 gt next1 and Max2 gt next2
if (List1next1 gt List2next2)
Resultout List2next2
else
Resultout List1next1
if (List1 lt Max1)for ( next1 lt Max1
Resultout List1next1)
if (List2 lt Max2)for ( next1 lt Max2
Resultout List2next1)

12
Sorting

Small files
sort completely in memory
Called internal sorting.

13
Sorting

Larger files
may be too large to fit in memory simultaneously
require "external sorting"
Sorting using secondary devices

14
External Sorting

Criteria for evaluating external sorting
algorithms
Different from internal sorts
Internal sort comparison criteria
Number of comparisons required
Number of swaps made
Memory needs
External sort comparison criteria
Dominated by I/O time
Minimize transfers between secondary storage and
main memory

15
External Sorting

Two major external sorting methods
in situ - sort the file in place
use additional storage space

16
External Sorting

Characteristics of in situ sorting
uses less file space, thus larger files may be
sorted.
if crash occurs during sort, file may be left in
corrupt state
in site sorts may be done on direct-access files
using standard internal type sorts.
direct-access required (may not be available)
performance of such algorithm's tends to be data
sensitive

17
External Sorting

Consider a file with 1000 records, 120 bytes each
We have 25,000 bytes available for a buffer.
Solution?
read in 200 records at a time, sort internally
This results in 5 sorted files
merge the resulting sorted files into 1sorted file

18
Sort/Merge

A common non-in situ method is an algorithm
called "sort-merge"
"safe" sorting technique
performance is guaranteed
requires only serial file access

19
Sort/Merge
Sort
Sort
Merge
Partition
Sort
Sort
20
Sort/Merge

Sort/Merge techniques have two stages
sort stage - sorted partitions are generated
Size depends on available memory
merge stage - sorted partitions are merged
(repetitively if necessary)
Why might more then one merge phase be needed?

21
Basic Sort/Merge

initial partition size is 1
Merge begins immediately (no sort)
Smallest main memory use
requires only 2 buffers in memory.
File starts with N "sorted" files of size 1
Similar to internal merge/sort

22
Improving Sort/Merge

Increase buffer size
Partitions sorted (in memory) with little I/O
Larger partitions mean fewer (I/O intensive)
merges needed
Take advantage of already sorted runs of data
Consider the "unsortedness" of the data

23
Sort/Merge

Producing sorted partitions
internal sorting
natural selection - (use already sorted runs)
replacement selection

24
Internal sorting

read M records (M determined by available memory)
sort them using internal sorting techniques
write back out, creating a partition of size M

25
Sort/Merge

Replacement selection (snowshovel)
files usually not totally out of order
take advantage of partial ordering in file
partition size varies with already existing
ordering

26
Replacement selection (snowshovel)

Start with primary buffer of size N (snowshovel)
1. Read in N records into buffer
2. Output record with smallest key
3. Replace with next record in file
4. if this new record is smaller then the last
record written, "freeze" (must wait for next
partition)
5. if unfrozen records remain, go to 2
6. If all records frozen, unfreeze them all,
start new partition, go to 2

27
Replacement selection (snowshovel)

if file is sorted or almost sorted, one pass may
suffice for complete sort!
average partition length is 2N
Consider file with, N 4
29 42 3 7 9 101 99 87 89 100 16 8 12 2 15 EOF

28
Natural Selection

Frozen records in the replacement scheme take up
space and search time.
Natural, rather than freezing, writes these
unused records to a fixed length secondary file
(called reservoir)
partition creation terminates when reservoir
full.
Next, buffer is refilled first with records from
buffer, than records from file (if more needed)
expected partition length is 2.718N if reservoir
and buffer same size - (about 30)

29
Natural Selection

Redo example with reservoir size 4
29 42 3 7 9 101 99 87 89 100 16 8 12 2 15 EOF

30
Distribution and Merging

Merging
required to bring the sorted partitions together
into a sorted whole
may require a series of merge phases, where
shorter partitions are merged into larger
partitions
More then one partitions per file
Not all partitions can be openned at once

31
MergingSingle phase
32
MergingMultiple phase
33
MergingMultiple Partitions / File
P5-8
P1-8
P9-12
P1-12
34
Merging

Major issues - minimizing overall I/O
Different length partitions
Spend time simply reading and writing from one
file
Left over partitions
Spend time simply copying partitions

35
Distribution and Merging

Distribution
In order to merge, partitions must be
distributed to files in a manner facilitaing
the merge process.
If 1 partition per file, distribution is trivial
If gt1 partition per file, distribution should
minimize I/O
Several partitions may be placed in each file

36
Balanced N-way merge

use as many files (or tapes) as the system can
open at once
Distribute the partitions evenly amoung F/2 files
repetitively merge back and forth between one set
of F/2 files and the other
Distribute the generated partitions evenly amoung
the F/2 output files

37
Balanced 2-way merge
File 1
File 2
File 3
File 4
P5-8
File 1
File 2
P1-8
P9-12
File 3
File 4
P1-12
File 1
38
Balanced 2-way merge

Example 4 files, 700 records, 100 primary
records can be sorted in memory

1-100 201-300 401-500 601-700
1-200 401-600
1-400
1-700
1-700
101-200 301-400 501-600
201-400 601-700
401-700
39
Balanced N-way merge

advantage
simple
disadvantage
wastes time if partition size different
spend time reading and write records without
actually merging

40
Polyphase merging

Strategically distribute the partitions onto F
files based on the Fibonacci Sequence
Algorithm
During each phase merge the F smallest files
until the end of one file is reached.
After each phase at least one partition will now
be empty - this file becomes available new place
to merge into
Continue to merge until only one file exists

41
Polyphase merging