Sort - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Sort

Description:

Sorting operation is frequently used for database processing. ... in which LBOS-F can fragment oversize buckets and distribute them to different PNs. ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 18
Provided by: yulu
Category:
Tags: oversize | sort

less

Transcript and Presenter's Notes

Title: Sort


1
Parallel Sorting
2
Sort
  • Sorting operation is frequently used for database
    processing.
  • For example sorting may be requested by users
    through the use of Distinct, Order By and Group
    By clauses in SQL.
  • In general, parallel database sorting consists of
    multiple rounds of local sort, performed in each
    PN in parallel, followed by movement of tuples
    among PNs.
  • A dynamic load balancing approach has been
    proposed, recently.

3
Parallel Sorting
  • Traditional Technologies
  • Parallel Merge-All Sort
  • Parallel Binary-Merge Sort
  • Parallel Partitioned Sort
  • Dynamic Load Balancing Approaches
  • Partitioned with Redistribution Sort (PRS)
  • Load Balancing Optimization Sort (LBOS)
  • Load Balancing Optimization Sort with Fragment
    Feature (LBOS-F)

4
Parallel Merge-All Sort
5
Parallel Binary-Merge Sort
6
Parallel Partitioned Sort
7
Partitioned with Redistribution Sort (PRS)
  • Partition Phase each PN first scans its local
    tuples and distributes them as buckets according
    to range partitioning.
  • Optimization Phase a coordinator PN is used to
    monitor the load for every PN. If every PN has
    the similar amount of data to process, Go to
    Sorting Phase. Otherwise, the imbalanced work
    load is detected, thus, a redistributed
    partitioning is needed.
  • Redistribution Phase data redistribution.
  • Sorting Phase local sort is carried out in each
    PN.

8
PRS
9
Load Balancing Optimization Sort (LBOS)
  • Phase 1. Bucket Sizes Counting Each PN reads
    its local data and computes each bucket size
    according to range distribution. Thereafter, each
    PN transfers the bucket information to a
    coordinator.
  • Phase 2. Load Balancing Optimization
    coordinator decides the distribution strategy for
    assigning each bucket to PNs according to their
    bucket IDs sequentially. Whether coordinator
    assigns one more next bucket (size B) to current
    PNi will be decided on if (PiB)-Pavg lt d, it
    will assign next bucket to PNi. Otherwise, the
    bucket will be assigned to next PNi1. This phase
    only decides the distribution strategy without
    any data transferred.

10
  • Phase 3. Data Partitioning Each PN transfers
    and receives data by using the distribution
    strategy decided in Phase 2.
  • Phase 4. Sorting Each PN performs local sort.

11
LBOS
12
Effect of Bucket Skew
13
Effect of No. of Processing Nodes
PNs16
PNs256
PRS PPS ? 21.8
PRS PPS ? 21.8
LOBS PPS ? 59.9
LOBS PPS ? 39.7
14
Load Balancing Optimization Sort with Fragment
Feature (LBOS-F)
  • When the skew condition is severe, LBOS performs
    not very well.
  • LBSO-F is used to address this drawback and it
    also consists of four phases.
  • Its Phase 1, 3, and 4 are exactly the same as in
    LBOS except the Phase 2 in which LBOS-F can
    fragment oversize buckets and distribute them to
    different PNs.

15
Phase 2 for LBOS-F
  • Phase 2. Load Balancing Optimization with
    Fragment Feature coordinator decides the
    distribution strategy for assigning each bucket
    to PNs according to their bucket IDs
    sequentially. Suppose that the next bucket (size
    B and PiBgtPavg), with the interval of data
    values from rs to re ( rs lt re ), is being
    considered for the assignment. The coordinator
    will fragment this bucket into two portions.
    Suppose that PNi already has total size Pi of
    tuples which is d tuples less than Pavg
    (PidPavg). The first portion which consists of
    those data values within the range of rs to
    rs(re-rs)d/B of the bucket will be assigned to
    current PNi. The second portion which consists of
    the remaining data of the bucket will be assigned
    to next PNi1. Again, this phase is without any
    data transferred.

16
Effect of Bucket Skew
17
Effect of Number of PNs
Write a Comment
User Comments (0)
About PowerShow.com