A classic problem in computer science - PowerPoint PPT Presentation

About This Presentation
Title:

A classic problem in computer science

Description:

Each pass we read write each page in file. N pages in the file = the number of passes ... Pass 1: = 6 sorted runs of 20 pages each (last run is only 8 pages) ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 17
Provided by: RaghuRama84
Category:

less

Transcript and Presenter's Notes

Title: A classic problem in computer science


1
External Sorting
  • A classic problem in computer science!
  • Data requested in sorted order
  • e.g., find students in increasing cap order
  • Sorting is used in many applications
  • First step in bulk loading operations.
  • Sorting useful for eliminating duplicate copies
    in a collection of records (How?)
  • Sort-merge join algorithm involves sorting.
  • Problem sort 1Gb of data with 1Mb of RAM.

2
2-Way Sort Requires 3 Buffers
  • Pass 1 Read a page, sort it, write it.
  • only one buffer page is used
  • Pass 2, 3, , etc.
  • three buffer pages used.

INPUT 1
OUTPUT
INPUT 2
Main memory buffers
Disk
Disk
3
Two-Way External Merge Sort
Input file
6,2
2
3,4
9,4
8,7
5,6
3,1
  • Each pass we read write each page in file.
  • N pages in the file gt the number of passes
  • So total cost is
  • Idea Divide and conquer sort subfiles and merge

PASS 0
1-page runs
1,3
2
3,4
5,6
2,6
4,9
7,8
PASS 1
4,7
1,3
2,3
2-page runs
8,9
5,6
2
4,6
PASS 2
2,3
4,4
1,2
4-page runs
6,7
3,5
6
8,9
PASS 3
1,2
2,3
3,4
8-page runs
4,5
6,6
7,8
9
4
General External Merge Sort
  • More than 3 buffer pages. How can we utilize
    them?
  • To sort a file with N pages using B buffer pages
  • Pass 0 use B buffer pages. Produce
    sorted runs of B pages each.
  • Pass 2, , etc. merge B-1 runs.

INPUT 1
. . .
. . .
INPUT 2
. . .
OUTPUT
INPUT B-1
Disk
Disk
B Main memory buffers
5
Cost of External Merge Sort
  • Number of passes
  • Cost 2N ( of passes)
  • E.g., with 5 buffer pages, to sort 108 page file
  • Pass 0 22 sorted runs of 5
    pages each (last run is only 3 pages)
  • Pass 1 6 sorted runs of 20
    pages each (last run is only 8 pages)
  • Pass 2 2 sorted runs, 80 pages and 28 pages
  • Pass 3 Sorted file of 108 pages

6
Number of Passes of External Sort
7
Sequential vs Random I/Os
  • Is minimizing passes optimal? Would merging as
    many runs as possible the best solution?
  • Suppose we have 80 runs, each 80 pages long and
    we have 80 pages of buffer space.
  • We can merge all 80 runs in a single pass
  • each page requires a seek to access (Why?)
  • there are 80 pages per run, so 80 seeks per run
  • total cost 80 runs X 80 seeks 6,400 seeks

8
Sequential vs Random I/Os (Cont)
  • We can merge all 80 runs in two steps
  • 5 sets of 16 runs each
  • read 80/165 pages of one run
  • 16 runs result in sorted run of 1280 pages
  • each merge requires 80/5X16 256 seeks
  • for 5 sets, we have 5X256 1280 seeks
  • merge 5 runs of 1280 pages
  • read 80/516 pages of one run gt 1280/1680 seeks
    in total
  • 5 runs gt 5X80 400 seeks
  • total 12804001680 seeks!!!
  • Number of passes increases, but number of seeks
    decreases!

9
Using B Trees for Sorting
  • Scenario Table to be sorted has B tree index on
    sorting column(s).
  • Idea Can retrieve records in order by traversing
    leaf pages.
  • Is this a good idea?
  • Cases to consider
  • B tree is clustered -- Good idea!
  • B tree is not clustered -- Could be a very bad
    idea!

10
Clustered B Tree Used for Sorting
  • Cost root to the left-most leaf, then retrieve
    all leaf pages (ltkey,recordgt pair organization)
  • If ltkey, ridgt pair organization is used?
    Additional cost of retrieving data records each
    page fetched just once.

Index
(Directs search)
Data Entries
("Sequence set")
Data Records
  • Always better than external sorting!

11
Unclustered B Tree Used for Sorting
  • each data entry contains ltkey,rid gt of a data
    record. In general, one I/O per data record!

Index
(Directs search)
Data Entries
("Sequence set")
Data Records
12
Summary
  • External sorting is important DBMS may dedicate
    part of buffer pool for sorting!
  • External merge sort minimizes disk I/O cost
  • Pass 0 Produces sorted runs of size B ( buffer
    pages). Later passes merge runs.
  • of runs merged at a time depends on B, and
    block size.
  • Larger block size means less I/O cost per page.
  • Larger block size means smaller runs merged.
  • In practice, of runs rarely more than 2 or 3.
  • Clustered B tree is good for sorting
    unclustered tree is usually very bad.

13
Database Tuning
  • Database Tuning is the activity of making a
    database application run more quickly. More
    quickly usually means higher throughput, though
    it may mean lower response time for time-critical
    applications.

14
Tuning Principles
  • Think globally, fix locally
  • Partitioning breaks bottlenecks (temporal and
    spatial)
  • Start-up costs are high running costs are low
  • Render onto server what is due onto Server
  • Be prepared for trade-offs (indexes and inserts)

15
Tuning Mindset
  • Set reasonable performance tuning goals
  • Measure and document current performance
  • Identify current system performance bottleneck
  • Identify current OS bottleneck
  • Tune the required components eg application, DB,
    I/O, contention, OS etc
  • Track and exercise change-control procedures
  • Measure and document current performance
  • Repeat step 3 through 7 until the goal is met

16
Goals Met?
  • Appreciation of DBMS architecture
  • Study the effect of various components on the
    performance of the systems
  • Tuning principle
  • Troubleshooting techniques for chasing down
    performance problems
  • Hands-on experience in Tuning
Write a Comment
User Comments (0)
About PowerShow.com