Sort-Merge Join Implementation Details for Minibase PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: Sort-Merge Join Implementation Details for Minibase


1
Sort-Merge Join Implementation Details for
Minibase
University of California Riverside Department
of Computer Science Engineering cs179G
Database Project Phase 4
  • by
  • Demetris Zeinalipour
  • http//www.cs.ucr.edu/cs179g-t/

2
Sort-Merge Join Review
Sailors S
Reserves R
Name
Carl
Tom
  • Query with JOIN
  • SELECT S.name
  • FROM Sailors S, Reserves R
  • WHERE S.sid R.id

SID NAME AGE
1 Carl 23
2 Tom 26
3 Peter 23
SID BID
2 5
1 4
2 9
  • How to implement the JOIN operator?
  • Using Nested Loop Joins. tuple-at-a-time For
    each tuple in S check with every tuple in R
    variation page-at-a-time (every page contains
    several records)
  • Using Block Nested Loop Join. Idea Load in
    memory smaller relation e.g. S and then scan
    relation S on a page-at-a-time basis. Notice The
    idea can be generalized even if the smaller
    relation doesnt fit in memory
  • Using Sort-Merge Join Idea Sort both relations
    using an external sort algorithm and then merge
    the relations.

3
External-Sorting Review 1/3
  1. When? If the data to be sorted is too big to fit
    in main memory then we need an external sort
    algorithm.
  2. Simple approach 2-way Merge-Sort


Inpage1 11,7, 2,1
Inpage219, 7,5,4
Use quicksort internally
Memory
4
  • Steps
  • Fetch a page to memory
  • Sort Page in memory
  • Write page back to disk
  • Merge pages levelwise
  • (see next page)

HeapFile 1
inpage1
1-3
outpage
4
inpage2
4
External-Sorting Review 2/3
  • 2-way mergesort gt
  • Passes log2N14
  • Cost 2N(log2N1)64 I/O
  • Expensive
  • In project we use External
  • Sort (sort.C) which utilizes
  • all available buffer pages (10)
  • and reduces the number of
  • Passes and the I/O cost

5
External-Sorting Review 3/3
  • Idea similar with 2-way Mergesort with the
    difference
  • that we utilize B-1 buffer pages (Bgt3)

Dont worry about this implementation as it is
already implemented in sort.C
Use quicksort internally
Memory
Heapf
1-3
0
1
2
In this project you should call Sort() from
within the sortMerge constructor before
proceeding to the merge phase
outpage
3
4
5
4
6
7
6
The Merging Phase of SMJoin
  • Now that the two relations R and S are sorted we
  • must merge them.
  • Merge using 2 iterators to move from page to page
    and from record to record
  • Works fine ONLY if both R and S have NO
    duplicates.
  • (e.g if sid is a foreign key in a 11 relation
    Sailor, Address)

Page
Tr
2 5 10 70 80 90
R.sid
Heapfile
S.sid
1 3 6 10 30 40
Gs
Forward the iterator (Tr or Gs) that has the
smallest value until TrGs then output value
7
The Merging Phase of SMJoin
  • What if relations have duplicates? (check
    example)
  • (either both of them or just one of them)
  • Therefore we need to use 3 iterators (1 for
    Marking)

Tr
2 5 10 10 80 90
R.sid
S.sid
1 3 6 10 10 40
Gs
Ts
  • The one extra Ts iterator will be used as soon
    TrGs at which point we will move Ts to Gs
    position and use Ts to iterate S
  • The full version of the algorithm is shown in 2
    slides

8
HeapFiles (heapfile.h, scan.C)
  1. Database File Organization of various pages into
    a logical.
  2. In a heapfile pages are unordered within the file
  3. In order to Scan the pages (records) of a
    heapfile we will use scan.C

Example // Sorting a heapfile (after heapR
stored in Catalog) Sort(unsortheapR, heapR,
..)rest of params from SortMerge() // Creating a
scan on a heapfile HeapFile heapR(heapR,
status) Scan Rscan heapR.openScan(Rstatus)
// Scan until DONE Rstatus Rscan-gtgetNext(RID
rid,char RecR, int lenR) // Inserting results
in Out Heapfile HeapFile heapOut(heapOut,
status) memmove(char RecO, char RecR, int
lenR)//do same for S heapOut.insertRecord(char
recptr, lenRlenS, RIDoutRID)
All the work of Pinning/Unpinning pages is done
from within Scan since it locates the
directoryheader Page from the catalog and
proceeds from there on with getNext()
9
The Merging Phase of SMJoin
  • The full algorithm (this is all you need to
    implement)


Sort R S

Init Iterators

Fast forward R

Fast forward S
Consider ScanPosition()
10
Implementation of the SMJoin Algorithm
  • The Big Picture
  • main.C (or smjoin_main.C same thing)
  • gt SMJTester.C (runTests())
  • gt test1() (this runs all 6 tests
    actually)
  • gt createFiles() // creates 5 Heapfiles using
  • // the data of same constant integer arrays
    (data0,data4)
  • gt test(i) // inside sortMerge Constructor

//sortMerge
heapfileR
No_of_cols
joinColumn
sortMerge sm(file0, 2, attrType, attrsize, 0,
file1, 2, attrType, attrsize, 0,
test1, 10, Ascending, s) // inside
sortMerge Constructor
heapfileS
Out_heapfile
minirel.h
No_of_pages available
gt
Sort( (infile) file0, (outfile) BRfile1,
(no_of_cols)2, attrType, attrsize,
(joincolumn)0, (no_of_pages_available)10,
(status)s )
11
Where to start from?
  1. Start out by reading Sort-Merge Join from book
    (Chap 12.5.2 2nd edition, 14.4.2 3rd edition)
  2. Study SMJTester.C which contains the tests and
    understand the test scenarios.
  3. Start Implementing sortMerge.C by invoking the
    Sort() constructor etc.
  4. Have a closer look at the new classes that you
    have Sort.C, Heapfile.C and Scan.C
Write a Comment
User Comments (0)
About PowerShow.com