Title: BTrees and Sorting
1B-Trees and Sorting
- Zachary G. Ives
- University of Pennsylvania
- CIS 550 Database Information Systems
- November 26, 2009
2Speeding Operations over Data
- Three general data organization techniques
- Indexing
- Sorting
- Hashing
3B Tree The DB Worlds Favorite Index
- Insert/delete at log F N cost
- (F fanout, N leaf pages)
- Keep tree height-balanced
- Minimum 50 occupancy (except for root).
- Each node contains d lt m lt 2d entries. d is
called the order of the tree. - Supports equality and range searches efficiently.
Index Entries
(Direct search)
Data Entries
("Sequence set")
4Example B Tree
- Search begins at root, and key comparisons direct
it to a leaf. - Search for 5, 15, all data entries gt 24 ...
Root
30
17
24
13
39
3
5
19
20
22
24
27
38
2
7
14
16
29
33
34
- Based on the search for 15, we know it is not
in the tree!
5B Trees in Practice
- Typical order 100. Typical fill-factor 67.
- average fanout 133
- Typical capacities
- Height 4 1334 312,900,700 records
- Height 3 1333 2,352,637 records
- Can often hold top levels in buffer pool
- Level 1 1 page 8 Kbytes
- Level 2 133 pages 1 Mbyte
- Level 3 17,689 pages 133 MBytes
6Inserting Data into a B Tree
- Find correct leaf L.
- Put data entry onto L.
- If L has enough space, done!
- Else, must split L (into L and a new node L2)
- Redistribute entries evenly, copy up middle key.
- Insert index entry pointing to L2 into parent of
L. - This can happen recursively
- To split index node, redistribute entries evenly,
but push up middle key. (Contrast with leaf
splits.) - Splits grow tree root split increases height.
- Tree growth gets wider or one level taller at
top.
7Inserting 8 into Example B Tree
- Observe how minimum occupancy is guaranteed in
both leaf and index pg splits. - Recall that all data items are in leaves, and
partition values for keys are in intermediate
nodes - Note difference between copy-up and push-up.
8Inserting 8 Example Copy up
Root
24
30
17
13
39
3
5
19
20
22
24
27
38
2
7
14
16
29
33
34
Want to insert here no room, so split copy up
8
Entry to be inserted in parent node.
(Note that 5 is copied up and
5
continues to appear in the leaf.)
3
5
2
7
8
9Inserting 8 Example Push up
Need to split node push up
Root
24
30
17
13
5
39
3
19
20
22
24
27
38
2
14
16
29
33
34
5
7
8
Entry to be inserted in parent node.
(Note that 17 is pushed up and only appears once
in the index. Contrast this with a leaf split.)
17
5
24
30
13
10Deleting Data from a B Tree
- Start at root, find leaf L where entry belongs.
- Remove the entry.
- If L is at least half-full, done!
- If L has only d-1 entries,
- Try to re-distribute, borrowing from sibling
(adjacent node with same parent as L). - If re-distribution fails, merge L and sibling.
- If merge occurred, must delete entry (pointing to
L or sibling) from parent of L. - Merge could propagate to root, decreasing height.
11B Tree Summary
- B tree and other indices ideal for range
searches, good for equality searches. - Inserts/deletes leave tree height-balanced logF
N cost. - High fanout (F) means depth rarely more than 3 or
4. - Almost always better than maintaining a sorted
file. - Typically, 67 occupancy on average.
- Note Order (d) concept replaced by physical
space criterion in practice (at least
half-full). - Records may be variable sized
- Index pages typically hold more entries than
leaves
12Other Kinds of Indices
- Multidimensional indices
- R-trees, kD-trees,
- Text indices
- Inverted indices
- Structural indices
- Object indices access support relations, path
indices - XML and graph indices dataguides, 1-indices,
d(k) indices - Describe parent-child, path relationships
13Speeding Operations over Data
- Three general data organization techniques
- Indexing
- Sorting
- Hashing
14Technique II Sorting
- Pass 1 Read a page, sort it, write it
- Can use a single page to do this!
- Pass 2, 3, , etc.
- Requires a minimum of 3 pages
INPUT 1
OUTPUT
INPUT 2
Disk
Disk
Main memory buffers
15Two-Way External Merge Sort
- Divide and conquer sort into subfiles and merge
- Each pass we read write every page
- If N pages in the file, we need dlog2(N)e 1
- passes to sort the data, yielding a cost of
- 2Ndlog2(N)e 1
Input file
3,4
6,2
9,4
8,7
5,6
3,1
2
PASS 0
1-page runs
1,3
2
3,4
5,6
2,6
4,9
7,8
PASS 1
4,7
1,3
2,3
2-page runs
8,9
5,6
2
4,6
PASS 2
2,3
4,4
1,2
4-page runs
6,7
3,5
6
8,9
PASS 3
1,2
2,3
3,4
8-page runs
4,5
6,6
7,8
9
16General External Merge Sort
- How can we utilize more than 3 buffer pages?
- To sort a file with N pages using B buffer pages
- Pass 0 use B buffer pages. Produce dN / Be
sorted runs of B pages each - Pass 2, , etc. merge B-1 runs
INPUT 1
. . .
. . .
INPUT 2
. . .
OUTPUT
INPUT B-1
Disk
Disk
B Main memory buffers
17Cost of External Merge Sort
- Number of passes 1dlogB-1 dN / Bee
- Cost 2N ( of passes)
- With 5 buffer pages, to sort 108 page file
- Pass 0 d108/5e 22 sorted runs of 5 pages each
(last run is only 3 pages) - Pass 1 d22/4e 6 sorted runs of 20 pages each
(final run only uses 8 pages) - Pass 2 d6/4e 2 sorted runs, 80 pages and 28
pages - Pass 3 Sorted file of 108 pages
18Speeding Operations over Data
- Three general data organization techniques
- Indexing
- Sorting
- Hashing
19Technique 3 Hashing
- A familiar idea, which we just saw for hash
files - Requires good hash function (may depend on
data) - Distribute data across buckets
- Often multiple items in same bucket (buckets
might overflow) - Hash indices can be built along the same lines as
what we discussed - The difference they may be unclustered as well
as clustered - Types
- Static
- Extendible (requires directory to buckets can
split) - Linear (two levels, rotate through split bad
with skew) - We wont get into detail because of time, but see
text
20Making Use of the Data IndicesQuery Execution
- Query plans exec strategies
- Basic principles
- Standard relational operators
- Querying XML
21Making Use of the Data IndicesQuery Execution
- Query plans exec strategies
- Basic principles
- Standard relational operators
- Querying XML
22Query Plans
- Data-flow graph of relational algebra operators
- Typically determined by optimizer
JoinPressRel.Symbol EastCoast.CoSymbol
JoinPressRel.Symbol Clients.Symbol
ProjectCoSymbol
SelectClient Atkins
SELECT FROM PressRel p, Clients C WHERE
p.Symbol c.Symbol AND c.Client Atkins
AND c.Symbol IN (SELECT CoSymbol FROM EastCoast)
Scan PressRel
ScanEastCoast
Scan Clients
23Iterator-Based Query Execution
- Execution begins at root
- open, next, close
- Propagate calls to children
- May call multiple child nexts
- Efficient scheduling resource usage
- Can you think of alternatives and their benefits?
JoinPressRel.Symbol EastCoast.CoSymbol
JoinPressRel.Symbol Clients.Symbol
ProjectCoSymbol
SelectClient Atkins
Scan PressRel
Scan Clients
ScanEastCoast
24Execution Strategy Issues
- Granularity parallelism
- Pipelining vs. blocking
- Materialization
JoinPressRel.Symbol EastCoast.CoSymbol
JoinPressRel.Symbol Clients.Symbol
ProjectCoSymbol
SelectClient Atkins
Scan PressRel
ScanEastCoast
Scan Clients
25Basic Principles
- Many DB operations require reading tuples, tuple
vs. previous tuples, or tuples vs. tuples in
another table - Techniques generally used
- Iteration for/while loop comparing with all
tuples on disk - Index if comparison of attribute thats
indexed, look up matches in index return those - Sort/merge iteration against presorted data
(interesting orders) - Hash build hash table of the tuple list, probe
the hash table - Must be able to support larger-than-memory data
26Basic Operators
- One-pass operators
- Scan
- Select
- Project
- Multi-pass operators
- Join
- Various implementations
- Handling of larger-than-memory sources
- Semi-join
- Aggregation, union, etc.
271-Pass Operators Scanning a Table
- Sequential scan read through blocks of table
- Index scan retrieve tuples in index order
- May require 1 seek per tuple! When?
- Cost in page reads b(T) blocks, r(T) tuples
- b(T) pages for sequential scan
- Up to r(T) for index scan if unclustered index
- Requires memory for one block
281-Pass Operators Select (s)
- Typically done while scanning a file
- If unsorted no index, check against predicate
- Read tuple
- While tuple doesnt meet predicate
- Read tuple
- Return tuple
- Sorted data can stop after particular value
encountered - Indexed data apply predicate to index, if
possible - If predicate is
- conjunction may use indexes and/or scanning loop
above (may need to sort/hash to compute
intersection) - disjunction may use union of index results, or
scanning loop
291-Pass Operators Project (P)
- Simple scanning method often used if no index
- Read tuple
- While tuple exists
- Output specified attributes
- Read tuple
- Duplicate removal may be necessary
- Partition output into separate files by bucket,
do duplicate removal on those - If have many duplicates, sorting may be better
- If attributes belong to an index, dont need to
retrieve tuples!