B-Trees - PowerPoint PPT Presentation

About This Presentation

Title:

B-Trees

Description:

The main memory of a computer system consists of silicon memory chips. ... In order, to amortize time spent for mechanical movements, disks access several ... – PowerPoint PPT presentation

Number of Views:25

Avg rating:3.0/5.0

Slides: 20

Provided by: ValuedGate2244

Learn more at: https://cs.gmu.edu

Category:

more less

Transcript and Presenter's Notes

Title: B-Trees

1
B-Trees

CS 583
Analysis of Algorithms

2
Outline

Data Structures on Secondary Storage
Magnetic disks
Efficient operations
B-Trees
Definitions
Searching
Inserting
Self-test
18.1-1, 18.1-2, 18.2-1, 18.2-2

3
Magnetic Disks

The main memory of a computer system consists of
silicon memory chips.
It is typically two orders of magnitude more
expensive than the magnetic storage technology.
Magnetic disks are cheaper and have higher
capacity than main memory.
However, they are much slower because of moving
parts.
In order, to amortize time spent for mechanical
movements, disks access several items at the same
time.
Information is divided into equal size pages.
Pages appear as consecutive bits within
cylinders.
Once the read/write head is positioned at the
desired page, large amounts of data can be
accessed quickly.

4
Disk Operations
When x is an object that resides on a disk the
following pseudocode conventions are used x
lta pointer to some objectgt Disk-Read(x) ltaccess
and modify fields of xgt Disk-Write(x) In most
systems the running time of a B-Tree algorithm is
determined by the number of disk read and write
operations. Hence, a B-tree node is usually as
large as a disk page. Example a B-tree with a
branching factor of 1001 and height 2 can store a
Billion keys. Since the root note is stored in
main memory, only two disk accesses at most are
needed to find any key!
5
B-tree Definition

We assume that any satellite information
associated with a key is stored in the same node
as a key.
A B-tree is a rooted tree with the following
properties
Every node x has the following fields
nx, the number of keys stored in x.
nx keys stored in non-decreasing order key1x
lt key2x lt ... lt keynxx
leafx true if x is a leaf, and false
otherwise.
Each internal node x contains nx1 pointers to
its children c1x, c2x, ... , cnx1x

6
B-tree Definition (cont.)

Properties (cont.)
The keys keyix separate the ranges stored in
each subtree if ki is any key stored in the
subtree with root cix, then
k1 lt key1x lt k2 lt key2x lt ...lt
keynxx lt knx1
All leaves have the same depth, -- the trees
height h.
There are lower and upper bounds on the number of
keys in a node. They are expressed in terms of an
integer t gt 2 called the minimum degree
Every node other than the root must have at least
t-1 keys.
Every node can contain at most (2t-1) keys. We
say the node is full if it contains exactly
(2t-1) keys.

7
Height of the Tree
The number of disk accesses for a B-tree is
proportional to the height of the tree. Theorem
18.1 If n gt 1, then for any n-key B-tree T of
height h and minimum degree t gt 2 h lt logt
(n1)/2 Proof. If a B-tree has height h, the
root contains at least one key and all other
nodes contain at least (t-1) keys. Thus there are
at least 2 nodes at depth 1, at least 2t nodes at
depth 2, and so on, until 2th-1 nodes at depth h.
8
Height of the Tree (cont.)
The number of n keys satisfies inequality n gt
1 (t-1) ?i1,h 2ti-1 12(t-1)(th-1)/(t-1
) 2 th-1 gt th lt (n1)/2 gt h lt
logt(n1)/2 ? Hence the height of the B-tree
grows as O(logt n) , which is significantly
slower than the growth of the height of the
red-black tree, -- O(lg n). This means that the
number of disk accesses is substantially reduced
for most tree operations.
9
Basic Operations

The root of the B-tree is always in main memory.
Disk-Read on the root is never required.
Disk-Write is required when the root node is
changed.
Any nodes that are passed as parameters have
already had Disk-Read performed on them.
All basic procedures are one-pass algorithms
They proceed downward from the root of the tree,
without having to back up.

10
Searching
The searching algorithm takes as input a pointer
to the root node x of a subtree, and a key k. It
returns a pair (y, i) such that keyiy
k. B-Tree-Search(x,k) 1 i 1 2 while i lt
nx and k gt key_ix 3 i 4 if i lt nx
and k key_ix 5 return (x,i) 6 if
leafx 7 return NIL 8 else 9 Disk-Read
(c_ix) // read ith child of x 10 return
B-Tree-Search(c_ix,k)
11
Searching Performance

The nodes encountered during the recursion form a
path downward from the root of the tree.
The number of disk pages accessed by
B-Tree-Search is O(h) O(logt n).
For each node, nx lt 2t, hence the while loop
2-3 takes O(t) time.
Therefore the total CPU time is O(th) O(logt n).

12
Inserting

General algorithm
Search for the leaf node y at which to insert the
new key.
If the node y is full (having 2t-1 keys)
Split the full node around its median key
keyty
Create two nodes with (t-1) keys each.
Move the median key up to ys parent.
If ys parent is also full, make the split again.
The key is inserted in a single path down the
tree.
Each full node is split along the way.
This assures that when the y node needs to be
split, its parent cannot be full.

13
Splitting a Node

The procedure B-Tree-Split-Child takes as input
non-full node x, index i, and a full child y of
x ycix.
The procedure then splits y in two and adjusts x
so that it has an additional child.
When the root needs to be split, a new root needs
to be created.
The tree grows in height by one.
Splitting is the only means to grow the tree.

14
Splitting Node Pseudocode
B-Tree-Split-Child(x,i,y) 1 z Allocate-Node()
// allocate a disk page 2 leafz leafy 3
nz t-1 4 for j 1 to t-1 5 keyjz
keyjty 6 if not leafy 7 for j 1 to t 8
cjz cjty 9 ny t-1 // shift
children to the right 10 for j nx downto
i1 11 cj1x cjx 12 ci1x z // add z
as a new child
15
Splitting Node Pseudocode (cont.)
// make room for the median 13 for j nx
downto i 14 keyj1x keyjx 15 keyix
keyty 16 nx 17 Disk-Write(y) 18
Disk-Write(z) 19 Disk-Write(x) The CPU time is
determined by loops 4-5 and 7-8, which is ?(t).
Note that other loops perform O(t) iterations.
The procedure performs ?(1) disk operations.
16
Inserting a Key Algorithm
B-Tree-Insert(T,k) 1 r rootT 2 if nr
2t-1 // full node 3 s Allocate-Node() 4
rootT s 5 leafs FALSE 6 ns 0 7
c1s r // split the old root 8
B-Tree-Split-Child(s,1,r) 9
B-Tree-Insert-Nonfull(s,k) 10 else 11
B-Tree-Insert-Nonfull(r,k)
17
Inserting a Key Algorithm (cont.)
// Insert key k into a non-full node
x B-Tree-Insert-Nonfull(x,k) 1 i nx 2 if
leafx // k is inserted in the ordered list 3
while i gt 1 and k lt keyix 4 keyi1x
keyix 5 i-- 6 keyi1x k 7
nx 8 Disk-Write(x) 9 else // search the
leaf to insert into
18
Inserting a Key Algorithm (cont.)
10 while i gt 1 and k lt keyix 11 i-- 12
i 13 Disk-Read(cix) 14 if ncix 2t-1
// full node 15 B-Tree-Split-Child(x,i,cix)
16 if k gt keyix 17 i 18
B-Tree-Insert-Nonfull(cix, k)
19
Inserting a Key Performance

The number of disk accesses performed by
B-Tree-Insert is O(h) for a B-tree of height h.
Only a O(1) of Disk-Read and Disk-Write
operations are performed at each level in the
B-Tree-Insert-Nonfull.
The total CPU time is O(t h) O(logt n)
At each level of the tree the number of CPU
operations are determined by while loops in
B-Tree-Insert-Nonfull.
The maximum number of iterations in these loops
are 2t-1, hence the total time at each level is
O(t).