B-Trees - PowerPoint PPT Presentation

About This Presentation
Title:

B-Trees

Description:

The main memory of a computer system consists of silicon memory chips. ... In order, to amortize time spent for mechanical movements, disks access several ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 20
Provided by: ValuedGate2244
Learn more at: https://cs.gmu.edu
Category:
Tags: amortize | trees

less

Transcript and Presenter's Notes

Title: B-Trees


1
B-Trees
  • CS 583
  • Analysis of Algorithms

2
Outline
  • Data Structures on Secondary Storage
  • Magnetic disks
  • Efficient operations
  • B-Trees
  • Definitions
  • Searching
  • Inserting
  • Self-test
  • 18.1-1, 18.1-2, 18.2-1, 18.2-2

3
Magnetic Disks
  • The main memory of a computer system consists of
    silicon memory chips.
  • It is typically two orders of magnitude more
    expensive than the magnetic storage technology.
  • Magnetic disks are cheaper and have higher
    capacity than main memory.
  • However, they are much slower because of moving
    parts.
  • In order, to amortize time spent for mechanical
    movements, disks access several items at the same
    time.
  • Information is divided into equal size pages.
  • Pages appear as consecutive bits within
    cylinders.
  • Once the read/write head is positioned at the
    desired page, large amounts of data can be
    accessed quickly.

4
Disk Operations
When x is an object that resides on a disk the
following pseudocode conventions are used x
lta pointer to some objectgt Disk-Read(x) ltaccess
and modify fields of xgt Disk-Write(x) In most
systems the running time of a B-Tree algorithm is
determined by the number of disk read and write
operations. Hence, a B-tree node is usually as
large as a disk page. Example a B-tree with a
branching factor of 1001 and height 2 can store a
Billion keys. Since the root note is stored in
main memory, only two disk accesses at most are
needed to find any key!
5
B-tree Definition
  • We assume that any satellite information
    associated with a key is stored in the same node
    as a key.
  • A B-tree is a rooted tree with the following
    properties
  • Every node x has the following fields
  • nx, the number of keys stored in x.
  • nx keys stored in non-decreasing order key1x
    lt key2x lt ... lt keynxx
  • leafx true if x is a leaf, and false
    otherwise.
  • Each internal node x contains nx1 pointers to
    its children c1x, c2x, ... , cnx1x

6
B-tree Definition (cont.)
  • Properties (cont.)
  • The keys keyix separate the ranges stored in
    each subtree if ki is any key stored in the
    subtree with root cix, then
  • k1 lt key1x lt k2 lt key2x lt ...lt
    keynxx lt knx1
  • All leaves have the same depth, -- the trees
    height h.
  • There are lower and upper bounds on the number of
    keys in a node. They are expressed in terms of an
    integer t gt 2 called the minimum degree
  • Every node other than the root must have at least
    t-1 keys.
  • Every node can contain at most (2t-1) keys. We
    say the node is full if it contains exactly
    (2t-1) keys.

7
Height of the Tree
The number of disk accesses for a B-tree is
proportional to the height of the tree. Theorem
18.1 If n gt 1, then for any n-key B-tree T of
height h and minimum degree t gt 2 h lt logt
(n1)/2 Proof. If a B-tree has height h, the
root contains at least one key and all other
nodes contain at least (t-1) keys. Thus there are
at least 2 nodes at depth 1, at least 2t nodes at
depth 2, and so on, until 2th-1 nodes at depth h.
8
Height of the Tree (cont.)
The number of n keys satisfies inequality n gt
1 (t-1) ?i1,h 2ti-1 12(t-1)(th-1)/(t-1
) 2 th-1 gt th lt (n1)/2 gt h lt
logt(n1)/2 ? Hence the height of the B-tree
grows as O(logt n) , which is significantly
slower than the growth of the height of the
red-black tree, -- O(lg n). This means that the
number of disk accesses is substantially reduced
for most tree operations.
9
Basic Operations
  • The root of the B-tree is always in main memory.
  • Disk-Read on the root is never required.
  • Disk-Write is required when the root node is
    changed.
  • Any nodes that are passed as parameters have
    already had Disk-Read performed on them.
  • All basic procedures are one-pass algorithms
  • They proceed downward from the root of the tree,
    without having to back up.

10
Searching
The searching algorithm takes as input a pointer
to the root node x of a subtree, and a key k. It
returns a pair (y, i) such that keyiy
k. B-Tree-Search(x,k) 1 i 1 2 while i lt
nx and k gt key_ix 3 i 4 if i lt nx
and k key_ix 5 return (x,i) 6 if
leafx 7 return NIL 8 else 9 Disk-Read
(c_ix) // read ith child of x 10 return
B-Tree-Search(c_ix,k)
11
Searching Performance
  • The nodes encountered during the recursion form a
    path downward from the root of the tree.
  • The number of disk pages accessed by
    B-Tree-Search is O(h) O(logt n).
  • For each node, nx lt 2t, hence the while loop
    2-3 takes O(t) time.
  • Therefore the total CPU time is O(th) O(logt n).

12
Inserting
  • General algorithm
  • Search for the leaf node y at which to insert the
    new key.
  • If the node y is full (having 2t-1 keys)
  • Split the full node around its median key
    keyty
  • Create two nodes with (t-1) keys each.
  • Move the median key up to ys parent.
  • If ys parent is also full, make the split again.
  • The key is inserted in a single path down the
    tree.
  • Each full node is split along the way.
  • This assures that when the y node needs to be
    split, its parent cannot be full.

13
Splitting a Node
  • The procedure B-Tree-Split-Child takes as input
    non-full node x, index i, and a full child y of
    x ycix.
  • The procedure then splits y in two and adjusts x
    so that it has an additional child.
  • When the root needs to be split, a new root needs
    to be created.
  • The tree grows in height by one.
  • Splitting is the only means to grow the tree.

14
Splitting Node Pseudocode
B-Tree-Split-Child(x,i,y) 1 z Allocate-Node()
// allocate a disk page 2 leafz leafy 3
nz t-1 4 for j 1 to t-1 5 keyjz
keyjty 6 if not leafy 7 for j 1 to t 8
cjz cjty 9 ny t-1 // shift
children to the right 10 for j nx downto
i1 11 cj1x cjx 12 ci1x z // add z
as a new child
15
Splitting Node Pseudocode (cont.)
// make room for the median 13 for j nx
downto i 14 keyj1x keyjx 15 keyix
keyty 16 nx 17 Disk-Write(y) 18
Disk-Write(z) 19 Disk-Write(x) The CPU time is
determined by loops 4-5 and 7-8, which is ?(t).
Note that other loops perform O(t) iterations.
The procedure performs ?(1) disk operations.
16
Inserting a Key Algorithm
B-Tree-Insert(T,k) 1 r rootT 2 if nr
2t-1 // full node 3 s Allocate-Node() 4
rootT s 5 leafs FALSE 6 ns 0 7
c1s r // split the old root 8
B-Tree-Split-Child(s,1,r) 9
B-Tree-Insert-Nonfull(s,k) 10 else 11
B-Tree-Insert-Nonfull(r,k)
17
Inserting a Key Algorithm (cont.)
// Insert key k into a non-full node
x B-Tree-Insert-Nonfull(x,k) 1 i nx 2 if
leafx // k is inserted in the ordered list 3
while i gt 1 and k lt keyix 4 keyi1x
keyix 5 i-- 6 keyi1x k 7
nx 8 Disk-Write(x) 9 else // search the
leaf to insert into
18
Inserting a Key Algorithm (cont.)
10 while i gt 1 and k lt keyix 11 i-- 12
i 13 Disk-Read(cix) 14 if ncix 2t-1
// full node 15 B-Tree-Split-Child(x,i,cix)
16 if k gt keyix 17 i 18
B-Tree-Insert-Nonfull(cix, k)
19
Inserting a Key Performance
  • The number of disk accesses performed by
    B-Tree-Insert is O(h) for a B-tree of height h.
  • Only a O(1) of Disk-Read and Disk-Write
    operations are performed at each level in the
    B-Tree-Insert-Nonfull.
  • The total CPU time is O(t h) O(logt n)
  • At each level of the tree the number of CPU
    operations are determined by while loops in
    B-Tree-Insert-Nonfull.
  • The maximum number of iterations in these loops
    are 2t-1, hence the total time at each level is
    O(t).
Write a Comment
User Comments (0)
About PowerShow.com