Title: ISYS 300 Search Trees, BTrees and B Trees
1ISYS 300 Search Trees, B-Trees and B Trees
2How we got here?
- Created an inverted index from text
- Used the index for a text searching (vector
model) - There can be tens of thousands on items in an
index How can we access them quickly? - Note These are important techniques beyond text
searching there are many places they can be
used.
3Finding Items in an Inverted Index
- Sorted List
- Binary search
- Linked List
- Hard to search without an index
- Easy to insert and delete items
- Binary Trees
- Always two choices (20 questions)
- Multi-way Trees (B-Trees and B Trees)
- Several choices at choice point
- High fan-out. Not as deep as binary trees
- Other techniques to remember but probably not a
good choice here - Hashing
- Content-based retrieval (neural networks)
4Graphs
- A graph is a set of nodes and edges
- Some common graph problems include
- Layout of highways, telephone lines
- The structure of the Web
- Social networks
5Trees and Hierarchies Some Terminology
- A Tree is a data structure in which one node
links to other nodes in exactly one way. - We often think of a tree as a hierarchy
- For a hierarchy, we often talk about parents
(up the hierarchy) and children (lower in the
hierarchy). - We often pick one central node as the root of a
tree and the nodes with no children as the
leaves - The depth of a tree is the (average) distance
from the root to the children - Lots of times we say tree when we mean
hierarchies - Trees can be used for searching with keys.
That is, an element providing directions for
moving around in the tree.
6Balanced, Binary, Search Trees
- Binary Search tree (there are always 2 choices)
- Each node contains a key
- Left sub-trees stored all keys smaller than the
parent key - Right sub-trees stored all keys larger than the
parent key - Balanced trees
- Every parent has an approximately equal number of
left-and right sub-trees
7Using Trees for Storing Data and Then Searching
for it there
- Storing Data in a Balanced Binary Tree
- In a balanced tree, the number of steps to find
an answer (the depth) is - NumSteps Levels NumItems
8Operations on Binary Search Trees
- Search
- Create a New Tree
- Insert New Items
- Delete Items
9Inserting Data into a Search Tree
- Walk down the tree to the node closest in value
to the one to be inserted - Link the new node in if this results in an
unbalanced tree go to the next page.
10Balancing a Binary Search Tree
The left tree is slightly less efficient than the
right tree. On the left tree, there are more
items deep in the tree than at the higher level.
11B Trees (Multi-Way Trees)
- Description
- Each node can have more than one key (multi-way
tree) - If a node has m keys, it will have m1 children
branches. - All keys in i-1 branch are smaller than key I
This makes it useful for searching, it is the
search tree property. - All leaves are at the same depth.
- Advantages of B-Tree over Binary Tree
- Useful when reading long records off a disk
12B tree
10
25
4
8
13
18
21
22
28
39
51
52
0
3
13B Tree
- B Trees
- B-tree that stores all data in the leaves.
- That way, the B-Tree can be kept in a separate
file from the data
14B tree
(F, M)
(Ap, Bs, E)
(Gr, H, L)
(P, Ru, T)
1
2
3
4
5
6
7
8
9
10
11
12
15Searching a Tree that is Not Well Structured
- The moves in some games can be organized as a
tree (a game space) - Tic-tac-toe, Chess
- Have to make estimates of the value of each
choice - A value can be calculated for the alternatives.
Further, we can have a rule (such as MIN-MAX) for
selecting alternatives. - Can do learning (remember what were good choices)