CSC 213 - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

CSC 213

Description:

Look at advanced Tree structures. Part of most databases, operating systems ... Nodes can get randomly spread over heap. Good torture test for roommates computer ... – PowerPoint PPT presentation

Number of Views:20
Avg rating:3.0/5.0
Slides: 19
Provided by: wwwcsCa
Category:
Tags: csc

less

Transcript and Presenter's Notes

Title: CSC 213


1
CSC 213 Large Scale Programming
  • Lecture 37
  • External Caching (a,b)-Trees

2
Todays Goal
  • Look at advanced Tree structures
  • Part of most databases, operating systems
  • Anywhere there is lot of data to be held
  • Already examined related (2,4) trees
  • Now look at more general definition
  • Also examine why we should care

3
Lies My Professor Told Me
  • Big-Oh notation not always accurate
  • For example, treats memory accesses equally
  • But many different memories inside machine
  • Organized in a pyramid
  • Higher faster
  • Lower cheaper
  • (Cheaper also means more memory available)

register
L1 cache
L2 cache
main memory (RAM)
hard drive
4
Hierarchy In Perspective
  • Suppose the processor needs a beverage
  • Registers -- Drink from the mug in its hand
  • L1 Cache -- Get from a case in the fridge
  • L2 Cache -- Get from tapped barrel in the cellar
  • Main memory -- Purchase corner Wilson Farms
  • Hard drive -- Drive to closest brewery buy vat
  • Network -- Go to Germany buy Bavaria

5
Waiting Is a Pain
6
Not All Access Are Equal
  • Want to limit access to lowest possible level
  • Easy when we are only using a few objects
  • Difficult when working with non-trivial data sets
  • Two common approaches to avoid the wait
  • Caching -- hold data from hard drive in RAM
  • Usually stores most recently or frequently used
    data
  • Locality -- organize data to limit amount used
  • By matching internal storage to improve cache
    effectiveness

7
Virtual Memory
  • Extends RAM by using space on hard drive
  • Big win if we rarely access the material on disk
  • Incredibly slow if always stuck driving to
    brewery
  • Works by dividing memory into pages
  • Each page is a constant size (usually 4096 bytes)
  • Operating system handles memory at page level
  • Limits overhead and maximizes efficiency
  • Evicts unused pages to the hard drive for storage
  • Reloads pages when it is then accessed

8
Problems with Binary Trees
  • Good way to organize information
  • Provides consistent O(log n) processing times
  • Organization is very bad for locality, however
  • Nodes contain only 1 piece of data
  • Must then jump to one of its two children
  • Nodes can get randomly spread over heap
  • Good torture test for roommates computer
  • (2,4) trees provide some improvement
  • Still have at most 3 elements 4 children
  • Does not use anything like 4096 bytes in a page

9
(a, b) Trees to the Rescue!
  • Real-world solution to killing disks by paging
  • Linux MacOS to track files directories
  • Organization used by MySQL other databases
  • Found in many other places where paging occurs
  • (2,4) trees are one example of these
  • Can also create others, just follow the rules
  • All leaves are found at same level of the tree
  • All internal nodes but root have at least a
    children
  • All internal nodes have at most b child Nodes

10
Improving Locality
  • For (2,4) trees, a 2 and b 4
  • Process of splitting and merging nodes still
    holds
  • We only vary the number of children in Node
  • Minimize paging using good size for a b
  • Store all the elements in an additional
    dictionary
  • Make sure full node, including dictionary and
    child references fill a page
  • Limit number of nearly empty pages by selecting
    reasonable value for a

11
Insertion
  • Always insert data into a leaf node
  • Once inserted check for overflow!
  • Trying to make larger than allowed
  • Example insert(30)

15 24
27 30 32 35
27 32 35
12
18
3
4
5
1
2
12
Split In Case Of Overflow
  • Split overflowing Node 2 new nodes
  • Promote median element to the parent Node
  • Divide remaining elements into the two new Nodes
  • This may cause parent Node to overflow
  • So must repeat the process until we hit the root
  • If the root node overflows, we create a new root!

15 24 32
27 28 29 30
27 28 30
12
18
35
13
Parent Overflow
  • Example insert(29)

15 24 29 32
27 28
12
18
35
30
14
Parent Overflow
  • Example insert(29)

29
15 24
32
12
27 28
18
35
30
15
Underflow and Fusion
  • Deleting Entry may cause underflow
  • Two possible solutions depending on situation
  • Example remove(15)

9 14
15
2 5 7
10
16
Case 1 Transfer
  • Has adjacent sibling with elements to spare
  • Steal closest Entry from parent siblings child
  • Parent takes siblings closest Entry
  • Were done
  • Example remove(10)

4 9
4 9
4 9
4
4 8
6 8
6 8
2
10
2
9
6 8
6
17
Case 2 Fusion
  • Emptied node has siblings of minimum size
  • Merge node sibling into one
  • Steal Entry from parent that was between siblings
  • May propagate underflow to parent!
  • Example remove(15)

9 14
9 14
9
9 14
2 5 7
10
15
10 14
2 5 7
10
18
For Next Lecture
  • Look at most popular version of (a, b)Tree
  • How a BTree is implemented
  • Ways of reading an writing these trees to disk
Write a Comment
User Comments (0)
About PowerShow.com