The Buffer Tree - PowerPoint PPT Presentation

About This Presentation
Title:

The Buffer Tree

Description:

Therefore a delete-min operation will empty all the buffers on the above path in ... Distribute the insert elements (if sub-tree was emptied, to leaf buffers) ... – PowerPoint PPT presentation

Number of Views:95
Avg rating:3.0/5.0
Slides: 38
Provided by: oroz
Category:

less

Transcript and Presenter's Notes

Title: The Buffer Tree


1
The Buffer Tree
  • Lars Arge

Presented by Or Ozery
2
I/O Model
  • Previously defined
  • N of elements in input
  • M of elements that fit into memory
  • B of elements per block
  • Measuring in terms of of blocks
  • n N / B
  • m M / B

3
I/O Model vs. RAM Model
RAM Model I/O Model
Scanning T(N) T(n)
List merging T(N) T(n)
Sorting T(N log 2 N) T(n log m n)
Searching T(log 2 N) T(log B N)
Sorting using a B-tree T(N log 2 N) T(N log B N)
4
Online vs. Batched
  • Online Problems
  • Batched Problems
  • A single command is given each time.
  • Must be processed before other commands are
    given.
  • Should be performed in a good W.C. time.
  • For example Searching.
  • A stream of commands is given.
  • Can perform commands in any legal order.
  • Should be performed in a good amortized time.
  • For example Sorting.

5
Motivation
  • Weve seen that using an online-efficient data
    structure (B-tree) for a batched problem
    (sorting) is inefficient.
  • We thus would like to design a data structure for
    efficient use on batched problems, such as
  • Sorting
  • Minimum reporting (priority queue)
  • Range searching
  • Interval stabbing

6
The Main Idea
  • There are 2 reasons why B-tree sorting is
    inefficient
  • We work element-wise instead of block-wise.
  • We dont take advantage of the memory size m.
  • We can fix both problems by using buffers
  • It allows us to accumulate elements into blocks.
  • Using buffers of size T(m), we fully utilize the
    memory.

7
The Buffer Tree
  • (m/4, m)-tree ? branching factor T(m).
  • Elements are stored in leaves, in blocks ? O(n)
    leaves.
  • Each internal node has a buffer of size m.

8
Basic Properties
  • The height of the tree is O(log m n).
  • The number of internal nodes is O(n/m).
  • From now on define
  • Leaf nodes nodes that have children which are
    leaves.
  • Internal nodes nodes that are not leaf nodes.
  • The buffer tree uses linear space
  • Each leaf takes O(1) space ? O(n) space.
  • Each node takes O(m) space ? O(n) space.

9
Processing Commands
  • We wait until we have a block of commands, then
    we insert it to the buffer of the root.
  • Because we process commands in a lazy way, we
    need to time-stamp them.
  • When the buffer of the root gets full, we empty
    it, using a buffer-emptying process (BEP)
  • We distribute elements to the buffers one level
    down.
  • If any of the child buffers gets full, we
    continue in recursion.

10
Internal Node BEP
  • Sort the elements in the buffer while deleting
    corresponding insert and delete elements.
  • Scan through the sorted buffer and distribute the
    elements to the appropriate buffers one level
    down.
  • If any of the child buffers is now full, run the
    appropriate BEP recursively.
  • Internal node BEP takes O(x m), where x is the
    number of elements in the buffer.

11
Leaf Node BEP
  1. Sort the elements in the buffer as for internal
    nodes.
  2. Merge the sorted buffer with the leaves of the
    node.
  3. If the number of leaves increased
  4. Place the smallest elements in the leaves of the
    node.
  5. Repeatedly insert one block of elements and
    rebalance.
  6. If the number of leaves decreased
  7. Place the elements in sorted order in the leaves,
    and append dummy-blocks at the end.
  8. Repeatedly delete one dummy block and rebalance.

12
Rebalancing - Fission
13
Rebalancing - Fusion
14
Rebalancing Cost
  • Rebalancing starts when inserting/deleting a
    block.
  • The leaf node which sparked the rebalancing, will
    not cause rebalancing for the next O(m)
    inserts/deletes.
  • Thus the total number of rebalancing operations
    on leaf nodes is O(n/m).
  • Each rebalancing operation on a leaf node can
    span O(log m n) rebalancing operations.
  • So there are O((n/m) log m n) rebalancing
    operations, each costs O(m) ? Rebalancing takes
    O(n log m n).

15
Summing Up
  • Weve seen rebalancing takes O(n log m n).
  • BEP cost
  • BEP of full buffers is linear in the number of
    blocks in the buffer ? Each element pays O(1/B)
    to be pushed one level down the tree.
  • Because there are O(log m n) levels in the tree,
    each element pays O(log m n / B) ? BEP takes O(n
    log m n).
  • Therefore, a sequence of N operations on an empty
    buffer tree takes O(n log m n).

16
Sorting
  • After inserting all N items to the tree, we need
    to empty all the buffers. We do this in a BFS
    order.
  • How much does emptying all buffers cost?
  • Emptying a buffer takes O(m) amortized.
  • There are O(n/m) buffers ? Total cost is O(n).
  • Thus sorting using a buffer tree takes O(n log m
    n).

17
Priority Queue
  • We can easily transform our buffer tree into a PQ
    by adding support for a delete-min operation
  • The smallest element is found on the path from
    the root to the leftmost leaf.
  • Therefore a delete-min operation will empty all
    the buffers on the above path in O(m log m n).
  • To make-up for the above cost, we delete the M/4
    smallest elements and keep them in memory.
  • This way we can answer the next M/4 delete-mins
    free.
  • Thus our PQ supports N operations in O(n log m n).

18
Time-Forward Processing
  • The problem
  • We are given a topologically ordered DAG.
  • For each vertex v there is a function fv which
    depends on all fu where u is a predecessor of v.
  • The goal is to compute fv for all v.

19
TWP Using Our PQ
  • For each vertex v (sorted in topological order)
  • Extract the minimum d-(v) elements from the PQ.
  • Use the extracted elements to compute fv.
  • For each edge (v, u) insert fv in the PQ with
    priority u.
  • The above works in O(n log m n).

20
Buffered Range Tree
  • We want to extend our tree to support range
    queries
  • Given an interval x1, x2, report all elements
    of the tree that our contained in it.
  • How will we distribute the query elements when
    emptying a buffer?
  • As long as the interval is contained in a
    sub-tree, send the query element to the root
    buffer of that sub-tree.
  • Otherwise, we split the query into its 2 query
    elements, and report the elements in the relevant
    sub-trees.

21
Time Order Representation
  • We say that a list of elements is in time order
    representation (TOR) if its of the form D-S-I,
    where
  • D is a sorted list of delete elements.
  • S is a sorted list of query elements.
  • I is a sorted list of insert elements.
  • Lemma 1
  • A non-full buffer can be brought into TOR in O(m
    r) where r B is the number of queries
    reported in the process.

22
Merging of TOR Lists
  • Lemma 2
  • Let S1 and S2 be TOR lists such that all elements
    of S2 are older then the elements of S1.
  • S1 and S2 can be merged into a TOR list in O(s1
    s2 r) where s1 and s2 are the size in blocks
    of S1 and S2 and r B is the number of queries
    reported in the process.

23
Proof of Lemma 2
  • Let Sj dj - sj - ij.

Time
d2 s2 i2 d1 s1 i1
d2 s2 d1 i2 s1 i1
d2 d1 s2 i2 s1 i1
d2 d1 s2 s1 i2 i1
d s i
24
Full Sub-Tree Reporting
  • Lemma 3
  • All buffers of a sub-tree with x leaves can be
    emptied and collected to a TOR list in O(x r).
  • Proof
  • For each level, prepare a TOR list of its
    elements.
  • Merge the TOR lists of all levels.

25
Internal Node BEP
  1. Compute the TOR of the buffer.
  2. Scan the delete elements and distribute them.
  3. Scan the range search elements and determine
    which sub-trees should have their elements
    reported.
  4. For each such sub-tree
  5. Remove the delete elements from (2) and store
    them in temporary place.
  6. Collect the elements of the sub-tree into TOR.
  7. Merge this TOR with the TOR of the removed delete
    elements.
  8. Distribute the insert and delete elements to leaf
    buffers.
  9. Merge a copy of the leaves with the TOR.
  10. Remove the range search elements from the TOR.
  11. Report the resulting elements to whoever needs
    it.
  12. Distribute the range search elements.
  13. Distribute the insert elements (if sub-tree was
    emptied, to leaf buffers).
  14. If any child buffer got full, apply the BEP
    recursively.

26
Leaf Node BEP
  1. Construct the TOR of the elements in the buffer.
  2. Merge the TOR with the leaves.
  3. Remove all range search elements and continue the
    BEP as in the normal buffer tree.

27
Analysis
  • The main difference from the normal buffer tree
    is the action of reporting all elements of a
    sub-tree.
  • By lemma 3, this action has a linear cost.
  • We thus can split this cost between the delete
    elements and query elements, as each element gets
    either deleted or reported.
  • Thus, a series of N operations on our buffered
    range tree costs O(n log m n r).

28
Orthogonal Line Intersection
  • The problem
  • Given N line segments parallel to the axes,
    report all intersections of orthogonal segments.

29
OLI Using Our Range Tree
  • Sort the segments, once by their top y
    coordinate, and once by their bottom y
    coordinate.
  • Merge the 2 sorted list of segments
  • When encountering a top coordinate of a vertical
    segment, insert its x coordinate to the tree.
  • When encountering a bottom coordinate of a
    vertical segment, delete its x coordinate from
    the tree.
  • When encountering a horizontal segment, insert a
    query for its endpoints.
  • The above takes an optimal O(n log m n r).

30
Buffered Segment Tree
  • We switch parts between points and intervals
  • We insert and delete intervals from the tree.
  • We use points as queries to get reported on all
    intervals stabbed by a point.
  • We assume the intervals has (distinct) endpoints
    from a fixed given set E of size N.
  • The elements in leaves will be the points of E.
  • We build our tree bottom-up in O(n).

31
Buffered Segment Tree
  • Define slabs, multi-slabs, short/long segments.

32
Internal Node BEP
  • Repeatedly load m/2 blocks of elements into
    memory, and perform the following
  • For every multi-slab list insert the relevant
    long segments.
  • For every multi-slab list that is stabbed by a
    point, report intervals and remove expired ones.
  • Distribute segments and queries.
  • If theres a full child buffer, apply BEP
    recursively.
  • The above costs O(m x r) O(x r)
    amortized.

33
Analysis
  • Because the tree structure is static, there is no
    rebalancing, and also no emptying of non-full
    buffers.
  • Therefore the only cost is emptying of full
    buffers, which is linear.
  • Thus a series of N operations on our segment tree
    takes O(n log m n r).
  • A write (flush) operation takes O(n log m n).
  • Therefore we have the desired O(n log m n r).

34
Batched Range Searching
  • The problem
  • Given N points and N axis parallel rectangles in
    the plane, report all points inside each
    rectangle.

35
BRS Using Our Segment Tree
  • Sort points and rectangles by their top y
    coordinate.
  • Scan the sorted list
  • For each rectangle, insert the interval that
    corresponds to its horizontal side, with a delete
    time matching its bottom y coordinate.
  • For each point, insert a stabbing query.
  • Flush the tree (empty all buffers).
  • The above takes an optimal O(n log m n r).

36
Pairwise Rectangle Intersection
  • The problem
  • Given N axis parallel rectangles in the plane,
    report all intersecting pairs.

37
PRI Using Our Segment Tree
  • 2 rectangles in the plane intersect ? one of the
    following holds
  • They have intersecting edges.
  • One contains the other ? One contains the others
    midpoint.
  • We have shown an O(n log m n r) solution for
    both (1) and (2).
  • Therefore we have an optimal O(n log m n r)
    solution for the PRI problem.
Write a Comment
User Comments (0)
About PowerShow.com