The Power of Incorrectness - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

The Power of Incorrectness

Description:

The Power of Incorrectness A Brief Introduction to Soft Heaps The Problem A heap (priority queue) is a data structure that stores elements with keys chosen from a ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 33
Provided by: pmclubUwa
Category:

less

Transcript and Presenter's Notes

Title: The Power of Incorrectness


1
The Power of Incorrectness
  • A Brief Introduction to Soft Heaps

2
The Problem
  • A heap (priority queue) is a data structure that
    stores elements with keys chosen from a totally
    ordered set (e.g. integers)
  • We need to support the following operations
  • Insert an element
  • Update (decrease the key of an element)
  • Extract min (find and delete the element with
    minimum key)
  • Merge (optional)

3
A Note on Notation
  • We evaluate algorithm speed using big O notation.
  • Most of the upper bounds on runtime given here
    are also lower bounds, but we use just big-O to
    simplify notation.
  • Some of the runtime specified are amortized,
    which means the average over a sequence of
    operation. Theyre stated as normal bounds to
    reduce confusion.
  • All logs are 2 based and denoted as lg.
  • N is the number of elements in our heap at any
    time. We also use it to denote the number of
    operations.
  • Were working in the comparison model.

4
What about delete?
  • Note we can perform delete the lazy style by
    marking the elements as deleted using a flag.
  • Then we can perform repeated extract-mins when
    the minimum element is already marked.
  • So delete doesnt need to be treated any
    differently than extract-min.

5
The General Approach
  • We store the elements in a tree with a constant
    branching factor
  • Heap condition the key of any node is always at
    least the key of its parent.
  • Exercise show we can perform insert and delete
    min in time proportional to the height of the
    tree.

6
Binary Heaps
  • We use a perfectly balanced tree with a constant
    branching factor.
  • The height of the tree is O(lgN).
  • So insert/update/extract min all take O(lgN)
    time.
  • Merge is not supported as a basic operation.

7
Binomial Heap
  • Binomial heaps use a branching factor of O(lgN)
    and can also support merge in O(lgn) time.
  • The main idea is keep a forest of trees, each
    with a number of nodes are powers of two and no
    two have the same size.
  • When we merge two trees of same size, we get
    another tree whose size is a power of two.
  • We can merge two such forests in O(lgN) time in a
    manner analogous to binary addition.

8
Structure of Binomial Heaps
  • We typically describe binomial heaps as one
    binomial heap attached to another of the same
    size.
  • Let the rank of a tree in a binomial be the log
    of the number of nodes it has.

9
Even More Heap
  • If we get lazy with binomial heaps, which is to
    save all the work until we perform extract min,
    we can get to O(1) per insert and merge, but
    O(lgn) for extract min and update.
  • Fibonacci heaps (by Tarjan) can do insert,
    update, merge in O(1) per access.
  • But it still requires O(lgN) for a delete
  • Can we get rid of the O(lgN) factor?

10
No!
  • WHY?

11
A Bound on Sorting
  • We cant sort N numbers in a comparison model
    faster than O(NlgN) time.
  • Sketch of Proof
  • There are N! possible permutations
  • Each operation in comparison model can only have
    2 possible results, true/false.
  • So for our algorithm to give distinguish all N!
    inputs, we need log(N!)O(NlgN) operations.

12
Now Apply to Heaps
  • Given an array of N elements, we can insert them
    into a heap in N inserts.
  • Performing extract-min once gives the 1st element
    of the sorted list, 2nd time gives the 2nd
    element.
  • So we can perform extract-min N times to get a
    sorted list back
  • So one of insert or extract min must take at
    least O(lgN) time per operation.

13
Is There a Way Around This
  • Note there is a hidden assumption made in the
    proof on the previous slide
  • The result given by every call of extract-min
    must be correct.

14
The Idea
  • We sacrifice some correctness to get a better
    runtime. To be more specific, We allow a fraction
    of the answers provided by extract-min to be
    incorrect.

15
Soft Heaps
  • Supports insertion, update, extract-min and merge
    in O(1) time.
  • No more than eN (0ltelt0.5) of all elements have
    their keys raised at any point.

16
The Motivation Car Pooling
17
No, I Meant This
18
The Idea in Words
  • We modify the binomial heap described earlier.
  • trees dont have to be full anymore.
  • The idea of rank can be transplanted.
  • We put multiple elements on the same node,
    resulting in the non-fullness.
  • This allows a reduction in the height of the tree.

19
The Catch
  • If a node has multiple elements stored on it, how
    do we track which one is the minimum?
  • Solution we assign all the elements in the list
    the same key.
  • Some of the keys would be raised.
  • This is where the error rate comes in.

20
Example
  • Modified binomial heap with 8 elements.
  • Two of the nodes have 2 elements instead of one.
    Note 2 and 3s key values are raised.
  • But two nodes in the deeper parts of the tree are
    no longer there.

21
Outline of the Algorithm
  • Insert is done through merging of heaps
  • We merge as we do in binomial heaps, in a manner
    not so different than adding binary numbers.
  • When inserting, we do not have to change any of
    the lists stored in the nodes, all we have to do
    it to maintain heap order when merging trees.

22
Extract-Min
  • If the roots list is not empty, we just take
    what it close to the minimum, remove it and
    reduce the size of the list by 1
  • Recall that we dont have to be right sometimes.
  • This is a bit trickier when the list is empty.
  • In both cases we siphon elements from below the
    root to append the roots list using a separate
    procedure called sift.

23
Sift
  • We pull up some of the elements the current
    nodes list up the tree. We need to concatenate
    the item lists when two lists collide.
  • Then we perform sift on one of the children of
    the current node. Note that at this point were
    doing the same thing as in a binary heap.
  • However, we call sift on another child of the
    node on some cases, which makes the sift calls
    truly branching. The question is when to do this.

24
How Many Elements Do We Sift?
  • This is tricky. If we dont sift, the height (and
    thus runtime) would become O(lgN).
  • But if we sift too much, we can get more than eN
    elements with keys raised.
  • We need to use a combination of the size of the
    tree and size of the current list to decide when
    to sift and destroy nodes, hence the branching
    condition is key.

25
Sift Loop Condition
  • We call sift twice when the rank of the current
    tree is large enough (gtr for some r) and the rank
    is odd.
  • The rank being odd condition ensures we never
    call sift more than twice.
  • The constant r is used to globally control how
    much we sift.

26
One More Detail
  • We need to keep a rank invariant, which states
    that a node has at least half as many children as
    its rank.
  • This prevents excessive merging of lists.
  • We can keep this condition as follows every time
    we find a violation on root, we dismantle it list
    and merge the elements of the list to get its
    subtrees.

27
Result of the Analysis
  • The total cost of merging is O(n) by an argument
    similar to counting the number of carries
    resulting from incrementing a binary counter N
    times.
  • Result on Sift from paper (no proof)
  • Let r22lg(1/ e ), then sift runs in a O(r) per
    call, which is O(1) per operation as e is
    constant.
  • We can also show that the runtime of O(1/ e) is
    optimal if at most eN elements can have keys
    raised.
  • Note that if we set e1/2N, no errors can occur
    and we get a normal heap back.

28
Is This Any Useful?
  • Dont ever submit this for your CS assignment and
    expect it to get right answers.

29
A Problem
  • Given a list of N numbers, we want to find the
    kth largest in O(N) time.
  • Randomized quick-select does it in O(N), but its
    randomized.
  • The most well-known deterministic algorithm for
    this involved finding the median of 5 numbers and
    taking median of medians...basically a mess.

30
A Simple Deterministic Solution
  • We insert all N elements in to a soft heap with
    error rate e1/3, and perform extract min N/3
    times. Then the largest number deleted has rank
    between 1/3N and 2/3N.
  • So we can remove 1/3n numbers from consideration
    each time (the ones on different side of k) and
    do the rest recursively.
  • Runtime n(2/3)n(2/3)2n(2/3)3nO(n)

31
Other applications
  • Approximate sorting sort n numbers so theyre
    nearly ordered.
  • Dynamic maintenance of percentiles
  • Minimum spanning trees
  • This is the problem soft heaps were designed to
    solve. It gives the best algorithm to date.
  • With soft heap (and another 56 pages of work),
    we can get an O(Ea(E)) algorithm for minimum
    spanning tree. (a(E) is the inverse Ackerman
    function)

32
Bibliography
  • Chazelle, Bernard. The Soft Heap An Approximate
    Priority Queue with Optimal Error Rate.
  • Chazelle, Bernard. A Minimum Spanning Tree
    Algorithm with Inverse Ackerman Type Complexity.
  • Wikipedia
Write a Comment
User Comments (0)
About PowerShow.com