Sampling and Soundness: Can We Have Both - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Sampling and Soundness: Can We Have Both

Description:

Cachet [Sang et al. '04] #3: Na ve Sampling Estimate. Idea: Randomly select a region ... Cachet (exact) Relsat (exact) SampleCount (99% conf.) Instance. Talk Roadmap ... – PowerPoint PPT presentation

Number of Views:18
Avg rating:3.0/5.0
Slides: 19
Provided by: ashi64
Category:

less

Transcript and Presenter's Notes

Title: Sampling and Soundness: Can We Have Both


1
Sampling and Soundness Can We Have Both?
  • Carla Gomes, Bart Selman, Ashish Sabharwal
  • Cornell University
  • Jörg Hoffmann
  • DERI Innsbruck
  • and I am Frank van Harmelen

2
Talk Roadmap
  • A Sampling Method with a Correctness Guarantee
  • Can we apply this to the Semantic Web?
  • Discussion

3
How Might One Count?
How many people are present in the hall?
  • Problem characteristics
  • Space naturally divided into rows, columns,
    sections,
  • Many seats empty
  • Uneven distribution of people (e.g. more near
    door, aisles, front, etc.)

4
1 Brute-Force Counting
  • Idea
  • Go through every seat
  • If occupied, increment counter
  • Advantage
  • Simplicity, accuracy
  • Drawback
  • Scalability

5
2 Branch-and-Bound (DPLL-style)
  • Idea
  • Split space into sectionse.g. front/back,
    left/right/ctr,
  • Use smart detection of full/empty sections
  • Add up all partial counts
  • Advantage
  • Relatively faster, exact
  • Drawback
  • Still accounts for every single person present
    need extremely fine granularity
  • Scalability

Framework used in DPLL-based systematic exact
counters e.g. Relsat Bayardo-et-al 00, Cachet
Sang et al. 04
6
3 NaĂŻve Sampling Estimate
  • Idea
  • Randomly select a region
  • Count within this region
  • Scale up appropriately
  • Advantage
  • Quite fast
  • Drawback
  • Robustness can easily under- or over-estimate
  • Scalability in sparse spacese.g. 1060 solutions
    out of 10300 means need region much larger than
    10240 to hit any solutions

7
Sampling with a Guarantee
  • Idea
  • Identify a balanced row split or column split
    (roughly equal number of people on each side)
  • Use local search for estimate
  • Pick one side at random
  • Count on that side recursively
  • Multiply result by 2
  • This provably yields the true count on average!
  • Even when an unbalanced row/column is picked
    accidentallyfor the split, e.g. even when
    samples are biased or insufficiently many
  • Surprisingly good in practice, using a local
    search as the sampler

8
Algorithm SampleCount
Gomes-Hoffmann-Sabharwal-Selman IJCAI07
  • Input Boolean formula F
  • Set numFixed 0, slack some constant (e.g. 2,
    4, 7, )
  • Repeat until F becomes feasible for exact
    counting
  • Obtain s solution samples for F
  • Identify the most balanced variable and
    variable-pair x is balanced s/2
    samples have x 0, s/2 have x 1 (x,y) is
    balanced s/2 samples have x y, s/2 have x
    y
  • If x is more balanced than (x,y), randomly set x
    to 0 or 1Else randomly replace x with y or y
    simplify F
  • Increment numFixed
  • Output model count ? 2numFixed slack ?
    exactCount(simplified F) with confidence
    (1 2 slack )

Note showing one trial
9
Correctness Guarantee
Theorem SampleCount with t trials gives a
correct lower bound with
probability (1 2 slack ? t )
e.g. slack 2, t 4 ? 99 correctness
confidence
  • Key properties
  • Holds irrespective of the quality of the local
    search estimates
  • No free lunch! Bad estimates ? high variance of
    trial outcome ? min(trials) is high-confidence
    but not tight
  • Confidence grows exponentially with slack and t
  • Ideas used in the proof
  • Expected model count true count (for each
    trial)
  • Use Markovs inequality PrXgtkEX lt 1/k to
    bound error probability (X is outcome of one
    trial)

10
Circuit Synthesis, Random CNFs
11
Talk Roadmap
  • A Sampling Method with a Correctness Guarantee
  • Can we apply this to the Semantic Web?
  • Discussion

12
Talk Roadmap
  • A Sampling Method with a Correctness Guarantee
  • Can we apply this to the Semantic Web?
  • Highly speculative
  • Discussion

13
Counting in the Semantic Web
  • should certainly be possible with this method
  • Example given RDF database D, count how many
    triples comply with query q
  • Throw a constraint cutting the set of all triples
    in half
  • If feasible, count n triples exactly return
    n2constraints-slack
  • Else, iterate
  • Merely technical challenges
  • What are constraints cutting the set of all
    triples in half?
  • How to throw a constraint?
  • When to stop throwing constraints?
  • How to efficiently count the remaining triples?

14
What about Deduction?
  • Does ? follow from ??
  • Exploit connection implication ? UNSAT? upper
    bounds?
  • A similar theorem does NOT hold for upper bounds
  • Nutshell Markovs inequality PrXgtkEX lt 1/k
    does not have a symmetric PrXltkEX
    counterpart
  • An adaptation is possible but has many problems ?
    does not look too promising
  • Heuristic alternative
  • Add constraints into ? to obtain ? check
    whether ? implies ?
  • If No, stop if yes, goto next trial
  • After t successful trials, output its enough, I
    believe it
  • No provable confidence but may work well in
    practice

15
What about Deduction?
  • Does ? follow from ??
  • Much more distant adaptation
  • Constraint something that removes half of ?
    !!
  • Throw some and check whether ? ? ?
  • Confidence problematic
  • Can we draw any conclusions if ? NOT ? ??
  • May be that ?1, ?2 in ? with ?1 ??2 ? ?, but a
    constraint separated ?1 from ?2
  • May be that all relevant ? are thrown out
  • Are there interesting cases where we can bound
    the probability of these events??

16
Talk Roadmap
  • A Sampling Method with a Correctness Guarantee
  • Can we apply this to the Semantic Web?
  • Highly speculative
  • Discussion

17
Discussion
  • In prop CNF, one can efficiently obtain
    high-confidence lower bounds on nr of models, by
    sampling
  • Application to Semantic Web
  • Adaptation to counting tasks should be possible
  • Adaptation for ? ? ?, via upper bounds, is
    problematic
  • Promising heuristic method sacrificing
    confidence guarantee
  • Alternative adaptation weakens ? instead of
    strengthening it
  • Sampling the knowledge base
  • Confidence guarantees??
  • Your feedback and thoughts are highly
    appreciated!!

18
What about Deduction?
  • Does ? follow from ??
  • Straightforward adaptation
  • There is a variant of this algorithm that
    computes high-confidence upper bounds instead
  • Throw large constraints, check if ???? is SAT
  • If SAT, no implication if UNSAT in each of t
    iterations, confidence on upper bound on models
  • Many problems
  • Is the ???? actually easier to check??
  • Large constraints are tough even in
    propositional CNF context!
  • (Large involves half of the prop vars needed
    for confidence)
  • Upper bound on models is not confidence in
    UNSAT!
Write a Comment
User Comments (0)
About PowerShow.com