On Random Sampling over Joins - PowerPoint PPT Presentation

About This Presentation
Title:

On Random Sampling over Joins

Description:

Black-Box U2: Given relation R with n tuples, generate an unweighted WR sample of size r. ... 3. Use r invocations of Black-Box U1 or U2 to sample r sample, one ... – PowerPoint PPT presentation

Number of Views:15
Avg rating:3.0/5.0
Slides: 17
Provided by: mathT
Category:
Tags: black | box | joins | over | random | sampling

less

Transcript and Presenter's Notes

Title: On Random Sampling over Joins


1
On Random Sampling over Joins
  • Surajit Chaudhuri Rajeeve Motwani Vivek
    Narasayya
  • Microsoft Research Stanford University
    Microsoft Research

2
Subtitles
  • The difficulty of join sampling - Example.
  • Semantic and algorithms of sample
  • Two previous sampling strategies
  • New strategies for join sampling
  • Experiments results

3
The Difficulty of Join Sampling -Example
  • Suppose that we have the relations

4
Black-Box U2 Given relation R with n tuples,
generate an unweighted WR sample of size r.
  • 1.
  • 2. Initialize reservoir array A1..r with r
    dummy values.
  • 3. While tuples are streaming by do begin
    (a) get
    next tuple t
    (b)
    (c) for
    j1 to r set Aj to t with probability 1/N end

5
Black-Box WR2 Given relation R with n tuples,
generate a weighted WR sample of size r.
  • 1.
  • 2. Initialize reservoir array A1r with r dummy
    values.
  • 3. While tuples are streaming by do begin
    (a) get next tuple t with weight w(t)
    (b)

    (c) for j1 to r do set Aj to t with prob.
    w(t)/W end.

6
The Classification of the Problem
  • Case A No information is available for
    either or .
  • Case B No information is available for
    but indexes and /or statistics are available for
    .
  • Case C Indexes/statistics are available for
    and .

7
Previous Sampling Strategies
  • Strategy Naive-Sample
  • 1. Compute the join .
  • 2. As the tuples of J stream by, use Black-Box
    U1
  • or U2 to produce
    .

8
Previous Sampling Strategies
  • Strategy Olken-Sample
  • 1. Let M be an upper bound on for all
    .
  • 2.repeat
  • (a) Sample a tuple uniformly at
    random.
  • (b) Sample a random tuple from
    among all
  • tuples that have
    .
  • (c) Output with probability
    , and
  • with remaining probability reject the
    sample.
  • Until r tuples have been produced.

9
New Strategies for Join Sampling
  • Strategy Stream Sample is more efficiency then
    Olken
    1. No information is required for -
    case B.
    2. No tuple is
    rejected after computing the join .
    3.
    Only one iteration is needed for each output
    tuple.

10
New Strategies for Join Sampling
  • Strategy Stream Sample
  • 1. Use Black-Box WR1 or WR2 to produce a WR
    sample of size r, where the weight for
    a tuple is set to
  • 2. While tuples of are streaming by do begin
  • (a) get next tuple and let
  • (b) sample a random tuple from
    among all
  • tuples that have
  • (c) output .
    end.

11
New Strategies for Join Sampling
  • Strategy Group Sample
  • 1. Use Black-Box WR1 or WR2 to produce a WR
    sample of size r, where the weight
    for a tuple is set to
    .
  • 2. Let consist of the tuples
    . Produce
    whose tuples are grouped by s tuples
    that generated them.
  • 3. Use r invocations of Black-Box U1 or U2 to
    sample r sample, one of each group.

12
New Strategy for Join Sampling
  • Strategy Frequency-Partition-Sample

13
Experimental Results
14
Experimental Results
15
Experimental Results
16
Summery
  • The difficulty of join sampling- example.
  • The classification of the problem - 3 cases.
  • Naive-sample
    Olken-sample previous
    strategies
  • Stream-sample
    Group-sample
    new strategies Frequency-partition-s
    ample
  • Conclusion The new strategies are better then
    the earlier techniques.
Write a Comment
User Comments (0)
About PowerShow.com