Title: Analysis of the Three Birthday Problem
1Analysis of the Three Birthday Problem
- In this last assignment, you were asked to
distribute your queries among a number of
parties you could attend large or small parties,
but you were limited to attending 10,000 parties.
- What happens as we try different methods of
analysis? What makes a difference? How do we
most accurately estimate the number of people
required to make exactly 50 probability of
winning a bet. - Start with a brute force approach. If we surmise
that the 50 points and the maximum are between
50 and 200 people, simply use our 10,000 guesses
divided evenly between these dates. The data is
noisy, but theres a lot of it so its not
difficult to fit a curve. - At the other extreme, divide the 10,000 samples
into only 10 party sizes. The data is much less
noisy, but there are many fewer points. Which is
best at finding the crossover points?
2Analysis of the Three Birthday Problem
- For this problem I did the following steps.
- Do four experiments. In each, divide my 10,000
parties evenly between a defined set of numbers - 50, 51, 52, 198, 199
- 50, 52, 54, 56, 196, 198
- 50, 55, 60, 65, 70, 185, 190, 195
- 50, 65, 80, 95 170, 185, 200
- Try to fit data to a Centered 3rd order
polynomial. Feed this into my 30-day free trial
of Prism by GraphPad. (BTW, a great product I
was able to solve this problem relatively quickly
yet can see that the tool has great potential for
a number of other applications.) - The polynomial is Y B0 B1 ( X X0) B2
(X X0)2 B3 ( X X0)3 - Take the equation produced by Prism, along with
the raw data to Excel where its plotted in the
next few graphs. - Use Excel to determine the 0.500 crossover point
for the 4 data sets.
3150 points
Data are noisy, but there are many points. So
how is this curve fitting accomplished?
475 points
530 points
610 points
Data are much less noisy, but considerably fewer
points.
7Comparison of the four methods
Is there a clear winner here?
8Analysis of the Three Birthday Problem
- This is an extension of the problem. Since the
four types of data sampling produced a variation
in both the equation and in the crossover points,
what happens with 4 runs of the same data sizes. - Divide my 10,000 parties evenly between a defined
set of numbers - 50, 52, 54, 56, 196, 198
- Run this data four times and get four sets of
random numbers. - Again, fit the data to a third order polynomial.
- The polynomial is Y B0 B1 ( X X0) B2
(X X0)2 B3 ( X X0)3 - Take the equation produced by Prism, along with
the raw data to Excel where its displayed in the
following table. - Use Excel to determine the 0.500 crossover point
for the 4 data sets.
9Comparison of four runs of 75 numbers
Note that the 10,000 samples gives us
uncertainties in the crossover number. This is
approximately the spread when we try different
distributions of samples. One can only conclude
that 10,000 samples produces this amount of
uncertainty, no matter which party-size
distribution you choose.