Measures of Precision - PowerPoint PPT Presentation

About This Presentation
Title:

Measures of Precision

Description:

... 10 n is an integer r is between 1 and 10 n is an integer n is an integer X has first digit 1 precisely when log(X) is ... density curve for log(X) ... – PowerPoint PPT presentation

Number of Views:68
Avg rating:3.0/5.0
Slides: 63
Provided by: CRE120
Category:

less

Transcript and Presenter's Notes

Title: Measures of Precision


1
How to Fake Data if you must
Rachel Fewster
Department of Statistics
2
Who wants to fake data?
  • Electoral finance returns
  • Toxic emissions reports
  • Business tax returns

3
Land areas of world countries real or fake?
4
Land areas of world countries real or fake?
1 2 3 4 5 6 7 8 9
IIIII III III I I II I
5
Land areas of world countries real or fake?
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
IIIII III III I I II I
I I III I IIII I II III
6
Land areas of world countries real or fake?
This one is right!
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
IIIII III III I I II I
I I III I IIII I II III
This one seems more even
This one has as many 1s as 5-9s put together!
7
Real land areas of world countries
11 of them begin with digits 1 4
Only 5 begin with digits 5 9
8
Fridays Newspaper
10 out of 34 numbers began with a 1
None out of 34 began with a 9!
9
The Curious Case of the Grimy Log-books
  • In 1881, American astronomer Simon Newcomb
    noticed something funny about books of logarithm
    tables

10
The Curious Case of the Grimy Log-books
The first pages are for numbers beginning with
digits 1 and 2
The books always seemed grubby on the first
pages
The last pages are for numbers beginning with
digits 8 and 9
but clean on the last pages
11
The Curious Case of the Grimy Log-books
Why?
People seemed to look up numbers beginning with
1 and 2 more often than they looked up numbers
beginning with 8 and 9.
Because numbers beginning with 1 and 2 are MORE
COMMON than numbers beginning with 8 and 9!!
12
Newcombs Law
30 of numbers begin with a 1 !!
lt 5 of numbers begin with a 9 !!
American Journal of Mathematics, 1881
13
The First Digits
Over 30 of numbers begin with a 1
Only 5 of numbers begin with a 9
14
The First Digits
Numbers beginning with a 1
Numbers beginning with a 9
There is the same opportunity for numbers to
begin with 9 as with 1 but for some reason they
dont!
15
0.301 log10(2/1)
0.176 log10(3/2)
0.125 log10(4/3)
Chance of a number starting with digit d
16
Reactions to Newcombs law
Nothing!
for 57 years!
17
Enter Frank Benford 1938
Physicist with the General Electric
Company Assembled over 20,000 numbers and
counted their first digits!
A study as wide as time and energy permitted.
18
Populations
Numbers from newspapers
Drainage rates of rivers
Numbers from Readers Digest articles
Street addresses of American Men of Science
19
About 30 begin with a 1
About 5 begin with a 9
20
Anomalous numbers !!
Benford gave the law its name but no
explanation.
21
The logarithmic law applies to outlaw numbers
that are without known relationship, rather than
to those that follow an orderly course and so
the logarithmic relation is essentially a Law of
Anomalous Numbers.
22
What is the explanation?
Explanations for Benfords Law
  • Numbers from a wide range of data sources have
    about 30 of 1s, down to only 5 of 9s.
  • Benford called these outlaw or anomalous
    numbers. They include street addresses of
    American Men of Science, populations, areas,
    numbers from magazines and newspapers.
  • Benfords orderly numbers dont follow the law
    like atomic weights and physical constants

23
Popular Explanations
These two say that IF there is a universal law,
it must be Benfords.
They dont explain why there should be a law to
start with!
  • Scale Invariance
  • Base Invariance
  • Complicated Measure Theory
  • Divine choice
  • Mystery of Nature

24
Complicated Measure Theory
In a nutshell If you grab numbers from all
over the place (a random mix of distributions),
their digit frequencies ultimately converge to
Benfords Law
25
Thats why THIS works well
26
It doesnt really explain WHAT will work well,
nor why
It doesnt explain why street addresses of
American Men of Science works well!
27
The Key Idea
If a hat is covered evenly in red and white
stripes
Photo - Eric Pouhier http//commons.wikimedia.org
/wiki/Napoleon
28
The Key Idea
If a hat is covered evenly in red and white
stripes
it will be half red
and half white.
Photo - Eric Pouhier http//commons.wikimedia.org
/wiki/Napoleon
29
A Hat
30
A Hat
31
A Hat
If the red stripes cover half the base, theyll
cover about half the hat
The red stripes and the white stripes even out
over the shape of the hat
32
What if the red stripes cover 30 of the base?
0 0.3 1 1.3 2 2.3 3 3.3
4 4.3 5 5.3 6
Then theyll cover about 30 of the hat.
33
What if the red stripes cover precisely fraction
0.301 of the base?
Then theyll cover fraction 0.301 of the hat.
0 0.301 1 1.301 2 2.301 3
3.301 4 4.301 5 5.301 6
0.301 log10(2/1)
34
Think of X as a random number
We want the probability that X has first digit
1
Let the hat be a probability density curve for X
Then AREAS on the hat give PROBABILITIES for X
35
Think of X as a random number
We want the probability that X has first digit
1
Let the hat be a probability density curve for X
Then AREAS on the hat give PROBABILITIES for X
Area 0.95 from 1 to 5
Pr(1 lt X lt 5) 0.95
Total area 1
36
In the same way .
0 0.301 1 1.301 2 2.301 3
3.301 4 4.301 5 5.301 6
If the red stripes somehow represent the X values
with first digit 1, and the red stripes have
area 0.301, then Pr(X has first digit 1)
0.301.
37
So X values with first digit1 somehow lie on a
set of evenly spaced stripes?
Write X in Scientific Notation
38
So X values with first digit1 somehow lie on a
set of evenly spaced stripes?
Write X in Scientific Notation
r is between 1 and 10
n is an integer
39
For example
r is between 1 and 10
n is an integer
40
For example
For the first digit of X, only r matters!
41
For example
r gt 2 J
1 lt r lt 2 J
For the first digit of X, only r matters!
42
Take logs to base 10
Or in other words
43
r is between 1 and 10
n is an integer
44
r is between 1 and 10
n is an integer
45
r is between 1 and 10
n is an integer
46
n is an integer
X has first digit 1 precisely when log(X)
is between n and n 0.301 for any integer n
n 0
X from 1 to 2
n 1
X from 10 to 20
n 2
X from 100 to 200
47
n is an integer
X has first digit 1 precisely when log(X)
is between n and n 0.301 for any integer n
STRIPES!!
n 0
n 1
n 2
48
The hat is the probability density curve for
log(X)
0 0.301 1 1.301 2 2.301 3
3.301 4 4.301 5 5.301 6
X values with first digit 1 satisfy
n 0
and so on!
n 1
n 2
49
The hat is the probability density curve for
log(X)
0 0.301 1 1.301 2 2.301 3
3.301 4 4.301 5 5.301 6
X values with first digit 1 satisfy
n 0
X from 1 to 2
n 1
X from 10 to 20
n 2
X from 100 to 200
50
0 0.301 1 1.301 2 2.301 3
3.301 4 4.301 5 5.301 6
So X values with first digit1 DO lie on evenly
spaced stripes, on the log scale!
The PROBABILITY of getting first digit 1 is the
AREA of the red stripes, approx the fraction on
the base, 0.301.
51
Weve done it!
Weve shown that we really should expect the
first digit to be 1 about 30 of the time!
52
Intuitively
So the smallest numbers (first digit 1) are
stretched out, and get the highest probability!
0 0.301 1 1.301 2 2.301 3
3.301 4 4.301 5 5.301 6
The log scale distorts small numbers (e.g. 100)
are stretched out larger numbers (e.g. 900) are
bunched up. The first digit corresponds to
regularly spaced stripes on the log scale.
53
When is this going to work?
0 0.301 1 1.301 2 2.301 3
3.301 4 4.301 5 5.301 6
The distribution of X needs to be WIDE on the
log scale!
We need a lot of stripes to balance out big ones
and little ones! We get one stripe every
integer So we need a lot of integers!
54
When is this going to work?
0 0.301 1 1.301 2 2.301 3
3.301 4 4.301 5 5.301 6
X ranges from 0 to 6 on the log scale So it
ranges from 1 to 106 on usual scale!
1 .. 2 .. Miss a few ... 999,999 .. 1,000,000
55
These are Benfords Outlaw Numbers!
0 0.301 1 1.301 2 2.301 3
3.301 4 4.301 5 5.301 6
  • All we need is a distribution that is
  • WIDE (4 6 orders of magnitude or more)
  • Reasonably SMOOTH
  • Then the red stripes will even out to cover about
    30 of the total area.

56
In Real Life
First digits very good fit to Benford!
World Populations From 50 for the Pitcairn
Islands To 1.3 x 109 for China
Wide (9 integers gt 9 stripes)
57
In Real Life
World Populations From 50 for the Pitcairn
Islands To 1.3 x 109 for China
58
Electorate populations? From 583,000 to 773,000
in California
The hat has less than one stripe! Benford
doesnt work here.
Of course not! All the first digits are 5, 6, or
7
59
But naturally occurring populations are a
different story! Cities in California - from 94
in the city of Vernon - to 3.9 million in Los
Angeles
Yes! Its Benford!
Wide enough (5 integers gt 5 stripes)
60
Powerball Jackpots? - from 10 million to 365
million
Not bad!
Orders of magnitude only 1.5 but sometimes
you just hit lucky!
Data with kind permission from www.lottostrategies
.com
61
Your tax return.?
???
If you plan to fake data, you should first check
whether it ought to be Benford! BUT the IRD has
a few other tricks up its sleeve too.
62
Thanks for listening!
  • To find out more
  • A Simple Explanation of Benfords Law
  • by R. M. Fewster
  • The American Statistician, to appear.
  • PDF from
  • www.stat.auckland.ac.nz/fewster/benford.html
  • Judy Patersons CMCT course, Term 1 2009 Centre
    for Mathematical Content in Teaching
Write a Comment
User Comments (0)
About PowerShow.com