How to Lie with Statistics - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

How to Lie with Statistics

Description:

72% of all crow nests in a particular forest are in pine trees. Therefore, crows prefer to nest in pine trees. But, 95% of all trees in the forest are pine! ... – PowerPoint PPT presentation

Number of Views:61
Avg rating:3.0/5.0
Slides: 42
Provided by: lindat2
Category:
Tags: crows | lie | statistics

less

Transcript and Presenter's Notes

Title: How to Lie with Statistics


1
How to Lie with Statistics
  • as in the book by Darrell Huff

2
Types of Lies
  • Intentional deceit
  • Selective data use
  • Extrapolation
  • Creative graphics
  • Faulty assumptions
  • Incompetence

3
Mmmm
  • Many of the truths we hold onto depend on our
    point of view Ben Kenobi

4
Look at this graph
5
Compared to this one
6
Or this one
7
Some graphs use pictures
8
B the height and the weight were doubled from Joe
to Ann
9
The Gee-Whiz Graph
  • Attractive figures must be true
  • Axes do not need labels or units
  • Scale is intent dependent

10
This looks like the fertilizer does an OK job.
11
This looks even better
12
How to mislead through visual effects
  •  Direct labels changing the title
  • Encoding - using color coding to associate values
    to numbers.   
  • Self-representing scales - portraying commonly
    known objects next to the object being discussed.

13
(No Transcript)
14
(No Transcript)
15
(No Transcript)
16
(No Transcript)
17
(No Transcript)
18
(No Transcript)
19
(No Transcript)
20
(No Transcript)
21
Sample with the Built-in Bias
  • Practically all statistics are based on a sample
    of a population. So...
  • how was the sample chosen?
  • how big is the sample?
  • what population does it claim to represent?
  • what population does it actually represent?

22
Size of Sample
  • Flip a coin 5 times
  • Heads four times
  • 80 heads
  • Flip 100 times
  • Same results???
  • In general, the larger the sample size, the
    better the estimation.

23
FDR poll
  • Early 1900's
  • Poll were taken during the U.S. presidential
    campaign of Franklin D. Roosevelt (FDR).
  • Only those people with telephones.
  • Pollsters predicted not FDR The pollsters
    predicted one candidate would win, but FDR
    actually won the real election.
  • The poll did NOT accurately reflect all of the
    voters because the opinions of only one part of
    the population (wealthy people with telephones)
    were taken into account.

24
Random?
  • Truly random samples are practically impossible
    but almost everyone claims theyve done it
  • Quadrants were randomly placed in a forest -
    except where there was a thicket of multiflora
    rose and greenbriar
  • Water quality was determined at random locations
    in a stream - but somehow always get clustered
    around access roads

25
What population does the sample represent?
  • No population is uniform - it is composed of many
    distinct and interactive subsets
  • are you crossing subsets boundaries?
  • mixing age groups, income levels, soil types
  • are you aware of the subsets?
  • how fine are the subsets divided
  • are each observation independent?
  • inflating sample size

26
Best ways to Lie with Sampling
  • Ignore possible biases in your sampling method -
    many are too difficult to detect anyway.
  • Claim everything has been done randomly - it is
    expected of you
  • Express more confidence in your sampling method
    then it merits
  • Do not elaborate

27
The Well Chosen Average
  • Remember the three measures of central tendency
  • Mean
  • Median
  • Mode Average could mean any of them

28
Pick your favorite average
  • Mean 37,000
  • Median 12,000
  • Mode 9000
  • Each is a legitimate average but can serve
    conflicting purposes
  • Incomes
  • 9000
  • 9000
  • 9000
  • 12,000
  • 120,000
  • 85,000
  • 15,000

29
Standard Deviation versus Standard Error
  • Standard Deviation describes variability around
    the mean
  • Standard Error assesses the precision of the
    estimate of the population mean

30
How to Lie with Averages
  • Use the standard error. It is always smaller than
    the standard deviation and thus looks better.
  • People look at the error bars but rarely at what
    the bars represent
  • State an average without explaining what it
    measures.
  • The average person will think it is the mean

31
The Little Figures that are Not There
  • Confusing graphics
  • Meaningless and misleading averages
  • lose an average of 30 pounds
  • Proportions or ratios stated without an
    explanation of what produced them
  • 3/4 dentists prefer Brand X toothpaste

32
What Average?
  • Person Money
  • John 2
  • Ann 3
  • Bob 1
  • Mary 10
  • Sue 5
  • Carol 2
  • Ken 999
  • Mean 146
  • Median 3
  • Mode 2

33
Misleading Graphics
  • Graphics are aesthetically appealing
  • Graphics convey a lot of information with minimum
    effort
  • Graphics are nice and vague

34
(No Transcript)
35
Examples of Irrelevant Conclusions
  • 72 of all crow nests in a particular forest are
    in pine trees
  • Therefore, crows prefer to nest in pine trees.
  • But, 95 of all trees in the forest are pine!

36
Failure to apply in general..
  • A bird survey in a woodlot detected a healthy
    songbird population
  • But at the next survey, hardly a bird was heard.
  • What happened to the birds?

37
Confuse cause and effect
  • Does high income cause ownership of stocks or
    does ownership of stocks cause high income?
  • Mistake correlation with causation

38
Misleading Numbers
  • Play with significant digits. 1.28 appears more
    accurate than 1.0.
  • Answer a question with a statistic that is
    slightly irrelevant.
  • Do students get a quality education at school X?
    Yes, the average GPA is 3.2
  • But what if has class rank of 70 with a GPA of
    3.8.

39
Other mis-uses
  • Present a result without a significant value.
  • Use untestable assumptions.
  • Use precision and accuracy interchangeably
  • Perform nonsensical tests that sound good.
  • No significant difference was found between field
    A and field B. At what scale????

40
Remember that.
  • a statistic is only worthwhile when it
    satisfies the assumptions of the model/test.
    Knowing whether the assumptions are met is
    dependent on the competence of the person running
    the stats. Often difficult to catch in the review
    process.

41
Thanks to
  • http//faculty.washington.edu/chudler/stat3.html
    Eric Chudler
  • http//atlantic.evsc.virginia.edu/jhp7e/EVSC503/s
    lides/stats_lie02/index.htm David Bowne
  • Dr. Robertta H. BarbaSan José State
    Universityhttp//sweeneyhall.sjsu.edu/edit272/lie
    /sld001.htm 
Write a Comment
User Comments (0)
About PowerShow.com