Chapter 3: Producing Data

About This Presentation

Title:

Chapter 3: Producing Data

Description:

Example: Suppose we wish to do a study on the effect of aspirin on mice, comparing heart rates. ... We randomly assign 50 mice to receive aspirin. ... – PowerPoint PPT presentation

Number of Views:60

Avg rating:3.0/5.0

Slides: 47

Provided by: hint9

Category:

more less

Transcript and Presenter's Notes

Title: Chapter 3: Producing Data

1
Chapter 3 Producing Data
2
1. Inferential Statistics

Population The population is the group of
people or things that were interested in. This
group is defined by whatever question is being
asked.
Example 1 Do Texas AM students have breakfast
regularly?
How many populations are of interest?
One
What is the population of interest?
All current Texas AM students

3
1. Inferential Statistics

Example 2 Is the IQ of women the same as the IQ
of men?
How many populations are of interest?
Two
What are the populations of interest?
All women and all men
Example 3 Which is more effective at lowering
the heart rate of mice, no drug (control), drug
A, drug B, or drug C?
How many populations are of interest?
Four
What are the populations of interest?
All mice taking no drug, all mice taking drug A,
all mice taking drug B, all mice taking drug C

4
1. Inferential Statistics

Suppose we have no previous information about
these questions. How could we answer them?
Census
Advantages
We get everyone, we know the truth
Disadvantages
Expensive, Difficult to obtain, may be
impossible.
Sample A subset of the population selected for
the study
Advantages
Take less time and money. Feasible.
Disadvantages
Uncertainty about the truth. We may have error.

5
1. Inferential Statistics

Example 1 Do Texas AM students have breakfast
regularly?
Population all current Texas AM students
Sample 1 all STAT 30X students
Sample 2 students who come to Blocker building
on Monday morning
Sample 3 randomly choose names from the phone
book and ask them
.
One population has many sampling methods

6
1. Inferential Statistics

General Idea of Inferential Statistics
1. Take a sample from the whole population.
2. Summarize the sample using important
statistics.
3. Use those summaries to make inference about
the whole population.
4. We realize there may be some error involved in
making inference.

7
1. Inferential Statistics

Example
Question Can Aspirin reduce the risk of heart
attack?
1. Take Sample Sample of 22,071 male physicians
between the ages of 40 and 84, randomly assigned
to one of two groups. One group took an ordinary
aspirin tablet every other day. The other group
took a placebo every other day. This group is
the control group.
2. Summary statistic The rate of heart attacks
in the group taking aspirin was only 55 of the
rate of heart attacks in the placebo group.
3. Inference to population Taking aspirin causes
lower rate of heart attacks in humans.

8
2. Sampling a Single Population

Basics for sampling
Sampling should not be biased A sampling method
is biased if any part of the population cant get
in.
Example 1 Only select STAT 30X students ---
biased
The selection of an individual in the population
should not affect the selection of the next
individual independence.
Example 1 Survey one student, then ask him to
introduce his friend to you. --- dependent
Sampling should be large enough to adequately
cover the population. A good sample is one that
is collected in such a way that it is
representative of the population.
Example 1 only ask 3 students --- sample size
too small

9
2. Sampling a Single Population

Sampling Techniques
Simple Random Sample (SRS) every member of the
population has an equal chance of being selected.

10
2. Sampling a Single Population

Sampling Techniques
Simple Random Sample (SRS)
Assign every individual a number and randomly
select n numbers using a random number table (or
computer generated random numbers).
Table B at the back of the book is random digits.
Example1 Obtain a list of all TAMU students and
assign every student a number. Using a random
number table, select 50 of them.
Example Obtain a list of all SSN for individuals
in the U.S. who are over 65. Using a random
number table, select 50 of them.

11
2. Sampling a Single Population

Sampling Techniques SRS Exercise
Choose a SRS three names from the following
employees of a small company. Bechhofer
Brown Ito Kesten
Kiefer Spitzer Taylor Wald
WeissUse the numerical labels attached to the
names above and the following list of random
digits. Read the list of random digits from left
to right, starting at the beginning of the
list.11793 20495 05907 11384 44982 20751 27498
12009 45287
The simple random sample is
a) 1(1)79
b) Bechhofer, then Bechhofer again, then
Taylor
c) Bechhofer, Taylor, Weiss

12
2. Sampling a Single Population

Sampling Techniques
Stratified Random Sample Divide the population
into several strata. Then take a SRS from each
stratum.

13
2. Sampling a Single Population

Sampling Techniques
Stratified Random Sample
Example1 Obtain a list of all TAMU students and
divide them into colleges. Then randomly sample
10 from each college.
Example Obtain a list of all SSN for individuals
in the U.S. who are over 65. Divide up the SSNs
into region of the country (time zones). Then
randomly sample 30 from each time zone.
Advantage Each stratum is guaranteed to be
randomly sampled
Disadvantage No longer a truly random sample

14
Sampling a Single Population

Sampling Techniques
Cluster Sample Divide the population into
several strata or clusters. Then take a SRS of
clusters.

15
Sampling a Single Population

Sampling Techniques
Cluster Sample
Advantage May be the only feasible method, given
resources.
Example Obtain a list of all SSNs for
individuals in the U.S. who are over 65. Sort
the SSNs by the last 4 digits making each set of
100 a cluster. Use a random number table to pick
the clusters. You may get the 4100s, 5600s and
8200s for example.

16
Sampling a Single Population

Sampling Techniques
Multi-Stage Sample Divide the population into
several strata. Then take a SRS from a random
subset of all the strata.

17
Sampling a Single Population

Sampling Techniques
Multi-Stage Sample
Advantage May be the only feasible method, given
resources.
Example Obtain a list of all SSN for individuals
in the U.S. who are over 65. Divide up the SSNs
into 50 states. Randomly select 10 states. Then
randomly sample 40 from each of the selected
states.

18
2. Sampling a Single Population

Sampling Techniques Exercise
A small college has 500 male and 600 female
undergraduates. A simple random sample of 50 of
the male undergraduates is selected and,
separately, a simple random sample of 60 of the
female undergraduates is selected. The two
samples are combined to give an overall sample of
110 students. The overall sample is
a) a simple random sample
b) a stratified random sample
c) a cluster sample
d) none of the above

19
2. Sampling a Single Population

Sampling Problems
Voluntary response Internet surveys, Call-in
surveys
E.g. Survey about earning. People who take the
survey can get free T-shirts. Busy people wont
come, and these people often have high earnings.
So our sample mean/median may be lower than the
true mean/median.
Convenience sampling Sampling friends, Sampling
at the mall
Problem They may have similar interests
Dishonesty Asking personal questions, Not enough
time to respond honestly

20
2. Sampling a Single Population

Cautions about Sample Surveys
Undercoverage Some groups in the population are
left out when the sample is taken
Ex) sample survey of households will miss not
only homeless people but prison inmates and
students in dormitories.
Nonresponse An individual chosen for the sample
cant be contacted or does not cooperate
Ex) phone survey, mail survey

21
2. Sampling a Single Population

Cautions about Sample Surveys
Response Bias Results that are influenced by
the behavior of the respondent or interviewer
The wording of questions can influence the
answers
Eg) (Text p254) How do Americans feel about
government help for the poor?
Only 13 think we are spending too much on
assistance to the poor.
But 44 think we are spending too much on
welfare.
It seems that assistance to the poor is nice,
hopeful word. Welfare is negative word.
Respondent may not want to give truthful answers
to sensitive questions

22
2. Sampling a Single Population

Ex. In order to assess the opinion of students at
the University of Minnesota on campus snow
removal, a reporter for the student newspaper
interviews the first 12 students he meets who are
willing to express their opinions. The method of
sampling used is
a) simple random sampling
b) the Gallup Poll
c) voluntary response
d) a census

23
3. Sampling More than One Population

We sample from more than one population when we
are interested in more than one variable.
One response variable and one explanatory
variable. The populations are defined by the
values the explanatory variable takes on.
Example 1 Comparing decibel levels of 4
different brands of speakers
What is the explanatory variable?
Brand
What is the response variable?
Decibel Level
Number of Populations?
Four

24
3. Sampling More than One Population

Example 2 Determining time to failure of 3
different types of light bulbs
What is the explanatory variable?
Type
What is the response variable?
Time to Failure
Number of Populations?
Three

25
3. Sampling More than One Population

Example 3 Comparing GRE scores for students from
5 different majors
What is the explanatory variable?
Major
What is the response variable?
GRE score
Number of Populations?
Five

26
3. Sampling More than One Population

Important Considerations
Each sample should represent the population it
corresponds to well.
Samples from more than one population should be
as close to each other in every respect as
possible except for the explanatory variable.
Otherwise we may have confounding variables.
Two variables are confounded if we cannot
determine which one caused the differences in the
response.

27
3. Sampling More than One Population

Important Considerations
Examples of Confounding
Suppose we compared the decibel levels of the
four different speaker brands, each with a
different measuring instrument
We wouldnt know if the differences were due to
the different brands or different instruments.
Brand and Instrument are then confounded.
Suppose we compared the time to failure of the
three different types of light bulbs, each in a
different light socket.
We wouldnt know if the differences were due to
the different types of light bulbs or different
light sockets.
Type and Socket confounded.

28
3. Sampling More than One Population

Important Considerations
Examples of Confounding
Suppose we obtained GRE scores for each major,
each from a different university.
We wouldnt know if the differences were due to
the different majors or different universities.
Major and University are then confounded.
Confounding can be avoided by using good sampling
techniques

29
3. Sampling More than One Population

Important Considerations
It is also possible that more than one (possibly
several) explanatory variable can influence a
given response variable.
Example
Perhaps both the type of light bulb and the type
of light socket influence the time to failure of
a light bulb.
It is likely that different types of light bulbs
work better for different sockets.
This concept is known as interaction.
Interaction The responses for the levels of one
variable differ over the levels of another
variable.

30
3. Sampling More than One Population

Good Sampling Techniques
Randomized Experiment
Observational Studies

31
3. Sampling More than One Population

Randomized Experiment
The key to a randomized experiment the treatment
(explanatory variable) is randomly assigned to
the experimental units or subjects.

Random Assignment
Compare
32
3. Sampling More than One Population

Randomized Experiment
Example Suppose we wish to do a study on the
effect of aspirin on mice, comparing heart rates.
We obtain a random sample of 100 mice.
We randomly assign 50 mice to receive a placebo.
We randomly assign 50 mice to receive aspirin.
After 20 days of administering the placebo and
aspirin, we measure the heart rates and obtain
summary statistics for comparison.

33
3. Sampling More than One Population

Randomized Experiment
The single greatest advantage of a randomized
experiment is that we can infer causation.
Through randomization to groups, we have
controlled all other factors and eliminated the
possibility of a confounding variable.
We cannot always use a randomized experiment
Often impossible or unethical, particularly with
humans.

34
3. Sampling More than One Population

Observational Study
We are forced to select samples from different
pre-existing populations

Simple Random Sample
Compare
35
3. Sampling More than One Population

Observational Study
Example 1 Suppose we are interested in comparing
GRE scores for students in five different majors
We cannot do a randomized experiment because we
cannot randomly assign individuals to a specific
major.
Thus, we observe students from 5 different
pre-existing populations the five majors.
We obtain a random sample of size 15 from each of
the five majors.
We calculate statistics and compare the 5 groups.
Can we say being in a specific major causes
someone to get a higher GRE score?

36
3. Sampling More than One Population

Observational Study
Example 2 Suppose we are interested finding out
which age group talks the most on the telephone
0-10 years, 10-20 years, 20-30 years, or 30-40
years
We cannot do a randomized experiment because we
cannot randomly assign individuals to an age
group.
Thus, we observe (through polling or wire
tapping) individuals from 4 different
pre-existing populations the four age groups.
We obtain a random sample of size 25 from each of
the four age groups.
We calculate statistics and compare the 4 groups.
Can we say being in a specific age group causes
someone to talk more on the telephone?

37
3. Sampling More than One Population

Observational Study
Advantage The data is much more easy to obtain.
Disadvantages
We cannot say the explanatory variable caused the
response
There may be lurking or confounding variables
Observational studies should be more to describe
the past, not predict the future.

38
4. Inference Overview

Recall that inference is using statistics from a
sample to talk about a population.
We need some background in how we talk about
populations and how we talk about samples.

39
4. Inference Overview

Describing a Population
It is common practice to use Greek letters when
talking about a population.
We call the mean of a population ? .
We call the standard deviation of a population ?
and the variance ? 2.
When we are talking about percentages, we call
the population proportion ?.
It is important to know that for a given
population there is only one true mean and one
true standard deviation and variance or one true
proportion.
There is a special name for these values
parameters.

40
4. Inference Overview

Describing a Sample
It is common practice to use Roman letters when
talking about a sample.
We call the mean of a sample .
We call the standard deviation of a sample s and
the variance s2.
When we are talking about percentages, we call
the sample proportion p.
There are many different possible samples that
could be taken from a given population. For each
sample there may be a different mean, standard
deviation, variance, or proportion.
There is a special name for these values
statistics.

41
4. Inference Overview

We use sample statistics to make inference about
population parameters

m
s
s
p
p
42
4. Inference Overview

Sampling Variability
There are many different samples that you can
take from the population.
Statistics can be computed on each sample.
Since different members of the population are in
each sample, the value of a statistic varies from
sample to sample.

43
4. Inference Overview

The sampling distribution of a statistic is the
distribution of values taken by the statistic in
all possible samples of the same size from the
same population.
We can then examine the shape, center, and spread
of the sampling distribution.

44
4. Inference Overview Bias and Variability

Bias concerns the center of the sampling
distribution. A statistic used to a parameter is
unbiased if the mean of the sampling distribution
is equal to the true value of the parameter being
estimated.
To reduce bias, use random sampling. The values
of a statistic computed from an SRS neither
consistently overestimates nor consistently
underestimates the value of the population
parameter.

45
4. Inference Overview Bias and Variability

Variability is described by the spread of the
sampling distribution.
To reduce the variability of a statistic from an
SRS, use a larger sample. You can make the
variability as small as you want by taking a
large enough sample. The variability of a
statistic from a random sample does not depend on
the size of the population, as long as the
population is large enough.

46
4. Inference Overview

Write a Comment

User Comments (0)