Chapter 3: Producing Data - PowerPoint PPT Presentation

1 / 46
About This Presentation
Title:

Chapter 3: Producing Data

Description:

Example: Suppose we wish to do a study on the effect of aspirin on mice, comparing heart rates. ... We randomly assign 50 mice to receive aspirin. ... – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 47
Provided by: hint9
Category:
Tags: chapter | data | mice | producing

less

Transcript and Presenter's Notes

Title: Chapter 3: Producing Data


1
Chapter 3 Producing Data
2
1. Inferential Statistics
  • Population The population is the group of
    people or things that were interested in. This
    group is defined by whatever question is being
    asked.
  • Example 1 Do Texas AM students have breakfast
    regularly?
  • How many populations are of interest?
  • One
  • What is the population of interest?
  • All current Texas AM students

3
1. Inferential Statistics
  • Example 2 Is the IQ of women the same as the IQ
    of men?
  • How many populations are of interest?
  • Two
  • What are the populations of interest?
  • All women and all men
  • Example 3 Which is more effective at lowering
    the heart rate of mice, no drug (control), drug
    A, drug B, or drug C?
  • How many populations are of interest?
  • Four
  • What are the populations of interest?
  • All mice taking no drug, all mice taking drug A,
    all mice taking drug B, all mice taking drug C

4
1. Inferential Statistics
  • Suppose we have no previous information about
    these questions. How could we answer them?
  • Census
  • Advantages
  • We get everyone, we know the truth
  • Disadvantages
  • Expensive, Difficult to obtain, may be
    impossible.
  • Sample A subset of the population selected for
    the study
  • Advantages
  • Take less time and money. Feasible.
  • Disadvantages
  • Uncertainty about the truth. We may have error.

5
1. Inferential Statistics
  • Example 1 Do Texas AM students have breakfast
    regularly?
  • Population all current Texas AM students
  • Sample 1 all STAT 30X students
  • Sample 2 students who come to Blocker building
    on Monday morning
  • Sample 3 randomly choose names from the phone
    book and ask them
  • .
  • One population has many sampling methods

6
1. Inferential Statistics
  • General Idea of Inferential Statistics
  • 1. Take a sample from the whole population.
  • 2. Summarize the sample using important
    statistics.
  • 3. Use those summaries to make inference about
    the whole population.
  • 4. We realize there may be some error involved in
    making inference.

7
1. Inferential Statistics
  • Example
  • Question Can Aspirin reduce the risk of heart
    attack?
  • 1. Take Sample Sample of 22,071 male physicians
    between the ages of 40 and 84, randomly assigned
    to one of two groups. One group took an ordinary
    aspirin tablet every other day. The other group
    took a placebo every other day. This group is
    the control group.
  • 2. Summary statistic The rate of heart attacks
    in the group taking aspirin was only 55 of the
    rate of heart attacks in the placebo group.
  • 3. Inference to population Taking aspirin causes
    lower rate of heart attacks in humans.

8
2. Sampling a Single Population
  • Basics for sampling
  • Sampling should not be biased A sampling method
    is biased if any part of the population cant get
    in.
  • Example 1 Only select STAT 30X students ---
    biased
  • The selection of an individual in the population
    should not affect the selection of the next
    individual independence.
  • Example 1 Survey one student, then ask him to
    introduce his friend to you. --- dependent
  • Sampling should be large enough to adequately
    cover the population. A good sample is one that
    is collected in such a way that it is
    representative of the population.
  • Example 1 only ask 3 students --- sample size
    too small

9
2. Sampling a Single Population
  • Sampling Techniques
  • Simple Random Sample (SRS) every member of the
    population has an equal chance of being selected.

10
2. Sampling a Single Population
  • Sampling Techniques
  • Simple Random Sample (SRS)
  • Assign every individual a number and randomly
    select n numbers using a random number table (or
    computer generated random numbers).
  • Table B at the back of the book is random digits.
  • Example1 Obtain a list of all TAMU students and
    assign every student a number. Using a random
    number table, select 50 of them.
  • Example Obtain a list of all SSN for individuals
    in the U.S. who are over 65. Using a random
    number table, select 50 of them.

11
2. Sampling a Single Population
  • Sampling Techniques SRS Exercise
  • Choose a SRS three names from the following
    employees of a small company. Bechhofer
    Brown Ito Kesten
    Kiefer Spitzer Taylor Wald
    WeissUse the numerical labels attached to the
    names above and the following list of random
    digits. Read the list of random digits from left
    to right, starting at the beginning of the
    list.11793 20495 05907 11384 44982 20751 27498
    12009 45287
  • The simple random sample is
  • a)     1(1)79
  • b)    Bechhofer, then Bechhofer again, then
    Taylor
  • c)     Bechhofer, Taylor, Weiss

12
2. Sampling a Single Population
  • Sampling Techniques
  • Stratified Random Sample Divide the population
    into several strata. Then take a SRS from each
    stratum.

13
2. Sampling a Single Population
  • Sampling Techniques
  • Stratified Random Sample
  • Example1 Obtain a list of all TAMU students and
    divide them into colleges. Then randomly sample
    10 from each college.
  • Example Obtain a list of all SSN for individuals
    in the U.S. who are over 65. Divide up the SSNs
    into region of the country (time zones). Then
    randomly sample 30 from each time zone.
  • Advantage Each stratum is guaranteed to be
    randomly sampled
  • Disadvantage No longer a truly random sample

14
Sampling a Single Population
  • Sampling Techniques
  • Cluster Sample Divide the population into
    several strata or clusters. Then take a SRS of
    clusters.

15
Sampling a Single Population
  • Sampling Techniques
  • Cluster Sample
  • Advantage May be the only feasible method, given
    resources.
  • Example Obtain a list of all SSNs for
    individuals in the U.S. who are over 65. Sort
    the SSNs by the last 4 digits making each set of
    100 a cluster. Use a random number table to pick
    the clusters. You may get the 4100s, 5600s and
    8200s for example.

16
Sampling a Single Population
  • Sampling Techniques
  • Multi-Stage Sample Divide the population into
    several strata. Then take a SRS from a random
    subset of all the strata.

17
Sampling a Single Population
  • Sampling Techniques
  • Multi-Stage Sample
  • Advantage May be the only feasible method, given
    resources.
  • Example Obtain a list of all SSN for individuals
    in the U.S. who are over 65. Divide up the SSNs
    into 50 states. Randomly select 10 states. Then
    randomly sample 40 from each of the selected
    states.

18
2. Sampling a Single Population
  • Sampling Techniques Exercise
  • A small college has 500 male and 600 female
    undergraduates. A simple random sample of 50 of
    the male undergraduates is selected and,
    separately, a simple random sample of 60 of the
    female undergraduates is selected. The two
    samples are combined to give an overall sample of
    110 students. The overall sample is
  • a)     a simple random sample
  • b)    a stratified random sample
  • c)     a cluster sample
  • d)    none of the above

19
2. Sampling a Single Population
  • Sampling Problems
  • Voluntary response Internet surveys, Call-in
    surveys
  • E.g. Survey about earning. People who take the
    survey can get free T-shirts. Busy people wont
    come, and these people often have high earnings.
    So our sample mean/median may be lower than the
    true mean/median.
  • Convenience sampling Sampling friends, Sampling
    at the mall
  • Problem They may have similar interests
  • Dishonesty Asking personal questions, Not enough
    time to respond honestly

20
2. Sampling a Single Population
  • Cautions about Sample Surveys
  • Undercoverage Some groups in the population are
    left out when the sample is taken
  • Ex) sample survey of households will miss not
    only homeless people but prison inmates and
    students in dormitories.
  • Nonresponse An individual chosen for the sample
    cant be contacted or does not cooperate
  • Ex) phone survey, mail survey

21
2. Sampling a Single Population
  • Cautions about Sample Surveys
  • Response Bias Results that are influenced by
    the behavior of the respondent or interviewer
  • The wording of questions can influence the
    answers
  • Eg) (Text p254) How do Americans feel about
    government help for the poor?
  • Only 13 think we are spending too much on
    assistance to the poor.
  • But 44 think we are spending too much on
    welfare.
  • It seems that assistance to the poor is nice,
    hopeful word. Welfare is negative word.
  • Respondent may not want to give truthful answers
    to sensitive questions

22
2. Sampling a Single Population
  • Ex. In order to assess the opinion of students at
    the University of Minnesota on campus snow
    removal, a reporter for the student newspaper
    interviews the first 12 students he meets who are
    willing to express their opinions. The method of
    sampling used is
  • a)    simple random sampling
  • b)    the Gallup Poll
  • c)    voluntary response
  • d) a census

23
3. Sampling More than One Population
  • We sample from more than one population when we
    are interested in more than one variable.
  • One response variable and one explanatory
    variable. The populations are defined by the
    values the explanatory variable takes on.
  • Example 1 Comparing decibel levels of 4
    different brands of speakers
  • What is the explanatory variable?
  • Brand
  • What is the response variable?
  • Decibel Level
  • Number of Populations?
  • Four

24
3. Sampling More than One Population
  • Example 2 Determining time to failure of 3
    different types of light bulbs
  • What is the explanatory variable?
  • Type
  • What is the response variable?
  • Time to Failure
  • Number of Populations?
  • Three

25
3. Sampling More than One Population
  • Example 3 Comparing GRE scores for students from
    5 different majors
  • What is the explanatory variable?
  • Major
  • What is the response variable?
  • GRE score
  • Number of Populations?
  • Five

26
3. Sampling More than One Population
  • Important Considerations
  • Each sample should represent the population it
    corresponds to well.
  • Samples from more than one population should be
    as close to each other in every respect as
    possible except for the explanatory variable.
    Otherwise we may have confounding variables.
  • Two variables are confounded if we cannot
    determine which one caused the differences in the
    response.

27
3. Sampling More than One Population
  • Important Considerations
  • Examples of Confounding
  • Suppose we compared the decibel levels of the
    four different speaker brands, each with a
    different measuring instrument
  • We wouldnt know if the differences were due to
    the different brands or different instruments.
  • Brand and Instrument are then confounded.
  • Suppose we compared the time to failure of the
    three different types of light bulbs, each in a
    different light socket.
  • We wouldnt know if the differences were due to
    the different types of light bulbs or different
    light sockets.
  • Type and Socket confounded.

28
3. Sampling More than One Population
  • Important Considerations
  • Examples of Confounding
  • Suppose we obtained GRE scores for each major,
    each from a different university.
  • We wouldnt know if the differences were due to
    the different majors or different universities.
  • Major and University are then confounded.
  • Confounding can be avoided by using good sampling
    techniques

29
3. Sampling More than One Population
  • Important Considerations
  • It is also possible that more than one (possibly
    several) explanatory variable can influence a
    given response variable.
  • Example
  • Perhaps both the type of light bulb and the type
    of light socket influence the time to failure of
    a light bulb.
  • It is likely that different types of light bulbs
    work better for different sockets.
  • This concept is known as interaction.
  • Interaction The responses for the levels of one
    variable differ over the levels of another
    variable.

30
3. Sampling More than One Population
  • Good Sampling Techniques
  • Randomized Experiment
  • Observational Studies

31
3. Sampling More than One Population
  • Randomized Experiment
  • The key to a randomized experiment the treatment
    (explanatory variable) is randomly assigned to
    the experimental units or subjects.

Random Assignment
Compare
32
3. Sampling More than One Population
  • Randomized Experiment
  • Example Suppose we wish to do a study on the
    effect of aspirin on mice, comparing heart rates.
  • We obtain a random sample of 100 mice.
  • We randomly assign 50 mice to receive a placebo.
  • We randomly assign 50 mice to receive aspirin.
  • After 20 days of administering the placebo and
    aspirin, we measure the heart rates and obtain
    summary statistics for comparison.

33
3. Sampling More than One Population
  • Randomized Experiment
  • The single greatest advantage of a randomized
    experiment is that we can infer causation.
  • Through randomization to groups, we have
    controlled all other factors and eliminated the
    possibility of a confounding variable.
  • We cannot always use a randomized experiment
  • Often impossible or unethical, particularly with
    humans.

34
3. Sampling More than One Population
  • Observational Study
  • We are forced to select samples from different
    pre-existing populations

Simple Random Sample
Compare
35
3. Sampling More than One Population
  • Observational Study
  • Example 1 Suppose we are interested in comparing
    GRE scores for students in five different majors
  • We cannot do a randomized experiment because we
    cannot randomly assign individuals to a specific
    major.
  • Thus, we observe students from 5 different
    pre-existing populations the five majors.
  • We obtain a random sample of size 15 from each of
    the five majors.
  • We calculate statistics and compare the 5 groups.
  • Can we say being in a specific major causes
    someone to get a higher GRE score?

36
3. Sampling More than One Population
  • Observational Study
  • Example 2 Suppose we are interested finding out
    which age group talks the most on the telephone
    0-10 years, 10-20 years, 20-30 years, or 30-40
    years
  • We cannot do a randomized experiment because we
    cannot randomly assign individuals to an age
    group.
  • Thus, we observe (through polling or wire
    tapping) individuals from 4 different
    pre-existing populations the four age groups.
  • We obtain a random sample of size 25 from each of
    the four age groups.
  • We calculate statistics and compare the 4 groups.
  • Can we say being in a specific age group causes
    someone to talk more on the telephone?

37
3. Sampling More than One Population
  • Observational Study
  • Advantage The data is much more easy to obtain.
  • Disadvantages
  • We cannot say the explanatory variable caused the
    response
  • There may be lurking or confounding variables
  • Observational studies should be more to describe
    the past, not predict the future.

38
4. Inference Overview
  • Recall that inference is using statistics from a
    sample to talk about a population.
  • We need some background in how we talk about
    populations and how we talk about samples.

39
4. Inference Overview
  • Describing a Population
  • It is common practice to use Greek letters when
    talking about a population.
  • We call the mean of a population ? .
  • We call the standard deviation of a population ?
    and the variance ? 2.
  • When we are talking about percentages, we call
    the population proportion ?.
  • It is important to know that for a given
    population there is only one true mean and one
    true standard deviation and variance or one true
    proportion.
  • There is a special name for these values
    parameters.

40
4. Inference Overview
  • Describing a Sample
  • It is common practice to use Roman letters when
    talking about a sample.
  • We call the mean of a sample .
  • We call the standard deviation of a sample s and
    the variance s2.
  • When we are talking about percentages, we call
    the sample proportion p.
  • There are many different possible samples that
    could be taken from a given population. For each
    sample there may be a different mean, standard
    deviation, variance, or proportion.
  • There is a special name for these values
    statistics.

41
4. Inference Overview
  • We use sample statistics to make inference about
    population parameters

m
s
s
p
p
42
4. Inference Overview
  • Sampling Variability
  • There are many different samples that you can
    take from the population.
  • Statistics can be computed on each sample.
  • Since different members of the population are in
    each sample, the value of a statistic varies from
    sample to sample.

43
4. Inference Overview
  • The sampling distribution of a statistic is the
    distribution of values taken by the statistic in
    all possible samples of the same size from the
    same population.
  • We can then examine the shape, center, and spread
    of the sampling distribution.

44
4. Inference Overview Bias and Variability
  • Bias concerns the center of the sampling
    distribution. A statistic used to a parameter is
    unbiased if the mean of the sampling distribution
    is equal to the true value of the parameter being
    estimated.
  • To reduce bias, use random sampling. The values
    of a statistic computed from an SRS neither
    consistently overestimates nor consistently
    underestimates the value of the population
    parameter.

45
4. Inference Overview Bias and Variability
  • Variability is described by the spread of the
    sampling distribution.
  • To reduce the variability of a statistic from an
    SRS, use a larger sample. You can make the
    variability as small as you want by taking a
    large enough sample. The variability of a
    statistic from a random sample does not depend on
    the size of the population, as long as the
    population is large enough.

46
4. Inference Overview
Write a Comment
User Comments (0)
About PowerShow.com