Data Collection and Sampling - PowerPoint PPT Presentation

About This Presentation
Title:

Data Collection and Sampling

Description:

Where then does data come from? How is it gathered? How do we ensure its accurate(??) ... Use open-ended questions cautiously. Avoid using leading-questions. ... – PowerPoint PPT presentation

Number of Views:155
Avg rating:3.0/5.0
Slides: 29
Provided by: zgo1
Category:

less

Transcript and Presenter's Notes

Title: Data Collection and Sampling


1
Data Collection and Sampling
  • Chapter 5

2
Recall
  • Statistics is a tool for converting data into
    information
  • But
  • Where then does data come from?
  • How is it gathered?
  • How do we ensure its accurate(??)? Is the data
    reliable(??)?
  • Is it representative(???) of the population from
    which it was drawn?
  • This chapter explores some of these issues.

3
5.1 Methods of Collecting Data
  • The reliability and accuracy of the data affect
    the validity of the results of a statistical
    analysis.
  • The reliability and accuracy of the data depend
    on the method of collection.
  • Four of the most popular sources of statistical
    data are
  • Published data(????)
  • Observational studies(??)
  • Experimental studies(??)
  • Surveys(??)

4
Published Data
  • This is often a preferred source of data due to
    low cost and convenience.
  • Published data is found as printed material,
    tapes, disks, and on the Internet.
  • Data published by the organization that has
    collected it is called PRIMARY DATA(????).

For example Data published by the US Bureau of
Census.
  • For example
  • The Statistical abstracts of the United States,
  • compiles data from primary sources
  • Compustat, sells variety of financial data
    tapescompiled from primary sources
  • Data published by an organization different than
    the organization that has collected it is called
    SECONDARY DATA(????).

5
Observational and experimental studies
  • When published data is unavailable, one needs to
    conduct a study to generate the data.
  • Observational study is one in which measurements
    representing a variable of interest are observed
    and recorded, without controlling any factor that
    might influence their values.
  • Experimental study is one in which measurements
    representing a variable of interest are observed
    and recorded, while controlling factors(????)
    that might influence their values.

6
Surveys
  • Surveys solicit information from people. e.g.
    pre-election polls marketing surveys.
  • The Response Rate(???) (i.e. the proportion of
    all people selected who complete the survey) is a
    key survey parameter.
  • Surveys can be made by means of
  • personal interview(????)
  • telephone interview(????)
  • self-administered questionnaire(??????)

7
Questionnaire Design(????)
  • Key design principles of a good questionnaire
  • Keep the questionnaire as short as possible.
  • Ask short, simple, and clearly worded questions.
  • Start with demographic questions to help
    respondents get started comfortably.
  • Use dichotomous (yesno) and multiple choice
    questions.
  • Use open-ended questions cautiously.
  • Avoid using leading-questions.
  • Pretest a questionnaire on a small number of
    people.
  • Think about the way you intend to use the
    collected data when preparing the questionnaire.

8
5.2 Sampling(??)
  • Recall that statistical inference permits us to
    draw conclusions about a population based on a
    sample.
  • Motivation for conducting a sampling procedure
  • Costs. (e.g. its less expensive to sample 1,000
    television viewers than 100 million TV viewers)
  • Population size.
  • The possible destructive nature (???)of the
    sampling process. (e.g. performing a crash test
    on every automobile produced is impractical).
  • The sampled population(????) and the target
    population(????) should be similar to one another.

9
5.3 Sampling Plans
  • A sampling plan is just a method or procedure for
    specifying how a sample will be taken from a
    population.
  • We will focus our attention on these three
    methods
  • Simple random sampling(??????)
  • Stratified random sampling(??????)
  • Cluster sampling(????)

10
Simple Random Sampling
  • In simple random sampling all the samples with
    the same size are equally likely to be chosen.
  • To conduct random sampling
  • assign a number to each element of the chosen
    population (or use already given numbers),
  • randomly select the sample numbers (members). Use
    a random numbers table, or a software package.

11
Simple Random Sampling
  • Example 5.1
  • A government income-tax auditor is responsible
    for 1,000 tax returns.
  • The auditor will randomly select 40 returns to
    audit.
  • Use Excels random number generator to select
    the returns.
  • Solution
  • We generate 50 numbers between 1 and 1000 (we
    need only 40 numbers, but the extra might be used
    if duplicate numbers are generated.)

12
Simple Random Sampling
  • Example 5.1 A government income tax auditor must
    choose a sample of 40 of 1,000 returns to audit

Extra s may be used if duplicate random numbers
are generated.
13
Simple Random Sampling
Round-up
X(100)
383 101 597 900 885 959 15 408 864 139 2
46 . .
The auditor should select 40 files numbered
383, 101, ...
14
Stratified Random Sampling
  • This sampling procedure separates the population
    into mutually exclusive sets (strata) (?????),
    and then draw simple random samples from each
    stratum.

15
Stratified Random Sampling
  • With this procedure we can acquire information
    about
  • the whole population
  • each stratum
  • the relationships among strata.

16
Stratified Random Sampling
  • After the population has been stratified, we can
    use simple random sampling to generate the
    complete sample. For example, keep the proportion
    of each stratum in the population.

17
Cluster Sampling
  • Cluster sampling is a simple random sample of
    groups or clusters of elements.
  • This procedure is useful when
  • it is difficult and costly to develop a complete
    list of the population members (making it
    difficult to develop a simple random sampling
    procedure.
  • the population members are widely dispersed
    geographically.
  • Cluster sampling may increase sampling
    error(????), because of probable similarities
    among cluster members.

18
Sample Size(???)
  • Numerical techniques for determining sample sizes
    will be described later, but suffice it to say
    that the larger the sample size is, the more
    accurate we can expect the sample estimates to be.

19
5.4 Sampling and Non-Sampling Errors
  • Two major types of error can arise when a sample
    of observations is taken from a population
  • Sampling error(????) refers to differences
    between the sample and the population that exist
    only because of the observations that happened to
    be selected for the sample.
  • Nonsampling errors (?????) are more serious and
    are due to mistakes made in the acquisition of
    data or due to the sample observations being
    selected improperly.

20
Sampling Error
  • Sampling error refers to differences between the
    sample and the population that exist only because
    of the observations that happened to be selected
    for the sample.
  • Another way to look at this is the differences
    in results for different samples (of the same
    size) is due to sampling error
  • E.g. Two samples of size 10 of 1,000 households.
    If we happened to get the highest income level
    data points in our first sample and all the
    lowest income levels in the second, this delta is
    due to sampling error.
  • Increasing the sample size will reduce this type
    of error.

21
Sampling Errors
Population income distribution
m ( population mean)
Sampling error
22
Nonsampling Error
  • Nonsampling errors are more serious and are due
    to mistakes made in the acquisition of data or
    due to the sample observations being selected
    improperly.
  • Three types of nonsampling errors
  • Errors in data acquisition
  • Nonresponse errors(?????)
  • Selection bias(????)
  • Note increasing the sample size will not reduce
    this type of error.

23
Errors in data acquisition
  • arises from the recording of incorrect
    responses, due to
  • incorrect measurements being taken because of
    faulty equipment,
  • mistakes made during transcription from primary
    sources,
  • inaccurate recording of data due to
    misinterpretation of terms, or
  • inaccurate responses to questions concerning
    sensitive issues.

24
Data Acquisition Error
Population
Sampling error Data acquisition error
Sample
25
Nonresponse Error
  • refers to error (or bias) introduced when
    responses are not obtained from some members of
    the sample, i.e. the sample observations that are
    collected may not be representative of the target
    population.
  • As mentioned earlier, the Response Rate (i.e. the
    proportion of all people selected who complete
    the survey) is a key survey parameter and helps
    in the understanding in the validity of the
    survey and sources of nonresponse error.

26
Non-Response Error
Population
No response here...
may lead to biased results here.
Sample
27
Selection Bias
  • occurs when the sampling plan is such that some
    members of the target population cannot possibly
    be selected for inclusion in the sample.

28
Selection Bias
Population
When parts of the population cannot be selected...
the sample cannot represent the whole population.
Sample
Write a Comment
User Comments (0)
About PowerShow.com