Statistical Inference - PowerPoint PPT Presentation

1 / 63
About This Presentation
Title:

Statistical Inference

Description:

The purpose of statistical inference is to obtain information about a population ... Sample mean1. 2.5. ECO 3411. 34. Central Limit Theorem ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 64
Provided by: tarekbu
Category:

less

Transcript and Presenter's Notes

Title: Statistical Inference


1
Lesson 1
  • Statistical Inference
  • Random Sampling

2
Sampling and Sampling Distributions
  • Simple Random Sampling
  • Point Estimation
  • Introduction to Sampling Distributions
  • Sampling Distribution of
  • Sampling Distribution of
  • Properties of Point Estimators
  • Other Sampling Methods

n 100
n 30
3
Statistical Inference
  • The purpose of statistical inference is to obtain
    information about a population from information
    contained in a sample
  • A population is the set of all the elements of
    interest.
  • A sample is a subset of the population.
  • The sample results provide only estimates of the
    values of the population characteristics.
  • A parameter is a numerical characteristic of a
    population.
  • With proper sampling methods, the sample results
    will provide good estimates of the population
    characteristics.

4
Simple Random SamplingFinite Population
  • Finite populations are often defined by lists
    such as
  • Organization membership roster
  • Credit card account numbers
  • Inventory product numbers
  • A simple random sample of size n from a finite
  • population of size N is a sample selected
    such
  • that each possible sample of size n has the
    same
  • probability of being selected.

5
Simple Random SamplingFinite Population
  • Replacing each sampled element before
    selecting
  • subsequent elements is called sampling with
  • replacement.
  • Sampling without replacement is the procedure
  • used most often.
  • In large sampling projects, computer-generated
  • random numbers are often used to automate
    the
  • sample selection process.

6
Simple Random SamplingInfinite Population
  • Infinite populations are often defined by an
    ongoing process whereby the elements of the
    population consist of items generated as though
    the process would operate indefinitely.
  • A simple random sample from an infinite
    population
  • is a sample selected such that the following
    conditions
  • are satisfied.
  • Each element selected comes from the same
  • population.
  • Each element is selected independently.

7
Simple Random SamplingInfinite Population
  • In the case of infinite populations, it is
    impossible to
  • obtain a list of all elements in the
    population.
  • The random number selection procedure cannot
    be
  • used for infinite populations.

8
Random Sampling
  • The basis for statistical inference about a
    population based on a sample
  • Example Build restaurant in neighborhood?
  • Population the collection of items you want to
    understand
  • N items. Example all people in neighborhood
  • Sample a smaller collection of population units
  • n items. Example 100 neighborhood residents who
    agree to be interviewed
  • Which 100 residents?
  • How to select?

9
Random Sample
  • A random sample must satisfy
  • 1. Each population unit must have an equal chance
    of being selected
  • This helps assure representation, because all
    units in the population are equally accessible
  • 2. Units must be selected independently of one
    another
  • This guarantees that each item to be selected
    will bring new, independent information
  • Properties
  • Sample is representative of population (on
    average)

10
Selecting a Random Sample
  • Use Table of Random Digits
  • Establish the frame (population units from 1 to
    N)
  • Decide starting place in table of random digits
  • Read random digits in groups
  • e.g., if N 5,281, then use groups of 4 digits
    (N has 4 digits)
  • Include number group
  • if it is from 1 to N, and has not yet been chosen
  • Shuffle the Population (Spreadsheet)
  • Arrange the population items in a column from 1
    to N
  • Put random numbers in an adjacent column
    RAND()
  • Sort population items in order by random numbers

11
Table of Random Digits
  • For example
  • Starting in row 21, column 3
  • We find 52794, then 01466

12
Example Sample Selection
  • Select random sample using random number table
  • Of size n 4 from a population of size N 861,
    starting at row 21, column 3 of the Table of
    Random Digits
  • Start with random digits
  • 52794 01466 85938 14565 79993
  • Group by 3 (because N 861 has 3 digits)
  • 527 940 146 685 938 145 657
  • Omit 000, and also 862, 863, , 999
  • Omit duplicates, until n 4 are obtained in the
    sample
  • 527 146 685 145
  • The random sample includes the following
    population units (numbered by frame)
  • 527, 146, 685, 145

13
Terminology of Sampling
  • Representative Sample
  • Same percentages in sample as in population
  • Example sample is representative if same
    percent
  • work, are young/old, are single/married, etc
  • Biased Sample
  • Not representative of population in an important
    way
  • Example sample is biased if too many retired
    people

14
Statistic and Parameter
  • Sample Statistic
  • Any number computed from sample data
  • A random variable. Known
  • Example Average weekly food expenditures for 100
    sampled residents
  • Random? Yes! Due to randomness of sample
    selection
  • Population Parameter
  • Any number computed for the entire population
  • A fixed number. Unknown
  • Example mean weekly food expenditures for all
    77,386 residents
  • Do we ever know this? NO!
  • But we estimate it (with error)

15
Point Estimation
  • In point estimation we use the data from the
    sample to compute a value of a sample statistic
    that serves as an estimate of a population
    parameter.
  • We refer to as the point estimator of the
    population mean ?.
  • s is the point estimator of the population
    standard deviation ?.
  • is the point estimator of the population
    proportion p.

16
Sampling Error
  • When the expected value of a point estimator
    is equal
  • to the population parameter, the point
    estimator is said
  • to be unbiased.
  • The absolute value of the difference between
    an
  • unbiased point estimate and the
    corresponding
  • population parameter is called the sampling
    error.
  • Sampling error is the result of using a subset
    of the
  • population (the sample), and not the entire
  • population.
  • Statistical methods can be used to make
    probability
  • statements about the size of the sampling
    error.

17
Sampling Error
  • The sampling errors are

18
Example UCF BUSINESS STUDENTS
  • The director of admissions would like to know
    the
  • following information
  • the average SAT score for the applicants, and
  • the proportion of applicants that want to live on
    campus.
  • We will now look at two alternatives for
    obtaining
  • the desired information.
  • Conducting a census of the entire 900 applicants
  • Selecting a sample of 30 applicants,

19
Example UCF
  • Taking a Census of the 900 Applicants
  • SAT Scores
  • Population Mean
  • Population Standard Deviation
  • Applicants Wanting On-Campus Housing
  • Population Proportion

20
Example UCF
  • Take a Sample of 30Applicants Using a Random
    Number Table
  • We will need 3-digit random numbers to randomly
    select applicants numbered from 1 to 900.
  • We will use the last three digits of the
    5-digit random numbers in the third column of a
    random number table. The numbers we draw will be
    the numbers of the applicants we will sample
    unless
  • the random number is greater than 900 or
  • the random number has already been used.
  • We will continue to draw random numbers until we
  • have selected 30 applicants for our sample.

21
Example UCF
  • Use of Random Numbers for Sampling
  • 3-Digit Applicant
  • Random Number Included in Sample
  • 744 No. 744
  • 436 No. 436
  • 865 No. 865
  • 790 No. 790
  • 835 No. 835
  • 902 Number exceeds 900
  • 190 No. 190
  • 436 Number already used
  • etc. etc.

22
Example UCF
  • Sample Data
  • Random
  • No. Number Applicant SAT Score
    On-Campus
  • 1 744 Connie Reyman 1025 Yes
  • 2 436 William Fox 950
    Yes
  • 3 865 Fabian Avante 1090 No
  • 4 790 Eric Paxton 1120 Yes
  • 5 835 Winona Wheeler 1015 No
  • . . . . .
  • 30 685 Kevin Cossack 965 No

23
Example UCF
  • Take a Sample of 30 Applicants Using
    Computer-Generated Random Numbers
  • Excel provides a function for generating random
    numbers in its worksheet.
  • 900 random numbers are generated, one for each
    applicant in the population.
  • Then we choose the 30 applicants corresponding to
    the 30 smallest random numbers as our sample.
  • Each of the 900 applicants have the same
    probability of being included.

24
Using Excel to Selecta Simple Random Sample
  • Formula Worksheet

Note Rows 10-901 are not shown.
25
Using Excel to Selecta Simple Random Sample
  • Value Worksheet

Note Rows 10-901 are not shown.
26
Using Excel to Selecta Simple Random Sample
  • Value Worksheet (Sorted)

Note Rows 10-901 are not shown.
27
Example UCF
  • Point Estimates
  • as Point Estimator of ?
  • s as Point Estimator of ?
  • as Point Estimator of p
  • Note Different random numbers would have
  • identified a different sample which would have
    resulted in different point estimates.

28
Summary of Point Estimates Obtained from a Simple
Random Sample
Population Parameter
Point Estimator
Point Estimate
Parameter Value
m Population mean SAT score
990
997
80
s Sample std. deviation for SAT
score
75.2
s Population std. deviation for
SAT score
.72
.68
p Population pro- portion wanting
campus housing
29
Sampling Distribution of
  • Process of Statistical Inference

Population with mean m ?
A simple random sample of n elements is
selected from the population.
30
Sampling Distribution of
  • The sampling distribution of is the
    probability distribution of all possible values
    of the sample
  • mean .
  • Expected Value of
  • E( ) ?
  • where
  • ? the population mean
  • For this class ALWAYS INFINITE

31
Sampling Distribution of
  • Standard Deviation of
  • FOR THIS CLASS ALWAYS INFINITE POPULATION
  • is referred to as the standard error of the mean.

32
Example
  • Let us assume a population of N5
  • We can have 5 possible sample of size 4(order is
    not important).

33
Continued
34
Central Limit Theorem
  • In selecting a random sample of size n from a
    population, the sampling distribution of the
    sample mean can be approximated by a
    normal probability distribution as the sample
    size becomes large
  • The sampling distribution of can be
    approximated by a normal probability distribution
    whenever the sample size is large. The
    large-sample condition can be assumed for simple
    random samples of size 30 or more
  • Whenever the population has a normal probability
    distribution, the sampling distribution of
    has a normal probability distribution for any
    sample size
  • Please go to www.ruf.rice.edu./lane/rvls.html

35
Sampling Distribution of
  • If we use a large (n gt 30) simple random sample,
    the central limit theorem enables us to conclude
    that the sampling distribution of can be
    approximated by a normal probability
    distribution.
  • When the simple random sample is small (n lt 30),
    the sampling distribution of can be
    considered normal only if we assume the
    population has a normal probability distribution.

36
Example UCF
  • Sampling Distribution of for the SAT Scores


37
Example UCF BUSINESS STUDENTS
  • Sampling Distribution of for the SAT
    Scores
  • What is the probability that a simple random
    sample of 30 applicants will provide an estimate
    of the population mean SAT score that is within
    plus or minus 10 of the actual population mean ?
    ?
  • In other words, what is the probability that
    will be between 980 and 1000?

38
Example UCF
  • Sampling Distribution of for the SAT Scores
  • Using the standard normal probability table with
  • z 10/14.6 .68, we have area (.2518)(2)
    .5036

Sampling distribution of
Area .2518
Area .2518
1000
980
990
39
  • Suppose we select a simple random sample of
    100
  • applicants instead of the 30 originally
    considered.

40
(No Transcript)
41
(No Transcript)
42
Area .7888
1000
980
990
43
Sampling Distribution of
  • The sampling distribution of is the
    probability distribution of all possible values
    of the sample proportion
  • Expected Value of
  • where
  • p the population proportion

44
Sampling Distribution of
  • Standard Deviation of Infinite
    Population ONLY
  • is referred to as the standard error of the
    proportion.

45
np gt 5
n(1 p) gt 5
and
46
Example UCF
  • Sampling Distribution of for In-State
    Residents
  • Recall that 72 of the prospective students
    applying to UCF desire on-campus housing.
  • What is the probability that a simple random
    sample of 30 applicants will provide an estimate
    of the population proportion of applicants
    desiring on-campus housing that is within plus or
    minus .05 of the actual population proportion?
  • In other words, what is the probability that
  • will be between .67 and .77?

47
  • For our example, with n 30 and p .72, the
    normal distribution is an acceptable
    approximation because

np 30(.72) 21.6 gt 5
and
n(1 - p) 30(.28) 8.4 gt 5
48
Example UCF
  • Sampling Distribution of for In-State
    Residents


49
ExampleUCF
  • Sampling Distribution of for In-State
    Residents
  • For z .05/.082 .61, the area (.2291)(2)
    .4582.
  • The probability is .4582 that the sample
    proportion will be within /-.05 of the actual
    population proportion.

Sampling distribution of
Area .2291
Area .2291
0.77
0.67
0.72
50
Properties of Point Estimators
  • Before using a sample statistic as a point
    estimator, statisticians check to see whether the
    sample statistic has the following properties
    associated with good point estimators.
  • Unbiasedness
  • Efficiency
  • Consistency

51
Properties of Point Estimators
  • Unbiasedness
  • If the expected value of the sample statistic
    is equal to the population parameter being
    estimated, the sample statistic is said to be an
    unbiased estimator of the population parameter.

52
Properties of Point Estimators
  • Efficiency
  • Given the choice of two unbiased estimators of
    the same population parameter, we would prefer to
    use the point estimator with the smaller standard
    deviation, since it tends to provide estimates
    closer to the population parameter.
  • The point estimator with the smaller standard
    deviation is said to have greater relative
    efficiency than the other.

53
Properties of Point Estimators
  • Consistency
  • A point estimator is consistent if the values
    of the point estimator tend to become closer to
    the population parameter as the sample size
    becomes larger.

54
Other Sampling Methods
  • Stratified Random Sampling
  • Cluster Sampling
  • Systematic Sampling
  • Convenience Sampling
  • Judgment Sampling

55
Stratified Random Sampling
The population is first divided into groups of
elements called strata.
Each element in the population belongs to one
and only one stratum.
Best results are obtained when the elements
within each stratum are as much alike as
possible (i.e. a homogeneous group).
56
Stratified Random Sampling
A simple random sample is taken from each
stratum.
Formulas are available for combining the
stratum sample results into one population
parameter estimate.
Advantage If strata are homogeneous, this
method is as precise as simple random sampling
but with a smaller total sample size.
Example The basis for forming the strata might
be department, location, age, industry type, and
so on.
57
Cluster Sampling
The population is first divided into separate
groups of elements called clusters.
Ideally, each cluster is a representative
small-scale version of the population (i.e.
heterogeneous group).
A simple random sample of the clusters is then
taken.
All elements within each sampled (chosen)
cluster form the sample.
58
Cluster Sampling
Example A primary application is area
sampling, where clusters are city blocks or
other well-defined areas.
Advantage The close proximity of elements can
be cost effective (i.e. many sample observations
can be obtained in a short time).
Disadvantage This method generally requires a
larger total sample size than simple or
stratified random sampling.
59
Systematic Sampling
  • If a sample size of n is desired from a
    population containing N elements, we might sample
    one element for every n/N elements in the
    population.
  • We randomly select one of the first n/N elements
    from the population list.
  • We then select every n/Nth element that follows
    in the population list.
  • This method has the properties of a simple random
    sample, especially if the list of the population
    elements is a random ordering.

60
Systematic Sampling
  • Advantage The sample usually will be easier to
    identify than it would be if simple random
    sampling were used.
  • Example Selecting every 100th listing in a
    telephone book after the first randomly selected
    listing.

61
Convenience Sampling
  • It is a nonprobability sampling technique. Items
    are included in the sample without known
    probabilities of being selected.
  • The sample is identified primarily by
    convenience.
  • Advantage Sample selection and data collection
    are relatively easy.
  • Disadvantage It is impossible to determine how
    representative of the population the sample is.
  • Example A professor conducting research might
    use student volunteers to constitute a sample.

62
Judgment Sampling
  • The person most knowledgeable on the subject of
    the study selects elements of the population that
    he or she feels are most representative of the
    population.
  • It is a nonprobability sampling technique.
  • Advantage It is a relatively easy way of
    selecting a sample.
  • Disadvantage The quality of the sample results
    depends on the judgment of the person selecting
    the sample.
  • Example A reporter might sample three or four
    senators, judging them as reflecting the general
    opinion of the senate.

63
END LESSON 1
Write a Comment
User Comments (0)
About PowerShow.com