Title: Statistical Inference
1Lesson 1
- Statistical Inference
- Random Sampling
2Sampling and Sampling Distributions
- Simple Random Sampling
- Point Estimation
- Introduction to Sampling Distributions
- Sampling Distribution of
- Sampling Distribution of
- Properties of Point Estimators
- Other Sampling Methods
n 100
n 30
3Statistical Inference
- The purpose of statistical inference is to obtain
information about a population from information
contained in a sample - A population is the set of all the elements of
interest. - A sample is a subset of the population.
- The sample results provide only estimates of the
values of the population characteristics. - A parameter is a numerical characteristic of a
population. - With proper sampling methods, the sample results
will provide good estimates of the population
characteristics.
4Simple Random SamplingFinite Population
- Finite populations are often defined by lists
such as - Organization membership roster
- Credit card account numbers
- Inventory product numbers
- A simple random sample of size n from a finite
- population of size N is a sample selected
such - that each possible sample of size n has the
same - probability of being selected.
5Simple Random SamplingFinite Population
- Replacing each sampled element before
selecting - subsequent elements is called sampling with
- replacement.
- Sampling without replacement is the procedure
- used most often.
- In large sampling projects, computer-generated
- random numbers are often used to automate
the - sample selection process.
6Simple Random SamplingInfinite Population
- Infinite populations are often defined by an
ongoing process whereby the elements of the
population consist of items generated as though
the process would operate indefinitely.
- A simple random sample from an infinite
population - is a sample selected such that the following
conditions - are satisfied.
- Each element selected comes from the same
- population.
- Each element is selected independently.
7Simple Random SamplingInfinite Population
- In the case of infinite populations, it is
impossible to - obtain a list of all elements in the
population.
- The random number selection procedure cannot
be - used for infinite populations.
8Random Sampling
- The basis for statistical inference about a
population based on a sample - Example Build restaurant in neighborhood?
- Population the collection of items you want to
understand - N items. Example all people in neighborhood
- Sample a smaller collection of population units
- n items. Example 100 neighborhood residents who
agree to be interviewed - Which 100 residents?
- How to select?
9Random Sample
- A random sample must satisfy
- 1. Each population unit must have an equal chance
of being selected - This helps assure representation, because all
units in the population are equally accessible - 2. Units must be selected independently of one
another - This guarantees that each item to be selected
will bring new, independent information - Properties
- Sample is representative of population (on
average)
10Selecting a Random Sample
- Use Table of Random Digits
- Establish the frame (population units from 1 to
N) - Decide starting place in table of random digits
- Read random digits in groups
- e.g., if N 5,281, then use groups of 4 digits
(N has 4 digits) - Include number group
- if it is from 1 to N, and has not yet been chosen
- Shuffle the Population (Spreadsheet)
- Arrange the population items in a column from 1
to N - Put random numbers in an adjacent column
RAND() - Sort population items in order by random numbers
11Table of Random Digits
- For example
- Starting in row 21, column 3
- We find 52794, then 01466
12Example Sample Selection
- Select random sample using random number table
- Of size n 4 from a population of size N 861,
starting at row 21, column 3 of the Table of
Random Digits - Start with random digits
- 52794 01466 85938 14565 79993
- Group by 3 (because N 861 has 3 digits)
- 527 940 146 685 938 145 657
- Omit 000, and also 862, 863, , 999
- Omit duplicates, until n 4 are obtained in the
sample - 527 146 685 145
- The random sample includes the following
population units (numbered by frame) - 527, 146, 685, 145
13Terminology of Sampling
- Representative Sample
- Same percentages in sample as in population
- Example sample is representative if same
percent - work, are young/old, are single/married, etc
- Biased Sample
- Not representative of population in an important
way - Example sample is biased if too many retired
people
14Statistic and Parameter
- Sample Statistic
- Any number computed from sample data
- A random variable. Known
- Example Average weekly food expenditures for 100
sampled residents - Random? Yes! Due to randomness of sample
selection - Population Parameter
- Any number computed for the entire population
- A fixed number. Unknown
- Example mean weekly food expenditures for all
77,386 residents - Do we ever know this? NO!
- But we estimate it (with error)
15Point Estimation
- In point estimation we use the data from the
sample to compute a value of a sample statistic
that serves as an estimate of a population
parameter. - We refer to as the point estimator of the
population mean ?. - s is the point estimator of the population
standard deviation ?. - is the point estimator of the population
proportion p.
16Sampling Error
- When the expected value of a point estimator
is equal - to the population parameter, the point
estimator is said - to be unbiased.
- The absolute value of the difference between
an - unbiased point estimate and the
corresponding - population parameter is called the sampling
error.
- Sampling error is the result of using a subset
of the - population (the sample), and not the entire
- population.
- Statistical methods can be used to make
probability - statements about the size of the sampling
error.
17Sampling Error
18Example UCF BUSINESS STUDENTS
- The director of admissions would like to know
the - following information
- the average SAT score for the applicants, and
- the proportion of applicants that want to live on
campus. - We will now look at two alternatives for
obtaining - the desired information.
- Conducting a census of the entire 900 applicants
- Selecting a sample of 30 applicants,
19Example UCF
- Taking a Census of the 900 Applicants
- SAT Scores
- Population Mean
- Population Standard Deviation
- Applicants Wanting On-Campus Housing
- Population Proportion
20Example UCF
- Take a Sample of 30Applicants Using a Random
Number Table - We will need 3-digit random numbers to randomly
select applicants numbered from 1 to 900. - We will use the last three digits of the
5-digit random numbers in the third column of a
random number table. The numbers we draw will be
the numbers of the applicants we will sample
unless - the random number is greater than 900 or
- the random number has already been used.
- We will continue to draw random numbers until we
- have selected 30 applicants for our sample.
21Example UCF
- Use of Random Numbers for Sampling
- 3-Digit Applicant
- Random Number Included in Sample
- 744 No. 744
- 436 No. 436
- 865 No. 865
- 790 No. 790
- 835 No. 835
- 902 Number exceeds 900
- 190 No. 190
- 436 Number already used
- etc. etc.
22Example UCF
- Sample Data
- Random
- No. Number Applicant SAT Score
On-Campus - 1 744 Connie Reyman 1025 Yes
- 2 436 William Fox 950
Yes - 3 865 Fabian Avante 1090 No
- 4 790 Eric Paxton 1120 Yes
- 5 835 Winona Wheeler 1015 No
- . . . . .
- 30 685 Kevin Cossack 965 No
-
23Example UCF
- Take a Sample of 30 Applicants Using
Computer-Generated Random Numbers - Excel provides a function for generating random
numbers in its worksheet. - 900 random numbers are generated, one for each
applicant in the population. - Then we choose the 30 applicants corresponding to
the 30 smallest random numbers as our sample. - Each of the 900 applicants have the same
probability of being included.
24Using Excel to Selecta Simple Random Sample
Note Rows 10-901 are not shown.
25Using Excel to Selecta Simple Random Sample
Note Rows 10-901 are not shown.
26Using Excel to Selecta Simple Random Sample
Note Rows 10-901 are not shown.
27Example UCF
- Point Estimates
- as Point Estimator of ?
- s as Point Estimator of ?
- as Point Estimator of p
- Note Different random numbers would have
- identified a different sample which would have
resulted in different point estimates.
28Summary of Point Estimates Obtained from a Simple
Random Sample
Population Parameter
Point Estimator
Point Estimate
Parameter Value
m Population mean SAT score
990
997
80
s Sample std. deviation for SAT
score
75.2
s Population std. deviation for
SAT score
.72
.68
p Population pro- portion wanting
campus housing
29Sampling Distribution of
- Process of Statistical Inference
Population with mean m ?
A simple random sample of n elements is
selected from the population.
30Sampling Distribution of
- The sampling distribution of is the
probability distribution of all possible values
of the sample - mean .
- Expected Value of
- E( ) ?
-
- where
- ? the population mean
- For this class ALWAYS INFINITE
31Sampling Distribution of
- Standard Deviation of
- FOR THIS CLASS ALWAYS INFINITE POPULATION
- is referred to as the standard error of the mean.
32Example
- Let us assume a population of N5
- We can have 5 possible sample of size 4(order is
not important).
33Continued
34Central Limit Theorem
- In selecting a random sample of size n from a
population, the sampling distribution of the
sample mean can be approximated by a
normal probability distribution as the sample
size becomes large - The sampling distribution of can be
approximated by a normal probability distribution
whenever the sample size is large. The
large-sample condition can be assumed for simple
random samples of size 30 or more - Whenever the population has a normal probability
distribution, the sampling distribution of
has a normal probability distribution for any
sample size - Please go to www.ruf.rice.edu./lane/rvls.html
35Sampling Distribution of
- If we use a large (n gt 30) simple random sample,
the central limit theorem enables us to conclude
that the sampling distribution of can be
approximated by a normal probability
distribution. - When the simple random sample is small (n lt 30),
the sampling distribution of can be
considered normal only if we assume the
population has a normal probability distribution.
36Example UCF
- Sampling Distribution of for the SAT Scores
37Example UCF BUSINESS STUDENTS
- Sampling Distribution of for the SAT
Scores - What is the probability that a simple random
sample of 30 applicants will provide an estimate
of the population mean SAT score that is within
plus or minus 10 of the actual population mean ?
? - In other words, what is the probability that
will be between 980 and 1000?
38Example UCF
- Sampling Distribution of for the SAT Scores
-
- Using the standard normal probability table with
- z 10/14.6 .68, we have area (.2518)(2)
.5036
Sampling distribution of
Area .2518
Area .2518
1000
980
990
39- Suppose we select a simple random sample of
100 - applicants instead of the 30 originally
considered.
40(No Transcript)
41(No Transcript)
42Area .7888
1000
980
990
43Sampling Distribution of
- The sampling distribution of is the
probability distribution of all possible values
of the sample proportion - Expected Value of
-
- where
- p the population proportion
44Sampling Distribution of
- Standard Deviation of Infinite
Population ONLY -
-
- is referred to as the standard error of the
proportion.
45np gt 5
n(1 p) gt 5
and
46Example UCF
- Sampling Distribution of for In-State
Residents - Recall that 72 of the prospective students
applying to UCF desire on-campus housing. - What is the probability that a simple random
sample of 30 applicants will provide an estimate
of the population proportion of applicants
desiring on-campus housing that is within plus or
minus .05 of the actual population proportion? - In other words, what is the probability that
- will be between .67 and .77?
47-
- For our example, with n 30 and p .72, the
normal distribution is an acceptable
approximation because
np 30(.72) 21.6 gt 5
and
n(1 - p) 30(.28) 8.4 gt 5
48Example UCF
- Sampling Distribution of for In-State
Residents -
-
49ExampleUCF
- Sampling Distribution of for In-State
Residents -
-
-
- For z .05/.082 .61, the area (.2291)(2)
.4582. - The probability is .4582 that the sample
proportion will be within /-.05 of the actual
population proportion.
Sampling distribution of
Area .2291
Area .2291
0.77
0.67
0.72
50Properties of Point Estimators
- Before using a sample statistic as a point
estimator, statisticians check to see whether the
sample statistic has the following properties
associated with good point estimators. - Unbiasedness
- Efficiency
- Consistency
51Properties of Point Estimators
- Unbiasedness
- If the expected value of the sample statistic
is equal to the population parameter being
estimated, the sample statistic is said to be an
unbiased estimator of the population parameter.
52Properties of Point Estimators
- Efficiency
- Given the choice of two unbiased estimators of
the same population parameter, we would prefer to
use the point estimator with the smaller standard
deviation, since it tends to provide estimates
closer to the population parameter. - The point estimator with the smaller standard
deviation is said to have greater relative
efficiency than the other.
53Properties of Point Estimators
- Consistency
- A point estimator is consistent if the values
of the point estimator tend to become closer to
the population parameter as the sample size
becomes larger.
54Other Sampling Methods
- Stratified Random Sampling
- Cluster Sampling
- Systematic Sampling
- Convenience Sampling
- Judgment Sampling
55Stratified Random Sampling
The population is first divided into groups of
elements called strata.
Each element in the population belongs to one
and only one stratum.
Best results are obtained when the elements
within each stratum are as much alike as
possible (i.e. a homogeneous group).
56Stratified Random Sampling
A simple random sample is taken from each
stratum.
Formulas are available for combining the
stratum sample results into one population
parameter estimate.
Advantage If strata are homogeneous, this
method is as precise as simple random sampling
but with a smaller total sample size.
Example The basis for forming the strata might
be department, location, age, industry type, and
so on.
57Cluster Sampling
The population is first divided into separate
groups of elements called clusters.
Ideally, each cluster is a representative
small-scale version of the population (i.e.
heterogeneous group).
A simple random sample of the clusters is then
taken.
All elements within each sampled (chosen)
cluster form the sample.
58Cluster Sampling
Example A primary application is area
sampling, where clusters are city blocks or
other well-defined areas.
Advantage The close proximity of elements can
be cost effective (i.e. many sample observations
can be obtained in a short time).
Disadvantage This method generally requires a
larger total sample size than simple or
stratified random sampling.
59Systematic Sampling
- If a sample size of n is desired from a
population containing N elements, we might sample
one element for every n/N elements in the
population. - We randomly select one of the first n/N elements
from the population list. - We then select every n/Nth element that follows
in the population list. - This method has the properties of a simple random
sample, especially if the list of the population
elements is a random ordering. -
60Systematic Sampling
- Advantage The sample usually will be easier to
identify than it would be if simple random
sampling were used. - Example Selecting every 100th listing in a
telephone book after the first randomly selected
listing.
61Convenience Sampling
- It is a nonprobability sampling technique. Items
are included in the sample without known
probabilities of being selected. - The sample is identified primarily by
convenience. - Advantage Sample selection and data collection
are relatively easy. - Disadvantage It is impossible to determine how
representative of the population the sample is. - Example A professor conducting research might
use student volunteers to constitute a sample.
62Judgment Sampling
- The person most knowledgeable on the subject of
the study selects elements of the population that
he or she feels are most representative of the
population. - It is a nonprobability sampling technique.
- Advantage It is a relatively easy way of
selecting a sample. - Disadvantage The quality of the sample results
depends on the judgment of the person selecting
the sample. - Example A reporter might sample three or four
senators, judging them as reflecting the general
opinion of the senate.
63END LESSON 1