Title: Economics 173 Business Statistics
1Economics 173Business Statistics
- Lectures 3 4
- Summer, 2001
- Professor J. Petry
2Introduction to Estimation
39.1 Introduction
- Statistical inference is the process by which we
acquire information about populations from
samples. - There are two procedures for making inferences
- Estimation.
- Hypotheses testing.
49.2 Concepts of Estimation
- The objective of estimation is to determine the
value of a population parameter on the basis of a
sample statistic. - There are two types of estimators
- Point Estimator
- Interval estimator
5Point Estimator
- A point estimator draws inference about a
population by estimating the value of an unknown
parameter using a single value or a point.
6- Point Estimator
- A point estimator draws inference about a
population by estimating the value of an unknown
parameter using a single value or a point.
Parameter
Population distribution
?
Sample distribution
Point estimator
7Interval Estimator
- An interval estimator draws inferences about a
population by estimating the value of an unknown
parameter using an interval. - The interval estimator is affected by the sample
size.
Interval estimator
89.3 Estimating the Population Mean when the
Population Standard Deviation is Known
- How is an interval estimator produced from a
sampling distribution? - To estimate m, a sample of size n is drawn from
the population, and its mean is calculated. - Under certain conditions, is normally
distributed (or approximately normally
distributed.), thus
9- This leads to the relationship
101 - a
Upper confidence limit
Lower confidence limit
See simulation results demonstrating this point
11- The confidence interval are correct most, but
not all, of the time.
UCL
LCL
Not all the confidence intervals cover the real
expected value of 100.
100
0
The selected confidence level is 90, and 10 out
of 100 intervals do not cover the real m.
12- Four commonly used confidence levels
The mean values obtained in repeated draws of
samples of size 100 result in interval
estimators of the form sample mean - .28,
Sample mean .28 90 of which cover the real
mean of the distribution.
za/2
13- Recalculate the confidence interval for 95
confidence level. -
- Solution
- The width of the 90 confidence interval
2(.28) .56 - The width of the 95 confidence interval
2(.34) .68 - Because the 95 confidence interval is wider,
it is more likely to include the value of m.
.95
.90
14- Example 9.1
- The number and the types of television programs
and commercials targeted at children is affected
by the amount of time children watch TV. - A survey was conducted among 100 North American
children, in which they were asked to record the
number of hours they watched TV per week. - The population standard deviation of TV watch was
known to be s 8.0 - Estimate the watch time with 95 confidence
level.
15- The parameter to be estimated is m, the mean time
of TV watch per week per child (of all American
Children). - We need to compute the interval estimator for m.
- From the data provided in file XM09-01, the
sample mean is
Since 1 - a .95, a .05. Thus a/2 .025.
Z.025 1.96
16- Interpreting the interval estimate
- It is wrong to state that the interval
estimator is an interval for which there is 1 - a
chance that the population mean lies between the
LCL and the UCL. - This is so because the m is a parameter, not a
random variable.
17- LCL, UCL and the sample mean are the random
variables, m is a parameter, NOT a random
variable. - Thus, it is correct to state that there is 1 - a
chance that LCL will be less than m and UCL will
be greater than m.
18- Example 9.2
- To lower inventory costs, the Doll Computer
company wants to employ an inventory model. - Lead time demand is normally distributed with
standard deviation of 50 computers. - It is required to know the mean in order to
calculate optimum inventory levels. - Estimate the mean demand during lead time with
95 confidence.
19- Solution
- The parameter to be estimated is m.The interval
estimator is - Demand during 60 lead times is recorded514, 525,
., 476. - The sample mean is calculated
- The 95 confidence interval is
20Information and the Width of the Interval
- Wide interval estimator provides little
information.
Where is m ?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
Ahaaa!
Here is a much smaller interval. If the
confidence level remains unchanged, the smaller
interval provides more meaningful information.
21- The width of the interval estimate is a function
of - the population standard deviation
- the confidence level
- the sample size.
22Suppose the standard deviation has increased by
50.
90 Confidence level
To maintain a certain level of confidence, changin
g to a larger standard deviation requires a
longer confidence interval.
23Let us increase the confidence level from 90
to 95.
Increasing the confidence level produces a wider
interval
90 Confidence level
95
There is an inverse relationship between the
width of the interval and the sample size
Increasing the sample size decreases the width
of the interval estimate while the confidence
level can remain unchanged.
249.4 Selecting the Sample size
- We can control the width of the interval estimate
by changing the sample size. - Thus, we determine the interval width first, and
derive the required sample size. - The phrase estimate the mean to within W units,
translates to an interval estimate of the form
25- The required sample size to estimate the mean is
- Example 9.3
- To estimate the amount of lumber that can be
harvested in a tract of land, the mean diameter
of trees in the tract must be estimated to within
one inch with 99 confidence. - What sample size should be taken? (assume
diameters are normally distributed with s 6
inches.
26- Solution
- The estimate accuracy is /-1 inch. That is w
1. - The confidence level 99 leads to a .01, thus
za/2 z.005 2.575. - We compute
27Introduction to Hypothesis Testing
2810.1 Introduction
- The purpose of hypothesis testing is to determine
whether there is enough statistical evidence in
favor of a certain belief about a parameter. - Examples
- Is there statistical evidence in a random sample
of potential customers, that support the
hypothesis that more than p of the potential
customers will purchase a new products? - Is a new drug effective in curing a certain
disease? A sample of patient is randomly
selected. Half of them are given the drug where
half are given a placebo. The improvement in the
patients conditions is then measured and compared.
2910.2 Concept of hypothesis testing
- The critical concepts of hypothesis testing.
- There are two hypotheses (about a population
parameter(s)) - H0 - the null hypothesis for example
m 5 - H1 - the alternative hypothesis m gt 5
This is what you want to prove
- Assume the null hypothesis is true.
- Build a statistic related to the parameter
hypothesized. - Pose the question How probable is it to obtain a
statistic value at least as extreme as the one
observed from the sample?
m 5
30- Continued
- Make one of the following two decisions (based on
the test) - Reject the null hypothesis in favor of the
alternative hypothesis. - Do not reject the null hypothesis in favor of the
alternative hypothesis.
- Two types of errors are possible when making the
decision whether to reject H0 - Type I error - reject H0 when it is true.
- Type II error - do not reject H0 when it is false.
3110.3 Testing the Population Mean When the
Population Standard Deviation is Known
- Example 10.1
- A new billing system for a department store will
be cost- effective only if the mean monthly
account is more than 170. - A sample of 400 monthly accounts has a mean of
178. - If the account are approximately normally
distributed with s 65, can we conclude that
the new system will be cost effective?
32- Solution
- The population of interest is the credit accounts
at the store. - We want to show that the mean account for all
customers is greater than 170.
H1 m gt 170
- The null hypothesis must specify a single value
of the parameter m
H0 m 170
33- Is a sample mean of 178 sufficiently greater
than 170 to infer that the population mean is
greater than 170?
34The rejection region method
The rejection region is a range of values such
that if the test statistic falls into that range,
the null hypothesis is rejected in favor of the
alternative hypothesis.
35The Rejection region is
Do no reject the null hypothesis
Reject the null hypothesis
36The Rejection region is
a
Reject the null hypothesis here
a P(commit a type I error) P(reject H0 given
that H0 is true)
37The Rejection region is
a
0.05
38The rejection region is
Conclusion Since the sample mean (178) is greater
than the critical value of 175.34, there is
sufficient evidence in the sample to reject H0 in
favor of H1, at 5 significance level.
178
39The standardized test statistic
- Instead of using the statistic , we can use
the standardized value z. - Then, the rejection region becomes
One tail test
40- Example 10.1 - continued
- We redo this example using the standardized test
statistic. - H0 m 170
- H1 m gt 170
- Test statistic
- Rejection region z gt z.05 1.645.
- Conclusion Since 2.46 gt 1.645, reject the null
hypothesis in favor of the alternative
hypothesis.
41P-value method
- The p - value provides information about the
amount of statistical evidence that supports the
alternative hypothesis.
42The probability of observing a test statistic at
least as extreme as 178, given that the null
hypothesis is true is
The p-value
43- Interpreting the p-value
- Because the probability that the sample mean will
assume a value of more than 178 when m 170 is
so small (.0069), there are reasons to believe
that m gt 170.
We can conclude that the smaller the p-value
the more statistical evidence exists to support
the alternative hypothesis.
44- Describing the p-value
- If the p-value is less than 1, there is
overwhelming evidence that support the
alternative hypothesis.
- If the p-value is between 1 and 5, there is a
strong evidence that supports the alternative
hypothesis. - If the p-value is between 5 and 10 there is a
weak evidence that supports the alternative
hypothesis. - If the p-value exceeds 10, there is no evidence
that supports of the alternative hypothesis.
45- The p-value and rejection region methods
- The p-value can be used when making decisions
based on rejection region methods as follows - Define the hypotheses to test, and the required
significance level a. - Perform the sampling procedure, calculate the
test statistic and the p-value associated with
it. - Compare the p-value to a. Reject the null
hypothesis only if p lta otherwise, do not reject
the null hypothesis.
46Conclusions of a test of Hypothesis
- If we reject the null hypothesis, we conclude
that there is enough evidence to infer that the
alternative hypothesis is true. - If we do not reject the null hypothesis, we
conclude that there is not enough statistical
evidence to infer that the alternative
hypothesis is true.
The alternative hypothesis is the more
important one. It represents what we are
investigating.
47- Example 10.2
- A government inspector samples 25 bottles of
catsup labeled Net weight 16 ounces, and
records their weights. - From previous experience it is known that the
weights are normally distributed with a standard
deviation of 0.4 ounces. - Can the inspector conclude that the product
label is unacceptable?
48- Solution
- We need to draw a conclusion about the mean
weights of all the catsup bottles. - We investigate whether the mean weight is less
than 16 ounces (bottle label is unacceptable).
H0 m 16
Then
H1 m lt 16
- Select a significance level
- a 0.05
- Define the rejection region
- z lt - za -1.645
One tail test
49 we want this
mistake to happen not more than 5 of the time.
16
A sample mean far below 16, should be a rare
event if m 16.
-za -1.645
50Since the value of the test statistic does not
fall in the rejection region, we do not reject
the null hypothesis in favor of the alternative
hypothesis.
There is insufficient evidence to infer that the
mean is less than 16 ounces.
The p-value P(Z lt - 1.25) .1056 gt .05
0
-za -1.645
51- Example 10.3
- The amount of time required to complete a
critical part of a production process on an
assembly line is normally distributed. The mean
was believed to be 130 seconds. - To test if this belief is correct, a sample of
100 randomly selected assemblies was drawn, and
the processing time recorded. The sample mean was
126.8 seconds. - If the process time is really normal with a
standard deviation of 15 seconds, can we conclude
that the belief regarding the mean is incorrect?
52- Solution
- Is the mean different than 130?
H0 m 130
Then
- Define the rejection region
- z lt - za/2 or z gt za/2
53we want this mistake to happen not more than 5
of the time.
130
A sample mean far below 130 or far above 130,
should be a rare event if m 130.
54Since the value of the test statistic falls in
the rejection region, we reject the null
hypothesis in favor of the alternative
hypothesis.
There is sufficient evidence to infer that the
mean is not 130.
The p-value P(Z lt - 2.13)P(Z gt 2.13)
2(.0166) .0332 lt .05
a/2 0.025
a/2 0.025
0
-2.13
2.13
55Testing hypotheses and intervals estimators
- Interval estimators can be used to test
hypotheses. - Calculate the 1 - a confidence level interval
estimator, then - if the hypothesized parameter value falls within
the interval, do not reject the null hypothesis,
while - if the hypothesized parameter value falls outside
the interval, conclude that the null hypothesis
can be rejected (m is not equal to the
hypothesized value).
56- Drawbacks
- Two-tail interval estimators may not provide the
right answer to the question posed in one-tail
hypothesis tests. - The interval estimator does not yield a p-value.
There are cases where only tests produce the
information needed to make decisions.
57Calculating the Probability of a Type II Error
- To properly interpret the results of a test of
hypothesis, we need to - specify an appropriate significance level or
judge the p-value of a test - understand the relationship between Type I and
Type II errors. - How do we compute a type II error?
58- Calculation of a type II error requires that
- the rejection region be expressed directly, in
terms of the parameter hypothesized (not
standardized). - the alternative value (under H1) be specified.
H0 m m0 H1 m m1 (m0 is not equal to m1)
m m0
59- Let us revisit example 10.1
- The rejection region was with
a .05. - A type II error occurs when a false H0 is not
rejected.
Do not reject H0
m0 170
175.34
but H0 is false
m1 180
175.34
60- Effects on b of changing a
- Decreasing the significance level a, increases
thethe value of b, and vice versa.
a1
gt a2
b1
lt b2
61- Judging the test
- A hypothesis test is effectively defined by the
significance level a and by the the sample size
n. - If the probability of a type II error b is judged
to be too large, we can reduce it by - increasing a, and/or
- increasing the sample size.
62a
b1
gt b2
As a result b decreases
- In example 10.1, suppose n increases from 400 to
1000.
63- In summary,
- By increasing the sample size, we reduce the
probability of type II error. - Hence, we shall accept the null hypothesis when
it is false less frequently. - Power of a test
- The power of a test is defined as 1 - b.
- It represents the probability to reject the null
hypothesis when it is false.