Economics 173 Business Statistics

About This Presentation

Title:

Economics 173 Business Statistics

Description:

Statistical inference is the process by which we acquire information about ... types of television programs and commercials targeted at children is affected by ... – PowerPoint PPT presentation

Number of Views:16

Avg rating:3.0/5.0

Slides: 64

Provided by: sba461

Learn more at: http://www.econ.uiuc.edu

Category:

more less

Transcript and Presenter's Notes

Title: Economics 173 Business Statistics

1
Economics 173Business Statistics

Lectures 3 4
Summer, 2001
Professor J. Petry

2
Introduction to Estimation

Chapter 9

3
9.1 Introduction

Statistical inference is the process by which we
acquire information about populations from
samples.
There are two procedures for making inferences
Estimation.
Hypotheses testing.

4
9.2 Concepts of Estimation

The objective of estimation is to determine the
value of a population parameter on the basis of a
sample statistic.
There are two types of estimators
Point Estimator
Interval estimator

5
Point Estimator

A point estimator draws inference about a
population by estimating the value of an unknown
parameter using a single value or a point.

Point Estimator
A point estimator draws inference about a
population by estimating the value of an unknown
parameter using a single value or a point.

Parameter
Population distribution
?
Sample distribution
Point estimator
7
Interval Estimator

An interval estimator draws inferences about a
population by estimating the value of an unknown
parameter using an interval.
The interval estimator is affected by the sample
size.

Interval estimator
8
9.3 Estimating the Population Mean when the
Population Standard Deviation is Known

How is an interval estimator produced from a
sampling distribution?
To estimate m, a sample of size n is drawn from
the population, and its mean is calculated.
Under certain conditions, is normally
distributed (or approximately normally
distributed.), thus

We know that

This leads to the relationship

10
1 - a
Upper confidence limit
Lower confidence limit
See simulation results demonstrating this point
11

The confidence interval are correct most, but
not all, of the time.

UCL
LCL
Not all the confidence intervals cover the real
expected value of 100.
100
0
The selected confidence level is 90, and 10 out
of 100 intervals do not cover the real m.
12

Four commonly used confidence levels

The mean values obtained in repeated draws of
samples of size 100 result in interval
estimators of the form sample mean - .28,
Sample mean .28 90 of which cover the real
mean of the distribution.
za/2
13

Recalculate the confidence interval for 95
confidence level.
Solution

The width of the 90 confidence interval
2(.28) .56
The width of the 95 confidence interval
2(.34) .68
Because the 95 confidence interval is wider,
it is more likely to include the value of m.

.95
.90
14

Example 9.1
The number and the types of television programs
and commercials targeted at children is affected
by the amount of time children watch TV.
A survey was conducted among 100 North American
children, in which they were asked to record the
number of hours they watched TV per week.
The population standard deviation of TV watch was
known to be s 8.0
Estimate the watch time with 95 confidence
level.

Solution

The parameter to be estimated is m, the mean time
of TV watch per week per child (of all American
Children).
We need to compute the interval estimator for m.
From the data provided in file XM09-01, the
sample mean is

Since 1 - a .95, a .05. Thus a/2 .025.
Z.025 1.96
16

Interpreting the interval estimate
It is wrong to state that the interval
estimator is an interval for which there is 1 - a
chance that the population mean lies between the
LCL and the UCL.
This is so because the m is a parameter, not a
random variable.

LCL, UCL and the sample mean are the random
variables, m is a parameter, NOT a random
variable.
Thus, it is correct to state that there is 1 - a
chance that LCL will be less than m and UCL will
be greater than m.

Example 9.2
To lower inventory costs, the Doll Computer
company wants to employ an inventory model.
Lead time demand is normally distributed with
standard deviation of 50 computers.
It is required to know the mean in order to
calculate optimum inventory levels.
Estimate the mean demand during lead time with
95 confidence.

Solution
The parameter to be estimated is m.The interval
estimator is
Demand during 60 lead times is recorded514, 525,
., 476.
The sample mean is calculated
The 95 confidence interval is

20
Information and the Width of the Interval

Wide interval estimator provides little
information.

Where is m ?

?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
Ahaaa!
Here is a much smaller interval. If the
confidence level remains unchanged, the smaller
interval provides more meaningful information.
21

The width of the interval estimate is a function
of
the population standard deviation
the confidence level
the sample size.

22
Suppose the standard deviation has increased by
50.
90 Confidence level
To maintain a certain level of confidence, changin
g to a larger standard deviation requires a
longer confidence interval.
23
Let us increase the confidence level from 90
to 95.
Increasing the confidence level produces a wider
interval
90 Confidence level
95
There is an inverse relationship between the
width of the interval and the sample size
Increasing the sample size decreases the width
of the interval estimate while the confidence
level can remain unchanged.
24
9.4 Selecting the Sample size

We can control the width of the interval estimate
by changing the sample size.
Thus, we determine the interval width first, and
derive the required sample size.
The phrase estimate the mean to within W units,
translates to an interval estimate of the form

The required sample size to estimate the mean is
Example 9.3
To estimate the amount of lumber that can be
harvested in a tract of land, the mean diameter
of trees in the tract must be estimated to within
one inch with 99 confidence.
What sample size should be taken? (assume
diameters are normally distributed with s 6
inches.

Solution
The estimate accuracy is /-1 inch. That is w
1.
The confidence level 99 leads to a .01, thus
za/2 z.005 2.575.
We compute

27
Introduction to Hypothesis Testing

Chapter 10

28
10.1 Introduction

The purpose of hypothesis testing is to determine
whether there is enough statistical evidence in
favor of a certain belief about a parameter.
Examples
Is there statistical evidence in a random sample
of potential customers, that support the
hypothesis that more than p of the potential
customers will purchase a new products?
Is a new drug effective in curing a certain
disease? A sample of patient is randomly
selected. Half of them are given the drug where
half are given a placebo. The improvement in the
patients conditions is then measured and compared.

29
10.2 Concept of hypothesis testing

The critical concepts of hypothesis testing.
There are two hypotheses (about a population
parameter(s))
H0 - the null hypothesis for example
m 5
H1 - the alternative hypothesis m gt 5

This is what you want to prove

Assume the null hypothesis is true.

Build a statistic related to the parameter
hypothesized.
Pose the question How probable is it to obtain a
statistic value at least as extreme as the one
observed from the sample?

m 5
30

Continued
Make one of the following two decisions (based on
the test)
Reject the null hypothesis in favor of the
alternative hypothesis.
Do not reject the null hypothesis in favor of the
alternative hypothesis.

Two types of errors are possible when making the
decision whether to reject H0
Type I error - reject H0 when it is true.
Type II error - do not reject H0 when it is false.

31
10.3 Testing the Population Mean When the
Population Standard Deviation is Known

Example 10.1
A new billing system for a department store will
be cost- effective only if the mean monthly
account is more than 170.
A sample of 400 monthly accounts has a mean of
178.
If the account are approximately normally
distributed with s 65, can we conclude that
the new system will be cost effective?

Solution
The population of interest is the credit accounts
at the store.
We want to show that the mean account for all
customers is greater than 170.

H1 m gt 170

The null hypothesis must specify a single value
of the parameter m

H0 m 170
33

Is a sample mean of 178 sufficiently greater
than 170 to infer that the population mean is
greater than 170?

34
The rejection region method
The rejection region is a range of values such
that if the test statistic falls into that range,
the null hypothesis is rejected in favor of the
alternative hypothesis.
35
The Rejection region is
Do no reject the null hypothesis
Reject the null hypothesis
36
The Rejection region is
a
Reject the null hypothesis here
a P(commit a type I error) P(reject H0 given
that H0 is true)

37
The Rejection region is
a
0.05
38
The rejection region is
Conclusion Since the sample mean (178) is greater
than the critical value of 175.34, there is
sufficient evidence in the sample to reject H0 in
favor of H1, at 5 significance level.
178
39
The standardized test statistic

Instead of using the statistic , we can use
the standardized value z.
Then, the rejection region becomes

One tail test
40

Example 10.1 - continued
We redo this example using the standardized test
statistic.
H0 m 170
H1 m gt 170
Test statistic
Rejection region z gt z.05 1.645.
Conclusion Since 2.46 gt 1.645, reject the null
hypothesis in favor of the alternative
hypothesis.

41
P-value method

The p - value provides information about the
amount of statistical evidence that supports the
alternative hypothesis.

42
The probability of observing a test statistic at
least as extreme as 178, given that the null
hypothesis is true is
The p-value
43

Interpreting the p-value
Because the probability that the sample mean will
assume a value of more than 178 when m 170 is
so small (.0069), there are reasons to believe
that m gt 170.

We can conclude that the smaller the p-value
the more statistical evidence exists to support
the alternative hypothesis.
44

Describing the p-value
If the p-value is less than 1, there is
overwhelming evidence that support the
alternative hypothesis.

If the p-value is between 1 and 5, there is a
strong evidence that supports the alternative
hypothesis.
If the p-value is between 5 and 10 there is a
weak evidence that supports the alternative
hypothesis.
If the p-value exceeds 10, there is no evidence
that supports of the alternative hypothesis.

The p-value and rejection region methods
The p-value can be used when making decisions
based on rejection region methods as follows
Define the hypotheses to test, and the required
significance level a.
Perform the sampling procedure, calculate the
test statistic and the p-value associated with
it.
Compare the p-value to a. Reject the null
hypothesis only if p lta otherwise, do not reject
the null hypothesis.

46
Conclusions of a test of Hypothesis

If we reject the null hypothesis, we conclude
that there is enough evidence to infer that the
alternative hypothesis is true.
If we do not reject the null hypothesis, we
conclude that there is not enough statistical
evidence to infer that the alternative
hypothesis is true.

The alternative hypothesis is the more
important one. It represents what we are
investigating.
47

Example 10.2
A government inspector samples 25 bottles of
catsup labeled Net weight 16 ounces, and
records their weights.
From previous experience it is known that the
weights are normally distributed with a standard
deviation of 0.4 ounces.
Can the inspector conclude that the product
label is unacceptable?

Solution
We need to draw a conclusion about the mean
weights of all the catsup bottles.
We investigate whether the mean weight is less
than 16 ounces (bottle label is unacceptable).

H0 m 16
Then
H1 m lt 16

Select a significance level
a 0.05

Define the rejection region
z lt - za -1.645

One tail test
49
we want this
mistake to happen not more than 5 of the time.
16
A sample mean far below 16, should be a rare
event if m 16.
-za -1.645
50
Since the value of the test statistic does not
fall in the rejection region, we do not reject
the null hypothesis in favor of the alternative
hypothesis.
There is insufficient evidence to infer that the
mean is less than 16 ounces.
The p-value P(Z lt - 1.25) .1056 gt .05
0
-za -1.645
51

Example 10.3
The amount of time required to complete a
critical part of a production process on an
assembly line is normally distributed. The mean
was believed to be 130 seconds.
To test if this belief is correct, a sample of
100 randomly selected assemblies was drawn, and
the processing time recorded. The sample mean was
126.8 seconds.
If the process time is really normal with a
standard deviation of 15 seconds, can we conclude
that the belief regarding the mean is incorrect?

Solution
Is the mean different than 130?

H0 m 130
Then

Define the rejection region
z lt - za/2 or z gt za/2

53
we want this mistake to happen not more than 5
of the time.
130
A sample mean far below 130 or far above 130,
should be a rare event if m 130.
54
Since the value of the test statistic falls in
the rejection region, we reject the null
hypothesis in favor of the alternative
hypothesis.
There is sufficient evidence to infer that the
mean is not 130.
The p-value P(Z lt - 2.13)P(Z gt 2.13)
2(.0166) .0332 lt .05
a/2 0.025
a/2 0.025
0
-2.13
2.13
55
Testing hypotheses and intervals estimators

Interval estimators can be used to test
hypotheses.
Calculate the 1 - a confidence level interval
estimator, then
if the hypothesized parameter value falls within
the interval, do not reject the null hypothesis,
while
if the hypothesized parameter value falls outside
the interval, conclude that the null hypothesis
can be rejected (m is not equal to the
hypothesized value).

Drawbacks
Two-tail interval estimators may not provide the
right answer to the question posed in one-tail
hypothesis tests.
The interval estimator does not yield a p-value.

There are cases where only tests produce the
information needed to make decisions.
57
Calculating the Probability of a Type II Error

To properly interpret the results of a test of
hypothesis, we need to
specify an appropriate significance level or
judge the p-value of a test
understand the relationship between Type I and
Type II errors.
How do we compute a type II error?

Calculation of a type II error requires that
the rejection region be expressed directly, in
terms of the parameter hypothesized (not
standardized).
the alternative value (under H1) be specified.

H0 m m0 H1 m m1 (m0 is not equal to m1)
m m0
59

Let us revisit example 10.1
The rejection region was with
a .05.
A type II error occurs when a false H0 is not
rejected.

Do not reject H0
m0 170
175.34
but H0 is false
m1 180
175.34
60

Effects on b of changing a
Decreasing the significance level a, increases
thethe value of b, and vice versa.

a1
gt a2
b1
lt b2
61

Judging the test
A hypothesis test is effectively defined by the
significance level a and by the the sample size
n.
If the probability of a type II error b is judged
to be too large, we can reduce it by
increasing a, and/or
increasing the sample size.

62
a
b1
gt b2
As a result b decreases