Chapter 2 The Forecast Process, Data Considerations, and Model Selection PowerPoint PPT Presentation

presentation player overlay
1 / 76
About This Presentation
Transcript and Presenter's Notes

Title: Chapter 2 The Forecast Process, Data Considerations, and Model Selection


1
Chapter 2 The Forecast Process, Data
Considerations, and Model Selection
2
The Forecast Process
  • Specify Objectives
  • Determine What to Forecast
  • Identify Time Dimensions
  • Data Considerations
  • Model Selection
  • Model Evaluation
  • Forecast Presentation
  • Tracking Results

3
A Quick Example
  • You are a local government official looking at
    the infrastructure needs for Carroll County.
  • Specifically, there has been a lot of talk about
    the water supply and whether the source we
    currently have is enough.
  • You need to figure out if it is supply is going
    to be a problem.

4
Specify Objectives
  • Why is the forecast important?
  • Carroll County needs forecasts for water
  • demand for peak demand periods and for growth
  • What is needed to help determine future water
    policy?
  • A Long-range forecast
  • -to determine future capacity needs
    (reservoir)
  • A Short-range forecast
  • -to decide when to restrict current water
    usage (drought)

5
2. Determine What to Forecast
  • industrial demand?
  • commercial demand?
  • residential demand?
  • total demand
  • Sometimes, what you forecast depends on data
    availability.

6
Identify Time Dimensions (two in particular)
  • Periodicity -length of time period
  • long-range forecast years
  • short-range forecast days, weeks, or months
  • Lead time -how far in advance forecast must
    be available
  • long-range forecast 10 years ahead
  • short-range forecast 2 weeks ahead

7
4. Data Considerations
  • What is available internally?
  • Demand for Water
  • long range millions of gallons per year
  • short range millions (or thousands) of gallons
    per day
  • Number of residential, industrial customers
  • Residential rates, industrial rates (why do rates
    matter?)
  • What must be obtained from external sources?
  • long range annual population
  • annual employment by industry
  • short range daily high temperature
  • daily rainfall
  • flow capacities from groundwater sources

8
5. Model Selection
  • Options for forecasting approach
  • depend on
  • Pattern exhibited by the data
  • Quantity of historic data available
  • Cost of acquiring available data

9
Data Patterns
  • Trend
  • level varies over time due to changes in income,
    population, relative prices, etc.
  • positive trend - level increases over time
  • negative trend - level decreases over time
  • stationary data -level doesnt change over time
  • Remember, some forecasting models that are
    appropriate for stationary data produce biased
    results when there is a trend in the data.

10
Positive Trend(Income)
11
Negative Trend(apparel employment)
12
Data Patterns
  • Seasonal Pattern
  • level changes at same time of year, month, week,
    day, etc.
  • Examples retail sales,
  • electricity demand,
  • water demand,
  • building permits
  • rainfall

13
Seasonal Pattern
14
Data Patterns
  • Cyclical Pattern (dont confuse cyclical with
    seasonal)
  • level moves with business cycle
  • pro cyclical - level ? as economic activity ?
  • sales of normal goods, employment, interest rates
  • counter cyclical - level ? as economic activity
    ?
  • sales of inferior goods, bankruptcies,
    unemployment (sometimeswhy sometimes)

15
Cyclical Pattern (pro)
16
Cyclical Pattern(countersometimes, why only
sometimes?!)
17
Chart 2.1 from book is a good starting point
18
5. Model Selection Continued
  • Number of options for appropriate forecasting
    models also depends on quantity of historic data
    available.
  • How far back does each time series go?
  • Rule-of-thumb for multiple regression models -
  • For statistically significant results, generally
    need 10 time periods for each explanatory
    variable
  • Example 40 years of annual data supports only 4
    explanatory variables

19
6. Model Evaluation (Diagnostics)
  • Need annual forecast for next 5 years and we have
    50 years of data for all variables. Then
    approach for evaluating different models is to
  • Estimate different models using use 1st 45 years
    of data.
  • Use results to develop forecasts for most recent
    5 years.
  • Select most accurate model (lowest RMSE) for last
    5 years.
  • Re-estimate selected model using the full 50
    years of data.
  • Use results to generate forecast for next 5
    years.

20
7. Presentation
  • Communicating your results to the relevant
    parties, taking into consideration their level of
    sophistication.
  • Presentation can be
  • formal report
  • presentation
  • memo
  • phone call
  • Ora combination

21
8. Tracking Results
  • Monitoring accuracy of the model over time.
  • Determining if the model needs changing.
  • Trying other approaches.

22
Pause to Catch Breath and Change Gears a bit
23
Chapter 2 Review of Stats and Hypothesis Testing
  • Basic Descriptive Stats
  • Developing Null and Alternative Hypotheses
  • Type I and Type II Errors
  • One-Tailed Tests About a Population Mean
  • Large-Sample Case
  • Two-Tailed Tests About a Population Mean
  • Large-Sample Case

24
Basic Descriptive Stats
Mean
25
Developing Null and Alternative Hypotheses
  • Hypothesis testing can be used to determine
    whether
  • a statement about the value of a population
    parameter
  • should or should not be rejected.
  • The null hypothesis, denoted by H0 , is a
    tentative
  • assumption about a population parameter.
  • The alternative hypothesis, denoted by Ha (or
    H1), is the
  • opposite of what is stated in the null
    hypothesis.
  • By construction, the alternative hypothesis is
    what the
  • test is attempting to establish.

26
Developing Null and Alternative Hypotheses (uses)
  • Testing Research Hypotheses
  • Hypothesis testing is proof by contradiction.
  • The research hypothesis should be expressed as
    the alternative
  • hypothesis.
  • The conclusion that the research hypothesis is
    true comes from
  • sample data that contradict the null
    hypothesis.
  • Testing the Validity of a Claim
  • Manufacturers claims are usually given the
    benefit of the
  • doubt and stated as the null hypothesis
    (innocent until proven guilty).
  • The conclusion that the claim is false comes
    from sample data
  • that contradict the null hypothesis.
  • Testing in Decision-Making Situations
  • A decision maker might have to choose between
    two courses
  • of action, one associated with the null
    hypothesis and another
  • associated with the alternative hypothesis.
  • Example Accepting a shipment of goods from a
    supplier or returning
  • the shipment of goods to the supplier.

27
Summary of Forms for Null and Alternative
Hypotheses about a Population Mean
  • The equality part of the hypotheses always
    appears
  • in the null hypothesis.
  • In general, a hypothesis test about the value
    of a
  • population mean ?? must take one of the
    following
  • three forms (where ?0 is the hypothesized
    value of
  • the population mean).

One-tailed (lower-tail)
One-tailed (upper-tail)
Two-tailed
28
Example Metro EMS
  • Null and Alternative Hypotheses
  • The director of medical services wants to
    formulate a hypothesis test that could use a
    sample of 40 emergency response times to
    determine whether or not the
  • service goal of 12 minutes or less
  • is being achieved.

29
Null and Alternative Hypotheses
H0 ??????
The emergency service is meeting the response
goal no follow-up action is necessary.
Ha????????
The emergency service is not meeting the response
goal appropriate follow-up action is necessary.
where ? mean response time for the
population of medical emergency requests
30
Type I and Type II Errors
  • Because hypothesis tests are based on sample
    data,
  • we must allow for the possibility of errors.
  • A Type I error is rejecting H0 when it is
    true.
  • The person conducting the hypothesis test
    specifies
  • the maximum allowable probability of making
    a
  • Type I error, denoted by ? and called the
    level of
  • significance, often 0.01, 0.05, or 0.10.
    The smaller the a
  • the larger the confidence.
  • Or goal is to minimize the Type I errors.

31
Type I and Type II Errors
  • A Type II error is accepting H0 when it is
    false.
  • It is difficult to control for the
    probability of making
  • a Type II error, denoted by ?. E.g., poor
    selection of
  • explainatory variables can easily lead to
    Type II errors.
  • Statisticians avoid the risk of making a Type
    II
  • error by using the phrase do not reject
    H0
  • and NOT accept H0. We NEVER accept, we
    just
  • fail to reject!...in the end, its just
    terminology.

32
Type I and Type II Errors
Population Condition
H0 True (m lt 12)
H0 False (m gt 12)
Conclusion
Correct Decision
Type II Error (Accepting a false Null)
Accept H0 (Conclude m lt 12)
Correct Decision
Type I Error (Rejecting a true Null)
Reject H0 (Conclude m gt 12)
33
Steps of Hypothesis Testing
1. Determine the null and alternative hypotheses.
2. Specify the level of significance ?.
3. Select the test statistic that will be used
to test the hypothesis.
Using the Test Statistic
4. Use ??to determine the critical value for the
test statistic and state the rejection rule
for H0.
5. Collect the sample data and compute the
value of the test statistic.
6. Use the value of the test statistic and the
rejection rule to determine whether to
reject H0.
34
One-Tailed Tests about a Population Mean
Large-Sample Case (n gt 30)
  • Hypotheses
  • Test Statistic
  • Rejection Rule

H0 ??????
H0 ??????
or
Ha????????
Ha????????
?? Known
? Unknown (or nlt30)
Reject H0 if z gt z??
Reject H0 if t gt t??
35
It should be noted
  • For our purposes, 30 degrees of freedom and the
    t-distribution begins to approximate the standard
    normal, however we almost never have a population
    statistics, so we calculate t-stats for inference
    and testing.
  • See t-dist, with inf. degrees of freedom (df).

36
Example Metro EMS
  • Null and Alternative Hypotheses
  • The response times for a random
  • sample of 40 medical emergencies
  • were tabulated. The sample mean
  • is 13.25 minutes and the sample
  • standard deviation is 3.2
  • minutes.
  • The director of medical services
  • wants to perform a hypothesis test, with a
  • .05 level of significance, to determine whether
    or not the
  • service goal of 12 minutes or less is being
    achieved.

37
One-Tailed Tests about a Population Mean
Large-Sample Case (n gt 30)
  • Using the Test Statistic

1. Determine the hypotheses.
H0 ?????? Ha????????
2. Specify the level of significance.
a .05
3. Select the test statistic.
(s is not known, we could use Z, but)
4. State the rejection rule.
Reject H0 if t gt 1.645 (dfgt30 so, its the
same as Z in our book)
38
One-Tailed Tests about a Population Mean
Large-Sample Case (n gt 30)
  • Using the Test Statistic

5. Compute the value of the test statistic.
6. Determine whether to reject H0.
Because 2.47 gt 1.645 (the critical value), we
reject H0.
We are 95 confident that Metro EMS is not
meeting the response goal of 12 minutes.
39
One-Tailed Tests about a Population Mean
Large-Sample Case (n gt 30)
a .05
p-value ???????
t
ta 1.645
our t 2.47
Interpretation Our t-stat indicates that the
sample mean that we got would have to be in the
tail of the distribution if the true mean were 12
minutes, and thats not very likely. The
likelihood of drawing a sample mean at random
that far from the true population mean is .0068
or less than 1
40
One-Tailed Tests about a Population Mean
Large-Sample Case (n gt 30)
  • Using the p-Value

4. Compute the value of the test statistic.
5. Compute the pvalue.
For z 2.47, cumulative probability
.9932. pvalue 1 - .9932 .0068
6. Determine whether to reject H0.
Because pvalue .0068 lt a .05, we reject H0.
41
Two-Tailed Tests about a Population Mean
Large-Sample Case (n gt 30)
  • Hypotheses
  • Test Statistic
  • Rejection Rule

?? Known
? Unknown, or n lt 30
Reject H0 if z gt z???
Reject H0 if t gt t???
42
Example Glow Toothpaste
  • Two-Tailed Tests about a Population Mean Large
    n
  • The production line for Glow toothpaste is
    designed to fill tubes with a mean weight of 6
    oz. Periodically, a sample of 30 tubes will be
    selected in order to check the filling process.
  • Quality assurance procedures call for the
    continuation of the filling process if the sample
    results are consistent with the assumption that
    the mean filling weight for the population of
    toothpaste tubes is 6 oz. otherwise the process
    will be adjusted.

43
Example Glow Toothpaste
  • Two-Tailed Tests about a Population Mean
    Large n
  • Assume that a sample of 30 toothpaste tubes
    provides a sample mean of 6.1 oz. and standard
    deviation of 0.2 oz.
  • Perform a hypothesis test, at the .05 level of
    significance, to help determine whether the
    filling process should continue operating or be
    stopped and corrected.

44
Two-Tailed Tests about a Population Mean
Large-Sample Case (n gt 30)
  • Using the Test Statistic

1. Determine the hypotheses.
2. Specify the level of significance.
a .05
3. Select the test statistic.
(s is not known)
4. State the rejection rule.
Reject H0 if t gt 2.045
45
Two-Tailed Tests about a Population Mean
Large-Sample Case (n gt 30)
  • Using the Test Statistic

Reject H0
Do Not Reject H0
Reject H0
??????????
??????????
t
0
2.045
-2.045
46
Two-Tailed Tests about a Population Mean
Large-Sample Case (n gt 30)
  • Using the Test Statistic

5. Compute the value of the test statistic.
6. Determine whether to reject H0.
Because 2.74 gt 2.045, we reject H0.
We are 95 confident that the mean filling weight
of the toothpaste tubes is not 6 oz. ITS
MORE!!!
47
Graphically
  • Using the Test Statistic

We reject the null!
2.74
Reject H0
Do Not Reject H0
Reject H0
??????????
??????????
t
0
2.045
-2.045
Interpretation The likelihood of getting a
sample of 30 tubes with a sample mean of 6.1oz,
when the true mean is 6oz, is less than 5 (or,
less than 1 chance in 20).
48
Summary of a Sample Test Statistics to be Used in
a Hypothesis Test about a Population Mean
Yes
No
n gt 30 ?
No
Popul. approx. normal ?
s known ?
Yes
Yes
Use s to Estimate s
No
s known ?
No
Use s to estimate s
Yes
Increase n to gt 30
49
Questions to Consider
  • What if we
  • -increase the st. dev. or variance, how will this
    affect the test statistic t or Z?...
  • and, how will this affect the likelihood of
    rejecting the null?
  • -Increase n, how will this affect the test
    statistic t or Z?...
  • and, how will this affect the likelihood of
    rejecting the null?

50
Number of Pets in Household
A local pet store has claimed that the average
decent American owns 4 pets. However, you feel
that this is an overstatement, so you decide to
test this claim with a survey.
2-tailed test, right?! H0 m0 4 H1 m0 !
4 You pick a.05, so your critical value with
df11 is /- 2.201
t
51
Lets double the sample size, but keep the
dispersion pretty constant
All I did here was increase the sample size to 24
people. The st.dev. changed slightly, but not
much. X-bar is still 3. Critical values are now
/- 2.069. However, now in both samples we can
strongly reject the null that the average number
of pets is 4.
t
52
Another Example(we arent always able to reject)
  • Suppose the claim is made that UWGs average
    student age has not changed since 2002 (22.5
    years old, source 2002 Fall Snapshot).
  • we really do want to know the average age of
    UWGs student population, but we dont have
    access to the population data...so, I decide to
    take a quick sample in front of the Library.

53
Our Sample
0
54
  • Its a two-tailed test
  • m022.5
  • n(9)
  • X-bar21.7
  • S3.153

55
What do we need to decide?
  • Significance level
  • .01?
  • .05?
  • .10?
  • Now we just calculate the test statistic

56
Again, calculating the t-stat
.687
We fail to reject the nullnow what can we do if
we still think the null is wrong!?!?! Get more
data! Bigger samples, more samples
57
Dow Jones 30 Rates of Return and Assumptions of
Normality
  • Financial economists and stock analysts have long
    sought to define the return generating process of
    asset prices.
  • Specifically, are monthly stock returns normally
    distributed?
  • If so, all the information we need to make
    probability statements about future stock returns
    is the mean and variance using the standard
    normal distribution.
  • This also mean we can make more meaningful
    comparisons about the returns in the stock market
    versus the returns to other investments, like a
    bond or a simple bank account.

58
Dow Jones 30 Index
59
  • Here again, we have used a natural log
    transformation converting our stock index data
    into continuously compounded rates of return
    i.e.,
  • R of Rln(Dow30t / Dow30t-1)
  • We can see that there is a bit more on the
    positive side than on the negative side.

60
Distribution of Rates of Returns (sorta)
The dist. of rates looks approximately
normal, for now, lets assume it is.
61
Descriptive Statistics
62
Normality and What it Gets Us
  • Considering the dist. above, stock rates of
    return cluster around the average monthly rate of
    return of .997, and so it can be approximated by
    the normal probability distribution.
  • The key benefit of assuming normality is
    simplicity
  • if we know the mean and variance of a normally
    distributed random variable then we know
    completely the behavior of such a variable from
    its probability distribution function.

63
Normality means we can predict
  • To see this point we will make a simple forecast
    regarding the probability of a rate of return
    lt0, in any given month, on the Dow Jones 30
    Industrials Index.

64
What do we know???
  • Its a one-tailed testwe are only concerned with
    returns lt0.
  • We want to falsify the claim that rate of return
    is less than or equal to zero. So, we set up our
    null like the following

65
The Hypothesis
  • H0 Rate of Return lt 0
  • H1 Rate of Return gt 0
  • We pick an a of .01. Risky or not risky?
  • Now, all thats left is calculating the test
    statistic.

66
Reject H0
??????1
t
0
2.326
4.253
67
Whats the conclusion???
  • The probability of the average monthly rate of
    return being 0 or less is VERY SMALL.
  • Based on this historic data, how risky is this
    stock index if you are worried about losing money?

68
Correlation
  • How two things are related
  • It could be useful to know the correlation
    between advertising expenditures and sales, the
    book sayswhy?
  • There are several ways to look at correlation,
    but we use one, Pearsons r AKA the correlation
    coefficient.

69
Correlation Coefficient
  • The correlation coefficient measures the degree
    of linear association between an X and a Y.
  • Its defined as
  • It ranges from -1 to 1, with 0 indicating no
    linear correlation

70
Uses of r
  • r is typically used to examine
  • Correlation of RHS variables used in a regression
    equation.
  • -picking variables
  • -diagnosing problems
  • Correlation of observations near each other in
    time, space, or some other measure of distance.
  • -diagnosing problems (autocorrelation)

71
Correlation
  • Linear Correlation
  • Non-linear (like F)

72
Correlation Coefficient and Autocorrelation
  • Initially, everyone is taught to think of
    correlation as being between two separate
    variables, but it can also between observations
    within a series. When this occurs, we refer to
    it as Autocorrelation.

If k1, then the lag is for only 1 period, but
you could have lags that are many periods.
73
Examples of Data that Exhibits Autocorrelation
  • Employment and employment rates
  • Prices
  • Crime (spatial)
  • County smog levels (spatial)

74
A Simple Way of Thinking of Autocorrelation
  • Yesterdays price level affects todays
  • price level (time series autocorrelation).
  • The crime in the community next door
  • affects crime here (spatial autocorrelation).
  • What other stuff fits this description?

75
Patterns in the Data
  • The form of auto correlation is closely related
    to patterns you see in the data

76
Correlation Coefficient and Autocorrelation
  • Stationary data series should see its
    autocorrelation coefficient decline to zero
    quickly as the number of lags increase.
  • In data that has a trend, the rk will diminish
    slowly as k increases.
  • Seasonal data may have 12-month or 4 quarter
    lags. Autocorrelation may be strong for months
    or years.
  • Autocorrelation can be used as an indicator of
    trends or seasonality. As such, it can be used
    as a check for stationarity.

77
Going over Assignment 1
  • Instructions
  • Questions
  • Return assignments by email?
  • I need your permission

78
Brainstorming Topics for Individual Projects
Write a Comment
User Comments (0)
About PowerShow.com