Title: IT-390Forecasting
1IT-390 Forecasting
2Introduction
- Forecasting is a future calculation
- Not an exact science
- Forecasting involves extrapolation
- This is based on the theory that past data
follows some particular pattern
3Graphic Analysis of Data
- Understanding of data organization
- Raw data ? Frequency Distribution
- Grouping data into categories
- Count number of observations in each class
- Gives Range (largest smallest values), apparent
patterns, what values the data may group around,
where values appear most often, and so on
4Graphic Analysis of Data
- Histograms are used to graphically display
Relative Frequency data. - From the Frequency Distribution spreadsheet on
page 141, we get - Relative Frequency - calculated by "number of
observations" in a class, divided by the total
number of observations.
.400
.300
.200
.100
.50
15.15- 15.65
14.35- 14.75
13.95- 14.35
12.75- 13.15
12.35- 12.75
15.65- 16.05
14.75- 15.15
13.55- 13.95
13.15- 13.55
5Graphic Analysis of Data
- Cumulative Frequency is used with a process of
graphing called "Ogive" (Oh-jive) - Estimate on the future or trend - using
historical - When plotted using the variables (normally two) a
Scatter Diagram is created.
6Graphic Analysis of Data
- Data on a Scatter Diagram takes many forms
7Graphic Analysis of Data
- Steps to generate a Scatter Diagram
- 1) Collect data
- 2) Draw horizontal axis Independent ("cause")
variable goes on "X". - 3) Draw vertical axis Dependent ("effect")
variable goes on "Y". - 4) Plot data. Circle repeat points
- 5) Analyze
8Graphic Analysis of Data
- "see" where the data is going
- Mathematical methods (Algorithms)
- Linear Least Squares Regression
- Curvilinear Least Squares Regression
9Least Squares and Regression
- Generate a mathematical equation for line of best
fit to the data - The first method we will examine is the Linear
Least Squares Regression
10Least Squares and Regression
- Linear Least Squares Regression
- Best fit through the historical points
- Assumptions
- 1) Data are normally distributed
- 2) y is the depend variable and x is the
independent variable - 3) Data appears to be linear, not curvilinear
11Least Squares and Regression
- The linear equation is
- Y a bx, where
- a is a constant value and is equal to the Y
value at the point where x0. - b is the slope of the line
- x is the independent variable
- Y is the dependent variable
- The two equations necessary are
12Linear (Data) Example
13Linear Example
14Handout Problem (1)
15Least Squares and Regression
- Curvilinear Least Squares Regression
- Logarithmic functions - Log and Antilog
- Same assumptions as Linear, except data is
curvilinear - Power Equation. (Eq. 5.25)(pg. 201)
- Y axb
- a is a constant value equal to the Y value
at the point where x0 - b is the slope of the line
- x is the independent variable
- Y is the dependent variable
- from eq. 5.29 5.30 (pg. 202)
a antilog of log a
16Curvilinear Example
17Handout Problem (2)
18Caution!
- There is a axiom in statistics that says,
"Correlation does not imply causality." In other
words, your scatter plot may show that a
relationship exists, but it does not and cannot
prove that one variable is causing the other.
There could be a third factor involved which is
causing both, some other systemic cause, or the
apparent relationship could just be a fluke.
Nevertheless, the scatter plot can give you a
clue that two things might be related, and if so,
how they move together.
19Standard Error of Estimate
- Standard Error of Estimate (not in book, but
important). To measure the reliability of the
estimating equation, statisticians have developed
the standard error of estimate. This is
symbolized by Se and it is a measure of
dispersion, or the variability (scatter) around
the regression line. - The equation is
VS
20Standard Error of Estimate
21Handout Problem 1-Standard Error
22Interpretation of Standard Error
- If Se 0, the equation would be perfect. All
points would lie on the line instead of around it - 68 of all points lie within the 1st standard
deviation - 95.5 of all points lie within the 2nd standard
deviation - 99.73 of all points lie within the 3rd standard
deviation
23Interpretation of Standard Error
- Problem 1 6,515,346 Kilowatts. The first
standard deviation is ? 120, 2nd is ? 240, 3rd is
? 360 Kilowatts - This means
- 68 data points are between 6,515,226 and
6,515,466 (? 120) - 95.5 data points are between 6,515,106 and
6,515,586 (? 240) - 99.7 data points are between 6,514,986 and
6,515,706 (? 360) - This also shows the (probability) of where new
data will fall in relation to the regression line.
24Interpretation of Standard Error Graphically
3rd
2nd
1st
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
Standard Deviations
25Correlation
- A correlation coefficient is a number between -1
and 1 which measures the degree to which two
variables are linearly related.
26Correlation
27Correlation
- 1) If there is perfect linear relationship with
positive slope between the two variables, we have
a correlation coefficient of 1. If there is
positive correlation, Y (dependent variable)
increases as X (independent variable) increases.
28Correlation
- 2) If there is a perfect linear relationship with
negative slope between the two variables, we have
a correlation coefficient of -1. If there is
negative correlation, Y (dependent variable)
decreases as X (independent variable) increases.
29Correlation
Positive Correlation (as X increases, Y
increases)
30Handout Problem 1- Correlation
31Cost Indexes
- An index is a dimensionless number used to
indicate how a cost has changed over time
relative to a base year. The creator of the
index publishes a table showing the value of the
index for the years of interest.
32Cost Index Example Graph
33Cost Indexes
- For example, the Bureau of Labor Statistics
tracks the Consumer Price Index (CPI) which "is a
measure of the average change over time in the
prices paid by urban consumers for a market
basket of consumer goods and services. It
includes a number of goods and services in
categories such as Food and Beverages, Housing,
Apparel, and Transportation. Selected index
values are shown below. - Source Bureau of Labor Statistics
34Cost Indexes
- Using the indexes, if we know the cost of goods
in one year, we can estimate the cost of the same
goods in another year by using a simple ratio. - Where IA and IB are the indexes in years A and B
respectively and CA and CB are the cost in years
A and B respectively.
which simplifies to
35Index Examples
- Example 1
- A family spent 160 per month on groceries in
1987, how much can they expect to spend in 1994? - Solution
- To find the cost in 1994 we need the cost of
groceries in 1987 (C1987160), and the index
values for the years 1987 and 1994 (I1987 118.2
and I1994 156.5). Then, - C1994 C1987 ( I1994 / I1987 )
- C1994 160 ( 156.5 / 118.2 )
- C1994 160 (1.3240)
- C1994 212
- Since the index value for 1994 is higher than the
index value for 1987, we should expect the cost
of groceries to increase, and it does.
36Index Examples
- Example 2
- When Anna graduated as an engineer in 1992, her
starting salary was 33,000. What would her first
employer have to offer a 2000 graduate to start
in her job? Given (I1992 147.3 and I2000
182.3) - Solution
- Simply taking into account the change in the cost
of living reflected in the CPI, her employer
would have to offer - C2000 C1992 ( I2000 / I1992 )
- C2000 33,000 ( 182.3 / 147.3 )
- C2000 33,000 (1.2376)
- C2000 40,841
37Cost Indexes
- Cost Indexes A cost index expresses a change or
the relationship in price levels between two
points in time. All that is needed to generate
an index is a base period. The relationship
between the base period and each of the other
periods is an index value. See the example below
where period four is selected as the base period. - Year Period Price Index
- 1981 1 43.75 43.75/46.10 .949 or 94.9
- 1982 2 44.25 44.25/46.10 .960 or 96.0
- 1983 3 45.00 45.00/46.10 .976 or 97.6
- 1984 4 46.10 46.10/46.10 1.000 or 100.0 lt
Base - 1985 5 47.15 47.15/46.10 1.023 or 102.3
- 1986 6 49.25 49.25/46.10 1.068 or 106.8
38Cost Indexes
- The base period can be shifted or changed by
dividing by the dollar amounts as above or by
simply dividing the current indexes by the index
value of the new base period. For example, if we
select period 5 as the new base, then our
indexes would appear as follows - New
- Year Period Price
Index - 1981 1 43.75 .949/1.023 .928 or 92.8
- 1982 2 44.25 .960/1.023 .938 or 93.8
- 1983 3 45.00 .976/1.023 .954 or 95.4
- 1984 4 46.10 1.000/1.023 .977 or 97.7
- 1985 5 47.15 1.023/1.023 1.000 or 100.0 lt
New Base - 1986 6 49.25 1.068/1.023 1.044 or 104.4
39Cost Indexes
- Instead of trying to work with dollars throughout
all computations, it is often easier to use an
index to convert a prior cost into an estimate of
future cost.