Title: Government%20Financial%20Accounting
1Regression Analysis
Defense Resources Management Institute
2Unscheduled Maintenance Issue
- 36 flight squadrons
- Each experiences unscheduled maintenance actions
(UMAs) - UMAs costs 1000 to repair, on average.
3Youve got the Data Now What?
Unscheduled Maintenance Actions (UMAs)
4What do you want to know?
- How many UMAs will there be next month?
- What is the average number of UMAs ?
5Sample Mean
6Sample Standard Deviation
7UMA Sample Statistics
8UMAs Next Month
95 Confidence Interval
9Average UMAs
95 Confidence Interval
10Model Cost of UMAs for one squadron
- If the cost per UMA 1000, the
- Expected cost for one squadron 60,000
11Model Total Cost of UMAs
-
- Expected Cost for all squadrons
- 60 1000 36 2,160,000
12Model Total Cost of UMAs
-
- Expected Cost for all squadrons
- 60 1000 36 2,160,000
- How confident are we about this estimate?
13 95
mean (60) standard error 12/?36 2
1456 58 60 62 64 (1
standard unit 2)
95
1595 Confidence Interval on our estimate of UMAs
and costs
- 60 2(2) 56, 64
- low cost 56 1000 36 2,016,000
- high cost 64 1000 36 2,304,000
16What do you want to know?
- How many UMAs will there be next month?
- What is the average number of UMAs ?
- Is there a relationship between UMAs and and some
other variable that may be used to predict UMAs? - What is that relationship?
17Relationships
- What might be related to UMAs?
- Pilot Experience ?
- Flight hours ?
- Sorties flown ?
- Mean time to failure (for specific parts) ?
- Number of landings / takeoffs ?
18Regression
- To estimate the expected or mean value of UMAs
for next month - look for a linear relationship between UMAs and a
predictive variable - If a linear relationship exists, use regression
analysis
19Regression analysis
- describes and evaluates
- relationships between one variable
- (dependent or explained variable), and
- one or more other variables (called the
independent or explanatory variables).
20What is a good estimating variable for UMAs?
- quantifiable
- predictable
- logical relationship with dependent variable
- must be a linear relationship
- Y a bX
21Sorties
22Pilot Experience
23Sample Statistics
24Describing the Relationship
- Is there a relationship?
- Do the two variables (UMAs and sorties or
experience) move together? - Do they move in the same direction or in opposite
directions? - How strong is the relationship?
- How closely do they move together?
25Positive Relationship
26Strong Positive Relationship
27Negative Relationship
28Strong Negative Relationship
29No Relationship
30Relationship?
31Correlation Coefficient
- Statistical measure of how closely two variables
are moving together in a coordinated fashion - Measures strength and direction
- Value ranges from -1.0 to 1.0
- 1.0 indicates perfect positive linear relation
- -1.0 indicates perfect negative linear relation
- 0 indicates no relation between the two variables
32Correlation Coefficient
33Sorties vs. UMAs
r .9788
34Experience vs. UMAs
r .1896
35Correlation Matrix
36A Word of Caution...
- Correlation does NOT imply causation
- It simply measures the coordinated movement of
two variables - Variation in two variables may be due to a third
common variable - The observed relationship may be due to chance
alone
37What is the Relationship?
- In order to use the correlation information to
help describe the relationship between two
variables we need a model - The simplest one is a linear model
38Fitting a Line to the Data
39One Possibility
Sum of errors 0
40Another Possibility
Sum of errors 0
41Which is Better?
- Both have sum of errors 0
- Compare sum of absolute errors
42Fitting a Line to the Data
43One Possibility
Sum of absolute errors 6
44Another Possibility
Sum of absolute errors 6
45Which is Better?
- Sum of the absolute errors are equal
- Compare sum of errors squared
46The Correct Relationship Y a bX U
Y
systematic random
100
90
80
70
60
50
X
100
110
120
130
47The correct relationship
Y a bX U
Y
systematic random
100
90
80
70
60
50
X
100
110
120
130
48Least-Squares Method
- Penalizes large absolute errors
- Y- intercept
- Slope
49Assumptions
- Linear relationship
- Errors are random and normally distributed with
mean 0 and variance - Supported by Central Limit Theorem
50Least Squares Regression for Sorties and UMAs
51Regression Calculations
52Sorties vs. UMAs
53Regression Calculations Confidence in the
predictions
54Confidence Interval for Estimate
5595 Confidence Interval for the model (b)
Y
X
56Testing Model Parameters
- How well does the model explain the variation in
the dependent variable? - Does the independent variable really seem to
matter? - Is the intercept constant statistically
significant?
57Variation
58Coefficient of Determination
- Values between 0 and 1
- R2 1 when all data on line (r1)
- R2 0 when no correlation (r0)
59Regression Calculations How well does the model
explain the variation?
60Does the IndependentVariable Matter?
- If sorties do not help predict UMAs we expect b
0 - If b is not 0, is it statistically significant?
61Regression Calculations Does the Independent
Variable Matter?
6295 Confidence Interval for the slope (a)
Y
Mean of Y
Mean of X
X
63Confidence Interval for Slope
64Is the InterceptStatistically Significant?
65Confidence Intervalfor Y-intercept
66Basic Steps ofRegression Analysis
- Formulate the model
- Plot scatter diagram for visual inspection
- Compute correlation coefficient
- Fit the regression line
- Test the model
67Factors affecting estimation accuracy
- Sample size (larger is better)
- Range of X values (wider is better)
- Standard deviation of U (smaller is better)
68Uses and Limitationsof Regression Analysis
- Identifying relationships
- Not necessarily cause
- May be due to chance only
- Forecasting future outcomes
- Only valid over the range of the data
- Past may not be good predictor of future
69Common pitfalls in regression
- Failure to draw scatter diagrams
- Omitting important variables from the model
- The two point phenomenon
- Unfounded claims of model sophistication
- Insufficient attention to interval estimates and
predictions - Predicting too far outside of known range
70Lines can be deceiving...
R2 .6662
71Nonlinear Relationship
72Best fit?
73Misleading data
74Summary
- Regression Analysis is a useful tool
- Helps quantify relationships
- But be careful
- Does not imply cause and effect
- Dont go outside range of data
- Check linearity assumptions
- Use common sense!
75Non-linear relationship between output and cost