Title: Statistics%20and%20Quantitative%20Analysis%20U4320
1Statistics and Quantitative Analysis U4320
- Segment 8
- Prof. Sharyn OHalloran
2I. Introduction
- A. Overview
- 1. Ways to describe, summarize and display data.
- 2.Summary statements
- Mean
- Standard deviation
- Variance
- 3. Distributions
- Central Limit Theorem
3I. Introduction (cont.)
- A. Overview
- 4. Test hypotheses
- 5. Differences of Means
- B. What's to come?
- 1. Analyze the relationship between two or more
variables with a specific technique called
regression analysis.
4I. Introduction (cont.)
- A. Overview
- B. What's to come?
- 2. This tools allows us to predict the impact of
one variable on another. - For example, what is the expected impact of a
SIPA degree on income?
5II. Causal Models
- Causal models explain how changes in one variable
affect changes in another variable. - Incinerator -------------------------gt Bad Public
Health - Regression analysis gives us a way to analyze
precisely the cause-and-effect relationships
between variables. - Directional
- Magnitude
6II. Causal Models (cont.)
- A. Variables
- Let us start off with a few basic definitions.
- 1. Dependent Variable
- The dependent variable is the factor that we want
to explain. - 2. Independent Variables
- Independent variable is the factor that we
believe causes or influences the dependent
variable. - Independent variable-------gt Dependent Variable
- Cause ------------------gt Effect
7II. Causal Models (cont.)
- A. Variables
- B. Voting Example
- Let us say that we have a vote in the House of
Representatives on health. And we want to know
if party affiliation influenced individual
members' voting decisions? - 1. The raw data looks like this
8II. Causal Models (cont.)
- A. Variables
- B. Voting Example
- 2. Percentages look like this
- 3. Does party affect voting behavior?
- Given that the legislator is a Democrat, what is
the chance of voting for the health care
proposal?
9II. Causal Models (cont.)
- A. Variables
- B. Voting Example
- 3. Does party affect voting behavior? (cont.)
- What is the Probability of being a democrat?
- What is the Probability of being a Democrat and
voting yes?
10II. Causal Models (cont.)
- A. Variables
- B. Voting Example
- 4. Casual Model
- This is the simplest way to state a causal model
- A-------------gt B
- Party ---------gt Vote
- 5. Interpretation
- The interpretation is that if party influences
vote, then as we move from Republicans to
Democrats we should see a move from a No vote to
a YES vote.
11II. Causal Models (cont.)
- A. Variables
- B. Voting Example
- C. Summary
- 1. Regression analysis helps us to explain the
impact of one variable on another. - We will be able to answer such questions as what
is the relative importance of race in explaining
one's income? - Or perhaps the influence of economic conditions
on the levels of trade barriers?
12II. Causal Models (cont.)
- A. Variables
- B. Voting Example
- C. Summary
- 2. Univariate Model
- For now, we will focus on the univariate case, or
the causal relation between two variables. - We will then relax this assumption and look at
the relation of multiple variables in a couple of
weeks.
13III. Fitted Line
- Although regression analysis can be very
complicated, the heart of it is actually very
simple. - It centers on the notion of fitting a line
through the data.
- 1. Example
- Suppose we have a study of how wheat yield
depends on fertilizer. And we observe this
relation
14III. Fitted Line (cont.)
- 1. Example (cont.)
- The observed relation between Fertilizer and
Yield then can be plotted as follows
15III. Fitted Line (cont.)
- 1. Example
- 2. What line best approximates the relation
between these observations? - a) Highest and Lowest Value
16III. Fitted Line (cont.)
- 1. Example
- 2. What line best approximates the relation
between these observations? (cont.) - b) Median Value
17III. Fitted Line (cont.)
- 1. Example
- 2. What line best approximates the relation
between these observations? - 3. Predicted Values
- a) Example 1
- The line that is fitted to the data gives the
predicted value of Y for any give level of X.
18III. Fitted Line (cont.)
- 1. Example
- 2. What line best approximates the relation
between these observations? - 3. Predicted Values (cont.)
- a) Example 1
- If X is 400 and all we know was the fitted line
then we would expect the yield to be around 65.
19III. Fitted Line (cont.)
- 1. Example
- 2. What line best approximates the relation
between these observations? - 3. Predicted Values (cont.)
- b) Example 2
- Many times we have a lot of data and fitting the
line becomes rather difficult.
20III. Fitted Line (cont.)
- 1. Example
- 2. What line best approximates the relation
between these observations? - 3. Predicted Values (cont.)
- b) Example 2
- For example, if our plotted data looked like this
21IV. OLS Ordinary Least Squares
- We want a methodology that allows us to be able
to draw a line that best fits the data. - A. The Least Square Criteria
- What we want to do is to fit a line whose
equation is of the form - This is just the algebraic representation of a
line.
22IV. OLS Ordinary Least Squares (cont.)
- A. The Least Square Criteria (cont.)
- 1. Intercept
- a represents the intercept of the line. That is,
the point at which the line crosses the Y axis. - 2. Slope of the line
- b represents the slope of the line.
23IV. OLS Ordinary Least Squares (cont.)
- A. The Least Square Criteria (cont.)
- 1. Intercept
- 2. Slope of the line
- Remember the slope is just the change in Y
divided by the change in X. Rise/Run - 3. Minimizing the Sum or Squares
- a) Problem
- How do we select a and b so that we minimize the
pattern of vertical Y deviations (predicted
errors)? - We what to minimize the deviation
24IV. OLS Ordinary Least Squares (cont.)
- A. The Least Square Criteria (cont.)
- 1. Intercept
- 2. Slope of the line
- 3. Minimizing the Sum or Squares
- b) There are several ways in which we can do
this. - 1. First, we could minimize the sum of d.
- We could find the line that will give us the
lowest sum of all the d's. - The problem of course is that some d's would be
positive and others would be negative and when we
add them all up they would end up canceling each
other. - In effect, we would be picking a line so that the
d's add up to zero.
25IV. OLS Ordinary Least Squares (cont.)
- A. The Least Square Criteria (cont.)
- 1. Intercept
- 2. Slope of the line
- 3. Minimizing the Sum or Squares
- b) There are several ways in which we can do
this. - 2. Absolute Values
- 3. Sum of Squared Deviations
26IV. OLS Ordinary Least Squares (cont.)
- A. The Least Square Criteria
- B. OLS Formulas
- 1. Fitted Line
- The line that we what to fit to the data is
- This is simply what we call the OLS line.
- Remember we are concerned with how to calculate
the slope of the line b and the intercept of the
line
27IV. OLS Ordinary Least Squares (cont.)
- A. The Least Square Criteria
- B. OLS Formulas
- 1. Fitted Line
- 2. OLS Slope
- The OLS slope can becalculated from the formula
28IV. OLS Ordinary Least Squares (cont.)
- A. The Least Square Criteria
- B. OLS Formulas
- 1. Fitted Line
- 2. OLS Slope
- In the book they use the abbreviations
29IV. OLS Ordinary Least Squares (cont.)
- A. The Least Square Criteria
- B. OLS Formulas
- 1. Fitted Line
- 2. OLS Slope
- 3. Intercept
- Now that we have the slope b it is easy to
calculate a - Note when b0 then the intercept is just the
mean of the dependent variable.
30IV. OLS Ordinary Least Squares (cont.)
- A. The Least Square Criteria
- B. OLS Formulas
- C. Example 1 Fertilizer and Yield
31IV. OLS Ordinary Least Squares (cont.)
- A. The Least Square Criteria
- B. OLS Formulas
- C. Example 1 Fertilizer and Yield
- So to calculate the slope we solve
- We can then use the slope b to calculate the
intercept
32IV. OLS Ordinary Least Squares (cont.)
- A. The Least Square Criteria
- B. OLS Formulas
- C. Example 1 Fertilizer and Yield
- Remember
- Plugging these estimated values into our fitted
line equation, we get
33IV. OLS Ordinary Least Squares (cont.)
- A. The Least Square Criteria
- B. OLS Formulas
- C. Example 1 Fertilizer and Yield
- What is the predicted bushels produced with 400
lbs of fertilizer? - What if we add 700 lbs of fertilizer what would
be the expected yield?
34IV. OLS Ordinary Least Squares (cont.)
- A. The Least Square Criteria
- B. OLS Formulas
- C. Example 1 Fertilizer and Yield
- D. Interpretation of b and a
- 1. Slope b
- Change in Y that accompanies a unit change X.
- The slope tells us that when there is a one unit
change in the independent variable what is the
predicted effect on the dependent variable?
35IV. OLS Ordinary Least Squares (cont.)
- A. The Least Square Criteria
- B. OLS Formulas
- C. Example 1 Fertilizer and Yield
- D. Interpretation of b and a
- 1. Slope b
- The slope then tells us two things
- i) The directional effect of the independent
variable on the dependent variable. - There was a positive relation between fertilizer
and yield.
36IV. OLS Ordinary Least Squares (cont.)
- A. The Least Square Criteria
- B. OLS Formulas
- C. Example 1 Fertilizer and Yield
- D. Interpretation of b and a
- 1. Slope b
- The slope then tells us two things
- ii) It also tells you the magnitude of the effect
on the dependent variable. - For each additional pound of fertilizer we expect
an increased yield of .059 bushels.
37IV. OLS Ordinary Least Squares (cont.)
- A. The Least Square Criteria
- B. OLS Formulas
- C. Example 1 Fertilizer and Yield
- D. Interpretation of b and a
- 2. The Intercept
- The intercept tells us what we would expect if
there is no fertilizer added, we expect a yield
of 36.4 bushels. - So independent of the fertilizer you can expect
36.4 bushels. - Alternatively, if fertilizer has no effect on
yield, we would simply expect 36.4 bushels. The
yield we expected with no fertilizer.
38IV. OLS Ordinary Least Squares (cont.)
- A. The Least Square Criteria
- B. OLS Formulas
- C. Example 1 Fertilizer and Yield
- D. Interpretation of b and a
- E. Example II Radio Active Exposure
- 1. Casual Model
- We want to know if exposure to radio active
waste is linked to cancer? - Radio Active Waste --------------gt Cancer
39IV. OLS Ordinary Least Squares (cont.)
- A. The Least Square Criteria
- B. OLS Formulas
- C. Example 1 Fertilizer and Yield
- D. Interpretation of b and a
- E. Example II Radio Active Exposure
- 2. Data
40IV. OLS Ordinary Least Squares (cont.)
- A. The Least Square Criteria
- B. OLS Formulas
- C. Example 1 Fertilizer and Yield
- D. Interpretation of b and a
- E. Example II Radio Active Exposure
- 3. Graph
41IV. OLS Ordinary Least Squares (cont.)
- A. The Least Square Criteria
- B. OLS Formulas
- C. Example 1 Fertilizer and Yield
- D. Interpretation of b and a
- E. Example II Radio Active Exposure
- 4. Calculate the regression line for predicting Y
from X - i) Slope
- How do we interpret the slope coefficient?
- For each unit of radioactive exposure, the cancer
mortality rate rises by 9.03 deaths per 10,000
individuals.
42IV. OLS Ordinary Least Squares (cont.)
- A. The Least Square Criteria
- B. OLS Formulas
- C. Example 1 Fertilizer and Yield
- D. Interpretation of b and a
- E. Example II Radio Active Exposure
- ii) Calculate the intercept
- Plugging these estimated values into our fitted
line equation, we get
43IV. OLS Ordinary Least Squares (cont.)
- A. The Least Square Criteria
- B. OLS Formulas
- C. Example 1 Fertilizer and Yield
- D. Interpretation of b and a
- E. Example II Radio Active Exposure
- 5. Predictions
- Let's calculate the mortality rate if X were 5.0.
- How about if X were 0?
44IV. OLS Ordinary Least Squares (cont.)
- A. The Least Square Criteria
- B. OLS Formulas
- C. Example 1 Fertilizer and Yield
- D. Interpretation of b and a
- E. Example II Radio Active Exposure
- How can we interpret this result?
- Even with no radioactive exposure, the
mortality rate would be 118.5.
45III. Advantages of OLS
- A. Easy
- 1. The least square method gives relative easy or
at least computable formulas for calculating a
and b.
46III. Advantages of OLS (cont.)
- A. Easy
- B. OLS is similar to many concepts we have
already used. - 1. We are minimizing the sum of the squared
deviations. In effect, this is very similar to
how we find the variance. - 2. Also, we saw above that when b0,
- The interpretation of this is that the best
prediction we can make of Y is just the sample
mean . - This is the case when the two variables are
independent.
47III. Advantages of OLS (cont.)
- A. Easy
- B. OLS is similar to many concepts we have
already used. - C. Extension of the Sample Mean
- Since OLS is just an extension of the sample
mean, it has many of the same properties like
efficient and unbiased. - D. Weighted Least Squares
- We might want to weigh some observations more
heavily than others.
48V. Homework Example
- In the homework assignment, you are asked to
select two interval/ratio level variables and
calculate the fitted line that minimizes the sum
of the squared deviations (the regression line). - A. Choose 2 Variables
- What effect does the number of years of education
have on the frequency that one reads the
newspaper? - The independent variable is Education
- And the dependent variable is Newspaper reading.
49V. Homework Example(cont.)
- A. Choose 2 Variables
- B. Coding the Variables
- First, I made a new variable called PAPER.
- Recode all the missing data values to a single
value. - Remove missing values from the data set.
- Then do the same for education
50V. Homework Example(cont.)
- A. Choose 2 Variables
- B. Coding the Variables
- C. Getting the number of valid observations
- Next, see how many valid observations are left by
using the Summarize command under the Data
menu.
51V. Homework Example(cont.)
- A. Choose 2 Variables
- B. Coding the Variables
- C. Getting the number of valid observations
- D. Sampling five observations
- 1. So we randomly sample 5 from 1019.
- 2. As before, use the Select command under the
Data menu to get 5 random observations. - 3. Then go to the Statistics menu and use the
Summarize gt List command to get the entries
for the variables of interest.
52V. Homework Example(cont.)
- A. Choose 2 Variables
- B. Coding the Variables
- C. Getting the number of valid observations
- D. Sampling five observations
- E. Calculate the OLS Line
- Finally, you will have to compute the fitted line
for these data.
53V. Homework Example(cont.)
- A. Choose 2 Variables
- B. Coding the Variables
- C. Getting the number of valid observations
- D. Sampling five observations
- E. Calculate the OLS Line
- 1. Calculate b
- 2 . Calculate the intercept
- 3 . Calculate the OLS line
54V. Homework Example(cont.)
- A. Choose 2 Variables
- B. Coding the Variables
- C. Getting the number of valid observations
- D. Sampling five observations
- E. Calculate the OLS Line
- 4. Plot
55V. Homework Example(cont.)
- A. Choose 2 Variables
- B. Coding the Variables
- C. Getting the number of valid observations
- D. Sampling five observations
- E. Calculate the OLS Line
- 5. Interpretation
- A person with no education would read 3.3
newspapers a day.
56V. Homework Example(cont.)
- A. Choose 2 Variables
- B. Coding the Variables
- C. Getting the number of valid observations
- D. Sampling five observations
- E. Calculate the OLS Line
- 5. Interpretation (cont.)
- Our results further tell us that each additional
year of education reduces the number of
newspapers a person reads by 0.14. - So for every year of education you read 14 less.
57V. Homework Example(cont.)
- A. Choose 2 Variables
- B. Coding the Variables
- C. Getting the number of valid observations
- D. Sampling five observations
- E. Calculate the OLS Line
- 5. Interpretation (cont.)
- This example suggests some of the problems with
drawing inferences about the underlying
population from small samples.