Statistics%20and%20Quantitative%20Analysis%20U4320 - PowerPoint PPT Presentation

About This Presentation
Title:

Statistics%20and%20Quantitative%20Analysis%20U4320

Description:

For now, we will focus on the univariate case, or the causal relation between two variables. ... The observed relation between Fertilizer and Yield then can be ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 58
Provided by: CCN4
Learn more at: http://www.columbia.edu
Category:

less

Transcript and Presenter's Notes

Title: Statistics%20and%20Quantitative%20Analysis%20U4320


1
Statistics and Quantitative Analysis U4320
  • Segment 8
  • Prof. Sharyn OHalloran

2
I. Introduction
  • A. Overview
  • 1. Ways to describe, summarize and display data.
  • 2.Summary statements
  • Mean
  • Standard deviation
  • Variance
  • 3. Distributions
  • Central Limit Theorem

3
I. Introduction (cont.)
  • A. Overview
  • 4. Test hypotheses
  • 5. Differences of Means
  • B. What's to come?
  • 1. Analyze the relationship between two or more
    variables with a specific technique called
    regression analysis.

4
I. Introduction (cont.)
  • A. Overview
  • B. What's to come?
  • 2. This tools allows us to predict the impact of
    one variable on another.
  • For example, what is the expected impact of a
    SIPA degree on income?

5
II. Causal Models
  • Causal models explain how changes in one variable
    affect changes in another variable.
  • Incinerator -------------------------gt Bad Public
    Health
  • Regression analysis gives us a way to analyze
    precisely the cause-and-effect relationships
    between variables.
  • Directional
  • Magnitude

6
II. Causal Models (cont.)
  • A. Variables
  • Let us start off with a few basic definitions.
  • 1. Dependent Variable
  • The dependent variable is the factor that we want
    to explain.
  • 2. Independent Variables
  • Independent variable is the factor that we
    believe causes or influences the dependent
    variable.
  • Independent variable-------gt Dependent Variable
  • Cause ------------------gt Effect

7
II. Causal Models (cont.)
  • A. Variables
  • B. Voting Example
  • Let us say that we have a vote in the House of
    Representatives on health. And we want to know
    if party affiliation influenced individual
    members' voting decisions?
  • 1. The raw data looks like this

8
II. Causal Models (cont.)
  • A. Variables
  • B. Voting Example
  • 2. Percentages look like this
  • 3. Does party affect voting behavior?
  • Given that the legislator is a Democrat, what is
    the chance of voting for the health care
    proposal?

9
II. Causal Models (cont.)
  • A. Variables
  • B. Voting Example
  • 3. Does party affect voting behavior? (cont.)
  • What is the Probability of being a democrat?
  • What is the Probability of being a Democrat and
    voting yes?

10
II. Causal Models (cont.)
  • A. Variables
  • B. Voting Example
  • 4. Casual Model
  • This is the simplest way to state a causal model
  • A-------------gt B
  • Party ---------gt Vote
  • 5. Interpretation
  • The interpretation is that if party influences
    vote, then as we move from Republicans to
    Democrats we should see a move from a No vote to
    a YES vote.

11
II. Causal Models (cont.)
  • A. Variables
  • B. Voting Example
  • C. Summary
  • 1. Regression analysis helps us to explain the
    impact of one variable on another.
  • We will be able to answer such questions as what
    is the relative importance of race in explaining
    one's income?
  • Or perhaps the influence of economic conditions
    on the levels of trade barriers?

12
II. Causal Models (cont.)
  • A. Variables
  • B. Voting Example
  • C. Summary
  • 2. Univariate Model
  • For now, we will focus on the univariate case, or
    the causal relation between two variables.
  • We will then relax this assumption and look at
    the relation of multiple variables in a couple of
    weeks.

13
III. Fitted Line
  • Although regression analysis can be very
    complicated, the heart of it is actually very
    simple.
  • It centers on the notion of fitting a line
    through the data.
  • 1. Example
  • Suppose we have a study of how wheat yield
    depends on fertilizer. And we observe this
    relation

14
III. Fitted Line (cont.)
  • 1. Example (cont.)
  • The observed relation between Fertilizer and
    Yield then can be plotted as follows

15
III. Fitted Line (cont.)
  • 1. Example
  • 2. What line best approximates the relation
    between these observations?
  • a) Highest and Lowest Value

16
III. Fitted Line (cont.)
  • 1. Example
  • 2. What line best approximates the relation
    between these observations? (cont.)
  • b) Median Value

17
III. Fitted Line (cont.)
  • 1. Example
  • 2. What line best approximates the relation
    between these observations?
  • 3. Predicted Values
  • a) Example 1
  • The line that is fitted to the data gives the
    predicted value of Y for any give level of X.

18
III. Fitted Line (cont.)
  • 1. Example
  • 2. What line best approximates the relation
    between these observations?
  • 3. Predicted Values (cont.)
  • a) Example 1
  • If X is 400 and all we know was the fitted line
    then we would expect the yield to be around 65.

19
III. Fitted Line (cont.)
  • 1. Example
  • 2. What line best approximates the relation
    between these observations?
  • 3. Predicted Values (cont.)
  • b) Example 2
  • Many times we have a lot of data and fitting the
    line becomes rather difficult.

20
III. Fitted Line (cont.)
  • 1. Example
  • 2. What line best approximates the relation
    between these observations?
  • 3. Predicted Values (cont.)
  • b) Example 2
  • For example, if our plotted data looked like this

21
IV. OLS Ordinary Least Squares
  • We want a methodology that allows us to be able
    to draw a line that best fits the data.
  • A. The Least Square Criteria
  • What we want to do is to fit a line whose
    equation is of the form
  • This is just the algebraic representation of a
    line.

22
IV. OLS Ordinary Least Squares (cont.)
  • A. The Least Square Criteria (cont.)
  • 1. Intercept
  • a represents the intercept of the line. That is,
    the point at which the line crosses the Y axis.
  • 2. Slope of the line
  • b represents the slope of the line.

23
IV. OLS Ordinary Least Squares (cont.)
  • A. The Least Square Criteria (cont.)
  • 1. Intercept
  • 2. Slope of the line
  • Remember the slope is just the change in Y
    divided by the change in X. Rise/Run
  • 3. Minimizing the Sum or Squares
  • a) Problem
  • How do we select a and b so that we minimize the
    pattern of vertical Y deviations (predicted
    errors)?
  • We what to minimize the deviation

24
IV. OLS Ordinary Least Squares (cont.)
  • A. The Least Square Criteria (cont.)
  • 1. Intercept
  • 2. Slope of the line
  • 3. Minimizing the Sum or Squares
  • b) There are several ways in which we can do
    this.
  • 1. First, we could minimize the sum of d.
  • We could find the line that will give us the
    lowest sum of all the d's.
  • The problem of course is that some d's would be
    positive and others would be negative and when we
    add them all up they would end up canceling each
    other.
  • In effect, we would be picking a line so that the
    d's add up to zero.

25
IV. OLS Ordinary Least Squares (cont.)
  • A. The Least Square Criteria (cont.)
  • 1. Intercept
  • 2. Slope of the line
  • 3. Minimizing the Sum or Squares
  • b) There are several ways in which we can do
    this.
  • 2. Absolute Values
  • 3. Sum of Squared Deviations

26
IV. OLS Ordinary Least Squares (cont.)
  • A. The Least Square Criteria
  • B. OLS Formulas
  • 1. Fitted Line
  • The line that we what to fit to the data is
  • This is simply what we call the OLS line.
  • Remember we are concerned with how to calculate
    the slope of the line b and the intercept of the
    line

27
IV. OLS Ordinary Least Squares (cont.)
  • A. The Least Square Criteria
  • B. OLS Formulas
  • 1. Fitted Line
  • 2. OLS Slope
  • The OLS slope can becalculated from the formula

28
IV. OLS Ordinary Least Squares (cont.)
  • A. The Least Square Criteria
  • B. OLS Formulas
  • 1. Fitted Line
  • 2. OLS Slope
  • In the book they use the abbreviations

29
IV. OLS Ordinary Least Squares (cont.)
  • A. The Least Square Criteria
  • B. OLS Formulas
  • 1. Fitted Line
  • 2. OLS Slope
  • 3. Intercept
  • Now that we have the slope b it is easy to
    calculate a
  • Note when b0 then the intercept is just the
    mean of the dependent variable.

30
IV. OLS Ordinary Least Squares (cont.)
  • A. The Least Square Criteria
  • B. OLS Formulas
  • C. Example 1 Fertilizer and Yield

31
IV. OLS Ordinary Least Squares (cont.)
  • A. The Least Square Criteria
  • B. OLS Formulas
  • C. Example 1 Fertilizer and Yield
  • So to calculate the slope we solve
  • We can then use the slope b to calculate the
    intercept

32
IV. OLS Ordinary Least Squares (cont.)
  • A. The Least Square Criteria
  • B. OLS Formulas
  • C. Example 1 Fertilizer and Yield
  • Remember
  • Plugging these estimated values into our fitted
    line equation, we get

33
IV. OLS Ordinary Least Squares (cont.)
  • A. The Least Square Criteria
  • B. OLS Formulas
  • C. Example 1 Fertilizer and Yield
  • What is the predicted bushels produced with 400
    lbs of fertilizer?
  • What if we add 700 lbs of fertilizer what would
    be the expected yield?

34
IV. OLS Ordinary Least Squares (cont.)
  • A. The Least Square Criteria
  • B. OLS Formulas
  • C. Example 1 Fertilizer and Yield
  • D. Interpretation of b and a
  • 1. Slope b
  • Change in Y that accompanies a unit change X.
  • The slope tells us that when there is a one unit
    change in the independent variable what is the
    predicted effect on the dependent variable?

35
IV. OLS Ordinary Least Squares (cont.)
  • A. The Least Square Criteria
  • B. OLS Formulas
  • C. Example 1 Fertilizer and Yield
  • D. Interpretation of b and a
  • 1. Slope b
  • The slope then tells us two things
  • i) The directional effect of the independent
    variable on the dependent variable.
  • There was a positive relation between fertilizer
    and yield.

36
IV. OLS Ordinary Least Squares (cont.)
  • A. The Least Square Criteria
  • B. OLS Formulas
  • C. Example 1 Fertilizer and Yield
  • D. Interpretation of b and a
  • 1. Slope b
  • The slope then tells us two things
  • ii) It also tells you the magnitude of the effect
    on the dependent variable.
  • For each additional pound of fertilizer we expect
    an increased yield of .059 bushels.

37
IV. OLS Ordinary Least Squares (cont.)
  • A. The Least Square Criteria
  • B. OLS Formulas
  • C. Example 1 Fertilizer and Yield
  • D. Interpretation of b and a
  • 2. The Intercept
  • The intercept tells us what we would expect if
    there is no fertilizer added, we expect a yield
    of 36.4 bushels.
  • So independent of the fertilizer you can expect
    36.4 bushels.
  • Alternatively, if fertilizer has no effect on
    yield, we would simply expect 36.4 bushels. The
    yield we expected with no fertilizer.

38
IV. OLS Ordinary Least Squares (cont.)
  • A. The Least Square Criteria
  • B. OLS Formulas
  • C. Example 1 Fertilizer and Yield
  • D. Interpretation of b and a
  • E. Example II Radio Active Exposure
  • 1. Casual Model
  • We want to know if exposure to radio active
    waste is linked to cancer?
  • Radio Active Waste --------------gt Cancer

39
IV. OLS Ordinary Least Squares (cont.)
  • A. The Least Square Criteria
  • B. OLS Formulas
  • C. Example 1 Fertilizer and Yield
  • D. Interpretation of b and a
  • E. Example II Radio Active Exposure
  • 2. Data

40
IV. OLS Ordinary Least Squares (cont.)
  • A. The Least Square Criteria
  • B. OLS Formulas
  • C. Example 1 Fertilizer and Yield
  • D. Interpretation of b and a
  • E. Example II Radio Active Exposure
  • 3. Graph

41
IV. OLS Ordinary Least Squares (cont.)
  • A. The Least Square Criteria
  • B. OLS Formulas
  • C. Example 1 Fertilizer and Yield
  • D. Interpretation of b and a
  • E. Example II Radio Active Exposure
  • 4. Calculate the regression line for predicting Y
    from X
  • i) Slope
  • How do we interpret the slope coefficient?
  • For each unit of radioactive exposure, the cancer
    mortality rate rises by 9.03 deaths per 10,000
    individuals.

42
IV. OLS Ordinary Least Squares (cont.)
  • A. The Least Square Criteria
  • B. OLS Formulas
  • C. Example 1 Fertilizer and Yield
  • D. Interpretation of b and a
  • E. Example II Radio Active Exposure
  • ii) Calculate the intercept
  • Plugging these estimated values into our fitted
    line equation, we get

43
IV. OLS Ordinary Least Squares (cont.)
  • A. The Least Square Criteria
  • B. OLS Formulas
  • C. Example 1 Fertilizer and Yield
  • D. Interpretation of b and a
  • E. Example II Radio Active Exposure
  • 5. Predictions
  • Let's calculate the mortality rate if X were 5.0.
  • How about if X were 0?

44
IV. OLS Ordinary Least Squares (cont.)
  • A. The Least Square Criteria
  • B. OLS Formulas
  • C. Example 1 Fertilizer and Yield
  • D. Interpretation of b and a
  • E. Example II Radio Active Exposure
  • How can we interpret this result?
  • Even with no radioactive exposure, the
    mortality rate would be 118.5.

45
III. Advantages of OLS
  • A. Easy
  • 1. The least square method gives relative easy or
    at least computable formulas for calculating a
    and b.

46
III. Advantages of OLS (cont.)
  • A. Easy
  • B. OLS is similar to many concepts we have
    already used.
  • 1. We are minimizing the sum of the squared
    deviations. In effect, this is very similar to
    how we find the variance.
  • 2. Also, we saw above that when b0,
  • The interpretation of this is that the best
    prediction we can make of Y is just the sample
    mean .
  • This is the case when the two variables are
    independent.

47
III. Advantages of OLS (cont.)
  • A. Easy
  • B. OLS is similar to many concepts we have
    already used.
  • C. Extension of the Sample Mean
  • Since OLS is just an extension of the sample
    mean, it has many of the same properties like
    efficient and unbiased.
  • D. Weighted Least Squares
  • We might want to weigh some observations more
    heavily than others.

48
V. Homework Example
  • In the homework assignment, you are asked to
    select two interval/ratio level variables and
    calculate the fitted line that minimizes the sum
    of the squared deviations (the regression line).
  • A. Choose 2 Variables
  • What effect does the number of years of education
    have on the frequency that one reads the
    newspaper?
  • The independent variable is Education
  • And the dependent variable is Newspaper reading.

49
V. Homework Example(cont.)
  • A. Choose 2 Variables
  • B. Coding the Variables
  • First, I made a new variable called PAPER.
  • Recode all the missing data values to a single
    value.
  • Remove missing values from the data set.
  • Then do the same for education

50
V. Homework Example(cont.)
  • A. Choose 2 Variables
  • B. Coding the Variables
  • C. Getting the number of valid observations
  • Next, see how many valid observations are left by
    using the Summarize command under the Data
    menu.

51
V. Homework Example(cont.)
  • A. Choose 2 Variables
  • B. Coding the Variables
  • C. Getting the number of valid observations
  • D. Sampling five observations
  • 1. So we randomly sample 5 from 1019.
  • 2. As before, use the Select command under the
    Data menu to get 5 random observations.
  • 3. Then go to the Statistics menu and use the
    Summarize gt List command to get the entries
    for the variables of interest.

52
V. Homework Example(cont.)
  • A. Choose 2 Variables
  • B. Coding the Variables
  • C. Getting the number of valid observations
  • D. Sampling five observations
  • E. Calculate the OLS Line
  • Finally, you will have to compute the fitted line
    for these data.

53
V. Homework Example(cont.)
  • A. Choose 2 Variables
  • B. Coding the Variables
  • C. Getting the number of valid observations
  • D. Sampling five observations
  • E. Calculate the OLS Line
  • 1. Calculate b
  • 2 . Calculate the intercept
  • 3 . Calculate the OLS line

54
V. Homework Example(cont.)
  • A. Choose 2 Variables
  • B. Coding the Variables
  • C. Getting the number of valid observations
  • D. Sampling five observations
  • E. Calculate the OLS Line
  • 4. Plot

55
V. Homework Example(cont.)
  • A. Choose 2 Variables
  • B. Coding the Variables
  • C. Getting the number of valid observations
  • D. Sampling five observations
  • E. Calculate the OLS Line
  • 5. Interpretation
  • A person with no education would read 3.3
    newspapers a day.

56
V. Homework Example(cont.)
  • A. Choose 2 Variables
  • B. Coding the Variables
  • C. Getting the number of valid observations
  • D. Sampling five observations
  • E. Calculate the OLS Line
  • 5. Interpretation (cont.)
  • Our results further tell us that each additional
    year of education reduces the number of
    newspapers a person reads by 0.14.
  • So for every year of education you read 14 less.

57
V. Homework Example(cont.)
  • A. Choose 2 Variables
  • B. Coding the Variables
  • C. Getting the number of valid observations
  • D. Sampling five observations
  • E. Calculate the OLS Line
  • 5. Interpretation (cont.)
  • This example suggests some of the problems with
    drawing inferences about the underlying
    population from small samples.
Write a Comment
User Comments (0)
About PowerShow.com