Lecture 8 Relationships between Scale variables: Regression Analysis - PowerPoint PPT Presentation

About This Presentation
Title:

Lecture 8 Relationships between Scale variables: Regression Analysis

Description:

1. Linear & Non-linear relationships between variables. Often of greatest interest in social science is ... is worker alienation related to job monotony? ... – PowerPoint PPT presentation

Number of Views:141
Avg rating:3.0/5.0
Slides: 43
Provided by: gwilym
Category:

less

Transcript and Presenter's Notes

Title: Lecture 8 Relationships between Scale variables: Regression Analysis


1
Lecture 8Relationships between Scale variables
Regression Analysis
  • Graduate School
  • Quantitative Research Methods
  • Gwilym Pryce
  • g.pryce_at_socsci.gla.ac.uk

2
Notices
  • Register

3
Plan
  • 1. Linear Non-linear Relationships
  • 2. Fitting a line using OLS
  • 3. Inference in Regression
  • 4. Ommitted Variables R2
  • 5. Types of Regression Analysis
  • 6. Properties of OLS Estimates
  • 7. Assumptions of OLS
  • 8. Doing Regression in SPSS

4
1. Linear Non-linear relationships between
variables
  • Often of greatest interest in social science is
    investigation into relationships between
    variables
  • is social class related to political perspective?
  • is income related to education?
  • is worker alienation related to job monotony?
  • We are also interested in the direction of
    causation, but this is more difficult to prove
    empirically
  • our empirical models are usually structured
    assuming a particular theory of causation

5
Relationships between scale variables
  • The most straight forward way to investigate
    evidence for relationship is to look at scatter
    plots
  • traditional to
  • put the dependent variable (I.e. the effect) on
    the vertical axis
  • or y axis
  • put the explanatory variable (I.e. the cause)
    on the horizontal axis
  • or x axis

6
Scatter plot of IQ and Income
7
We would like to find the line of best fit
8
What does the output mean?


9
Sometimes the relationship appears non-linear
10
and so a straight line of best fit is not
always very satisfactory
11
Could try a quadratic line of best fit
12
But we can simulate a non-linear relationship by
first transforming one of the variables
13
(No Transcript)
14
or a cubic line of best fit(overfitted?)
15
Or could try two linear linesstructural break
16
2. Fitting a line using OLS
  • The most popular algorithm for drawing the line
    of best fit is one that minimises the sum of
    squared deviations from the line to each
    observation

Where yi observed value of y predicted
value of yi the value on the line of
best fit corresponding to xi
17
Regression estimates of a, bor Ordinary Least
Squares (OLS)
  • This criterion yields estimates of the slope b
    and y-intercept a of the straight line

18
3. Inference in Regression Hypothesis tests on
the slope coefficient
  • Regressions are usually run on samples, so what
    can we say about the population relationship
    between x and y?
  • Repeated samples would yield a range of values
    for estimates of b N(b, sb)
  • I.e. b is normally distributed with mean b
    population mean value of b if regression run on
    population
  • If there is no relationship in the population
    between x and y, then b 0, this is our H0

19
What does the standard error mean?
20
Hypothesis test on b
  • (1) H0 b 0
  • (I.e. slope coefficient, if regression run on
    population, would 0)
  • H1 b ? 0
  • (2) a 0.05 or 0.01 etc.
  • (3) Reject H0 iff P lt a
  • (N.B. Rule of thumb P lt 0.05 if tc ? 2, and P lt
    0.01 if tc ? 2.6)
  • (4) Calculate P and conclude.

21
Example using SPSS output
  • (1) H0 no relationship between house price and
    floor area.
  • H1 there is a relationship
  • (2), (3), (4)
  • P 1- CDF.T(24.469,554) 0.000000
  • Reject H0

22
4. Ommitted Variables R2 Q/ is floor area the
only factor?How much of the variation in Price
does it explain?
23
R-square
  • R-square tells you how much of the variation in y
    is explained by the explanatory variable x
  • 0 lt R2 lt 1 (NB you want R2 to be near
    1).
  • If more than one explanatory variable, use
    Adjusted R2

24
Example 2 explanatory variables
25
Scatter plot (with floor spikes)
26
3D Surface PlotsConstruction, Price
UnemploymentQ -246 27P - 0.2P2 - 73U 3U2
27
Construction Equation in a SlumpQ 315 4P -
73U 5U2
28
5. Types of regression analysis
  • Univariate regression one explanatory variable
  • what weve looked at so far in the above
    equations
  • Multivariate regression gt1 explanatory variable
  • more than one equation on the RHS
  • Log-linear regression log-log regression
  • taking logs of variables can deal with certain
    types of non-linearities useful properties
    (e.g. elasticities)
  • Categorical dependent variable regression
  • dependent variable is dichotomous -- observation
    has an attribute or not
  • e.g. MPPI take-up, unemployed or not etc.

29
6. Properties of OLS estimators
  • OLS estimates of the slope and intercept
    parameters have been shown to be BLUE (provided
    certain assumptions are met)
  • Best
  • Linear
  • Unbiased
  • Estimator

30
  • Best in that they have the minimum variance
    compared with other estimators (i.e. given
    repeated samples, the OLS estimates for a and ß
    vary less between samples than any other sample
    estimates for a and ß).
  • Linear in that a straight line relationship is
    assumed.
  • Unbiased because, in repeated samples, the mean
    of all the estimates achieved will tend towards
    the population values for a and ß.
  • Estimates in that the true values of a and ß
    cannot be known, and so we are using statistical
    techniques to arrive at the best possible
    assessment of their values, given the information
    available.

31
7. Assumptions of OLS
  • For estimation of a and b to be BLUE and for
    regression inference to be correct
  • 1. Equation is correctly specified
  • Linear in parameters (can still transform
    variables)
  • Contains all relevant variables
  • Contains no irrelevant variables
  • Contains no variables with measurement errors
  • 2. Error Term has zero mean
  • 3. Error Term has constant variance

32
  • 4. Error Term is not autocorrelated
  • I.e. correlated with error term from previous
    time periods
  • 5. Explanatory variables are fixed
  • observe normal distribution of y for repeated
    fixed values of x
  • 6. No linear relationship between RHS
  • variables
  • I.e. no multicolinearity

33
8. Doing Regression analysis in SPSS
  • To run regression analysis in SPSS, click on
    Analyse, Regression, Linear

34
Select your dependent (i.e. explained) variable
and independent (i.e. explanatory) variables
35
e.g. Floor area and bathroomsFloor area a b
Number of bathrooms e
36
(No Transcript)
37
Confidence Intervals for regression coefficients
  • Population slope coefficient CI
  • Rule of thumb

38
e.g. regression of floor area on number of
bathrooms, CI on slope
  • b 64.6 ? 2 ? 3.8
  • 64.6 ? 7.6
  • 95 CI ( 57, 72)

39
Confidence Intervals in SPSSAnalyse,
Regression, Linear, click on Statistics and
select Confidence intervals
40
Our rule of thumb said 95 CI for slope ( 57,
72). How does this compare?
41
Past Paper (C2)Relationships (30)
  • Suppose you have a theory that suggests that time
    watching TV is determined by gregariousness
  • the less gregarious, the more time spent watching
    TV
  • Use a random sample of 60 observations from the
    TV watching data to run a statistical test for
    this relationship that also controls for the
    effects of age and gender.
  • Carefully interpret the output from this model
    and discuss the statistical robustness of the
    results.

42
Reading
  • Regression Analysis
  • Field, A. chapters on regression.
  • Moore and McCabe Chapters on regression.
  • Kennedy, P. A Guide to Econometrics
  • Bryman, Alan, and Cramer, Duncan (1999)
    Quantitative Data Analysis with SPSS for
    Windows A Guide for Social Scientists, Chapters
    9 and 10.
  • Achen, Christopher H. Interpreting and Using
    Regression (London Sage, 1982).
Write a Comment
User Comments (0)
About PowerShow.com