PSCI 51087108 Advanced Data Analysis III - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

PSCI 51087108 Advanced Data Analysis III

Description:

In a two-party system dependent variable is dichotomous use linear probability ... not affect point estimates (i.e., it doesn't affect the estimates of the betas ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 19
Provided by: davidl135
Category:
Tags: iii | psci | advanced | analysis | betas | data

less

Transcript and Presenter's Notes

Title: PSCI 51087108 Advanced Data Analysis III


1
PSCI 5108/7108Advanced Data Analysis III
  • David Leblang
  • leblang_at_colorado.edu

2
Texts
  • Texts from last semester
  • Gujarati
  • Kennedy
  • Long and Freese (2003).
  • Charemza and Deadman (1997).
  • Enders (2004) -- optional

3
What this course is about
  • Models with limited dependent variables
  • Models for time series data
  • Models that pool cross-sections and time-series
  • With limited dependent variables
  • With continuous dependent variables

4
Limited Dependent Variables
  • Individual level behavior
  • What factors determine which candidate an
    individual votes for?
  • In a two-party system dependent variable is
    dichotomous ?use linear probability model, logit
    or probit
  • In a multi-party system, dependent variable is
    polychotomouswith no specific ranking ? can use
    multinomial logit or conditional logit

5
  • Similar questions in comparative politics
  • Which countries are democratic?
  • Or in international politics
  • What factors determine whether a country gets
    involved in a conflict?

6
  • Other questions concern the level/degree of
    intensity of a given response.
  • Do you favor/oppose prayer in schools?
  • Favor strongly
  • Favor not strongly
  • Oppose not strongly
  • Oppose strongly
  • Which opinion best describes your position?
  • By law all students should pray in school
  • By law no students should pray in school.
  • These decisions should be left to the localities
  • Ordered logit or ordered probit

7
  • Other related models we will cover
  • Logit models for rare-events situations when
    the outcome occurs very infrequently (far less
    than 50 of the time).
  • Probit models for simultaneous choices
    bivariate probit models with differing degrees of
    observability.

8
Limited outcome models we will not cover (but can
if there is interest)
  • Situations where the dependent variable is
    bounded
  • What countries receive US foreign aid?
  • Some countries receive zero
  • Other countries receive lots.
  • These are censored regression models (tobit
    models)
  • Situations where the individual appears in the
    sample only if the value of their dependent
    variable is above a certain threshold (truncated
    regression models

9
  • Models were we only observe an outcome (e.g., a
    salary) if an earlier condition is met (e.g., the
    individual is employed).
  • This is non-random selection ? individuals are
    assigned to the outcome/observation sample on the
    basis of some earlier model/decision.

10
Other MLE Models (we will not cover)
  • Models for event counts (poission, negative
    binomial models)
  • Models for event histories (survival models

11
Time Series Models
  • (Taken from the Penn World Tables, v. 6.1
    http//pwt.econ.upenn.edu/)
  • Country Year Pop Prices Cons. Gov't Inv.
  • USA 1950 152594.16 100 95.1243 121.347
    106.393
  • USA 1951 155605.88 100 94.2701 117.868
    106.421
  • USA 1952 158617.61 100 94.2852 118.444
    104.006
  • USA 1953 160625.43 100 94.3136 117.762
    104.493
  • USA 1954 163637.16 100 94.4554 118.980
    104.958
  • ...
  • ...
  • ...
  • USA 2000 275423 100 100.478 128.568
    85.4658
  • Note Population is in 1,000's C,I and G are in
    current U.S. dollars.
  • GDP, Prices, Cons, etc. are each univariate time
    series.
  • The critical feature of time series analysis is
    that observations have some logical order.
  • Ideally, time series observations are
    equally-spaced. (E.g., one year apart, or one
    month apart.)

12
Why we need special tools
  • 1. Autocorrelation is common in time series data
    because the values of omitted
  • causes are related from one period to the next.
  • 2. Autocorrelation does not affect point
    estimates (i.e., it doesn't affect the estimates
    of the betas
  • 3. It does affect the standard errors of the
    estimates.
  • 4. GLS (or weighted least squares) can be used to
    correct the standard errors.
  • Since it is only a re-weighting scheme, on
    average, it doesn't affect the point
  • estimates.
  • CRITICAL PROBLEM (2) above is based on the
    assumption that only
  • the \no autocorrelation" assumption is violated.
    But if the omitted variables
  • that cause the autocorrelation are also
    confounding variables, then
  • E(Xet) 0 is violated also, introducing omitted
    variables bias. Generalized least squares does
    not fix this.

13
Other issues
  • Integrated and co-integrated series
  • Simultaneous equations (VARs)
  • Causality tests

14
Pooled Cross-Section Time-Series Data
  • Country Year Pop Prices Cons. Gov't Inv.
  • CAN 1950 13753.32 81.976 80.034 94.007
    82.890
  • CAN 1951 14149.10 87.562 85.165 97.855
    88.949
  • CAN 1952 14544.88 94.751 92.189 107.266
    94.699
  • ...
  • ...
  • ...
  • ...
  • ...
  • ...
  • ...
  • CAN 2000 30750.00 79.2984 79.2705 109.141
    63.8771
  • USA 1950 152594.16 100 95.1243 121.347
    106.393
  • USA 1951 155605.88 100 94.2701 117.868
    106.421
  • USA 1952 158617.61 100 94.2852 118.444
    104.006
  • USA 1953 160625.43 100 94.3136 117.762
    104.493
  • USA 1954 163637.16 100 94.4554 118.980
    104.958

15
  • When the number of time points is relatively
    small, this sort of data is sometimes referred to
    as panel data.
  • Pooling data in this way introduces a level of
    complexity beyond that of standard time series
    analysis due to the heterogeneity of units.
  • Don't confuse this sort of data with repeated
    cross-sections." The latter in-clude things like
    annual surveys in which different sets of
    respondents are interviewed each year.

16
Fixes
  • Fixed effects
  • Random effects
  • Different fixes depending on whether the panel is
    short and fat or tall and skinny

17
Bigger problems when we combine cross-sectional
time-series data with limited outcome dependent
variables
  • Conditional (fixed effects) logit and random
    effects probit
  • bcsts models comparable to failure time data

18
STATA
  • Version 8
  • Labs
  • Student STATA
Write a Comment
User Comments (0)
About PowerShow.com