VARIABLES - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

VARIABLES

Description:

VARIABLES Topic #3 * * * * * * * * * * * * * * * * * * * Ratio Variables A ratio variable (or a variable measured at the ratio level) is an interval variable (that ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 43
Provided by: userpage5
Category:
Tags: variables

less

Transcript and Presenter's Notes

Title: VARIABLES


1
VARIABLES
  • Topic 3

2
Variables and the Unit of Analysis
  • Variables are characteristics of the things
    that we are studying.
  • These things are commonly called cases or
    units.
  • A case study focuses on a single thing.
  • The kind of thing that is being studied is
    called the unit of analysis.
  • Individuals constitute the unit of analysis for
    much empirical social science research (and
    almost all survey research in political science).
  • A particular research project focuses on a
    particular set or population of cases
    (individuals or other units),
  • often by studying a sample of cases drawn from
    the population.

3
American National Election Studies
  • ANES focuses on individuals as the units of
    analysis in the American voting age population
    (VAP).
  • ANES variables pertain to these individuals
  • ANES variables include
  • gender, race, education, and other demographic
    variables
  • party identification, voting intention, President
    approval, ideology, abortion opinion, political
    trust, and other attitudinal variables
  • whether registered/voted, candidate vote for,
    whether contributed campaign , and other
    behavioral variables
  • These are all variable properties of individuals,
  • not households, elections, nations, etc.

4
Other Populations of Individuals
  • Population All Members of Congress
  • additional variables pertaining to this
    specialized population of individuals include
  • number of terms served, campaign expenditure in
    last election, last re-election margin, party
    affiliation, committee assignments, roll-call
    vote on specified bill, ADA (etc.) rating,
    NOMINATE score, etc.
  • Annual Survey of Social Security and Medicare
    Beneficiaries
  • British etc. Election Studies

5
Other Units of Analysis in Political Research
  • Presidential elections variables include
  • winning party, winners vote popular vote , Dem.
    candidates popular vote , winners electoral
    vote margin, turnout , whether the incumbent was
    running for re-election, total campaign
    expenditures, etc.
  • States in a given Presidential election
    variables include
  • number of electoral votes, winning
    party/candidate, winners vote Rep.
    candidates vote , turnout , etc.
  • States in all historical Presidential elections
    variables include
  • all of above for each election year
  • Nations variables include
  • population, GNP, per capita income, literacy
    rate, military spending as of GNP, size of
    army, type of party system, etc.
  • States, counties, other jurisdictions, precincts,
    legislatures, political parties, etc.

6
Households
  • Households are often the unit of analysis in
    economic and sociological research
  • Variables include
  • size ( of persons)
  • type (single-parent, no children, unrelated,
    etc.)
  • type of housing unit
  • household income
  • etc.
  • Current Population Survey (CPS)
  • Panel Study of Income Dynamics (PSID)
  • Rotating panel surveys of households

7
Variables vs. Values
  • Variables that pertain to a given unit of
    analysis take on different values from case to
    case cross-sectional analysis.
  • Gender individuals male, female
  • Education individuals primary school only,
    years completed, etc.
  • Income individuals or households dollar amount
    (or dollar range), quintile, etc.
  • Type of dwelling households detached,
    townhouse, apartment, etc.
  • Literacy rate nations numerical
  • Turnout elections numerical
  • Variables can also vary over time in the same
    case longitudinal analysis,
  • e.g., state democratic candidate vote over time.

8
Variables are the building blocks of empirical
political science research
  • Researchers have to figure out how to measure the
    variables they are interested in by designing
  • appropriate survey questions
  • or other kinds of measures
  • Researchers next need to actually collect the
    data, e.g., by carrying out
  • the survey they have designed
  • or other data collecting operations.
  • With the data at hand, researchers then ask such
    questions as the following
  • What is the average or typical value of a
    variable in a set of cases?
  • For example, what is typical level of interest
    among voters, or the average rate of turnout in
    recent elections?

9
Questions (cont.)
  • How are the values of a variable distributed in a
    set of data, i.e., do most of the same cases have
    about the same value (low dispersion) or do
    different cases have very different values (high
    dispersion). For example
  • Do all voters have about the same level of
    interest or are some very interested while others
    not interested at all?
  • Do all elections have about the same level of
    turnout, or do some have very high turnout while
    others have very low turnout?
  • Distribution of income or wealth.
  • How are two variables related or associated in a
    set of data? For example
  • Is the level of interest among voters related to
    their level of education?
  • Does the level of turnout in elections depend on
    how close elections are expected to be?
  • Does one variable have a (direct) causal impact
    on another variable? For example
  • Does higher education cause people to become more
    interested in politics?
  • Does the prospect of a close election cause more
    voters to turn out and vote?
  • Does one variable have an (indirect) causal
    impact on another variable? For example
  • Does the prospect of a close election cause
    greater activity by campaign organizations that
    in turn causes more voters to turn out and vote?

10
Variables and Their Values
  • To repeat, variables vary they take on
    different values from case to case or from time
    to time
  • Thus, associated with every variable is a list or
    range of possible values. For example
  • PARTY IDENTIFICATION (pertaining to individuals)
    in the U.S has values REPUBLICAN, DEMOCRAT,
    INDEPENDENT (or perhaps refinements like STRONG
    REPUBLICAN, WEAK DEMOCRAT, etc., and/or other
    values like MINOR PARTY).
  • VOTED IN 2008 ELECTION? is another variable
    pertaining to individuals, with just two possible
    values, YES and NO.
  • HEIGHT is a physical variable pertaining to
    individuals with values that are real numbers
    (expressed in units such as inches, centimeters,
    or feet).
  • SIZE ( of persons) is a variable pertaining to
    households with values that are whole numbers gt 1
    (values are counts)
  • LEVEL OF TURNOUT is a variable pertaining to
    elections (or to different jurisdictions in a
    given election), with values ranging potentially
    from 0 to 100.

11
Naming Variables
  • As a reminder that any variable must have a range
    of two or more possible values, it is useful to
    give variables names like
  • LEVEL OF EDUCATION
  • WHETHER OR NOT VOTED IN 2000 ELECTION
  • SIZE OF POPULATION
  • TYPE OF POLITICAL REGIME
  • LEVEL OF VOTING TURNOUT
  • DIRECTION OF IDEOLOGY
  • ETC.
  • In quantitative research, variable names are
    often written in capital letters (as above).

12
Observations/Observed Values
  • The actual value of a variable in a particular
    case is called an observation (or observed
    value). For example,
  • we "observe by asking the appropriate
    question(s) in a survey that Joe Smith (the
    case) has the PARTY IDENTIFICATION (the variable)
    WEAK DEMOCRAT (the observed value), and likewise
  • we observe by consulting the appropriate
    records that the 2008 Presidential election (the
    case) has a LEVEL OF TURNOUT (the variable) of
    61 (the observed value).

13
Identifying Variables (PS3A)
  • Each of the following statements makes an
    empirical assertion (which may or may not be
    true) each refers (at least implicitly) to two
    variables (and asserts that there is some kind of
    relationship between them). For each statement
  • (a) indicate to what unit of analysis
    (individuals, nations, elections, etc.) and, as
    appropriate, what particular population the
    variables pertain
  • (b) identify the two variables, with appropriate
    names (probably TYPE OF _____, LEVEL OF _____,
    DEGREE OF _____, AMOUNT OF _____, WHETHER OR NOT
    _____) and
  • (c) indicate a range of possible values for each
    variable (often, but certainly not always, LOW
    and HIGH will do).
  • (Note both variables in each sentence pertain
    to the same units.)
  • 1. Junior members of Congress are less
    pragmatic than their senior colleagues.
  • 2. Education tends to undermine religious
    faith.
  • 3. Capital punishment deters murder.
  • 8. When times are bad, incumbent candidates
    are punished in elections. gt
  • 11. If you want to get ahead, stay in school.

14
CLASS LIST (Data Spreadsheet)
  • Case ID
    Variable 1 Var2 Var3 Var4

  • Grad.
  • Name SSN Class Major GPA Cand?
  • Jones, R. 215-14-6609 Senior POLI 3.12 No
  • Kim, S. 144-56-9231 Sophomore PYSC 2.78 No
  • Smith. H. 502-45-2323 Junior POLI 2.75 No
  • Williams, R. 212-16-7834 Senior HIST 3.28 Yes
  • Etc.
  • What distinctions between different types of
    variables can we make?

15
Types of Variables
  • Our concern here is with drawing distinctions
    among variables with respect to their logical
    properties, not their substantive nature (e.g.,
    demographic, attitudinal, etc.)
  • Every variable has at least two possible values
    (otherwise it could not vary).
  • A variable is dichotomous (also called a dummy
    variable) if it has exactly two possible values
    (typically yes and no), e.g.,
  • GRADUATION CANDIDATE? Students (Yes/No)
  • WHETHER VOTED IN 2000 ELECTION Inds. (Yes/No)
  • GENDER Inds. (M/F)
  • However, most variables have three or more
    possible values.
  • Some variables have an infinite number of
    possible values.

16
Qualitative Variables
  • A variable is qualitative if its values are given
    by words
  • MAJOR Students POLI, HIST, BIOL, etc.
  • TYPE OF REGIME nations Free, Partly Free,
    Unfree
  • ABORTION OPINION Inds. Never permit, etc.
  • In a data spreadsheet e.g., SPSS, these verbal
    values are typically recorded in terms of
    numerical codes, because this
  • saves space, and
  • facilitates machine processing.
  • Moreover, survey data from closed-form questions
    is often pre-coded (e.g., the Student Survey).

17
  • In a spreadsheet
  • Rows are cases
  • Columns are variables
  • Cell are values (varying from case to case)
  • Values (except V01 YEAR OF SURVEY) in the Student
    Survey and SETUPS are numerically coded.

18
Quantitative Variables
  • A variable is quantitative if its (true, not
    coded) values are given by numbers
  • GPA Students 3.12, 2.78, etc.
  • LITERACY RATE Nations 98, 55, etc.
  • HEIGHT Inds. 72", 62", etc.
  • SIZE Households 1 person, 2 persons, etc.
  • LEVEL OF TURNOUT Elections or jurisdictions
    51, etc.
  • The magnitude of these numbers may depend on the
    units of measurement used (e.g., is HEIGHT given
    in inches, feet, centimeters, etc.?).
  • In spreadsheet, such values are typically
    recorded in terms of their actual numerical
    values.
  • The SETUPS data contains data pertaining to
    variables that, while truly quantitative in
    nature, are recoded in broad categories, e.g.,
  • AGE (V60) 18-24, 25-34, etc. or
  • INCOME (V65A) 0-16th percentile, 17-33rd
    percentile, etc.

19
Truly Quantitative Data Need Not be Coded
20
Variables and the Unit of Analysis
  • Substantively related variables may be of
    different types depending on the unit of analysis
    to which they pertain.
  • TURNOUT pertaining to individuals is a
    dichotomous variable with values yes voted
    and no did not vote.
  • LEVEL OF TURNOUT pertaining to elections (or
    jurisdictions, precincts, etc.) is a quantitative
    variable with possible values ranging from 0 to
    100.

21
Types of Variables / Levels of Measurement
  • It is useful to refine both qualitative and
    quantitative variables further by distinguishing
    among four
  • different types of variables, or (equivalently)
  • different levels of measurement of pertaining to
    variables.
  • Note these distinctions are relevant only as
    they pertain to non-dichotomous variables.
  • Please take note of this with respect to PS 3B,
    Question 2.

22
Nominal Variables
  • A nominal variable (or a variable measured at the
    nominal level) has values that are unordered
    categories.
  • Accordingly, nominal variable are qualitative in
    nature.
  • Given two cases and a nominal variable, we can
    observe
  • that they have the same value or they have
    different values, but (if they have different
    values)
  • we cannot say that one has the higher/bigger
    value and the other the lower/smaller, etc.

23
Nominal Variables (cont.)
  • A nominal variable typically has a name like
  • NAME OF ____
  • TYPE OF ____
  • NATURE OF ____
  • KIND OF ____
  • Examples
  • (NAME OF) MAJOR Political Science, Economics,
    History, etc.
  • (TYPE OF) RELIGIOUS AFFILIATION Protestant,
    Catholic, Jewish, etc.
  • PREFERENCE FOR REPUBLICAN NOMINATION Giuliani,
    McCain, Romney, etc.
  • In a data spreadsheet, numerical codes must be
    assigned to values of nominal variables in an
    essentially arbitrary manner,
  • so it is certainly illegitimate to do arithmetic
    on the numerical code values.
  • Typically the numerical codes are consecutive
    whole numbers.

24
Ordinal Variables
  • An ordinal variable (or a variable measured at
    the ordinal level) has values that fall into some
    kind of natural ordering,
  • often (but not always) running from (in some
    sense) LOW to HIGH.
  • Therefore, cases can be ranked or ordered with
    respect to their values on an ordinal variable.
  • An ordinal variable is also qualitative in
    nature.
  • Given two cases and a ordinal variable, we can
    observe
  • that they have the same value or they have
    different values, and also (if they have
    different values)
  • that one has the higher/bigger value and the
    other lower/smaller, etc., but
  • we cannot say how much higher/bigger or
    lower/smaller.
  • Given three cases with different values on an
    ordinal variable,
  • we can identify the case with the observed value
    between the other two
  • but we cannot say which of the other value it is
    closer to.

25
Ordinal Variables (cont.)
  • An ordinal variable typically has a name like
  • DIRECTION OF ___
  • EXTENT OF ____
  • LEVEL OF ____
  • DEGREE of ____
  • Examples
  • TYPE OF REGIME/DEGREE OF FREEDOM nations Free,
    Partly Free, Unfree
  • (LEVEL OF) INTEREST IN THE ELECTION CAMPAIGN
    individuals from low to high
  • (DIRECTION OF) IDEOLOGY individuals from most
    liberal to most conservative
  • (DEGREE OF) PRESIDENTIAL APPROVAL individuals
    from strongly approve to strongly disapprove
  • DIRECTION OF ABORTION OPINION individuals
    Never permit, . . . , Always permit
  • (LEVEL OF) CLASS STANDING students freshman,
    sophomore, junior, senior
  • When data is recorded in coded form, numerical
    codes should be assigned to values in a manner
    consistent with the natural ordering of the
    values.

26
Ordinal Variables (cont.)
  • If the natural ordering is from LOW to HIGH, the
    codes should likewise run from lower to higher
    numbers.
  • If the natural ordering is not from LOW to HIGH,
    e.g., DIRECTION OF IDEOLOGY,
  • the two extreme values (or poles), e.g., MOST
    LIBERAL and MOST CONSERVATIVE, should be assigned
    the minimum and maximum code values, but
  • which gets which is arbitrary ,
  • and intermediate values, e.g., MODERATE, should
    be assigned intermediate codes).
  • In any event, values are typically assigned
    numerical codes that are consecutive integers,
  • but this is not a logical necessity (because only
    their order matters).
  • It remains illegitimate to do arithmetic on the
    numerical code values
  • unless we are willing to attribute interval
    status to the code values.

27
Ordinal Variables (cont.)
  • Note that DIRECTION OF IDEOLOGY could be renamed
    DEGREE OF LIBERALISM,
  • which does range from LOW (i.e., least liberal
    or most conservative) to HIGH (most liberal
    or least conservative).
  • We could also reverse the polarity of the
    renamed variable and call it DEGREE OF
    CONSERVATISM,
  • ranging from LOW (i.e., least conservative or
    most liberal) to HIGH (most conservative or
    least liberal).

28
Ordinal Variables (cont.)
  • Opinion variables with closed-form values running
    from (STRONGLY) AGREE (or APPROVE) to (STRONGLY)
    DISAGREE (or DISAPPROVE) are ordinal in nature.
  • The value INDEPENDENT is usually deemed to fall
    between DEMOCRAT and REPUBLICAN, so PARTY
    IDENTIFICATION is usually deemed to be ordinal in
    nature.
  • But this works only if we treat cases with minor
    party or DK values as missing data (since these
    values dont fall in the natural ordering).
  • An SPSS spreadsheet normally displays a numerical
    code (rather than a blank) for missing data
    (unobserved values), which must be understood
    as not part of the natural ordering.
  • In the SETUPS and Student Survey data, missing
    data coded as (9).
  • SPSS must be told the missing data code(s) for
    each variable, so that it can set cases so coded
    aside when it processes data.

29
(No Transcript)
30
Interval Scale Variables
  • An interval variable (or variable measured at the
    interval level) has values that are real numbers
    that can appropriately be added together,
    subtracted one from another, and averaged.
  • SPSS refers to scale variables
  • An interval variable is quantitative in nature.
  • Given two cases and an interval variable, we can
    say they have the same value or they have
    different values, and also (if they have
    different values)
  • that one has the higher value and the other
    lower, etc., and also
  • how much higher or lower one value is than the
    other, because
  • we can subtract one value from another,
  • i.e., we can determine the magnitude of the
    interval separating them and thus say how far
    apart the cases are with respect to the
    variable.
  • Given three case with different values on an
    interval variable, we can identify the case with
    the observed value between the other two and we
    can also determine which of the to other cases it
    is closer to.
  • But we cannot say how many times greater one
    value is than another.

31
Interval Variables (cont.)
  • An interval variable typically has a name like
  • LEVEL OF ____
  • DEGREE OF ____ 
  • NUMBER OF ____
  • AMOUNT OF ____
  • In a spreadsheet, actual numerical values (rather
    than numerical codes) are normally entered into a
    data array (e.g., Presidential election data).
  • But sometimes (numerically coded) class intervals
    are used instead (e.g., SETUPS V60 AGE), as
    will be discussed later. See gt
  • Variables like PARTY IDENTIFICATION,IDEOLOGY, and
    ISSUE OPINIONS are often treated as interval
    variables (e.g., my Student Survey/ANES
    longitudinal charts that showed changing average
    levels of Party ID, Ideology, etc., over time).

32
A Truly Interval Variable May Be Recoded into An
Ordinal One
33
Ordinal vs. Interval Variables
  • Example Baseball Standings
  • Rank Standing of a team (first place, second
    place, etc.) is ordinal information
  • Winning Percent (or Games Behind Leader) is
    interval information
  • For the league playoffs
  • the determination of division winners is based on
    ordinal information only but
  • the determination of the wild card entry is
    based on interval information (best winning
    percent not otherwise in playoffs)
  • A team that fails to make the playoffs may have a
    higher winning percent that a team that does make
    the playoffs

34
Ratio Variables
  • A ratio variable (or a variable measured at the
    ratio level) is an interval variable (that has
    values that are real numbers that can
    appropriately be added together, subtracted one
    from another, and averaged) but in addition
  • one can appropriately divide one value by another
    (i.e., compute their ratio), and
  • say, for example, that one case has twice the
    observed value of another.
  • This requires that the ratio variable have a
    non-arbitrary zero value,
  • which usually represents in some sense the
    complete absence of the characteristic or
    property to which the variable refers.
  • Even if negative values are possible, the zero
    value is non-arbitrary, e.g.,
  • level of profit (of a business) may have a
    negative value, or
  • rate of economic growth (over years) may have a
    negative value.

35
Ratio Variables (cont.)
  • Examples of interval variables that are not
    ratio
  • LEVEL OF SAT (or IQ) SCORE there is no 0 score
  • DEGREE OF TEMPERATURE (Fahrenheit or Celsius)
    while each has a 0 value,
  • 0F and 0C represent different temperatures, so
  • 0 has no fundamental significance in either
    temperature scale
  • vs. Kelvin Temperature scale with absolute 0K.
  • IDEOLOGY, PARTY IDENTIFICATION and OPINION
    variables
  • may perhaps be treated as interval rather than
    merely ordinal,
  • but they certainly are not ratio.

36
Ratio Variables (cont.)
  • Examples of ratio variables include
  • NUMBER OF CHILDREN or AGE (uncoded) individuals
  • SIZE/NUMBER OF MEMBERS households or
    legislatures
  • SIZE OF POPULATION nations
  • LEVEL OF INCOME individuals or households
  • PER CAPITA INCOME nations
  • LEVEL OF PROFITS firms
  • SIZE OF BUDGET SURPLUS governments or fiscal
    years
  • NUMBER OF VOTES FOR DEM CAND elections, states
  • PERCENT OF VOTES FOR DEM CAND elections, states
  • Even though LEVEL OF PROFITS or SIZE OF BUDGET
    SURPLUS can have negative values, their zero
    points are not arbitrary.
  • However, ratio comparisons can only be made
    between observed values with the same positive
    or negative sign.

37
Freeway Exits and Levels of Measurement
  • The identification of freeway exits has changed
    over the years, progressing from lower to higher
    levels of measurement.
  • Nominal exits were once only given names (e.g.,
    name of crossroad or town),
  • So you could tell only whether the upcoming exit
    is your exit or not.
  • Ordinal Exits then were ordered (e.g., from east
    to west) and consecutively numbered, so you could
    tell
  • whether you have passed your exit or not, and
  • how many exits there are between your exit and
    where you are now.
  • (Otherwise exit numbers are uninformative gt)
  • Interval/Ratio Exits are now usually numbered in
    terms of their distance in miles from the state
    line,
  • so can tell how far you have to go to get to your
    exit
  • (and also that your exit is X times as far from
    the state line as where you are now).

38
Ordinal Information May Not Be Informative
39
But Ordinal Is Better Than Nominal
40
Discrete vs. Continuous Variables
  • Quantitative interval and ratio variables may
    be either discrete or continuous.
  • Qualitative variables are pretty much
    necessarily discrete.
  • A discrete variable has a finite (and typically
    small) number of possible values that usually (if
    the variable is quantitative) correspond to whole
    numbers (or integers) only.
  • NUMBER OF CHILDREN households
  • NUMBER OF MEMBERS councils or legislatures
  • NUMBER OF ELECTORAL VOTES WON BY DEM CANDIDATE
    Presidential elections vs.
  • PERCENT OF POPULAR VOTE WON BY DEM CANDIDATE
    Presidential elections

41
Continuous Variables
  • A continuous variable can have any real number
    (at least within some range) as a value (i.e.,
    including fractional values between the
    integers).
  • So a continuous variable has (at least in
    principle) an infinite number of possible values,
  • so that given two cases with distinct values of
    the continuous variable, it is in principle
    always possible that there is another case with
    an intermediate value of the variable.
  • Discrete vs. Continuous temperature controls
    on a kitchen range.
  • Digital vs. old fashioned thermometer

42
Continuous Variables (cont.)
  • Examples
  • LEVEL OF DAILY HIGH TEMPERATURE places
    (cross-sectional), days (longitudinal)
  • HEIGHT, WEIGHT, and AGE individuals
  • Because we typically round off the value of such
    variables to the nearest degree, inch, pound,
    year, etc., such variables may look discrete.
  • IDEOLOGY might be thought of as a truly
    continuous variable.
  • Some interval variables are in principle discrete
    but are virtually continuous because they have
    so many possible (numerical) values, e.g.,
  • RATE OF TURNOUT elections
  • PERCENT OF VOTE FOR DEMOCRATIC CANDIDATE
    elections
Write a Comment
User Comments (0)
About PowerShow.com