Describing Univariate Variables - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Describing Univariate Variables

Description:

Definition: A variable is any characteristic or attribute of an object under ... of analysis that accurately capture the characteristics of the larger population. ... – PowerPoint PPT presentation

Number of Views:92
Avg rating:3.0/5.0
Slides: 27
Provided by: homeUc
Category:

less

Transcript and Presenter's Notes

Title: Describing Univariate Variables


1
Describing Univariate Variables

2
Stages of empirical research
  • - derive a hypothesis from theory
  • - identify epistemic relationships
  • - test relationship with data

3
Agenda
  • Foundations of Hypothesis Testing
  • How do we measure concepts of interest?
  • How do we describe our measures?

4
Topic OneWhat is a variable?
  • Definition A variable is any characteristic or
    attribute of an object under investigation that
    takes on numerical values.
  • For example, variables associated with employees
    may be their talent, work-ethic, wage, gender,
    age, productivity level, etc.

5
Latent vs. Manifest Variables
  • A manifest variable can be observed.
  • e.g. age, gender, productivity-level, tenure and
    wage
  • A latent variable is not observed and can only be
    measured indirectly.
  • e.g. talent, work ethic

6
Dependent vs. Independent Variables
  • An independent variable has an antecedent or
    causal role.
  • e.g. talent, work-ethic, age, tenure
  • A dependent variable plays a consequent, or
    affected, role in relation to the independent
    variable.
  • e.g. productivity

7
Dependent vs. Independent Variables cont.
  • Sometimes a variable can be both an independent
    and a dependent variable.
  • e.g. productivity
  • talent, work-ethic, age gt productivity
  • productivity gt wage
  • It is even possible to have a system of equations
    such as
  • talent, work-ethic, age, wage gt productivity
  • productivity, tenure gt wage

8
Discrete vs. Continuous Variables
  • A discrete variable classifies persons, objects,
    or events according to the kind or quality of
    their attributes.
  • e.g. gender, race
  • A continuous variable can, in theory, take on all
    possible numerical values.
  • e.g. age, income

9
Types of Discrete Variables
  • A non-orderable discrete variable does not have
    an intrinsic order from high to low that can be
    imposed on categories.
  • e.g. East, North, South, West
  • e.g. Asian, Black, Latino, White
  • An orderable discrete variable can be arranged in
    a meaningful ascending or descending sequence.
  • e.g. Responses to a Likert Scale survey
    question such as
  • Do you strongly agree, agree, neither agree
    nor disagree, disagree, or strongly disagree with
    the statement that
  • A dichotomous variable classifies observations
    into two mutually exclusive categories
  • e.g. Gender

10
Desirable Properties of Variables
  • Validity is the degree to which a variables
    operationalization accurately reflects the
    concept it is intended to measure.
  • This is a property related to the strength of the
    epistemic relationship (the linkage between data
    and theory)
  • e.g. For simple tasks, productivity may be a
    valid measure of work ethic.
  • For harder tasks, productivity may not be a valid
    measure of work ethic.

11
Desirable Properties of Variables cont.
  • Reliability is the extent to which different
    operationalizations of the same concept produces
    consistent results.
  • - It is desirable to find that multiple measures
    of the same concept yield consistent inferences.
  • e.g. Reliable measures of a countrys level of
    industrialization are the kilowatt hours of
    electricity per capita and the proportion of GNP
    in manufacturing.

12
Data Collection
  • Data collection is the activity of constructing
    primary data records for a given set of
    observations.
  • Two Basic Forms of Data Collection
  • 1) A population refers to a set of variables
    collected for the entire set of objects of
    analysis
  • e.g. The census, collections of roll call votes
  • 2) A sample refers to a data collection that
    contains a subset of cases or elements selected
    from a population.
  • e.g. A public opinion survey

13
Desirable Properties of Samples
  • Most social scientific research is based on a
    sample of the entire population.
  • A desirable properties of a samples is
    representativeness, which refers to the selection
    of units of analysis that accurately capture the
    characteristics of the larger population.
  • A standard way to ensure representativeness is
    through the collection of a random sample, which
    refers to a sample in which each member of the
    population is equally likely to be selected for a
    sampel.

14
Topic Two-How do we summarize variables?
  • It is impossible to consider piece by piece all
    of the data that we collect
  • Instead we summarize the data in ways that
    facilitate inferences.

15
Tabular Representation of a Variable
  • A Frequency Distribution is a table of the
    outcomes of a variable and the number of times
    each outcomes is observed.

16
Distribution of Income of People 15 Years and
Over in 2001
Source Current Population Survey, March 2002
17
Graphical Representation of Data
  • A Bar Chart is a type of diagram for discrete
    variables in which the numbers of percentages of
    cases in each outcome are displayed.
  • Note the number of people is represented by the
    height of the bar.

18
Graphical Representation of Data
  • A Histogram is fundamentally similar to a bar
    chart, except that it is used for its cases are
    associated with an interval of outcomes of a
    variable.

19
Mathematical or Statistical Representation of the
Data
  • A more succinct way to present data is through
    statistics.
  • Interest focuses on two primary quantities
  • - the center of a distribution
  • - the dispersion of observations around that
    center

20
Measuring the Center of a Distribution
  • The mode identifies the most common value of the
    variable being analyzed.
  • The median is the middle value of the data when
    ordered from highest to lowest value.
  • The mean of the data is the simple arithmetic
    average.

21
Different formulas for the Mean
  • F1. Mean of X X1 X2 Xn / n
  • F2. Mean of X ?? Xi / n
  • For grouped data, such as the income data above,
    we use
  • Mean of X
  • ?categories midpointnumber in category / n

22
The Dispersion of a Distribution
  • The center of mass obviously does not provide a
    complete description of a variable.
  • For example, we would not want to draw the same
    conclusion if we observed these two
    distributions.

23
Measures of Dispersion
  • Range The difference between the largest and
    smallest observed values for a variable.
  • Variance the mean squared deviation from the
    mean of a continuous distribution.

24
Variance
  • Sample Variance of X
  • ??(Xi - Mean(x) )2 / (n-1)
  • Notice that the denominator is n-1, not n.
  • - This is because this is a unbiased estimate
    of the true variance population variance.
  • - If we were to divide by n, we would
    underestimate the population variance.
  • The standard deviation of X is the square root of
    the sample variance.

25
Symmetry of Distribution
  • The mean and variance do not summarize all of the
    information about a distribution.
  • We also want to be able to distinguish between
    the following situations

26
How do we measure symmetry?
  • Skewness is a measure of how asymmetric a
    variables distribution is around its median
    value.
  • Skewness of x 3(Mean(x) Median) / ??X
  • Positive Skew ? the tail of a skewed distribution
    is to the right of the median
  • Negative Skew ? the tail of a skewed distribution
    that is to the left of the median
Write a Comment
User Comments (0)
About PowerShow.com