Title: Describing Univariate Variables
1Describing Univariate Variables
2Stages of empirical research
- - derive a hypothesis from theory
- - identify epistemic relationships
- - test relationship with data
3Agenda
- Foundations of Hypothesis Testing
- How do we measure concepts of interest?
- How do we describe our measures?
4Topic OneWhat is a variable?
- Definition A variable is any characteristic or
attribute of an object under investigation that
takes on numerical values. - For example, variables associated with employees
may be their talent, work-ethic, wage, gender,
age, productivity level, etc.
5Latent vs. Manifest Variables
- A manifest variable can be observed.
- e.g. age, gender, productivity-level, tenure and
wage - A latent variable is not observed and can only be
measured indirectly. - e.g. talent, work ethic
6Dependent vs. Independent Variables
- An independent variable has an antecedent or
causal role. - e.g. talent, work-ethic, age, tenure
- A dependent variable plays a consequent, or
affected, role in relation to the independent
variable. - e.g. productivity
7Dependent vs. Independent Variables cont.
- Sometimes a variable can be both an independent
and a dependent variable. - e.g. productivity
- talent, work-ethic, age gt productivity
- productivity gt wage
- It is even possible to have a system of equations
such as - talent, work-ethic, age, wage gt productivity
- productivity, tenure gt wage
8Discrete vs. Continuous Variables
- A discrete variable classifies persons, objects,
or events according to the kind or quality of
their attributes. - e.g. gender, race
- A continuous variable can, in theory, take on all
possible numerical values. - e.g. age, income
9Types of Discrete Variables
- A non-orderable discrete variable does not have
an intrinsic order from high to low that can be
imposed on categories. - e.g. East, North, South, West
- e.g. Asian, Black, Latino, White
- An orderable discrete variable can be arranged in
a meaningful ascending or descending sequence. - e.g. Responses to a Likert Scale survey
question such as - Do you strongly agree, agree, neither agree
nor disagree, disagree, or strongly disagree with
the statement that - A dichotomous variable classifies observations
into two mutually exclusive categories - e.g. Gender
10Desirable Properties of Variables
- Validity is the degree to which a variables
operationalization accurately reflects the
concept it is intended to measure. - This is a property related to the strength of the
epistemic relationship (the linkage between data
and theory) - e.g. For simple tasks, productivity may be a
valid measure of work ethic. - For harder tasks, productivity may not be a valid
measure of work ethic.
11Desirable Properties of Variables cont.
- Reliability is the extent to which different
operationalizations of the same concept produces
consistent results. - - It is desirable to find that multiple measures
of the same concept yield consistent inferences. - e.g. Reliable measures of a countrys level of
industrialization are the kilowatt hours of
electricity per capita and the proportion of GNP
in manufacturing.
12Data Collection
- Data collection is the activity of constructing
primary data records for a given set of
observations. - Two Basic Forms of Data Collection
- 1) A population refers to a set of variables
collected for the entire set of objects of
analysis - e.g. The census, collections of roll call votes
- 2) A sample refers to a data collection that
contains a subset of cases or elements selected
from a population. - e.g. A public opinion survey
13Desirable Properties of Samples
- Most social scientific research is based on a
sample of the entire population. - A desirable properties of a samples is
representativeness, which refers to the selection
of units of analysis that accurately capture the
characteristics of the larger population. - A standard way to ensure representativeness is
through the collection of a random sample, which
refers to a sample in which each member of the
population is equally likely to be selected for a
sampel.
14Topic Two-How do we summarize variables?
- It is impossible to consider piece by piece all
of the data that we collect - Instead we summarize the data in ways that
facilitate inferences.
15Tabular Representation of a Variable
- A Frequency Distribution is a table of the
outcomes of a variable and the number of times
each outcomes is observed.
16Distribution of Income of People 15 Years and
Over in 2001
Source Current Population Survey, March 2002
17Graphical Representation of Data
- A Bar Chart is a type of diagram for discrete
variables in which the numbers of percentages of
cases in each outcome are displayed. - Note the number of people is represented by the
height of the bar.
18Graphical Representation of Data
- A Histogram is fundamentally similar to a bar
chart, except that it is used for its cases are
associated with an interval of outcomes of a
variable.
19Mathematical or Statistical Representation of the
Data
- A more succinct way to present data is through
statistics. - Interest focuses on two primary quantities
- - the center of a distribution
- - the dispersion of observations around that
center
20Measuring the Center of a Distribution
- The mode identifies the most common value of the
variable being analyzed. - The median is the middle value of the data when
ordered from highest to lowest value. - The mean of the data is the simple arithmetic
average.
21Different formulas for the Mean
- F1. Mean of X X1 X2 Xn / n
- F2. Mean of X ?? Xi / n
- For grouped data, such as the income data above,
we use - Mean of X
- ?categories midpointnumber in category / n
22The Dispersion of a Distribution
- The center of mass obviously does not provide a
complete description of a variable. - For example, we would not want to draw the same
conclusion if we observed these two
distributions.
23Measures of Dispersion
- Range The difference between the largest and
smallest observed values for a variable. - Variance the mean squared deviation from the
mean of a continuous distribution.
24Variance
- Sample Variance of X
- ??(Xi - Mean(x) )2 / (n-1)
- Notice that the denominator is n-1, not n.
- - This is because this is a unbiased estimate
of the true variance population variance. - - If we were to divide by n, we would
underestimate the population variance. - The standard deviation of X is the square root of
the sample variance.
25Symmetry of Distribution
- The mean and variance do not summarize all of the
information about a distribution. - We also want to be able to distinguish between
the following situations
26How do we measure symmetry?
- Skewness is a measure of how asymmetric a
variables distribution is around its median
value. - Skewness of x 3(Mean(x) Median) / ??X
- Positive Skew ? the tail of a skewed distribution
is to the right of the median - Negative Skew ? the tail of a skewed distribution
that is to the left of the median