Measuring Individual Differences presentation

About This Presentation

Transcript and Presenter's Notes

Title: Measuring Individual Differences

1
Measuring Individual Differences
2
Overview

Measurement as a scientific process
Psychological tests
Statistical Concepts
Reliability
Validity

3
Psychological Measurement

Measurement the process (rules) for assigning
numbers to observations to represent quantities
of attributes
Statistics a body of procedures for organizing
data, describing variation, and making inferences

4
Goals of Science

The goal of science is to describe, predict, and
explain natural phenomena
It is necessary to make careful and precise
observations of the phenomena (i.e., measurement)
These observations must be interpreted within an
explanatory framework (i.e., theory)
The truthfulness of the explanation must be
evaluated (i.e., validation)

Theory the proposed interpretation, or
explanation, of the interrelationships among
variables found in nature
Constructs basic elements of a theory
Constructs are abstractions from our observations
and are themselves unobservable.
Constructs are related to other constructs by
hypotheses which specify the ways which variation
in one construct will cause, accompany, or affect
variation in another construct.

6
Stages of Scientific Inquiry

Observation construct generation
Hypothesis generation related to, associated
with, predicts
Investigation of hypotheses/ Data collection
create operational definitions (measurement) and
gather data. Because there may be many
operational definitions of any single construct
the adequacy of the operational definition has to
be ascertained (construct validation)
Verify/Refute Theory

7
Operational Definition

Each (abstract) construct must be translated into
something that is directly observable, which
serves as a proxy for the construct.
Operational Definition the set of specified
procedures which take the theoretical construct
and reduce (map) it to a quantitative (real
world) level.

8
Scientific Method (Measurement)

Hypothesis based on Theory
Operational Definition of Constructs
Data Collection
Data Analysis, Summarization, Interpretation
Evaluate the fit of the results to either
support or fail-to-support the stated
(theoretical) hypothesis

9
Nomological Network

Visual representation of a theory that delineate
the relations among the constructs
Path Diagrams, conventions

Circles represent latent or unobserved
variables (constructs)
Double headed arrows represent correlations
Rectangles represent latent observed
variables, Measurements that serve as
operational definitions of constructs
Single headed arrows represent regression
coefficients change in the variable at the tail
causes change in the variable at the head
10
Hypothesis Relationship
C2 School Performance
C1 Intelligence
C3 Learning Ability
Hypothesis
Hypothesis
Relationship
Relationship
Construct level Theory Land
O2 Course Grades
O1 IQ scores
O3 Speed of Learning Paired Associates
Observed
Observed
Relationship
Relation
Observable level Data Land
Observed Relationship
11
Measurement Standardizes Meaning and Communication

Express general laws in precise ways
Allows for the use of math stats
Greater descriptive flexibility
Use of numbers relays more precise information
Better characterization of relative position

Psychological measurement is less clear and far
more complicated than physical measurement.
Physical measurements can be repeated without
substantially changing the measurement.
Psychological measurements run the risk of
changing the individual as a result of the
measuring process.
Due to limitations of scale, the basis of
psychological measurement are found in comparing
an individual with the group (normative).

13
What is a Psychological Test?

3 criteria
Sample of behavior
Obtained under standardized conditions
Established measurement and scoring rules

14
Sampling Behavior

Cant measure all relevant behavior have to get
a sample of behavior
Three types
Specific task tests of performance
Observation
Self-reports

15
Specific Task Tests

Most familiar type
Score based on success in performing the task
Generalizable?
Limited by testing situation
Examples?

16
Observation

Participant knowledge of being observed
Generalizable?
Again, limited by the testing situation
Examples?

17
Self-reports

Widely used
Description or report
Valid?
Truthful?
Faking
Examples?

18
Standardization

Key word uniformity
Administration
Scoring
Why is uniformity important?
What factors could affect test results?

19
Test Scoring

Obvious objective
Right/wrong
Not so obvious subjective
Projective Tests
Must set up clear criteria
Again consistency is very important!

20
Back to Standardization

Key word uniformity
Administration
Scoring
Establish norms
What are norms?
Terminology norm group normative sample
standardization sample

21
Test Norms

A conversion process
Raw scores scaled scores for comparisons
Example percentile or percentile ranks
Where do we get norms?
Standardization group
Needs to be representative sample!

22
Statistics

Measurement is a set of procedures for assigning
numbers to observations to represent quantities
of attributes
Measurement yields data
Statistics is a set of procedures for summarizing
data, describing variation, and making inference

23
Descriptive Statistics

Summarizes data and describes variation
Central Tendency mean, median, mode
Dispersion variance, standard deviation, min
and max
Distribution skewness, kurtosis

24
Central Tendency

The typical or expected score
Mean average, ? Xi / N
Median middle score of the distribution
Mode most frequent score
If the distribution is symmetrical (e.g., a
normal distribution), the mean, median, and mode
will be the same value
The greater the skew or kurtosis, the more
measures of central tendency will differ

25
Dispersion or Variability

Indication of how much scatter there is in the
distribution of scores
Variability is absolutely essential to
measurement and the study of individual
differences no reason to measure if everyone the
same
Generally want to maximize variability in a
measuring instrument, provides greater
sensitivity or ability to distinguish people

26
Variance

Average (squared) deviation from the mean
? (Xi Mean)2
N
Have to square the deviation otherwise the sum of
the deviations will equal zero
This puts the variance in a different metric than
the mean
Just take the square root of the variance to get
the Standard deviation (SD)

27
Distribution statistics

Skewness describes the tail of the distribution
If the distribution is symmetrical (e.g., normal)
there is no skew
Positive skew tail is in the high values, but
most scores in the low values
Negative skew tail is in the low values, but
most scores in the high values
Kurtosis how much the distribution bunches up
around the mean
Usually want to minimize both skew and kurtosis

28
(No Transcript)
29
Meaning of scores

Raw scores of psychological tests usually have
little inherent meaning
Meaning is derived by comparing scores to others
(e.g., other members of a sample or a normative
sample)
Percentiles
Z scores
T scores

30
Percentile/Percentile Rank

Percentile relative position in the sample or
reference group
Percentile rank percentage of people that
earned a raw score lower than the given score
Percentage of persons, not items

31
Standard scores

Expresses distance of score from the mean in SD
units
Advantages of standard scores
Includes information about the persons standing
in the distribution (ie., percentile rank)
Allows comparisons across tests that have
different raw metrics

32
Z score

How far the score is away from the mean in SD
units
Xi Mean
SD
Z score mean 0, SD 1.0

33
Z scores and Percentile ranks

Z scores relate to percentile ranks (see figure
2-7 in textbook)
For a normal distributionZ score Percentile
rank
2 97.5
1 84
0 50
-1 16
-2 2.5
Z scores between 1 and 1 are usually considered
the average range

34
T scores

T scores are linear transformations of Z scores
Why? For Z scores, half the scores are negative
and fractional numbers are involved

35
T scores

T score (Z score 10) 50
Mean 50, SD 10
If normal scores will be between 20 and 80
Scores no longer negative or fractional
components

36
Conversions of Standard Scores
37
T scores

MMPI scales are expressed in T score units
T score of 65 or higher is considered clinically
significant
GRE and SAT subtests use T score (10) metric
Mean 500, SD 100
E.g., Verbal score of 600 is 1 SD above the mean
Quantitative score of 700 is 2 SD above the mean

38
Norms

Usually compared a persons score relative to a
normative sample
The normative sample is some defined group
A persons score is interpreted in relation to
the scores of this defined group

39
Types of Norms

Age-related
Average scores for persons of a certain
chronological age
Grade equivalent
Average scores for persons of a certain grade
level
Percentile
Relative position in the norm group
Standard score (Z and T scores)
Deviation score from the mean of the norm group

40
Scales of Measurement

We usually treat psychological tests as interval
but really they are ordinal
Interval equal spaces on the scale have the
same meaning
Can only say how far apart in the distribution
scores are from each other

41
Stats 2 Inferential Statistics

Statistics are tools that help us understand our
observations by
Summarize and describe our data (descriptive
stats)
Test hypotheses (inferential stats)
We need to inferential statistics to verify or
refute hypotheses
These tests help to establish the validity of our
theory and measuring instruments

42
Population vs. Sample

Population encompasses all the phenomenon of
interest
Parameters are the numbers used to describe the
population
Sample is a subset of observations from the
population
Sample Statistics are the are the numbers used to
describe the sample and to estimate the
population parameters
Want to generalize or infer that what we observe
in our sample also applies to the population
We do this by making probablistic statements
relating the population and sample

43
Population vs. Sample

This is exactly the same logic used in testing
The population is construct of interest
The sample is the test
We then generalize from what we observe in the
sample to the population using probablistic
statements

44
Correlation (r) Coefficient

Way to describe relationship between two
variables
Magnitude
Direction
Many types
Pearsons r (Product Moment Correlation)

45
Pearsons r

Ranges from 1.0 to 1.0
Has no units of measurement
0 indicates no linear relationship
-1 indicates a perfect, negative linear
relationship
1 indicates a perfect, positive linear
relationship

46
Co-variance

Where does correlation come from?
Amount of overlapping variance need variance to
have covariance
Covariance
S (X X)(Y Y)
N

47
Problems with Covariance

Same as raw scores, units typically have little
intrinsic meaning and no upper limit
Also, two variables may be on different scales
Need an analog to standard scores
Standardized covariance

48
Correlation as Standardized Covariance

Doesnt matter which variable is x or y
r Covariance
SDx SDy

49
Examples of Correlations

Item 1 on Quiz 1 and total score for Quiz 1 r
.92, corrected r .82
Cumulative quiz scores and total score on
screening measure of IQ r .03
Difference score (Quiz 2 Quiz 1) and Quiz 1
score r -.81

50
r .92, corrected r .82
51
(No Transcript)
52
r -.81
53
Null Hypothesis for Correlation Coefficient

Typically, NH is whether the correlation is
different from zero
Bigger the sample, more power to detect any
differences from zero (reject NH)
Can be different from zero, but have little
practical significance
r2 - coefficient of determination or proportion
of variance accounted for

54
Effect Sizes for Correlations

Small ES r .10 to .29
Medium ES r .30 to .49
Large ES r .50 to 1.00
Most psychological research works with effects in
the small to medium range

55
Usual Correlation Disclaimer

Correlation does NOT equal causation
Reasons?
Chance
Third variable causes the relationship

56
Prediction

r describes how much two things go together
Therefore, can be used to predict y from x
If r 1.0, what z score would you predict for y
if you knew x?
Unlike correlation, in regression it matters
which variable is x and y

57
Linear Regression

Describes the association between two variables
using a straight line
Equation of a line
y a bx
Where
x predictor or independent variable
y outcome or criterion variable
y predicted value of y
b slope amount of change in y associated with
one unit change in x
a intercept value of y when x 0

58
Conceptual Understanding of Linear Regression

a mean of y
If x 0, mean is your best guess of someones
score
x just gives you additional information to
improve your prediction
The stronger the relationship between x and y
(i.e., the correlation), the better your
prediction gets

59
Linear Regression with z scores

b is simply r
Why?
a is zero
no adjustment needed for different scales
Change in y per 1 SD change in x
Equation
zy rzx

60
Regression and ANOVA

Regression and ANOVA are really the same
General Linear Model (GLM)
y a bx
If you have groups, x is group membership
Dummy code 0 group 1, 1 group 2
Plot the means, the slope of the line is the
correlation (point-biserial correlation)

61
Mean differences as a Correlation
Height (in)
62
GLM

ANOVA and regression both try to account for
variance in a criterion
Only difference is the nature of predictor
variable quantitative (continuous) or
categorical (dichotomous)

63
Reliability

Definition the proportion of variance in a set
of test scores that is due to the real or true
attributes of the persons being measured, rather
than error
Also, repeatability, consistency, or stability

64
Reliability as Repeatability

Conceptually, any observation has some degree of
error or imprecision
By taking multiple measurements it is presumed
that these random errors will cancel each other
out
Under certain assumptions the mean of repeated
measurements is considered an estimate of the
true score

65
Components of Reliability

Want a statistic of the proportion of total test
score variance that is due to the true score
variance
i.e., what proportion is not due to error
variance?
Defining true score variance as the consistent,
stable variance

66
Classical Test Theory (CTT) Reliability

Observed score true score error
X True error
sX2 sT2 se2
What is observed is a function of the variability
in the true score and variability of the errors
of measurement

67
Definition by Symbols

Reliability
rxx sT2 sT2
sX2 sT2
se2

68
Assumptions of True Score Theory

Error of measurement is unsystematic or random
deviation of an individuals score from a
theoretically expected observed score
(true-score)
Observed score True Score error
True score is an expected or mean score
Errors are not correlated with true score (i.e.,
random)

69
Methods of Assessing Reliability

Test-retest
Alternate Forms
Split-half
Internal Consistency

70
Average Item Intercorrelation

Related to the last type of reliability well
discuss, internal consistency reliability
An example imagine two people who are taking an
internally consistent test of extraversion

71
An Example cont.

Brittany is very extraverted, Hillary is not
For every item, Brittany always responds true
and Hillary always responds false
So, within a sample of different people, the
responses to items will be correlated
People who score high on item 1 will also score
high on item 2, 3,..n
Internal consistency

72
Another Example

Imagine Brittany and Hillary take an internally
consistent test of intelligence
Hillary is very intelligent Brittany is not so
bright
Hillary passes every item Brittany fails nearly
every item
Again within a sample of different people, the
item responses will be correlated
People who pass item 1 will tend to pass items 2,
3,.n

73
Flipping Examples

Now imagine an internally inconsistent test
Responses would be random with respect to what
the test is supposedly measuring (extraversion,
intelligence)
What does this have to do with reliability?

74
Internal Consistency Reliability

Take the logic of split-half and parallel forms
reliability to the extreme
Every ITEM is a parallel test of the construct
Therefore, the average correlation among items is
an index of reliability

75
Cronbachs Coefficient Alpha (a)

Alpha is the average value of all possible
split-half reliabilities
As number of items increases so will alpha
Some consider this a major flaw, claiming alpha
is useless if more than 40 items are used
Use average interitem correlation instead

76
Standard Error of Measurement

Applying reliability to individuals
SEM
standard deviation of the distribution of test
scores you would expect if a test was
administered repeatedly to the same person

77
SEM

If test scores are consequential, a small SEM is
important
Normal curve reference
Standard deviation tells how far off you are in
estimating the true score, on average

78
SEM Formula

SEM SD 1 rxx
SD Standard deviation of test scores
rxx reliability coefficient

79
SEM example

IQ score rxx .90, SD 15
SEM 15 1 - .90 4.74
Get a confidence interval for a score of 110
68 CI 110 4.7 105.3, 114.7
95 CI 110 9.5 100.5, 119.5
99 CI 110 14.2 95.8, 124.2

80
Relationship between Reliability and Validity

Reliability places a limit on validity
Why?

81
Factors Influencing Reliability

Inter-item correlation
Number of items
The more items, the higher the reliability
coefficient

82
Dependence of Reliability on the Sample Tested

Internal consistency reliability is dependent on
observed item scores
Cant assume reliability estimate in one sample
will apply to a different sample

83
Dependence of Reliability on the Sample Tested

Also, applies to SEM
Assumes
equal measurement precision across all levels a
trait
Individuals dont differ in the ability of the
test to measure their trait level
SEM dependent on variability of sample scores

84
Validity

Does the test measure what it is supposed to
measure?
Is the label put on the test and scores
appropriate
What inferences can you make about a test score?
Validity is multifaceted
Face, Content, Criterion, and Construct Validity

85
Face Validity

Does the test appear to measure what people
responding to it think it does?
Subjective reaction to a test
Primarily a PR issue
Some dont consider it part of validity

86
Content Validity

Is the coverage of testing material an adequate
sample of the construct of interest?
Have to cover everything
Structure of test should be the same as the
construct
Factor analysis

87
Criterion related Validity

Can a test predict a criterion that is external
to the test?
Concurrent validity
Can the test predict criteria measured at roughly
the same time?
Predictive validity
Can the test predict criteria measured after the
test was taken?

88
Construct Validity

Subsumes all types of validity
Determines the appropriateness of inferences
about a construct
What is part of the construct?
What other constructs is it related to?
What other constructs is it NOT related to?

89
Construct Validation

An ongoing process
Interplay between hypothesis generation, data
collection, and refining the construct
No construct validation index no single value
that summarizes a tests construct validity

90
Evidence of Construct Validity

Group (mean) differences
Correlations
Factor analysis
Studies of internal structure
Studies of change over occasions
Studies of process (experimental manipulations)

91
Establishing Validity

Scores on the measuring instrument must behave in
a way that is consistent with theory
Make measurements, test hypotheses
Validating a measuring instrument also validates
(or refutes) a theory

92
Max Consumption as a measure of Alcoholism

Construct of Alcoholism
People all over the world consume alcohol
Individual differences in alcohol consumption
Some persons use of alcohol is considered
pathological
Drink large quantities, spend excessive time
drinking or pursuing alcohol, interferes with
major life roles (work, parent), unable to stop
drinking, withdrawal, medical problems and
continued use despite medical problems

93
Alcoholism Definitions

DSM-IV criteria for Alcohol Dependence
3 symptoms (or more) occurring in the same
12-month period
Tolerance, withdrawal, drinking more than
intended, unable to cut down, great deal of time
spent obtaining, consuming, or recovering from
substance use, important activities given up, use
continued despite physical or psychological
problem caused or exacerbated by the substance

94
Maximum Consumption

What is the largest amount of alcohol you have
ever consumed in 24 hours?
Alternative measure of alcoholism?
Must demonstrate the same associations as would
be predicted for Alcoholism

95
Max Consumption vs. Alc Dep

Advantages Max Consumption
Objective number and easy to compare across
people
More socially acceptable people reluctant to
admit to Alc Dep symptoms
Quantitative, spans the full range of
vulnerability to alcoholism
Alc Dep only measures the extreme range
Lose information, lose statistical power

96
Quantitative Measures and Alcoholism Severity
Threshold
Liability
97
Max Consumption vs. Alc Dep

Potential Disadvantages
Sufficient content validity?
How accurate at people at reporting?
False positives?
False negatives?
Most of these are no different for any
other measures including Alc Dep

98
Alcoholisms Nomological Network
Drug Use
Intelligence
Tobacco Use
School Achievement
Adult Antisocial Behavior
Alcoholism
Delinquency
Depression
Risky Sexual Behavior
99
Hypotheses connecting Alcoholism and other
constructs

Construct
Drug use
Tobacco use
Adult Antisocial Behavior
Delinquency
Risky Sexual Behavior
Depression
Intelligence
School Achievement

Predicted relation
Strong ()
Strong ()
Strong ()
Strong ()
Moderate ()
Zero, small ()
Zero, small (-)
Small (-)

100
Does Max Consumption Reproduce the same
Nomological Network?
Drug Use
Intelligence
Tobacco Use
School Achievement
Adult Antisocial Behavior
MAX CONS
Delinquency
Depression
Risky Sexual Behavior
101
Validate Max Consumption as measure of Alcoholism

Does Max Consumption exhibit the same relations
with other constructs?
Have to measure each construct
Make observations in representative sample
Test hypotheses using statistics

102
Need to measure each construct

Construct
Drug use
Tobacco use
Adult Antisocial Behavior
Delinquency
Risky Sexual Behavior
Depression
Intelligence
School Achievement

Measure
DSM symptoms Drug Dependence
Nicotine Dependence
Antisocial Personality Disorder
Conduct Disorder
Life Events Interview
DSM Major Depression
Wechsler IQ scores
Class Grades

103
Sample

Minnesota Twin Family Study
17-year old male and female twins
Born in MN, recruited from all over the state
Almost all white, IQ gt 70, no mental or physical
disability
Representative?

104
Statistics

Need to use statistics to test hypotheses
Rely on correlations
Correlation index of association ranges from 1
to 1
1 perfect positive relation
-1 perfect inverse relation
0 no association

105
Convergent Discriminant Relations

Convergent validity
Measure should be positively correlated with
certain constructs
Discriminant Validity
Measure should be uncorrelated or negatively
correlated with other constructs

106
Convergent Validity

Test should be positively correlated with other
tests attempting to measure the same construct
Correlation between Max Consumption Alc. Dep
r .65
Big correlation
Measures a similar construct, but not the same
construct
Which is a better measure of Alcoholism?

107
Convergent Validity
108
Discriminant Validity
109
Group (mean) differences

Alcoholism more common in men than women
Mean Alc Dep symptoms
Men .63, Women .43
Mean Max Consumption
Men 7.7, Women 4.87
Using t-tests, means are significantly different
for both

110
Evaluate Measure

How consistent are the observations with theory?
As measures of Alcoholism, both Max Cons and Alc
Dep criteria are related to external constructs
in a way consistent with theory
Therefore, both are valid measures of Alcoholism
It might seem simple, but if the observations are
NOT as predicted, the test is not valid

111
Is Construct Validation over?

No, its just the beginning
Continue to delineate relations with other
constructs
Now that good measures of the construct of
Alcoholism are available
Can now study the etiology or causal processes of
Alcoholism

112
Construct Elaboration

As new observations accrue, new criteria to
evaluate the construct validity of measures
For example, develop markers of the underlying
processes of Alcoholism
Specific genes
psychophysiological markers present before
symptom onset
The test that has a stronger relation with these
variables is more valid measure of the construct

Measuring Individual Differences PowerPoint PPT Presentation