Title: Measuring Individual Differences
1Measuring Individual Differences
2Overview
- Measurement as a scientific process
- Psychological tests
- Statistical Concepts
- Reliability
- Validity
3Psychological Measurement
- Measurement the process (rules) for assigning
numbers to observations to represent quantities
of attributes - Statistics a body of procedures for organizing
data, describing variation, and making inferences
4Goals of Science
- The goal of science is to describe, predict, and
explain natural phenomena - It is necessary to make careful and precise
observations of the phenomena (i.e., measurement) - These observations must be interpreted within an
explanatory framework (i.e., theory) - The truthfulness of the explanation must be
evaluated (i.e., validation)
5- Theory the proposed interpretation, or
explanation, of the interrelationships among
variables found in nature - Constructs basic elements of a theory
- Constructs are abstractions from our observations
and are themselves unobservable. - Constructs are related to other constructs by
hypotheses which specify the ways which variation
in one construct will cause, accompany, or affect
variation in another construct. -
6Stages of Scientific Inquiry
- Observation construct generation
- Hypothesis generation related to, associated
with, predicts - Investigation of hypotheses/ Data collection
create operational definitions (measurement) and
gather data. Because there may be many
operational definitions of any single construct
the adequacy of the operational definition has to
be ascertained (construct validation) - Verify/Refute Theory
7Operational Definition
- Each (abstract) construct must be translated into
something that is directly observable, which
serves as a proxy for the construct. - Operational Definition the set of specified
procedures which take the theoretical construct
and reduce (map) it to a quantitative (real
world) level.
8Scientific Method (Measurement)
- Hypothesis based on Theory
- Operational Definition of Constructs
- Data Collection
- Data Analysis, Summarization, Interpretation
- Evaluate the fit of the results to either
support or fail-to-support the stated
(theoretical) hypothesis
9Nomological Network
- Visual representation of a theory that delineate
the relations among the constructs - Path Diagrams, conventions
Circles represent latent or unobserved
variables (constructs)
Double headed arrows represent correlations
Rectangles represent latent observed
variables, Measurements that serve as
operational definitions of constructs
Single headed arrows represent regression
coefficients change in the variable at the tail
causes change in the variable at the head
10Hypothesis Relationship
C2 School Performance
C1 Intelligence
C3 Learning Ability
Hypothesis
Hypothesis
Relationship
Relationship
Construct level Theory Land
O2 Course Grades
O1 IQ scores
O3 Speed of Learning Paired Associates
Observed
Observed
Relationship
Relation
Observable level Data Land
Observed Relationship
11Measurement Standardizes Meaning and Communication
- Express general laws in precise ways
- Allows for the use of math stats
- Greater descriptive flexibility
- Use of numbers relays more precise information
- Better characterization of relative position
12- Psychological measurement is less clear and far
more complicated than physical measurement. - Physical measurements can be repeated without
substantially changing the measurement. - Psychological measurements run the risk of
changing the individual as a result of the
measuring process. - Due to limitations of scale, the basis of
psychological measurement are found in comparing
an individual with the group (normative).
13What is a Psychological Test?
- 3 criteria
- Sample of behavior
- Obtained under standardized conditions
- Established measurement and scoring rules
14Sampling Behavior
- Cant measure all relevant behavior have to get
a sample of behavior - Three types
- Specific task tests of performance
- Observation
- Self-reports
15Specific Task Tests
- Most familiar type
- Score based on success in performing the task
- Generalizable?
- Limited by testing situation
- Examples?
16Observation
- Participant knowledge of being observed
- Generalizable?
- Again, limited by the testing situation
- Examples?
17Self-reports
- Widely used
- Description or report
- Valid?
- Truthful?
- Faking
- Examples?
18Standardization
- Key word uniformity
- Administration
- Scoring
- Why is uniformity important?
- What factors could affect test results?
19Test Scoring
- Obvious objective
- Right/wrong
- Not so obvious subjective
- Projective Tests
- Must set up clear criteria
- Again consistency is very important!
20Back to Standardization
- Key word uniformity
- Administration
- Scoring
- Establish norms
- What are norms?
- Terminology norm group normative sample
standardization sample
21Test Norms
- A conversion process
- Raw scores scaled scores for comparisons
- Example percentile or percentile ranks
- Where do we get norms?
- Standardization group
- Needs to be representative sample!
22Statistics
- Measurement is a set of procedures for assigning
numbers to observations to represent quantities
of attributes - Measurement yields data
- Statistics is a set of procedures for summarizing
data, describing variation, and making inference
23Descriptive Statistics
- Summarizes data and describes variation
- Central Tendency mean, median, mode
- Dispersion variance, standard deviation, min
and max - Distribution skewness, kurtosis
24Central Tendency
- The typical or expected score
- Mean average, ? Xi / N
- Median middle score of the distribution
- Mode most frequent score
- If the distribution is symmetrical (e.g., a
normal distribution), the mean, median, and mode
will be the same value - The greater the skew or kurtosis, the more
measures of central tendency will differ
25Dispersion or Variability
- Indication of how much scatter there is in the
distribution of scores - Variability is absolutely essential to
measurement and the study of individual
differences no reason to measure if everyone the
same - Generally want to maximize variability in a
measuring instrument, provides greater
sensitivity or ability to distinguish people
26Variance
- Average (squared) deviation from the mean
- ? (Xi Mean)2
- N
- Have to square the deviation otherwise the sum of
the deviations will equal zero - This puts the variance in a different metric than
the mean - Just take the square root of the variance to get
the Standard deviation (SD)
27Distribution statistics
- Skewness describes the tail of the distribution
- If the distribution is symmetrical (e.g., normal)
there is no skew - Positive skew tail is in the high values, but
most scores in the low values - Negative skew tail is in the low values, but
most scores in the high values - Kurtosis how much the distribution bunches up
around the mean - Usually want to minimize both skew and kurtosis
28(No Transcript)
29Meaning of scores
- Raw scores of psychological tests usually have
little inherent meaning - Meaning is derived by comparing scores to others
(e.g., other members of a sample or a normative
sample) - Percentiles
- Z scores
- T scores
30Percentile/Percentile Rank
- Percentile relative position in the sample or
reference group - Percentile rank percentage of people that
earned a raw score lower than the given score - Percentage of persons, not items
31Standard scores
- Expresses distance of score from the mean in SD
units - Advantages of standard scores
- Includes information about the persons standing
in the distribution (ie., percentile rank) - Allows comparisons across tests that have
different raw metrics
32Z score
- How far the score is away from the mean in SD
units - Xi Mean
- SD
- Z score mean 0, SD 1.0
33Z scores and Percentile ranks
- Z scores relate to percentile ranks (see figure
2-7 in textbook) - For a normal distributionZ score Percentile
rank - 2 97.5
- 1 84
- 0 50
- -1 16
- -2 2.5
- Z scores between 1 and 1 are usually considered
the average range
34T scores
- T scores are linear transformations of Z scores
- Why? For Z scores, half the scores are negative
and fractional numbers are involved
35T scores
- T score (Z score 10) 50
- Mean 50, SD 10
- If normal scores will be between 20 and 80
- Scores no longer negative or fractional
components
36Conversions of Standard Scores
37T scores
- MMPI scales are expressed in T score units
- T score of 65 or higher is considered clinically
significant - GRE and SAT subtests use T score (10) metric
- Mean 500, SD 100
- E.g., Verbal score of 600 is 1 SD above the mean
- Quantitative score of 700 is 2 SD above the mean
38Norms
- Usually compared a persons score relative to a
normative sample - The normative sample is some defined group
- A persons score is interpreted in relation to
the scores of this defined group
39Types of Norms
- Age-related
- Average scores for persons of a certain
chronological age - Grade equivalent
- Average scores for persons of a certain grade
level - Percentile
- Relative position in the norm group
- Standard score (Z and T scores)
- Deviation score from the mean of the norm group
40Scales of Measurement
- We usually treat psychological tests as interval
but really they are ordinal - Interval equal spaces on the scale have the
same meaning - Can only say how far apart in the distribution
scores are from each other
41Stats 2 Inferential Statistics
- Statistics are tools that help us understand our
observations by - Summarize and describe our data (descriptive
stats) - Test hypotheses (inferential stats)
- We need to inferential statistics to verify or
refute hypotheses - These tests help to establish the validity of our
theory and measuring instruments
42Population vs. Sample
- Population encompasses all the phenomenon of
interest - Parameters are the numbers used to describe the
population - Sample is a subset of observations from the
population - Sample Statistics are the are the numbers used to
describe the sample and to estimate the
population parameters - Want to generalize or infer that what we observe
in our sample also applies to the population - We do this by making probablistic statements
relating the population and sample
43Population vs. Sample
- This is exactly the same logic used in testing
- The population is construct of interest
- The sample is the test
- We then generalize from what we observe in the
sample to the population using probablistic
statements
44Correlation (r) Coefficient
- Way to describe relationship between two
variables - Magnitude
- Direction
- Many types
- Pearsons r (Product Moment Correlation)
45Pearsons r
- Ranges from 1.0 to 1.0
- Has no units of measurement
- 0 indicates no linear relationship
- -1 indicates a perfect, negative linear
relationship - 1 indicates a perfect, positive linear
relationship
46Co-variance
- Where does correlation come from?
- Amount of overlapping variance need variance to
have covariance - Covariance
- S (X X)(Y Y)
- N
47Problems with Covariance
- Same as raw scores, units typically have little
intrinsic meaning and no upper limit - Also, two variables may be on different scales
- Need an analog to standard scores
- Standardized covariance
48Correlation as Standardized Covariance
- Doesnt matter which variable is x or y
- r Covariance
- SDx SDy
49Examples of Correlations
- Item 1 on Quiz 1 and total score for Quiz 1 r
.92, corrected r .82 - Cumulative quiz scores and total score on
screening measure of IQ r .03 - Difference score (Quiz 2 Quiz 1) and Quiz 1
score r -.81
50r .92, corrected r .82
51(No Transcript)
52r -.81
53Null Hypothesis for Correlation Coefficient
- Typically, NH is whether the correlation is
different from zero - Bigger the sample, more power to detect any
differences from zero (reject NH) - Can be different from zero, but have little
practical significance - r2 - coefficient of determination or proportion
of variance accounted for
54Effect Sizes for Correlations
- Small ES r .10 to .29
- Medium ES r .30 to .49
- Large ES r .50 to 1.00
- Most psychological research works with effects in
the small to medium range
55Usual Correlation Disclaimer
- Correlation does NOT equal causation
- Reasons?
- Chance
- Third variable causes the relationship
56Prediction
- r describes how much two things go together
- Therefore, can be used to predict y from x
- If r 1.0, what z score would you predict for y
if you knew x? - Unlike correlation, in regression it matters
which variable is x and y
57Linear Regression
- Describes the association between two variables
using a straight line - Equation of a line
- y a bx
- Where
- x predictor or independent variable
- y outcome or criterion variable
- y predicted value of y
- b slope amount of change in y associated with
one unit change in x - a intercept value of y when x 0
58Conceptual Understanding of Linear Regression
- a mean of y
- If x 0, mean is your best guess of someones
score - x just gives you additional information to
improve your prediction - The stronger the relationship between x and y
(i.e., the correlation), the better your
prediction gets
59Linear Regression with z scores
- b is simply r
- Why?
- a is zero
- no adjustment needed for different scales
- Change in y per 1 SD change in x
- Equation
- zy rzx
60Regression and ANOVA
- Regression and ANOVA are really the same
- General Linear Model (GLM)
- y a bx
- If you have groups, x is group membership
- Dummy code 0 group 1, 1 group 2
- Plot the means, the slope of the line is the
correlation (point-biserial correlation)
61Mean differences as a Correlation
Height (in)
62GLM
- ANOVA and regression both try to account for
variance in a criterion - Only difference is the nature of predictor
variable quantitative (continuous) or
categorical (dichotomous)
63Reliability
- Definition the proportion of variance in a set
of test scores that is due to the real or true
attributes of the persons being measured, rather
than error - Also, repeatability, consistency, or stability
64Reliability as Repeatability
- Conceptually, any observation has some degree of
error or imprecision - By taking multiple measurements it is presumed
that these random errors will cancel each other
out - Under certain assumptions the mean of repeated
measurements is considered an estimate of the
true score
65Components of Reliability
- Want a statistic of the proportion of total test
score variance that is due to the true score
variance - i.e., what proportion is not due to error
variance? - Defining true score variance as the consistent,
stable variance
66Classical Test Theory (CTT) Reliability
- Observed score true score error
- X True error
- sX2 sT2 se2
- What is observed is a function of the variability
in the true score and variability of the errors
of measurement
67Definition by Symbols
- Reliability
- rxx sT2 sT2
- sX2 sT2
se2
68Assumptions of True Score Theory
- Error of measurement is unsystematic or random
deviation of an individuals score from a
theoretically expected observed score
(true-score) - Observed score True Score error
- True score is an expected or mean score
- Errors are not correlated with true score (i.e.,
random)
69Methods of Assessing Reliability
- Test-retest
- Alternate Forms
- Split-half
- Internal Consistency
70Average Item Intercorrelation
- Related to the last type of reliability well
discuss, internal consistency reliability - An example imagine two people who are taking an
internally consistent test of extraversion
71An Example cont.
- Brittany is very extraverted, Hillary is not
- For every item, Brittany always responds true
and Hillary always responds false - So, within a sample of different people, the
responses to items will be correlated - People who score high on item 1 will also score
high on item 2, 3,..n - Internal consistency
72Another Example
- Imagine Brittany and Hillary take an internally
consistent test of intelligence - Hillary is very intelligent Brittany is not so
bright - Hillary passes every item Brittany fails nearly
every item - Again within a sample of different people, the
item responses will be correlated - People who pass item 1 will tend to pass items 2,
3,.n
73Flipping Examples
- Now imagine an internally inconsistent test
- Responses would be random with respect to what
the test is supposedly measuring (extraversion,
intelligence) - What does this have to do with reliability?
74Internal Consistency Reliability
- Take the logic of split-half and parallel forms
reliability to the extreme - Every ITEM is a parallel test of the construct
- Therefore, the average correlation among items is
an index of reliability
75Cronbachs Coefficient Alpha (a)
- Alpha is the average value of all possible
split-half reliabilities - As number of items increases so will alpha
- Some consider this a major flaw, claiming alpha
is useless if more than 40 items are used - Use average interitem correlation instead
76Standard Error of Measurement
- Applying reliability to individuals
- SEM
- standard deviation of the distribution of test
scores you would expect if a test was
administered repeatedly to the same person
77SEM
- If test scores are consequential, a small SEM is
important - Normal curve reference
- Standard deviation tells how far off you are in
estimating the true score, on average
78SEM Formula
- SEM SD 1 rxx
- SD Standard deviation of test scores
- rxx reliability coefficient
79SEM example
- IQ score rxx .90, SD 15
- SEM 15 1 - .90 4.74
- Get a confidence interval for a score of 110
- 68 CI 110 4.7 105.3, 114.7
- 95 CI 110 9.5 100.5, 119.5
- 99 CI 110 14.2 95.8, 124.2
80Relationship between Reliability and Validity
- Reliability places a limit on validity
- Why?
81Factors Influencing Reliability
- Inter-item correlation
- Number of items
- The more items, the higher the reliability
coefficient
82Dependence of Reliability on the Sample Tested
- Internal consistency reliability is dependent on
observed item scores - Cant assume reliability estimate in one sample
will apply to a different sample
83Dependence of Reliability on the Sample Tested
- Also, applies to SEM
- Assumes
- equal measurement precision across all levels a
trait - Individuals dont differ in the ability of the
test to measure their trait level - SEM dependent on variability of sample scores
84Validity
- Does the test measure what it is supposed to
measure? - Is the label put on the test and scores
appropriate - What inferences can you make about a test score?
- Validity is multifaceted
- Face, Content, Criterion, and Construct Validity
85Face Validity
- Does the test appear to measure what people
responding to it think it does? - Subjective reaction to a test
- Primarily a PR issue
- Some dont consider it part of validity
86Content Validity
- Is the coverage of testing material an adequate
sample of the construct of interest? - Have to cover everything
- Structure of test should be the same as the
construct - Factor analysis
87Criterion related Validity
- Can a test predict a criterion that is external
to the test? - Concurrent validity
- Can the test predict criteria measured at roughly
the same time? - Predictive validity
- Can the test predict criteria measured after the
test was taken?
88Construct Validity
- Subsumes all types of validity
- Determines the appropriateness of inferences
about a construct - What is part of the construct?
- What other constructs is it related to?
- What other constructs is it NOT related to?
89Construct Validation
- An ongoing process
- Interplay between hypothesis generation, data
collection, and refining the construct - No construct validation index no single value
that summarizes a tests construct validity
90Evidence of Construct Validity
- Group (mean) differences
- Correlations
- Factor analysis
- Studies of internal structure
- Studies of change over occasions
- Studies of process (experimental manipulations)
91Establishing Validity
- Scores on the measuring instrument must behave in
a way that is consistent with theory - Make measurements, test hypotheses
- Validating a measuring instrument also validates
(or refutes) a theory
92Max Consumption as a measure of Alcoholism
- Construct of Alcoholism
- People all over the world consume alcohol
- Individual differences in alcohol consumption
- Some persons use of alcohol is considered
pathological - Drink large quantities, spend excessive time
drinking or pursuing alcohol, interferes with
major life roles (work, parent), unable to stop
drinking, withdrawal, medical problems and
continued use despite medical problems
93Alcoholism Definitions
- DSM-IV criteria for Alcohol Dependence
- 3 symptoms (or more) occurring in the same
12-month period - Tolerance, withdrawal, drinking more than
intended, unable to cut down, great deal of time
spent obtaining, consuming, or recovering from
substance use, important activities given up, use
continued despite physical or psychological
problem caused or exacerbated by the substance
94Maximum Consumption
- What is the largest amount of alcohol you have
ever consumed in 24 hours? - Alternative measure of alcoholism?
- Must demonstrate the same associations as would
be predicted for Alcoholism
95Max Consumption vs. Alc Dep
- Advantages Max Consumption
- Objective number and easy to compare across
people - More socially acceptable people reluctant to
admit to Alc Dep symptoms - Quantitative, spans the full range of
vulnerability to alcoholism - Alc Dep only measures the extreme range
- Lose information, lose statistical power
96Quantitative Measures and Alcoholism Severity
Threshold
Liability
97Max Consumption vs. Alc Dep
- Potential Disadvantages
- Sufficient content validity?
- How accurate at people at reporting?
- False positives?
- False negatives?
- Most of these are no different for any
- other measures including Alc Dep
98Alcoholisms Nomological Network
Drug Use
Intelligence
Tobacco Use
School Achievement
Adult Antisocial Behavior
Alcoholism
Delinquency
Depression
Risky Sexual Behavior
99Hypotheses connecting Alcoholism and other
constructs
- Construct
- Drug use
- Tobacco use
- Adult Antisocial Behavior
- Delinquency
- Risky Sexual Behavior
- Depression
- Intelligence
- School Achievement
- Predicted relation
- Strong ()
- Strong ()
- Strong ()
- Strong ()
- Moderate ()
- Zero, small ()
- Zero, small (-)
- Small (-)
100Does Max Consumption Reproduce the same
Nomological Network?
Drug Use
Intelligence
Tobacco Use
School Achievement
Adult Antisocial Behavior
MAX CONS
Delinquency
Depression
Risky Sexual Behavior
101Validate Max Consumption as measure of Alcoholism
- Does Max Consumption exhibit the same relations
with other constructs? - Have to measure each construct
- Make observations in representative sample
- Test hypotheses using statistics
102Need to measure each construct
- Construct
- Drug use
- Tobacco use
- Adult Antisocial Behavior
- Delinquency
- Risky Sexual Behavior
- Depression
- Intelligence
- School Achievement
- Measure
- DSM symptoms Drug Dependence
- Nicotine Dependence
- Antisocial Personality Disorder
- Conduct Disorder
- Life Events Interview
- DSM Major Depression
- Wechsler IQ scores
- Class Grades
103Sample
- Minnesota Twin Family Study
- 17-year old male and female twins
- Born in MN, recruited from all over the state
- Almost all white, IQ gt 70, no mental or physical
disability - Representative?
104Statistics
- Need to use statistics to test hypotheses
- Rely on correlations
- Correlation index of association ranges from 1
to 1 - 1 perfect positive relation
- -1 perfect inverse relation
- 0 no association
105Convergent Discriminant Relations
- Convergent validity
- Measure should be positively correlated with
certain constructs - Discriminant Validity
- Measure should be uncorrelated or negatively
correlated with other constructs
106Convergent Validity
- Test should be positively correlated with other
tests attempting to measure the same construct - Correlation between Max Consumption Alc. Dep
r .65 - Big correlation
- Measures a similar construct, but not the same
construct - Which is a better measure of Alcoholism?
107Convergent Validity
108Discriminant Validity
109Group (mean) differences
- Alcoholism more common in men than women
- Mean Alc Dep symptoms
- Men .63, Women .43
- Mean Max Consumption
- Men 7.7, Women 4.87
- Using t-tests, means are significantly different
for both
110Evaluate Measure
- How consistent are the observations with theory?
- As measures of Alcoholism, both Max Cons and Alc
Dep criteria are related to external constructs
in a way consistent with theory - Therefore, both are valid measures of Alcoholism
- It might seem simple, but if the observations are
NOT as predicted, the test is not valid
111Is Construct Validation over?
- No, its just the beginning
- Continue to delineate relations with other
constructs - Now that good measures of the construct of
Alcoholism are available - Can now study the etiology or causal processes of
Alcoholism
112Construct Elaboration
- As new observations accrue, new criteria to
evaluate the construct validity of measures - For example, develop markers of the underlying
processes of Alcoholism - Specific genes
- psychophysiological markers present before
symptom onset - The test that has a stronger relation with these
variables is more valid measure of the construct
113What to do when theory and data dont agree?
- Have to determine whether the measure is invalid
or theory is wrong - Need multiple lines of evidence
- Tests of hypotheses across different measures and
samples - A body of evidence accumulates to make the
determination