Title: Introduction to Psychometrics
1Introduction to Psychometrics
- Psychometrics Measurement Validity
- Constructs Measurement
- Kinds of Items
- Properties of a good measure
- Standardization
- Reliability
- Validity
- Standardization Inter-rater Reliabiligy
2- Psychometrics
- (Psychological Measurement)
- The process of assigning a value to represent the
amount or kind of a specific attribute of an
individual. - Individuals can be participants, collectives,
stimuli, or processes - We do not measure individuals
- We measure specific attributes of an individual
E.g., Each participant in the Heptagonal
Condition was presented with a 2 inch wide
polygon to view for 10 seconds. Then this polygon
and four similar ones were presented and the
participants reaction time to identify the
polygon presented previously was recorded.
We will focus on measuring attributes of persons
in this introduction!
3- Psychometrics is the centerpiece of scientific
empirical psychological research practice. - All psychological data result from some form of
measurement - Behaviors are collected by observation,
self-report or behavioral traces. - Measurement is the process of turning those
behaviors into data for analysis - For those data to be useful we need Measurement
Validity - The better the measurement, the better the data,
the more accurate and the more useful are the
conclusions of the data analysis for the
intended psychological research or application
Without Measurement Validity, there cant be
Internal Validity, External Validity, or
Statistical Conclusion Validity!
4Most of what we try to measure in Psychology are
constructs Theyre called this because most of
what we care about as psychologists are not
physical measurements, such as height, weight,
pressure velocity rather the stuff of
psychology ? learning, motivation, anxiety,
social skills, depression, wellness, etc. are
things that dont really exist. Rather, they
are attributes and characteristics that weve
constructed to give organization and structure to
behavior. Essentially all of the things we
psychologists research, both as causes and
effects, are Attributive Hypotheses with
different levels of support and acceptance!!!!
5Measurement of constructs is more difficult than
measurement of physical properties! We cant
just walk up to someone with a scale, ruler,
graduated cylinder or velocimeter and measure how
depressed they are. We have to figure out some
way to turn observations of their behavior,
self-reports or traces of their behavior into
variables that give values for the constructs we
want to measure. So, measurement is, just like
the rest what weve learned about so far in this
course, all about representation !!! Measurement
Validity is the extent to which the data
(variable values) we have represent the behaviors
(constructs) we want to study.
6- What are the different types of constructs we
measure from persons ??? - The most commonly discussed types are ...
- Demographics population/subpopulation
identifiers - e.g., age, gender, race/ethnic, history
variables - Ability/Skill performance broadly defined
- e.g., scholastic skills, job-related skills,
research DVs, etc. - Attitude/Opinion how things are or should be
- e.g., polls, product evaluations, etc.
- Personality characterological contextual
attributes of an individual - e.g., anxiety, psychoses, assertiveness,
extroversion, etc.
7- However, it is difficult to categorize many of
the things we Psychologists measure.. -
- Diagnostic Category
- achievement limits of what can be
learned/expressed /or - personality private social expressions
/or - attitude/opinion beliefs feelings
- Social Skills
- achievement something that has been learned ?
/or - personality how we get along socially is part
of who we are ? - Intelligence
- innate (biological) preparedness for learning
/or - achievement earlier learning more
intelligence - Aptitude
- achievement know things necessary to learn
other things /or - specific capacity the ability to learn certain
skills
8- Each separate thing we measure is called an
item - e.g., a question, a problem, a page, a trial,
etc. - Collections of items are called many things
- e.g., survey, questionnaire, instrument,
measure, test, or scale - Three kinds of item collections you should know
.. - Scale (Test) - all items are put together to
get a single score - Subscale (Subtest) item sets put together
to get multiple separate scores - Surveys each item gives a specific piece of
information - Most questionnaires, surveys or interviews
are a combination of all three.
9There are skads of ways of classifying or
categorizing items, here are three ways that I
want you to be familiar with
- Kinds of items 1? objective items vs. subject
items - objective does not mean true real or
accurate - subjective does not mean made up or
inaccurate - Defined by how the observer/interviewer/coder
transforms participants responses into data
Objective Items - no evaluation or decision is
needed either response data or a
mathematical transformation e.g., multiple
choice, TF, matching, fill-in-the-blanks (strict)
Subjective Items response must be evaluated and
a decision or judgment made what should be the
data value content coding, diagnostic systems,
behavioral taxonomies e.g., essays, interview
answers, drawings, facial expressions
10Bit more about objective vs. subjective
- Seems simple
- the objective measure IS the behavior of interest
- e.g., impolite statements, GPA, hourly sales,
publications - problems? Objective doesnt mean
representative
- Seems harder
- subjective rating of behavior IS the behavior of
interest - e.g., friends eval, advisors eval, managers
eval, Chairs eval - problems? Good subjective measures are hard
work, but
- Hardest most common
- construct of interest isnt a specific behavior
- e.g., social skills, preparation for the
professorate, sales skill, contribution to the
department - problems ? What is construct how represent it
???
11- Kind 2 ? Judgments, Sentiments Scored
Sentiments - Judgments ? do have a correct answer (e.g., 2
2 4) - the behavior, response or trace must be
scored (compared it to the correct answer) to
produce the variable/data - scoring may be objective or subjective,
depending on item
- Scored Sentiments ? do not have a correct answer
but do have an indicative answer (e.g., Do you
prefer to be alone?) - behavior, response or trace must be scored
(compared it to the indicative answer) to
produce the variable/data - scoring may be objective or subjective,
depending on item
- Sentiments ? do not have a correct answer (e.g.,
Like Psyc350?) or have a correct answer, but we
wont check (e.g., age) - the behavior, response or trace is the
variable/data - scoring may be objective or subjective,
depending on item
12- Using Judgments, Sentiments Scored Sentiments
- Judgments ? do have a correct answer
- Ability/skill
- Intelligence
- Diagnostic category
- Aptitude
- Scored Sentiments ? do not have a correct answer
but do have an indicative answer - Personality
- Diagnostic category
- Aptitude
- Sentiments ? do not have a correct answer or
have a correct answer, but we wont check - Demographics
- Attitude/Opinion
13Kind 3 ? Direct Keying vs. Reverse Keying We
want the respondents to carefully read and
respond to each item of our scale/test. One
thing we do is to write the items so that some of
them are backwards or reversed Consider
these items from a depression measure 1. It is
tough to get out of bed some mornings.
disagree 1 2 3 4 5 agree 2. Im generally
happy about my life. 1 2 3 4 5 3.
I sometimes just want to sit and cry.
1 2 3 4 5 4. Most of
the time I have a smile on my face. 1
2 3 4 5
If the person is depressed, we would expect
then to give a fairly high rating for questions 1
3, but a low rating on 2 4. Before
aggregating these items into a composite scale or
test score, we would direct key (11, 22, 33,
44, 55) and reverse key items 2 4 (15, 24,
42, 51)
14Desirable Properties of Psychological
Measures Interpretability of Individual and
Group Scores Population Norms Validity
Reliability Standardization
15Desirable Properties of Psychological Measures
Interpretability of Individual Group Scores
Population Norms Scoring Distribution Cutoffs
Validity Face, Content, Criterioin-Related,
Construct
Reliability Inter-rater, Internal Consistency,
Test-Retest Alternate Forms
Standardization Administration Scoring
16- Standardization
- Administration test is given the same way
every time - who administers the instrument
- specific instructions, order of items, timing,
etc. - Varies greatly - multiple-choice classroom test
? hand it out - MMPI ? hand it out - - WAIS ? whole books
courses -
- Scoring test is scored the same way every
time - who scores the instrument
- correct, partial and incorrect answers, points
awarded, etc. - Varies greatly - multiple choice test ? fill in
the bubble sheet - - MMPI ? whole books
courses
- WAIS ? whole books
courses
17- We need to assess the inter-rater reliability of
the scores from subjective items. - Have two or more raters score the same set of
tests (usually 25-50 of the tests) - Assess the consistency of the scores different
ways for different types of items - Quantitative Items
- correlation, intraclass correlation, RMSD
- Ordered Categorical Items
- agreement, Cohens Kappa
- Keep in mind ? what we really want is rater
validity - we dont really want raters to agree, we want
then to be right! - so it is best to compare raters with a
standard rather than just with each other
18- Ways to improve inter-rater reliability
- improved standardization of the measurement
instrument - do questions focus respondents answers?
- will single sentence or or other response
limitations help?
- instruction in the elements of the
standardization - is complete explication possible? (borders on
objective) - if not, need conceptual matches
- practice with the instrument -- with feedback
- walk-through with experienced coders
- practice with common problems or historical
challenges
- experience with the instrument
- really no substitute
- have to worry about drift generational
reinterpretation
- use of the instrument to the intended population
- different populations can have different
response tendencies