Introduction to Psychometrics - PowerPoint PPT Presentation

About This Presentation
Title:

Introduction to Psychometrics

Description:

Introduction to Psychometrics Psychometrics Some important language Properties of a good measure ... Social Skills -- achievement or personality ?? Aptitude ... – PowerPoint PPT presentation

Number of Views:302
Avg rating:3.0/5.0
Slides: 20
Provided by: Gar68
Learn more at: https://psych.unl.edu
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Psychometrics


1
Introduction to Psychometrics
  • Psychometrics
  • Some important language
  • Properties of a good measure
  • Standardization
  • Reliability
  • Validity
  • Common Item types
  • Reverse Keying
  • Construction Validation Process

2
  • Psychometrics
  • (Psychological measurement)
  • The process of assigning values to represent the
    amounts and kinds of specified attributes, to
    describe (usually) persons.
  • We do not measure people
  • We measure specific attributes or
    characteristics of a person
  • Psychometrics is the centerpiece of empirical
    psychological research and practice.
  • All data result from some form of measurement
  • What weve meant by Measurement Validity all
    along
  • The better the measurement, the better the data,
    the better the conclusions of the psychological
    research or application

3
Most of what we try to measure in Psychology are
constructs Theyre called constructs because
most of what we care about as psychologists are
not physical measurements, such as height,
weight, pressure velocity rather the stuff
of psychology ? learning, motivation, anxiety,
social skills, depression, wellness, etc. are
things that dont really exist that have been
constructed to help us describe and understand
behavior. They are attributes and
characteristics that weve constructed to give
organization and structure to behavior.
Essentially all of the things we psychologists
research, both as causes and effects, are
Attributive Hypotheses with different levels of
support and acceptance!!!!
4
Measurement of constructs is more difficult than
of physical properties! We cant just walk up
to someone with a scale, ruler, graduated
cylinder or velocimeter and measure how depressed
they are. We have to figure out some way to turn
their behavior, self-reports or traces of their
behavior into variables that give values for the
constructs we want to measure. So, measurement
is, much like the rest of research that weve
learned about so far, all about representation
!!! Measurement Validity is the extent to which
the variable values (data) we have represent the
behaviors we want to study.
5
  • What are the different types of constructs we
    measure ???
  • The most commonly discussed types are ...
  • Achievement -- performance broadly defined
    (judgements)
  • e.g., scholastic skills, job-related skills,
    research DVs, etc.
  • Attitude/Opinion -- how things should be
    (sentiments)
  • polls, product evaluations, etc.
  • Personality -- characterological attributes
    (keyed sentiments)
  • anxiety, psychoses, assertiveness, etc.
  • There are other types of measures that are often
    used
  • Social Skills -- achievement or personality ??
  • Aptitude -- how well some will perform after
    then are trained and experiences but measures
    before the training experience
  • some combo of achievement, personality and
    likes
  • IQ -- is it achievement (things learned) or is
    it aptitude for academics, career and life ??

6
  • Some language
  • Mostly we will talk about measurement using
    self-report
  • behavioral observation, instrumentation trace
    indices are all part of measurement, but
  • Each question is called an ? item
  • Kinds of items ? objective items vs. subject
    items
  • objective does not mean true or real
  • objective means no judgment or evaluation is
    required
  • there is one correct answer and everything
    else is wrong
  • e.g., multiple choice, TF, fill-in-the-blanks
  • subjective means that someone has to judge what
    is correct
  • short answer, essay

7
  • Some more language
  • A collection of items is called many things
  • e.g., survey, questionnaire, instrument,
    measure, test, or scale
  • Three kinds of item collections you should know
    ..
  • Scale (Test) - all items are put together to
    get a single score
  • Subscale (Subtest) item sets put together
    to get multiple separate scores
  • Surveys each item gives a specific piece of
    information
  • Most questionnaires or surveys are a
    combination of all three, giving data like you
    used for your research project
  • single demographic history survey items
  • some instruments that gave a singe scale score
  • some instruments that gave multiple subscale
    score

8
  • Some more language
  • Psychometric Sampling Inference process
  • Research Sampling is about how well a sample of
    participants represents the target population
  • we collect data from the sample and infer that
    the statistical results from the sample tell us
    about the entire population
  • Measurement Sampling is about how well a scale (a
    set of items) represents behavior (domain)
  • we collect data using the scale and infer that
    the score we get reflects the score for the
    behavior (entire domain)
  • Psychometric Sampling is both
  • collecting a set of items sampled from a domain
    from a set of participants sampled from a
    population
  • and using statistics calculated from the scale
    scores to represent the behavior of that
    population

9
Desirable Properties of Psychological
Measures Interpretability of Individual and
Group Scores Population Norms (Typical
Scores) Validity (Consistent Accuracy) Reliabili
ty (Consistency) Standardization (Administration
Scoring)
10
  • Standardization
  • Administration test is given the same way
    every time
  • who administers the instrument
  • specific instructions, order of items, timing,
    etc.
  • Varies greatly - multiple-choice classroom test
    ? hand it out) - WAIS -- 100 page
    administration manual
  • Scoring test is scored the same way every
    time
  • who scores the instrument
  • correct, partial and incorrect answers, points
    awarded, etc.
  • Varies greatly -- multiple choice test (fill in
    the sheet) -- WAIS 200 page scoring
    manual

11
  • Reliability (Consistency or Agreement)
  • Inter-rater or Inter-observers reliability
  • do multiple observers/coders score an item the
    same way ?
  • important whenever using subjective items
  • Internal reliability -- do the items measure a
    central thing
  • Cronbachs alpha ? a .00 1.00 ? higher
    values mean stronger
    internal consistency/reliability
  • External Reliability -- consistency of
    scale/test scores
  • test-retest reliability correlate scores from
    same test given 3-18 weeks apart
  • alternate forms reliability correlate scores
    from two
    versions of the test

12
  • Validity (Consistent Accuracy)
  • Face Validity -- do the items come from domain
    of interest ? non-statistical -- decision of
    target population
  • Content Validity -- do the items come from
    domain of interest? non-statistical --
    decision of expert in the field
  • Criterion-related Validity -- does test correlate
    with criterion?
  • statistical -- requires a criterion that you
    believe in
  • predictive, concurrent, postdictive validity
  • Construct Validity -- does test relate to other
    measures it should?
  • Statistical -- Discriminant validity
  • convergent validity -- correlates with selected
    tests
  • divergent validity -- doesnt correlate with
    others

13
  • Is the test valid?
  • Jum Nunnally (one of the founders of modern
    psychometrics) claimed this was silly question!
    The point wasnt that tests shouldnt be valid
    but that a tests validity must be assessed
    relative to
  • the construct it is intended to measure
  • the population for which it is intended (e.g.,
    age, level)
  • the application for which it is intended (e.g.,
    for classifying folks into categories vs.
    assigning them quantitative values)
  • So, the real question is, Is this test a valid
    measure of this construct for this population in
    this application? That question can be answered!

14
Most Common Types of Items ??? Personality,
Attitude, Opinion (Psychology) Items 1. How do
you feel today ? Unhappy 1 2 3 4 5
happy 2. How interested are you in campus
politics ? Interested 1 2 3 4 5 6 7
Uninterested
  • These are called Likert or Likert-Type
    items
  • statement
  • response along a continuum with verbal anchors
  • 5- 7- 9-point response scales are common

15
  • Most Common Types of Items ???
  • Personality, Attitude, Opinion (Psychology)
    Items
  • 1. Which of these best describes you ?
  • a. I am mostly interested in the social side
    of college.
  • b. I am mostly interested in the intellectual
    side of college.
  • 2. Would you rather spend time with a friend ...
  • at your favorite restaurant
  • watching a sporting event

These are called Forced Choice items
16
Most Common Types of Items ??? Test Items 1.
Which of these is one of the 7 dwarves ? a.
Grungy b. Sleazy c. Kinky d. Doc e.
Dorky 2. What should you do if the traffic
light turns yellow as you approach an
intersection ? a. Stop b. Speed up
c. Check for Police and then choose a
vs. b
These (as you know) are called Multiple Choice
items Their difference from Likert items is
that, for these, the response options are
qualitative different.
17
Reverse Keying We want the respondents to
carefully read an separately respond to each item
of our scale/test. One thing we do is to write
the items so that some of them are backwards or
reversed Consider these items from a
depression measure 1. It is tough to get out of
bed some mornings. disagree 1 2 3 4
5 agree 2. Im generally happy about my life.
1 2 3 4 5 3. I sometimes just want to
sit and cry. 1
2 3 4 5 4. Most of the time I have a smile
on my face. 1 2 3 4 5 If the
person is depressed, we would expect then to
give a fairly high rating for questions 1 3,
but a low rating on 2 4. Before aggregating
these items into a composite scale or test score,
we would reverse key items 2 4 (15, 24,
42, 51)
18
Scale Construction Validation Process
  • Determine what kind of scale you are trying to
    make
  • Construct, Population Application
  • Focus groups
  • Work with subject matter experts to define the
    domain
  • Write items
  • Content validity is the focus
  • item types, reverse keying face validity
  • Focus groups
  • Back to the SMEs to evaluate content validity
  • Pilot the scale
  • Walk-through with members of target population
    for readability, word choices, reverse-keying
    issues, etc.
  • Assess face validity
  • Establish standards
  • Administration
  • Scoring

19
Scale Construction Validation Process, cont.
  • Collect data from first sample
  • Evaluate internal reliability
  • Evaluate alternate form reliability (if
    applicable)
  • Evaluate inter-rater reliability (if applicable)
  • Collect data again from the same sample (3-18
    weeks later)
  • Evaluate test-retest reliability
  • Collect data from second sample (including other
    measures)
  • Repeat internal and external reliability analyses
  • Evaluate criterion-related validity (if
    applicable)
  • Evaluate discriminant validity (if applicable)
  • Collect data from third sample (including other
    measures)
  • Repeat validity evaluation(s) called
    cross-validation
  • Combining all data Establish population
    standards
  • Population norms (e.g., mean std)
  • Cutoff scores (diagnosis, selection, etc.)
Write a Comment
User Comments (0)
About PowerShow.com