Title: Reliability
1Reliability Validity of Instruments
2The MaxMinCon Principle
- Maximize
- The variance of the variables under study
- You want greater variance in dependent variable
as a result of the independent variable. - Make treatments as different as possible.
3The MaxMinCon Principle
- Minimize
- Error or random variance including errors in
measurement.
4From Week 1
- Typically concerned with three aspects of
validity - Measurement validity exists when a measure
measures what we think it measures. - Generalizability exists when a conclusion holds
true for the population, group, setting, or event
that we say it does, given the conclusions that
we specify.
5Introduction to Validity
- (continued)
- Internal validity (also called causal validity)
exists when a conclusion that A leads to or
results in B is correct.
6Measurement Validity
- How do we know that this measuring instrument
measures what we think it measures? - Deals with accuracy
- There is an inference involved
- Between the indicators we can observe and the
construct we aim to measure.
7Types of Validity
- Content
- Face validity
- Criterion-related
- Concurrent
- Predictive
- Construct
- Convergent
- Discriminant
8Content Validity
- Focuses on whether the full content of a
conceptual definition is represented in the
measure - Essentially, checks the operationalization
against the relevant content domain of the
construct(s) you are measuring.
9Content Validity
- Driven by the literature review.
- Did you include all the content that you should
have to measure the construct? - What are the criteria to measure the construct?
10Content Validation
- Two steps
- Specify the content of a definition.
- Develop indicators which sample from all areas of
the content in the definition.
11Content Validity
- Face validity
- A visual inspection of the test by expert (or
non-expert) reviewers. - Often included as a part (or subset) of content
validity.
12Criterion-related Related
- An indicator is compared with another measure of
the same construct in which the research has
confidence. - Comparing test or scale scores with one or more
external variables known or believed to measure
the attributes under study.
13Criterion-related Related
- An instrument high in criterion-related validity
helps test users make better decisions in terms
of placement, classification, selection, and
assessment.
14Criterion-related
- Two methods
- Concurrent
- Predictive
- Difference is the temporal (time) relationship
between the operationalizations and the criterion.
15Concurrent
- The criterion variable exists in the present
- Compares the operationalizations ability to
distinguish between groups that it should
theoretically be able to distinguish between.
16Concurrent
- Example
- A researcher might wish to establish the
awareness of students about their performance in
school during the past year. - Ask student What was your grade point average
last year? - Compare to school records
- Calculate correlation
17Predictive
- Criterion variable will not exist until later.
- You assess the operationalizations ability to
predict something it should theoretically be able
to predict
18Predictive
- Example
- A researcher might wish student to anticipate
their performance in school during the next year. - Ask student What do you think your GPA will be
next year? - Compare to school records after year is completed
- Calculate correlation
19Construct Validity
- Focuses on how well a measure conforms with
theoretical expectations - Established by relating a presumed measure of a
construct to some behavior that it is
hypothesized to underlie.
20Construct Validity
- Think of construct validity as the distinction
between two broad territories the land of theory
and the land of observation.
21Construct Validity
22Construct Validity Methods
- Convergent
- Examine the degree to which the
operationalization is similar to (converges on)
other operationalizations to which it
theoretically should be similar.
23Construct Validity Methods
- Discriminant
- Examine the degree to which the
operationalization is not similar to (diverges
from) other operationalizations that it
theoretically should not be similar to.
24Reliability
- Reliability is another important consideration,
since researchers want consistent results from
instrumentation - Consistency gives researchers confidence that the
results actually represent the achievement of the
individuals involved.
25Two Main Aspects to Reliability
- Consistency over time (or stability)
- Internal consistency
26Consistency Over Time
- Usually expressed in this question
- If the same instrument were given to the same
people, under the same circumstances, but at a
different time, to what extent would they get the
same scores? - Takes two administrations of the instrument to
establish
27Internal Consistency
- Relates to the concept-indicator idea of
measurement - Since we will use multiple items (indicators) to
infer to a concept, the question concerns the
extent that these items are consistent with each
other. - All working in the same direction
- Only one administration of instrument
28Reliability
- Scores obtained can be considered reliable but
not valid. - An instrument should be reliable and valid
(Figure 8.2), depending on the context in which
an instrument is used.
29Errors of Measurement
- Because errors of measurement are always present
to some degree, variation in test scores are
common. - This is due to
- Differences in motivation
- Energy
- Anxiety
- Different testing situation
30Some Factors Influencing Reliability
- The greater the number of items, the more
reliable the test. - In general, the longer the test administration
time, the greater the reliability. - The narrower the range of difficulty of items,
the greater the reliability. - The more objective the scoring, the greater the
reliability.
31Some Factors Influencing Reliability
- The greater the probability of success by chance,
the lower the reliability. - Inaccuracy in scoring leads to unreliability.
- The more homogeneous the material, the greater
the reliability.
32Some Factors Influencing Reliability
- The more common the experiences of the
individuals tested, the greater the reliability - Catch/trick questions lower reliability
- Subtle factors leading to misinterpretation of
the test item lead to unreliability.
33Reliability Coefficient
- Expresses a relationship between scores of the
same instrument at two different times or parts
of the instrument. - The 3 best known methods are
- Test-retest
- Equivalent forms method
- Internal consistency method
34Test-Retest Method
- Involves administering the same test twice to the
same group after a certain time interval has
elapsed. - A reliability coefficient is calculated to
indicate the relationship between the two sets of
scores. - A coefficient of stability ( a Pearson product
moment correlation)
35Test-Retest Method
- Reliability coefficients are affected by the
lapse of time between the administrations of the
test. - An appropriate time interval should be selected.
- Greater than zero but less than 6 months
36Test-Retest Method
- When reporting a coefficient of stability
- Always report the time interval between
administrations (as a subscript to the r) - Report any significant experiences that may have
intervened in the measurements.
37Test-Retest Method
- When reporting a coefficient of stability
- Describe the conditions of each measurement to
account for measurement error due to poor
lighting, loud noises and the like.
38Equivalent-Forms Method
- Two different but equivalent (alternate or
parallel) forms of an instrument are administered
to the same group during the same time period. - Also called parallel form reliability procedure.
39Equivalent-Forms Method
- A reliability coefficient is then calculated
between the two sets of scores. - A coefficient of equivalence (again, a Pearson
r correlation coefficient)
40Equivalent-Forms Method
- Parallel forms should
- Contain the same number of items
- Contain items of equal difficulty
- Have means, variances, and interrelations with
other variables that are not significantly
different from each other.
41Equivalent-Forms Method
- Parallel form tests are often needed for studies
with pretests and posttests - Where it is important that they are not the same
tests but measure the same things.
42Equivalent-Forms Method
- It is possible to combine the test-retest and
equivalent-forms methods by giving two different
forms of testing with a time interval between the
two administrations. - This produces a coefficient of stability and
equivalency
43Internal-Consistency Methods
- There are several internal-consistency methods
that require only one administration of an
instrument. - In essence, computer splits the instrument into
two halves and correlates scores from each half. - Because it cuts it in half, a short instrument
will be calculated conservatively.
44Internal-Consistency Methods
- Split-half Procedure
- involves scoring two halves of a test separately
for each subject and calculating the correlation
coefficient between the two scores. - split systematically (odd-even) or randomly
- A Spearman-Brown correction formula is used
45Internal-Consistency Methods
- Other measures do not require the researcher to
split the instrument in half - Assess the homogeneity of the items.
46Internal-Consistency Methods
- Two sources of random error are reflected in the
measures of reliability using homogeneity of the
items. - Content sampling (as in split-half)
- Heterogeneity of the behavior domain sampled
- The more homogeneous the domain is, the less
error thus the higher the reliability
47Internal-Consistency Methods
- Kuder-Richardson Approaches (KR20 and KR21)
- Requires 3 pieces of information
- Number of items on the test
- The mean
- The standard deviation
48Internal-Consistency Methods
- Kuder-Richardson Approaches (KR20 and KR21)
- KR20 is used on dichotomous items KR21 on
multiple response items - This test does assume that all items have equal
difficulty - If not, estimates will be lower
49Internal-Consistency Methods
- Alpha Coefficient (Cronbachs Alpha)
- A general form of the KR20 used to calculate the
reliability of items that are not scored right
vs. wrong.
50Internal-Consistency Methods
- KR21 and Cronbachs alpha are most widely used
because of only requiring one administration. - Choice may depend on which statistical package
you use (SPSS has Cronbachs alpha).
51Running Our Reliability
- Design items that appear to measure the same
domain. - Collect data using a pilot test.
- Run Cronbachs alpha procedure.
52Running Our Reliability
- View correlation matrix
- No 0s or 1s?
- No negatives?
- Eliminate negatives, ones, and zeros
- View alpha if item deleted
- If alpha can be raised by deletion decide if
reduction in number of items will hurt content
validity. - Rerun alpha and repeat procedure
53How high must reliability of a measurement be?
- There is no absolute answer to this question.
- May depend on competition -- Are there stronger
instruments available? - Early stages of developing a test for a
construct, a reliability of .50 or .60 may
suffice. - Usually the higher the better (at least .70)
54How high must reliability of a measurement be?
- Remember
- A reliability coefficient of .90 says that 90 of
the variance in scores is due to true variance in
the characteristic measures leaving only 10 due
to error.
55Standard Error of Measurement
- An index that shows the extent to which a
measurement would vary under changed
circumstances. - There are many possible standard errors for
scores given. - Also known as measurement error, a range of
scores that show the amount of error which can be
expected. (Appendix D)
56Scoring Agreement
- Scoring agreement requires a demonstration that
independent scorers can achieve satisfactory
agreement in their scoring. - Instruments that use direct observations are
highly vulnerable to observer differences (e.g.
qualitative research methods).
57Scoring Agreement
- What is desired is a correlation of at least .90
among scorers as an acceptable level of agreement.