Reliability - PowerPoint PPT Presentation

1 / 57

About This Presentation

Title:

Reliability

Description:

Consistency gives researchers confidence that the results actually represent the ... Consistency Over Time. Usually expressed ... Internal-Consistency Methods ... – PowerPoint PPT presentation

Number of Views:48

Avg rating:3.0/5.0

Slides: 58

Provided by: DrDavidD

Category:

Tags: reliability

more less

Transcript and Presenter's Notes

Title: Reliability

1
Reliability Validity of Instruments

Lesson 8

2
The MaxMinCon Principle

Maximize
The variance of the variables under study
You want greater variance in dependent variable
as a result of the independent variable.
Make treatments as different as possible.

3
The MaxMinCon Principle

Minimize
Error or random variance including errors in
measurement.

4
From Week 1

Typically concerned with three aspects of
validity
Measurement validity exists when a measure
measures what we think it measures.
Generalizability exists when a conclusion holds
true for the population, group, setting, or event
that we say it does, given the conclusions that
we specify.

5
Introduction to Validity

(continued)
Internal validity (also called causal validity)
exists when a conclusion that A leads to or
results in B is correct.

6
Measurement Validity

How do we know that this measuring instrument
measures what we think it measures?
Deals with accuracy
There is an inference involved
Between the indicators we can observe and the
construct we aim to measure.

7
Types of Validity

Content
Face validity
Criterion-related
Concurrent
Predictive
Construct
Convergent
Discriminant

8
Content Validity

Focuses on whether the full content of a
conceptual definition is represented in the
measure
Essentially, checks the operationalization
against the relevant content domain of the
construct(s) you are measuring.

9
Content Validity

Driven by the literature review.
Did you include all the content that you should
have to measure the construct?
What are the criteria to measure the construct?

10
Content Validation

Two steps
Specify the content of a definition.
Develop indicators which sample from all areas of
the content in the definition.

11
Content Validity

Face validity
A visual inspection of the test by expert (or
non-expert) reviewers.
Often included as a part (or subset) of content
validity.

12
Criterion-related Related

An indicator is compared with another measure of
the same construct in which the research has
confidence.
Comparing test or scale scores with one or more
external variables known or believed to measure
the attributes under study.

13
Criterion-related Related

An instrument high in criterion-related validity
helps test users make better decisions in terms
of placement, classification, selection, and
assessment.

14
Criterion-related

Two methods
Concurrent
Predictive
Difference is the temporal (time) relationship
between the operationalizations and the criterion.

15
Concurrent

The criterion variable exists in the present
Compares the operationalizations ability to
distinguish between groups that it should
theoretically be able to distinguish between.

16
Concurrent

Example
A researcher might wish to establish the
awareness of students about their performance in
school during the past year.
Ask student What was your grade point average
last year?
Compare to school records
Calculate correlation

17
Predictive

Criterion variable will not exist until later.
You assess the operationalizations ability to
predict something it should theoretically be able
to predict

18
Predictive

Example
A researcher might wish student to anticipate
their performance in school during the next year.
Ask student What do you think your GPA will be
next year?
Compare to school records after year is completed
Calculate correlation

19
Construct Validity

Focuses on how well a measure conforms with
theoretical expectations
Established by relating a presumed measure of a
construct to some behavior that it is
hypothesized to underlie.

20
Construct Validity

Think of construct validity as the distinction
between two broad territories the land of theory
and the land of observation.

21
Construct Validity
22
Construct Validity Methods

Convergent
Examine the degree to which the
operationalization is similar to (converges on)
other operationalizations to which it
theoretically should be similar.

23
Construct Validity Methods

Discriminant
Examine the degree to which the
operationalization is not similar to (diverges
from) other operationalizations that it
theoretically should not be similar to.

24
Reliability

Reliability is another important consideration,
since researchers want consistent results from
instrumentation
Consistency gives researchers confidence that the
results actually represent the achievement of the
individuals involved.

25
Two Main Aspects to Reliability

Consistency over time (or stability)
Internal consistency

26
Consistency Over Time

Usually expressed in this question
If the same instrument were given to the same
people, under the same circumstances, but at a
different time, to what extent would they get the
same scores?
Takes two administrations of the instrument to
establish

27
Internal Consistency

Relates to the concept-indicator idea of
measurement
Since we will use multiple items (indicators) to
infer to a concept, the question concerns the
extent that these items are consistent with each
other.
All working in the same direction
Only one administration of instrument

28
Reliability

Scores obtained can be considered reliable but
not valid.
An instrument should be reliable and valid
(Figure 8.2), depending on the context in which
an instrument is used.

29
Errors of Measurement

Because errors of measurement are always present
to some degree, variation in test scores are
common.
This is due to
Differences in motivation
Energy
Anxiety
Different testing situation

30
Some Factors Influencing Reliability

The greater the number of items, the more
reliable the test.
In general, the longer the test administration
time, the greater the reliability.
The narrower the range of difficulty of items,
the greater the reliability.
The more objective the scoring, the greater the
reliability.

31
Some Factors Influencing Reliability

The greater the probability of success by chance,
the lower the reliability.
Inaccuracy in scoring leads to unreliability.
The more homogeneous the material, the greater
the reliability.

32
Some Factors Influencing Reliability

The more common the experiences of the
individuals tested, the greater the reliability
Catch/trick questions lower reliability
Subtle factors leading to misinterpretation of
the test item lead to unreliability.

33
Reliability Coefficient

Expresses a relationship between scores of the
same instrument at two different times or parts
of the instrument.
The 3 best known methods are
Test-retest
Equivalent forms method
Internal consistency method

34
Test-Retest Method

Involves administering the same test twice to the
same group after a certain time interval has
elapsed.
A reliability coefficient is calculated to
indicate the relationship between the two sets of
scores.
A coefficient of stability ( a Pearson product
moment correlation)

35
Test-Retest Method

Reliability coefficients are affected by the
lapse of time between the administrations of the
test.
An appropriate time interval should be selected.
Greater than zero but less than 6 months

36
Test-Retest Method

When reporting a coefficient of stability
Always report the time interval between
administrations (as a subscript to the r)
Report any significant experiences that may have
intervened in the measurements.

37
Test-Retest Method

When reporting a coefficient of stability
Describe the conditions of each measurement to
account for measurement error due to poor
lighting, loud noises and the like.

38
Equivalent-Forms Method

Two different but equivalent (alternate or
parallel) forms of an instrument are administered
to the same group during the same time period.
Also called parallel form reliability procedure.

39
Equivalent-Forms Method

A reliability coefficient is then calculated
between the two sets of scores.
A coefficient of equivalence (again, a Pearson
r correlation coefficient)

40
Equivalent-Forms Method

Parallel forms should
Contain the same number of items
Contain items of equal difficulty
Have means, variances, and interrelations with
other variables that are not significantly
different from each other.

41
Equivalent-Forms Method

Parallel form tests are often needed for studies
with pretests and posttests
Where it is important that they are not the same
tests but measure the same things.

42
Equivalent-Forms Method

It is possible to combine the test-retest and
equivalent-forms methods by giving two different
forms of testing with a time interval between the
two administrations.
This produces a coefficient of stability and
equivalency

43
Internal-Consistency Methods

There are several internal-consistency methods
that require only one administration of an
instrument.
In essence, computer splits the instrument into
two halves and correlates scores from each half.
Because it cuts it in half, a short instrument
will be calculated conservatively.

44
Internal-Consistency Methods

Split-half Procedure
involves scoring two halves of a test separately
for each subject and calculating the correlation
coefficient between the two scores.
split systematically (odd-even) or randomly
A Spearman-Brown correction formula is used

45
Internal-Consistency Methods

Other measures do not require the researcher to
split the instrument in half
Assess the homogeneity of the items.

46
Internal-Consistency Methods

Two sources of random error are reflected in the
measures of reliability using homogeneity of the
items.
Content sampling (as in split-half)
Heterogeneity of the behavior domain sampled
The more homogeneous the domain is, the less
error thus the higher the reliability

47
Internal-Consistency Methods

Kuder-Richardson Approaches (KR20 and KR21)
Requires 3 pieces of information
Number of items on the test
The mean
The standard deviation

48
Internal-Consistency Methods

Kuder-Richardson Approaches (KR20 and KR21)
KR20 is used on dichotomous items KR21 on
multiple response items
This test does assume that all items have equal
difficulty
If not, estimates will be lower

49
Internal-Consistency Methods

Alpha Coefficient (Cronbachs Alpha)
A general form of the KR20 used to calculate the
reliability of items that are not scored right
vs. wrong.

50
Internal-Consistency Methods

KR21 and Cronbachs alpha are most widely used
because of only requiring one administration.
Choice may depend on which statistical package
you use (SPSS has Cronbachs alpha).

51
Running Our Reliability

Design items that appear to measure the same
domain.
Collect data using a pilot test.
Run Cronbachs alpha procedure.

52
Running Our Reliability

View correlation matrix
No 0s or 1s?
No negatives?
Eliminate negatives, ones, and zeros
View alpha if item deleted
If alpha can be raised by deletion decide if
reduction in number of items will hurt content
validity.
Rerun alpha and repeat procedure

53
How high must reliability of a measurement be?

There is no absolute answer to this question.
May depend on competition -- Are there stronger
instruments available?
Early stages of developing a test for a
construct, a reliability of .50 or .60 may
suffice.
Usually the higher the better (at least .70)

54
How high must reliability of a measurement be?

Remember
A reliability coefficient of .90 says that 90 of
the variance in scores is due to true variance in
the characteristic measures leaving only 10 due
to error.

55
Standard Error of Measurement

An index that shows the extent to which a
measurement would vary under changed
circumstances.
There are many possible standard errors for
scores given.
Also known as measurement error, a range of
scores that show the amount of error which can be
expected. (Appendix D)

56
Scoring Agreement

Scoring agreement requires a demonstration that
independent scorers can achieve satisfactory
agreement in their scoring.
Instruments that use direct observations are
highly vulnerable to observer differences (e.g.
qualitative research methods).

57
Scoring Agreement