Title: Development of a Confidence Interval for Small Sample Expert Review of Item Content Validation
1Development of a Confidence Interval for Small
Sample Expert Review of Item Content Validation
- Jeffrey M. Miller Randall D. Penfield
- FERA November 19, 2003
- University of Florida
- millerjm_at_ufl.edu penfield_at_coe.ufl.edu
2INTRODUCING CONTENT VALIDITY
- Validity refers to the degree to which evidence
and theory support the interpretations of test
scores entailed by proposed uses of tests
(AERA/APA/NCME, 1999) - Content validity refers to the degree to which
the content of the items reflects the content
domain of interest (APA, 1954)
3THE NEED FOR IMPROVED REPORTING
Content is a precursor to drawing a score-based
inference. It is evidence-in-waiting (Shepard,
1993 Yalow Popham, 1983)
Unfortunately, in many technical manuals,
content representation is dealt with in a
paragraph, indicating that selected panels of
subject matter experts (SMEs) reviewed the test
content, or mapped the items to the content
standards and all is well (Crocker, 2003)
4QUANTIFYING CONTENT VALIDITY
- Several indices for quantifying expert agreement
have been proposed - For many, experts quantify the match of the item
to an objective using a rating scale - The mean rating across raters is often used in
calculations - Klein Kosecoffs Correlation (1975)
- Aikens V (1985)
- The mean, by itself, does not account for error
and does not tell us how far it lies from the
population mean. WE NEED A CONFIDENCE INTERVAL!
5THE CONFIDENCE INTERVAL
- The traditional confidence interval assumes a
normal distribution for the sample mean of a
rating scale. - However, the assumption of population normality
can not be justified when analyzing the mean of
an individual scale item because - 1.) the outcomes of the items are discrete
- 2.) the items are bounded by the limits of the
Likert-scale. - 3.) sample size for raters is too small even if
the above were not problematic
6SCORE CONFIDENCE INTERVAL FOR RATING SCALES
- The Score confidence interval (Penfield, 2003)
treats rating scale variables as outcomes of a
binomial distribution. - This interval is asymmetric
- Hence, it is based on the actual distribution for
the item of concern. - Further, the limits cannot extend below or above
the actual limits of the categories.
7- 1. Obtain values for n, k, and z
- n the number of raters
- k the number of possible ratings
- The highest rating is scale starts with 0
- The highest rating minus 1 if scale starts
greater than 0 - z the standard normal variate associated with
the confidence level (e.g., /- 1.96 at 95
confidence)
8- 2. CalculateThe sum of the ratings for an
item divided by the number of raters
9- 3. Calculate p
- Or if scale begins with 1 then
10- 4. Use p to calculate the upper and lower limits
for the population proportion (Wilson, 1927)
11- 5. Calculate the upper and lower limits of the
Score confidence interval
12- Shorthand Example (cont.)
- Let n 10, k 4, z 1.96, and let the sum of
the items 31 -
-
so, the mean equals 31/10 3.100 -
-
so, p 31 / (104) 0.775 -
13- Shorthand Example (cont.)
-
-
-
- 3.100 1.96sqrt(0.938/10) 2.500
- 3.100 1.96sqrt(0.421/10) 3.507
-
14-
-
-
- (65.842 11.042) / 87.683 0.625
- (65.842 11.042) / 87.683 0.877
-
15We are 95 confident that the population mean
rating falls somewhere between 2.500 and 3.507
16