Title: Dr. Mohammad H. Omar Department of Mathematical Sciences
1Some Statistics for Equating Multiple Forms of a
test
- by
- Dr. Mohammad H. OmarDepartment of Mathematical
Sciences - May 16, 2006
- Presented at Statistic Research (STAR)
colloquium,King Fahd University of Petroleum
Minerals,Dhahran, Saudi Arabia.
2Equating
3Brief overview of Talk
- Test administration using
- Only one form
- More than one form
- Test Equity
- Steps to ensuring equity
- Conditions for Equated Score
- Data Collection Designs
- Equating procedures
- Illustration of the Equipercentile Equating
process - Use of smoothing techniques
- Application of equipercentile equating to data
collection design - Standard errors of equipercentile equating
- Linear equating
- Illustration of the Linear Equating process
- Application of linear equating to data collection
design - Standard errors of linear equating
- Comparison of equating methods
4Test Administration using only one Form
Advantage
Disadvantage
1) Score means the same thing for every student (1) Dishonest students can copy answers from neighbouring students.
(2) Scores of dishonest students can be unreliably high
(3) Honest students are disadvantaged by acts of dishonest students.
If cheating doesnt occur
5Test Administration using more than one form
Advantage
Disadvantage
1) Substantially reduce chance for dishonesty cheating (1) Some equity issues if test equating is not carried out
2) Honest students are not disadvantaged by acts of dishonest students.
3) Scores of dishonest students are reliably low if cheating occurs
6Test Equity
- Definition (laymens definition)
-
- Equity
- "It is a matter of indifference which test
form a student took"
7Steps to ensuring Equity
- Building test forms to the same test content
specifications - Test forms should be interchangeable.
- No one form should have different content
specifications than others. - Test length should be the same.
- No one form should be longer than another
- Students should not be disadvantaged by taking
a longer test form than their peers.
Interchangeable Content? Interchangeable Content? Interchangeable Content?
Form X Form Y
Differentiation 80 20
Integration 20 80
Same length? Same length? Same length?
Time Form X Form Y
Required to finish 2 hr 1 hr
Allotted for Administration 1 hr 30 min 1 hr 30 min
8Steps to ensuring Equity continued//
- Building test forms to the same test parameter
specifications - Test forms should be equally difficult
- Students should not be disadvantaged by taking
test forms that are very difficult compared to
what their peers take in the same
administration. - Test forms should be equally reliable.
Same Difficulty? Same Difficulty? Same Difficulty?
Form X Form Y
Percent of student below median of X 50 70
Same consistency? Same consistency? Same consistency?
Form X Form Y
Coefficient alpha 0.70 0.90
9Conditions For Equated Scores
- The purpose of equating is to establish, as
nearly as possible, an effective equivalence
between raw scores on two test forms. - Because equating is an empirical procedure, it
requires a design for data collection and a rule
for transforming scores on one test form to
scores on another. - Many practitioners would agree with Lord (1980)
that scores on test X and test Y are equated if
the following four conditions are met - Same Ability the two tests must both be
measures of the same characteristic (latent
trait, ability or skill). - Equity for every group of examinees of
identical ability, the conditional frequency
distribution of scores on test Y, after
transformation, is the same conditional frequency
distribution of scores on test X. - Population Invariance the transformation is the
same regardless of the group from of which it is
derived. - Symmetry the transformation is invertible, that
is, the mapping scores from form X to form Y is
the same as the mapping of scores from form Y to
form X
10Conditions For Equated Scores
continued//
- The equity condition is unlikely to be precisely
satisfied in practice. - Although it might be possible to build two forms
of a test that measured the same characteristic
and were equally reliable generally, it is highly
unlikely that one could ever build two forms that
were equally reliable at every ability level, let
alone that which can produce the same conditional
frequency distributions.
11Data Collection Designs
12 13Equating Data Collection Designs
- No statistical procedure can provide completely
appropriate adjustments when non-equivalent or
naturally occurring groups are used, - but
- adjustments based on an another test that is as
close as possible to the tests to be equated are
much more satisfactory than those based on
nonparallel tests.
14Equating Procedures
- Can regression be used to equate scores?
- No. Because Y abX does not give us the same
conversion function as X cmY - To ensure equity, the conversion functions need
to be the same.
15Equating Procedures
- Pre-Equating
- Equating done on sections of a test, not the
final test booklets - Scores are not counted for student
- Post-Equating
- Equating done on final test booklets, not
sections of a test - Equipercentile Equating
- Equates percentiles of two score distributions
for two test forms - Linear Equating
- Equates means and standard deviations of two
score distributions for two test forms
16Illustration of the Equipercentile Equating
Process
-
- Equipercentile equating can be thought as a
two-stage process (Kolen, 1984). - First,
- the relative cumulative frequency (i.e.
percentage of cases below a score interval)
distributions are tabulated or plotted for the
two forms to be to be equated. - Second,
- equated scores (e.g. scores with identical
relative cumulative frequencies) on the two forms
are obtained from these cumulative frequency
distributions.
17Illustration of the Equipercentile
Equating Process continued//
- A graphical method for equipercentile is
illustrated in Figure 6.4. -
- First,
- the relative cumulative frequency distributions,
each based on 471 examinees, for two forms
(designated X and Y) of a 60-item
number-right-scored test were plotted. - The crosses (and stars) represent the relative
cumulative frequency (i.e., percent below) at the
lower real limit of each integer score interval
(e.g, at i-0.5, for i1, 2, , n, where n is the
number of items). - Next,
- the crosses (stars) were connected with straight
line segments. - Graphs constructed in this manner are referred to
as linearly interpolated relative cumulative
frequency distributions. - The line segments connecting the crosses (stars)
need not be linear. - Methods of curvilinear interpolation, such as the
use of cubic splines, could also be employed.
18Illustration of the Equipercentile
Equating Process continued//
- Let the form-X equipercentile equivalent of yi,
be denoted ex(yi). - The calculation of the form X equipercentile
equivalent ex(18) of a number-right score of 18
on form Y is illustrated in Figure 6.4. - The left-hand vertical arrow indicates that the
relative cumulative frequency for a score of 18
on form Y is 50. - The short horizontal arrow shows the point on the
curve for form X with the same relative
cumulative frequency (50). - The right-hand vertical arrow indicates that a
score of 30 on form X is associated with this
relative cumulative frequency. - Thus, a score of 30 on form X is considered to
be equivalent to a score of 18 on form Y. - A plot of the score conversion (equivalent) is
given in Figure 6.5. -
19- The equipercentile transformation between two
forms, X and Y, of a test will usually be
curvilinear. - If form X is more difficult than form Y, the
conversion line will tend to be concave downward.
- If the distribution of scores on form X is
flatter, more platykurtic, than that on form Y,
the conversion will tend to be S-shaped. - If the shapes of the score distributions on the
two forms are the same (i.e., have the same
moments except for the first two), the conversion
line will be linear.
20Use of Smoothing Techniques
- Unsmoothed equipercentile equating uses straight
linear interpolation for the ogives - Smoothing techniques can be used with curvilinear
interpolation such as cubic splines with
different parameters - Smoothing on ogives is known as pre-smoothing
method - Smoothing on conversion functions is known as
post-smoothing method
21Application of Equipercentile-Equating to Data
Collection Designs
- Equipercentile equating can also be carried out
for the anchor-test-random-groups design in the
following manner - Using the data for the group taking tests X and V
(the anchor test), for each raw score on test V,
determine the score on test X with the same
percentile rank. - Using the data group taking tests Y and V, for
each raw score on test V, determine the score on
test Y with the same percentile rank.
22Application of Equipercentile-Equating to Data
Collection Designs continued//
- Tabulate pairs of scores on tests X and Y that
correspond to the same raw score on test V. - Using data from step 3, for each raw score on
test Y, interpolate to determine the equivalent
score on test X. - The last procedure uses the data on test V to
adjust for differences in ability between the two
groups. This procedure really involves two
equatings, instead of just one, and therefore
doubles the variance of equating error.
23Standard Errors of Equipercentile Equating
24Standard Errors of Equipercentile
Equating
continued//
- Another procedure that may be used to estimate
the standard error of an equipercentile equating
is the bootstrap method (Efron 1982).
25Linear Equating
When tests X and Y are not equally reliable, true
score x and y are used instead
26Illustration of the Linear Equating Process
- Linear equating, like equipercentile equating,
can be thought of as two-stage process. -
- First, compute the sample means (m) and standard
deviations (s) of scores on the two forms to be
equated. - Second, obtain equated scores on the two forms by
substituting these values into linear equating
equation. - For example, suppose the raw-score means and the
standard deviations for two-forms, X and Y, of a
60-item number-right-scored test administered to
a single group of 471 examinees are
27Illustration of the Linear Equating
Process continued//
28Application of Linear Equating to Data Collection
Designs
- Linear equating can be carried out for the
anchor-test-random-groups design in the same
manner as for the equivalent-group design, in
which case, the data on anchor-test V are
ignored. - However, even when the groups are chosen at
random, it is inevitable that there will be some
differences between them, which, if ignored, will
lead to bias in the conversion line. - The data on test V can be used to adjust for
differences between groups by means of the
maximum-likelihood approach (Lord, 1955a). - Maximum-likelihood estimates of the population
means and standard deviations on forms X and Y
are as follows-
29Application of Linear Equating to Data Collection
Designs continued..//
30Standard Errors of Linear Equating
31Comparison of equating methods
- Equipercentile Equating
- Adjust for differences in difficulty of test
forms - Can equate up to the fourth moments of the score
distribution - Percent of students below a particular score is
equated
- Linear Equating
- Adjust for differences in difficulty of test
forms - Only equates up to the first two moments of the
score distribution - Percent of students scoring below an equated
score is not equated
32References
- Kolen and Brennan (1995) Test equating, springer
verlag - Kolen, Peterson, Hoovers chapter on test
equating in Linn (1993) Educational Measurement,
Ace-Oryx publishing
33Thank You
Thank You
34Application of Linear Equating to Data Collection
Designs