Title: Measurement:
1Measurement
2Classroom Assessment Reliability
- Reliability Assessment Consistency.
- Consistency within tests across examinees.
- Consistency within tests over multiple
administrations to the same examinees. - Consistency across alternative forms of the same
tests for same examinees.
3Three Types of Reliability
- Stability reliability.
- Alternate form reliability.
- Internal consistency reliability.
4Stability Reliability
- Stability Reliability
- Concerned with the question
- Are assessment results consistent over time (over
occasions). - Think of some examples where stability
reliability might be important. - Why might test results NOT be consistent over
time?
5Evaluating Stability Reliability
- Test-Retest Reliability.
- Compute the correlation between a first and later
administration of same test. - Classification-consistency.
- Compute the percentage of consistent student
classifications over time. - Main concern is with the stability of the
assessment over time.
6Example of Classification Consistency
7Example of Classification Consistency (Good
Reliability)
8Example of Classification Consistency (Poor
Reliability)
9Alternate-form Reliability
- Are two, supposedly equivalent, forms of an
assessment in fact actually equivalent? - The two forms do not have to yield identical
scores. - The correlation between two or more forms of the
assessment should be reasonably substantial.
10Evaluating Alternate-form Reliability
- Administer two forms of the assessment to the
same individuals and correlate the results. - Determine the extent to which the same students
are classified the same way by the two forms. - Alternate-form reliability is established by
evidence, not by proclamation.
11Example of Using a Classification Table to Assess
Alternate-Form Reliability
12Example of Using a Classification Table to Assess
Alternate-Form Reliability
13Internal Consistency Reliability
- Concerned with the extent to which the items (or
components) of an assessment function
consistently. - To what extent do the items in an assessment
measure a single attribute? - For example, consider a math problem-solving
test. To what extent does reading comprehension
play a role? What is being measured?
14Evaluating Internal Consistency Reliability
- Split-Half Correlations.
- Kuder-Richardson Formua (KR20).
- Used with binary-scored (dichotomous) items.
- Average of all possible split-half correlations.
- Cronbachs Coefficient Alpha.
- Similar to KR20, except used with non-binary
scored (polytomous) items (e.g., items that
measure attitude.
15ReliabilityComponents of an Observation
- O T E
- Observation True Status Measurement Error.
16Standard Error of Measurement
- Provides an index of the reliability of an
individuals score. - The standard deviation of the theoretical
distribution of errors (i.e. the Es). - The more reliable a test, the smaller the SEM.
- The SEM is smallest near the average score on a
test.
17Things to Do toImprove Reliability
- Use more items or tasks.
- Use items or tasks that differentiate among
students. - Use items or tasks that measure within a single
content domain. - Keep scoring objective.
- Eliminate (or reduce) extraneous influences
- Use shorter assessments more frequently.
18