Title: Selecting Effective Early Reading Assessments
1Selecting Effective Early Reading Assessments
2What Well Cover
- A research-based framework for selecting early
reading assessments - Application of the framework to selected early
reading instruments - Early reading assessment case examples
- Resources for early reading assessment and
intervention
3So many tests, so few guidelines . . .
- Growing number of print and online tests that
claim to assess or predict reading - Standards for Psychological and Educational
Testing (AERA, APA, NCME, 1999) - Provides general guidelines--not specific
criteria--for evaluating psychometric quality
4Myths about Early Reading Assessments
- All claims that a reading measure is
scientifically based are equally valid. - A valid and reliable measure is equally valid and
reliable for all examinees. - All measures of the same reading component yield
similar results for the same examinee.
5Does Tim (Grade 1) have a reading problem?
6Why does this happen?
- Tests vary in terms of their psychometric
characteristics and soundness. - Early Reading Assessment A Practitioners
Handbook
7Early Reading Assessment Models
- Traditional
- Standard battery (one size fits all)
- Assumes reading problems arise from internal
child deficits - Designed to provide a categorical label for
programming purposes
- Component-based
- Targets domains related to the identified
deficits - Assumes most reading problems arise from
experiential and/or instructional deficits - Designed to provide information for guiding
instruction
810 Key Reading Components
- 4 Cognitive-linguistic Variables
- Phonological processing
- Rapid naming
- Orthographic processing
- Oral language
- 6 Literacy Skills
- Print awareness
- Alphabet knowledge
- Single word reading
- Contextual reading
- Reading comprehension
- Written language
9Considerations in Selecting Early Reading
Assessments
- Technical adequacy Psychometric soundness
- Usability Degree to which practitioners can
actually use a measure in applied settings
10Five Key Technical Adequacy Characteristics
- Norms
- Test floors
- Item gradients
- Reliability
- Validity
11How can we examine a tests technical
characteristics?
- Test manuals? Tremendous variation in quality and
quantity of the psychometric information provided
- WJ III 2 examiner manuals, separate 209-page
technical manual - Dyslexia Early Screening Test 7 pages
in 45-page manual - Research literature?
- Continuing stream of validation data
12Norms How do we interpret performance?
- Norm-referenced measures Comparisons with
age/grade peers - Criterion-referenced measures Comparisons with
pre-determined performance standards - Nonstandardized measures Research norms or
examiner judgment
13Evaluating the Adequacy of Norms
- Are they representative?
- Criteria Should match a national or appropriate
reference population - Are they recent?
- Criteria No more than 7 12 years old
- Are subgroup and sample sizes large enough?
- Criteria At least 100 (subgroup size) 1000
(sample size)
14Evaluating Norms, II
- Are norm table intervals small enough to reflect
small changes in skill development and small
differences among examinees? - Criteria
- No more than 6 months for students aged 7-11 and
younger - No more than 1 year for students aged 8-0 to 18
15Norms example 1 Expressive Vocabulary Test (AGS,
1997)
- Date 1995-1996 (age norms only)
- Total norm group 2,725 examinees
- 5-0 to 6-11 group 119-122 examinees tested per
each 6-month interval - Derived scores 2-month increments
- Derived scores for 5-0 to 6-11 age group are
based on 39-56 examinees.
16Norms example 2 TOWRE 8-year-old Grade 2 student
17Reliability Are scores consistent and accurate?
- Alternate-form Form A vs. Form B
- Internal consistency Item A vs Item B
- Test-retest Time A vs. Time B
- Interscorer Scorer A vs. Scorer B
- Criteria /gt .80 for screening measures /gt .90
for diagnostic measures
18Hidden Threat to Reliability
- Examiner variance Differences among assessors in
administering tasks and recording responses - Especially likely on
- Live-voice tasks (phoneme blending)
- Fluency-based tasks (rapid naming)
- Tasks with complex administration or scoring
systems (LAC3)
19Reliability Example TOWRE (PRO-ED, 1999)
- Internal consistency .93 and above
- Alternate form .90 and above
- Test-retest .90 and above for a study with
examinees ages 6-9 (n 29) - Interscorer .99, based on agreement of 2
independent scorers with 30 completed protocols
20Test Floors Can the Test Detect Poor Readers?
- Test floor Lowest possible standard score when a
student answers 1 item correctly - Adequate floors Permit identification of
students with very weak skills - Inadequate floors Overestimate students level
of skills
21Test Floor Criteria
- A subtest raw score of 1 should yield a standard
score greater than 2 standard deviations below
the subtest mean. - SS of 3 or less for a subtest mean of 10
- SS of 69 or less for a subtest mean of 100
22Which Tests Are Likely to Display Floor Effects?
- Cradle-to-grave tests
- Phonemic manipulation tasks (deletion,
substitution, reversal) - Oral reading fluency tests
- Pseudoword reading tests
- Spelling tests
- Reading comprehension tests
23Item Gradients Can the Test Detect Small
Differences?
- Item gradient Steepness with which standard
scores change from 1 raw score unit to another - Adequate gradient Sensitive to small differences
in performance - Steep gradient Obscures differences among
performance levels
24Item Gradient Criteria
- 6 or more items between subtest floor and mean (M
10) or - 10 or more items between subtest floor and mean
(M 100) - Caution Item gradients should be evaluated in
the context of test floors.
25Test Floors and Item Gradients Special Cases
- Screening tests
- Critical issue is cutoff score accuracy, not
floor/gradient violations - Tests not yielding standard scores
- Deciles, percentiles, quartiles, stanines
- Rasch-model tests
- Preclude direct inspection of raw score-standard
score relationships - WJ family WJ III, WRMT-R/NU, WDRB
26Floor Gradient Example GORT-4 (PRO-ED, 2001)
- Item gradients adequate
- Floors
- Rate inadequate below 8-0 for both forms
- Accuracy inadequate below 7-6 for Form A and
below 8-0 for Form B - Comprehension inadequate below 8-0 for Form A
and below 9-0 for Form B - ORQ inadequate below 6-6 for Form A and below
7-6 for Form B
27Validity Are the Results Meaningful?
- Content validity Effectiveness in assessing the
relevant domain - Criterion-related validity Effectiveness in
predicting performance now (concurrent validity)
or later (predictive validity) - Construct Effectiveness in measuring what the
test is supposed to measure - Criteria Evidence of all three types of
validity for the target population
28Validity Example WJ III ACH
- Content validity remarkably little content
validity evidence - Criterion-related validity correlates .63 to .82
with WIAT - WJ III Written Expression mean standard scores
more than 10 points higher than WIAT Written
Expression mean standard scores
29WJ III ACH Validity Example, Cont.
- Diagnostic utility study with 48 students with
ADHD ages 6 17 - ADHD group scored significantly lower than norm
group on 3 of 8 WJ III ACH tests (Oral
Comprehension, Passage Comprehension and
Calculation)
30The Untold Story Usability Considerations
- Usability often has more influence in test
selection and use than technical adequacy. - Virtually no research on impact of usability on
test selection and use
31Do these comments sound familiar?
- I know how to give it.
- It doesnt take long to give.
- Its easy to carry around.
- I think I saw one in the storage closet.
- I think that test kit has all the parts.
32Key Practical Characteristics
- Test construction
- Administration
- Accommodations and adaptations
- Scores and scoring
- Interpretation
- Links to intervention
33Usability Example DEST (PsyCorp, 1996)
- Inexpensive (130.00)
- Has numerous stimulus materials to manage,
increasing administration time - Letter Naming subtest 4 cards for 12 items
- Digit Naming subtest 3 cards for 9 items
- Requires calibrating a postural stability balance
tester - Manual is not spiral bound, so it doesnt lie
flat during administration.
34Increasing the Effectiveness of Early Reading
Assessments
- Begin with measures that target domains directly
related to the referral problem. - Supplement norm-referenced measures with
criterion-referenced measures to ensure adequate
coverage and increase instructionally relevant
information. - Know the psychometric strengths and limitations
of each measure you use.
35Increasing Effectiveness, II
- Evaluate the presence of attentional, behavior,
and motivational problems. - Key predictors of response to intervention
- The Unmotivated Child
- Assess environmental and instructional variables.
36Instructional Disability?
37The Golden Rule of Assessment
- The best designed assessment with the most
reliable and valid measures administered by the
best trained examiner wont change a childs
reading trajectory . . . unless someone in the
childs life does something different. - Effective School Interventions Strategies for
Enhancing Academic Achievement and Social
Competence
38Early Reading Assessment and Intervention
Resources
- AERA, APA, NCME. (1999). Standards for
educational and psychological testing.
Washington DC AERA. www.apa.org - Buros Institute of Mental Measurements.
www.unl.edu/buros - Center for Equity and Excellence in Education
Test Database. http//ceee.gwu.edu/standards_asses
sments/sa.htm - ERIC Clearinghouse on Assessment.
http//www.ericae.net - Florida Reading Research Center.
http//www.fcrr.org
39More Resources
- Rathvon, N. (2004). Early Reading Assessment A
Practitioners Handbook. New York Guilford.
www.guilford.com - Rathvon, N. (1999). Effective School
Interventions Strategies for Enhancing
Achievement and Social Competence. New York
Guilford. www.guilford.com - Rathvon, N. (1996). The Unmotivated Child How
to Help Your Underachiever Become a Successful
Student. New York Simon Schuster.
www.simonsays.com - Southern Educational Development Laboratory.
www.sedl.org/reading/rad
40Thank you!