Title: State Exemplar: Marylands Alternate Assessment Using Alternate Achievement Standards The Alternate M
1State Exemplar Marylands Alternate Assessment
Using Alternate Achievement StandardsThe
Alternate Maryland School Assessment
- Presenters
- Sharon Hall
- U.S. Department of Education
- Martin Kehe
- Maryland State Department of Education
- William Schafer
- University of Maryland
2Session Summary
-
- This session highlights the Alternate Assessment
based on Alternate Academic Achievement Standards
in Maryland The Alternate Maryland School
Assessment (Alt-MSA) - Discussion will focus on
- A description of the assessment and the
systems-change process which was required to
develop and implement the testing program - Development of reading, mathematics and science
item banks - The process to ensure alignment with grade-level
content standards and results and results of
independent alignment studies - Technical documentation and research agenda to
support validity and reliability.
3Agenda
- Developing Marylands AA-AAAS A Systems Change
Perspective - Conceptual Framework
- Alt-MSA Design
- Developing the Mastery Objective Banks
- Evaluation of the Alt-MSAs alignment with
content standards - Technical Documentation and Establishing a
Research Agenda Support Validity and Reliability - Questions and Answers
4A Systems Change Perspective
- Process
- Collaboration
- Divisions of Special Education and Assessment
- Stakeholder Advisory
- Alt-MSA Facilitators
- Alt-MSA Facilitators and LACs
- MSDE and Vendor
- Instruction and Assessment
- Students assigned to age appropriate grade (for
purposes of Alt-MSA) - Local School System Grants
5A Systems Change Perspective
- Content
- Reading and Mathematics mastery objectives and
artifacts (evidence) linked with grade level
content standards - No program evaluation criteria
6Marylands Alternate Assessment Design (Alt-MSA)
- Portfolio Assessment
- 10 Reading and 10 Mathematics Mastery Objectives
(MOs) - Evidence of Baseline (50 or less attained)
- Evidence of Mastery (80 - 100) 1 artifact for
each MO - 2 Reading and 3 Mathematics MOs aligned with
science - Vocabulary and informational text measurement
and data analysis
7Whats Assessed Reading
- Maryland Reading Content Standards
- 1.0 General Reading Processes
- Phonemic awareness, phonics, fluency (2 MOs)
- Vocabulary (2 MOs 1 aligned with science)
- General reading comprehension (2 MOs)
- 2.0 Comprehension of Informational Text
- (2 MOs 1 aligned with science)
- 3.0 Comprehension of Literary Text
- (2 MOs)
8Whats Assessed Mathematics
- Algebra, Patterns, and Functions
- (2 MOs)
- Geometry
- (2 MOs)
- Measurement
- (2 MOs 1 aligned with science)
- Statistics-Data Analysis
- (2 MOs aligned with science)
- Number Relationships and Computation
- (2 MOs)
9Whats Assessed Science (2008)
- Grades 5, 8, 10
- Grades 5 and 8 select 1 MO each
- Earth/Space Science
- Life Science
- Chemistry
- Physics
- Environmental Science
- Grade 10
- 5 Life Science MOs
10Steps in the Alt-MSA Process Step 1 September
- Principal meets with Test Examiner Teams
- Review results or conduct pre-assessment
11Steps in the Alt-MSA ProcessStep 2
September-November
- TET selects or writes Mastery Objectives
- Principal reviews and submits
- Share with parents
- Revise (written) Mastery Objectives
12Steps in the Alt-MSA ProcessStep 3
September-March
- Collect Baseline Data for Mastery Objectives 50
or less accuracy - Teach Mastery Objectives
- Assess Mastery Objectives
- Construct Portfolio
13Standardized
- Number of mastery objectives assessed
- Format of mastery objectives
- Content standards/topics assessed
- All mos must have baseline data and evidence of
mastery at 80-100 - Types of artifacts permissible
- Components of artifacts
- Training and Handbook provided
- Scoring training and procedures
14MO Format
15Evidence (Artifacts)
- Acceptable Artifacts (Primary Evidence)
- Videotapes-1 reading and 1 math mandatory
- Audiotape
- Student work (original)
- Data collection charts (original)
- Unacceptable Artifacts
- photographs, checklists, narrative descriptions
16Artifact Requirements
- Aligned with Mastery Objective
- Must include baseline data that demonstrates
student performs MO with 50 or less accuracy - Data chart must show 3-5 demonstrations of
instruction prior to mastery - The observable, measurable student response must
be evident (not trial 1) - Mastery is 80-100 accuracy
- Name, date, accuracy score, prompts
17Scores and Condition Codes
- A MO is not aligned
- B Artifact is missing or not acceptable
- C Artifact is incomplete
- D Artifact does not align with MO, or
components of MO are missing - E Data Chart does not show 3-5 observations of
instruction on different days prior to
demonstration of mastery - F Accuracy score is not reported
18Reliability Scorer Training
- Conducted by contractor scoring director, MSDE
always present - Must attain 80 accuracy on each qualifying set
- Every portfolio is scored twice by 2 different
teams - Daily backreading by supervisors and scoring
directors - Daily inter-rater reliability data
- Twice weekly validity checks
- Ongoing retraining
19Marylands Alt-MSA Report
20Development of the Mastery Objective Banks
- Initial three years of program involved teachers
writing individualized reading and mathematics
Mastery Objectives (approximately 100,000
objectives each year) - Necessary process to help staff learn the content
standards - Maryland and contractor staff reviewed 100 of
MOs for alignment and technical quality
21Mastery Objective Banks
- Prior to year 4, Maryland conducted an analysis
of written MOs to create the MO Banks for reading
and mathematics - Banked items available in an online application,
linked to and aligned with content standards - Provided additional degree of standardization
- Process still allows for writing of customized
MOs, as needed
22Mastery Objective Banks
- In year 4, Baseline MO measurement was added
- Teachers take stock of where a student is,
without prompts at the beginning of the year on
each proposed MO - This helps to ensure that students are learning
and assessed on skills and knowledge that has not
already been mastered - Year 5 added Science MO Bank
23Mastery Objective Banks
24Mastery Objective Banks
25Mastery Objective Banks
26Mastery Objective Banks
27Mastery Objective Banks
28 National Alternate Assessment Center (NAAC)
- Alignment Study of the Alt-MSA
29NAAC Alt-MSA Alignment Study
- Conducted by staff from University of North
Carolina at Charlotte and Western Carolina
University from March August, 2007 - Study was an investigation of the alignment of
Alt-MSA Mastery Objectives in Reading and
Mathematics to grade-level content standards
30NAAC Alt-MSA Alignment Study
- Eight (8) criteria used to evaluate
- Developed in collaboration of content experts
special educators and measurement experts at
University of North Carolina at Charlotte
(Browder, Wakeman, Flowers, Rickleman, Pugalee,
Karvonen, 2006) - A stratified random sampling method (stratified
on grade level) was used to select the
portfolios, grades 3 8 and 10, 225 reading/231
mathematics
31Alignment Results by Criterion
- Criterion 1 The content is academic and
includes the major domains/strands of the content
area as reflected in state and national standards
(e.g., reading, math, science) - Outcome
- Reading 99 of MOs were rated academic
- Math 94 of MOs were rated academic
32Alignment Results by Criterion
- Criterion 2 The content is referenced to the
students assigned grade level (based on
chronological age) - Outcome
- Reading 82 of the MOs reviewed were referenced
to a grade level standard (2.0 were not
referenced to a grade level standard. 16 were
referenced to off-grade standards (K-2) which
were referenced to the standards of phonics and
phonemic awareness.) - Math 97 were referenced to a grade level
standard
33Alignment Results by Criterion
- Criterion 3 The focus of achievement maintains
fidelity with the content of the original grade
level standards (content centrality) and when
possible, the specified performance - Outcome
- Reading 99 MOs rated as far or near for content
centrality, 92 MOs rated partial or full
performance centrality, and 90 rated as being
linked to the MO - Math 92 MOs rated as far in content
centrality, 92 MOs rated partial performance
centrality, and 92 rated as being linked to the
MO
34Alignment Results by Criterion
- Criterion 4 The content differs from grade level
in range, balance, and Depth of Knowledge (DOK),
but matches high expectations set for students
with significant cognitive disabilities. - Outcome
- Reading All the reading standards had multiple
MOs that were linked to the standard and although
73 were rated at the depth of knowledge level of
memorize/recall, there were MOs rated at the
highest level of depth of knowledge levels (i.e.,
comprehension, application, and analysis) - Math MOs were aligned to all grade level
standards and distributed across all levels of
depth of knowledge except the lowest level (i.e.,
attention), with the largest percentage of MOs at
the performance and analysis/synthesis/evaluation
levels.
35Alignment Results by Criterion
- Criterion 5 There is some differentiation in
achievement across grade levels or grade bands. - Outcome
- Reading Overall the reading has good
differentiation across grade levels - Math While there is some limited
differentiation, some items were redundant from
lower to upper grades - Criterion 6 The expected achievement for
students is for the students to show learning of
grade referenced academic content - Outcome The Alt-MSA score is not augmented with
program factors. However, in cases where more
intrusive prompting is used, the level of
inference that can be made is limited.
36Alignment Results by Criterion
- Criterion 7 The potential barriers to
demonstrating what students know and can do are
minimized in the assessment - Outcome Alt-MSA minimizes barriers for the
broadest range of heterogeneity within the
population, because flexibility is built into the
tasks teachers select. (92 of the MOs were
accessible at an abstract level of symbolic
communication, while the remaining MOs were
accessible to students at a concrete level of
symbolic communication). - Criterion 8 The instructional program promotes
learning in the general curriculum - Outcome The Alt-MSA Handbook is well developed
and covers the grade level domains that are
included in alternate assessment. Some LEAs in
MD have exemplary professional development
materials.
37Study Summary
- Overall the Alt-MSA demonstrated good access to
the general curriculum - The Alt-MSA was well developed and covered the
grade level standards - The quality of the professional development
materials varied across the different counties
38Technical Documentationof the Alt-MSA
39Sources
- Alt-MSA Technical Manuals (2004, 2005, 2006)
- Schafer, W. D. (2005). Technical Documentation
for Alternate Assessments. Practical Assessment,
Research and Evaluation, 10(10). At
PAREonline.net. - Marion, S. F. Pellegrino, J. W. (2007). A
validity framework for evaluating the technical
adequacy of alternate assessments. Educational
Measurement Issues and Practice, 25(4), 47-57. - Report from the National Alternate Assessment
Center from a panel review of the Alt-MSA. - Contracted technical studies on Alt-MSA
40Validity of the CriterionIs Always Important
- To judge proficiency in any assessment, a
students score is compared with a criterion
score - Regular assessment standard setting generates a
criterion score for all examinees - Regular assessment the criterion score is
assumed appropriate for everyone - It defines an expectation for minimally
acceptable performance - It is interpreted in behavioral terms through
achievement level descriptions
41Criterion in Alternate Assessment
- A primary question in alternate assessment is
Should the same criterion score should apply to
everyone? - Our answer was no, because behaviors that imply
success for some students, imply failure for
others - This implies that flexible criteria are needed to
judge the success of a student or of a teacher
unlike the regular assessment
42Criterion Validity
- The quality of criteria is documented for the
regular assessment through a standard setting
study - When criteria vary, then each different criterion
needs to be documented - So we need to consider both score and criterion
reliability validity for Alt-MSA.
43Technical Research Agenda
- There are four sorts of technical research we
should undertake - Reliability of Criteria
- Reliability of Scores
- Validity of Criteria
- Validity of Scores
- We will describe some examples and possibilities
for each.
44Reliability of Criteria
- Could see if the criteria (MOs) are internally
consistent for a student in terms of difficulty,
cognitive demand, and/or levels of the content
elements they represent - Could do that for, say, 9 samples of students
L-M-H degrees of challenge - for L-M-H grade levels,
- Degree of challenge might be assessed by age of
identification of disability or by location in
the extended standards of last years MOs
45Reliability of Scores
- 2007 rescore of a 5 sample of 2006 portfolios
(n266) showed agreement rates of 82-89 for
reading 83-89 for math - A NAAC review concluded the inter-rater evidence
of scorer reliability is strong - Amount of evidence could be evaluated using
Smiths (2003) approach of modeling error using
the binomial distribution to get decision
accuracy estimates
46Decision Accuracy Study
- Assume each student produces a sample of size 10
from a binomial population of MOs - Can use the binomial distribution to generate the
probabilities of all outcomes (X0 to10) for any
p - For convenience, use the midpoints of ten
equally-spaced intervals for p (.05 .95) - Using X0-50 for Basic, X60-80 for Proficient,
X90-100 for Advanced yields
47Classification Probabilities for Students with
Various ps
- p Basic Proficient Advanced
- .95 .0001 .0861 .9138
- .85 .0098 .4458 .5443
- .75 .0781 .6779 .2440
- .65 .2485 .6656 .0860
- .55 .4956 .4812 .0232
- .45 .7384 .2571 .0045
- .35 .9052 .0944 .0005
- .25 .9803 .0207 .0000
- .15 .9986 .0013 .0000
- .05 1.000 .0000 .0000
483x3 Decision Accuracy
- Collapsing across p with True Basic .05-.55,
True Proficient .65-.85, True Advanced .95 - Classification
- True Level Basic Proficient Advanced Total
- Advanced .0000 .0086 .0914 .1000
- Proficient .0336 .1789 .0874 .3000
- Basic .5118 .0855 .0028 .6000
- P(Accurate) .5118 .1789 .0914 .7821
- This assumes equally-weighted ps
49Empirically Weighted ps
- Mastery Objectives Mastered in 2006 for Reading
and Math (N 4851 students) - Percent Mastered Reading Percent Math Percent
- 100 21.8 26.4
- 90 16.1 16.7
- 80 11.6 10.3
- 70 8.0 7.8
- 60 6.7 6.1
- 50 5.5 5.8
- 40 4.9 4.6
- 30 5.1 4.1
- 20 4.7 4.1
- 10 6.7 6.3
- 0 6.9 7.7
503x3 Decision Accuracy with Empirical Weights -
Reading
- Observed Achievement Level
- True Level Basic Proficient Advanced Total
- Advanced .0000 .0258 .2726 .2984
- Proficient .0274 .1768 .1057 .3099
- Basic .3414 .0486 .0017 .3917
- P(Accurate) .3414 .1768 .2726 .7908
51NCLB requires decisions in terms of
Proficient/Advanced vs. Basic
- Observed Level Group - Reading
- True Level Basic Proficient or Advanced
- Proficient
- or Advanced .0451 .9549
- Basic .8716 .1284
- These are conditional probabilities they sum to
1 by rows. - PType I Error (taking action) .0451
- PType II Error (taking no action) .1284
- These are less than Cohens guidelines of .05 and
.20.
523x3 Decision Accuracy with Empirical Weights -
Math
- Observed Achievement Level
- True Level Basic Proficient Advanced Total
- Advanced .0000 .0299 .3174 .3474
- Proficient .0256 .1676 .1014 .2946
- Basic .3092 .0472 .0017 .3581
- P(Accurate) .3092 .1676 .3174 .7942
53NCLB requires decisions in terms of
Proficient/Advanced vs. Basic
- Observed Level Group - Math
- True Level Basic Proficient or Advanced
- Proficient
- or Advanced .0398 .9602
- Basic .8635 .1365
- These are conditional probabilities they sum to
1 by rows. - PType I Error (taking action) .0398
- PType II Error (taking no action) .1365
- These are also less than Cohens guidelines of
.05 and .20.
54Reliability of ScoresConclusions
- Decision accuracy of Reading is 79.1
- Decision accuracy of Math is 79.4
- Misclassification probabilities are
- False Reading Math
- Prof. 12.8 13.6
- Not Prof. 4.5 4.0
- These are within Cohens guidelines
55Validity of CriteriaContent Evidence
- Could study MO development review process for 9
samples of students, L-M-H degrees of challenge
for L-M-H grade levels - Could map student progress along content standard
strands over time - Could evaluate and monitor the use of the bank
- Could survey parents are MOs too modest, about
right, or too idealistic - MSDE will conduct a new cut-score study
56Validity of Criteria Quantitative Evidence
- For n267 same-student portfolio pairs from 2006
2007 - 95 of 2007 reading MOs
- 90 of 2007 math MOs
- were completely new or more demanding
- than the respective students 2006MOs
- (suggesting growth)
- Alternate standard-setting studies could generate
evidence about validity of the existing (or
resulting) criteria
57Possible Alternate Standard Setting Study
Approaches
- Develop percentage cut-scores for groups with
different degrees of disability (e.g., modified
Angoff) articulate vertically horizontally - Establish criterion groups using an external
criterion and identify cut scores that minimize
classification errors (contrasting groups) - Set cutpoints that match the percentages of
students in the achievement levels in the general
population (equipercentile)
58Validity of CriteriaConsequential Evidence
- Could study IEPs to see if they have become more
oriented toward academic goals over time - Could study of the ability of Alt-MSA to drive
instruction e.g., do the enacted content
standards move toward the assessed content
standards?
59Validity of ScoresContent Evidence
- Could study how well raters can categorize
samples of artifacts into the content strand
elements their MOs were designed to represent
60Validity of ScoresConsequential Evidence
- Could survey stakeholders
- How have the scores been used?
- How have the scores been misused?
61Two Philosophical Issues
- Justification is needed for implementing flexible
performance expectations all the way down to the
individual student - Justification is needed for using standardized
percentages for success categories across the
flexible performance expectations
62Contact Information
- Sharon Hall Sharon.Hall_at_ed.gov
- Martin Kehe mkehe_at_msde.state.md.us
- William Schafer wschafer_at_umd.edu