State Exemplar: Marylands Alternate Assessment Using Alternate Achievement Standards The Alternate M

About This Presentation

Title:

State Exemplar: Marylands Alternate Assessment Using Alternate Achievement Standards The Alternate M

Description:

State Exemplar: Marylands Alternate Assessment Using Alternate Achievement Standards The Alternate M – PowerPoint PPT presentation

Number of Views:65

Avg rating:3.0/5.0

Slides: 63

Provided by: sha1197

Category:

more less

Transcript and Presenter's Notes

Title: State Exemplar: Marylands Alternate Assessment Using Alternate Achievement Standards The Alternate M

1
State Exemplar Marylands Alternate Assessment
Using Alternate Achievement StandardsThe
Alternate Maryland School Assessment

Presenters
Sharon Hall
U.S. Department of Education
Martin Kehe
Maryland State Department of Education
William Schafer
University of Maryland

2
Session Summary

This session highlights the Alternate Assessment
based on Alternate Academic Achievement Standards
in Maryland The Alternate Maryland School
Assessment (Alt-MSA)
Discussion will focus on
A description of the assessment and the
systems-change process which was required to
develop and implement the testing program
Development of reading, mathematics and science
item banks
The process to ensure alignment with grade-level
content standards and results and results of
independent alignment studies
Technical documentation and research agenda to
support validity and reliability.

3
Agenda

Developing Marylands AA-AAAS A Systems Change
Perspective
Conceptual Framework
Alt-MSA Design
Developing the Mastery Objective Banks
Evaluation of the Alt-MSAs alignment with
content standards
Technical Documentation and Establishing a
Research Agenda Support Validity and Reliability
Questions and Answers

4
A Systems Change Perspective

Process
Collaboration
Divisions of Special Education and Assessment
Stakeholder Advisory
Alt-MSA Facilitators
Alt-MSA Facilitators and LACs
MSDE and Vendor
Instruction and Assessment
Students assigned to age appropriate grade (for
purposes of Alt-MSA)
Local School System Grants

5
A Systems Change Perspective

Content
Reading and Mathematics mastery objectives and
artifacts (evidence) linked with grade level
content standards
No program evaluation criteria

6
Marylands Alternate Assessment Design (Alt-MSA)

Portfolio Assessment
10 Reading and 10 Mathematics Mastery Objectives
(MOs)
Evidence of Baseline (50 or less attained)
Evidence of Mastery (80 - 100) 1 artifact for
each MO
2 Reading and 3 Mathematics MOs aligned with
science
Vocabulary and informational text measurement
and data analysis

7
Whats Assessed Reading

Maryland Reading Content Standards
1.0 General Reading Processes
Phonemic awareness, phonics, fluency (2 MOs)
Vocabulary (2 MOs 1 aligned with science)
General reading comprehension (2 MOs)
2.0 Comprehension of Informational Text
(2 MOs 1 aligned with science)
3.0 Comprehension of Literary Text
(2 MOs)

8
Whats Assessed Mathematics

Algebra, Patterns, and Functions
(2 MOs)
Geometry
(2 MOs)
Measurement
(2 MOs 1 aligned with science)
Statistics-Data Analysis
(2 MOs aligned with science)
Number Relationships and Computation
(2 MOs)

9
Whats Assessed Science (2008)

Grades 5, 8, 10
Grades 5 and 8 select 1 MO each
Earth/Space Science
Life Science
Chemistry
Physics
Environmental Science
Grade 10
5 Life Science MOs

10
Steps in the Alt-MSA Process Step 1 September

Principal meets with Test Examiner Teams
Review results or conduct pre-assessment

11
Steps in the Alt-MSA ProcessStep 2
September-November

TET selects or writes Mastery Objectives
Principal reviews and submits
Share with parents
Revise (written) Mastery Objectives

12
Steps in the Alt-MSA ProcessStep 3
September-March

Collect Baseline Data for Mastery Objectives 50
or less accuracy
Teach Mastery Objectives
Assess Mastery Objectives
Construct Portfolio

13
Standardized

Number of mastery objectives assessed
Format of mastery objectives
Content standards/topics assessed
All mos must have baseline data and evidence of
mastery at 80-100
Types of artifacts permissible
Components of artifacts
Training and Handbook provided
Scoring training and procedures

14
MO Format
15
Evidence (Artifacts)

Acceptable Artifacts (Primary Evidence)
Videotapes-1 reading and 1 math mandatory
Audiotape
Student work (original)
Data collection charts (original)
Unacceptable Artifacts
photographs, checklists, narrative descriptions

16
Artifact Requirements

Aligned with Mastery Objective
Must include baseline data that demonstrates
student performs MO with 50 or less accuracy
Data chart must show 3-5 demonstrations of
instruction prior to mastery
The observable, measurable student response must
be evident (not trial 1)
Mastery is 80-100 accuracy
Name, date, accuracy score, prompts

17
Scores and Condition Codes

A MO is not aligned
B Artifact is missing or not acceptable
C Artifact is incomplete
D Artifact does not align with MO, or
components of MO are missing
E Data Chart does not show 3-5 observations of
instruction on different days prior to
demonstration of mastery
F Accuracy score is not reported

18
Reliability Scorer Training

Conducted by contractor scoring director, MSDE
always present
Must attain 80 accuracy on each qualifying set
Every portfolio is scored twice by 2 different
teams
Daily backreading by supervisors and scoring
directors
Daily inter-rater reliability data
Twice weekly validity checks
Ongoing retraining

19
Marylands Alt-MSA Report
20
Development of the Mastery Objective Banks

Initial three years of program involved teachers
writing individualized reading and mathematics
Mastery Objectives (approximately 100,000
objectives each year)
Necessary process to help staff learn the content
standards
Maryland and contractor staff reviewed 100 of
MOs for alignment and technical quality

21
Mastery Objective Banks

Prior to year 4, Maryland conducted an analysis
of written MOs to create the MO Banks for reading
and mathematics
Banked items available in an online application,
linked to and aligned with content standards
Provided additional degree of standardization
Process still allows for writing of customized
MOs, as needed

22
Mastery Objective Banks

In year 4, Baseline MO measurement was added
Teachers take stock of where a student is,
without prompts at the beginning of the year on
each proposed MO
This helps to ensure that students are learning
and assessed on skills and knowledge that has not
already been mastered
Year 5 added Science MO Bank

23
Mastery Objective Banks
24
Mastery Objective Banks
25
Mastery Objective Banks
26
Mastery Objective Banks
27
Mastery Objective Banks
28
National Alternate Assessment Center (NAAC)

Alignment Study of the Alt-MSA

29
NAAC Alt-MSA Alignment Study

Conducted by staff from University of North
Carolina at Charlotte and Western Carolina
University from March August, 2007
Study was an investigation of the alignment of
Alt-MSA Mastery Objectives in Reading and
Mathematics to grade-level content standards

30
NAAC Alt-MSA Alignment Study

Eight (8) criteria used to evaluate
Developed in collaboration of content experts
special educators and measurement experts at
University of North Carolina at Charlotte
(Browder, Wakeman, Flowers, Rickleman, Pugalee,
Karvonen, 2006)
A stratified random sampling method (stratified
on grade level) was used to select the
portfolios, grades 3 8 and 10, 225 reading/231
mathematics

31
Alignment Results by Criterion

Criterion 1 The content is academic and
includes the major domains/strands of the content
area as reflected in state and national standards
(e.g., reading, math, science)
Outcome
Reading 99 of MOs were rated academic
Math 94 of MOs were rated academic

32
Alignment Results by Criterion

Criterion 2 The content is referenced to the
students assigned grade level (based on
chronological age)
Outcome
Reading 82 of the MOs reviewed were referenced
to a grade level standard (2.0 were not
referenced to a grade level standard. 16 were
referenced to off-grade standards (K-2) which
were referenced to the standards of phonics and
phonemic awareness.)
Math 97 were referenced to a grade level
standard

33
Alignment Results by Criterion

Criterion 3 The focus of achievement maintains
fidelity with the content of the original grade
level standards (content centrality) and when
possible, the specified performance
Outcome
Reading 99 MOs rated as far or near for content
centrality, 92 MOs rated partial or full
performance centrality, and 90 rated as being
linked to the MO
Math 92 MOs rated as far in content
centrality, 92 MOs rated partial performance
centrality, and 92 rated as being linked to the
MO

34
Alignment Results by Criterion

Criterion 4 The content differs from grade level
in range, balance, and Depth of Knowledge (DOK),
but matches high expectations set for students
with significant cognitive disabilities.
Outcome
Reading All the reading standards had multiple
MOs that were linked to the standard and although
73 were rated at the depth of knowledge level of
memorize/recall, there were MOs rated at the
highest level of depth of knowledge levels (i.e.,
comprehension, application, and analysis)
Math MOs were aligned to all grade level
standards and distributed across all levels of
depth of knowledge except the lowest level (i.e.,
attention), with the largest percentage of MOs at
the performance and analysis/synthesis/evaluation
levels.

35
Alignment Results by Criterion

Criterion 5 There is some differentiation in
achievement across grade levels or grade bands.
Outcome
Reading Overall the reading has good
differentiation across grade levels
Math While there is some limited
differentiation, some items were redundant from
lower to upper grades
Criterion 6 The expected achievement for
students is for the students to show learning of
grade referenced academic content
Outcome The Alt-MSA score is not augmented with
program factors. However, in cases where more
intrusive prompting is used, the level of
inference that can be made is limited.

36
Alignment Results by Criterion

Criterion 7 The potential barriers to
demonstrating what students know and can do are
minimized in the assessment
Outcome Alt-MSA minimizes barriers for the
broadest range of heterogeneity within the
population, because flexibility is built into the
tasks teachers select. (92 of the MOs were
accessible at an abstract level of symbolic
communication, while the remaining MOs were
accessible to students at a concrete level of
symbolic communication).
Criterion 8 The instructional program promotes
learning in the general curriculum
Outcome The Alt-MSA Handbook is well developed
and covers the grade level domains that are
included in alternate assessment. Some LEAs in
MD have exemplary professional development
materials.

37
Study Summary

Overall the Alt-MSA demonstrated good access to
the general curriculum
The Alt-MSA was well developed and covered the
grade level standards
The quality of the professional development
materials varied across the different counties

38
Technical Documentationof the Alt-MSA
39
Sources

Alt-MSA Technical Manuals (2004, 2005, 2006)
Schafer, W. D. (2005). Technical Documentation
for Alternate Assessments. Practical Assessment,
Research and Evaluation, 10(10). At
PAREonline.net.
Marion, S. F. Pellegrino, J. W. (2007). A
validity framework for evaluating the technical
adequacy of alternate assessments. Educational
Measurement Issues and Practice, 25(4), 47-57.
Report from the National Alternate Assessment
Center from a panel review of the Alt-MSA.
Contracted technical studies on Alt-MSA

40
Validity of the CriterionIs Always Important

To judge proficiency in any assessment, a
students score is compared with a criterion
score
Regular assessment standard setting generates a
criterion score for all examinees
Regular assessment the criterion score is
assumed appropriate for everyone
It defines an expectation for minimally
acceptable performance
It is interpreted in behavioral terms through
achievement level descriptions

41
Criterion in Alternate Assessment

A primary question in alternate assessment is
Should the same criterion score should apply to
everyone?
Our answer was no, because behaviors that imply
success for some students, imply failure for
others
This implies that flexible criteria are needed to
judge the success of a student or of a teacher
unlike the regular assessment

42
Criterion Validity

The quality of criteria is documented for the
regular assessment through a standard setting
study
When criteria vary, then each different criterion
needs to be documented
So we need to consider both score and criterion
reliability validity for Alt-MSA.

43
Technical Research Agenda

There are four sorts of technical research we
should undertake
Reliability of Criteria
Reliability of Scores
Validity of Criteria
Validity of Scores
We will describe some examples and possibilities
for each.

44
Reliability of Criteria

Could see if the criteria (MOs) are internally
consistent for a student in terms of difficulty,
cognitive demand, and/or levels of the content
elements they represent
Could do that for, say, 9 samples of students
L-M-H degrees of challenge
for L-M-H grade levels,
Degree of challenge might be assessed by age of
identification of disability or by location in
the extended standards of last years MOs

45
Reliability of Scores

2007 rescore of a 5 sample of 2006 portfolios
(n266) showed agreement rates of 82-89 for
reading 83-89 for math
A NAAC review concluded the inter-rater evidence
of scorer reliability is strong
Amount of evidence could be evaluated using
Smiths (2003) approach of modeling error using
the binomial distribution to get decision
accuracy estimates

46
Decision Accuracy Study

Assume each student produces a sample of size 10
from a binomial population of MOs
Can use the binomial distribution to generate the
probabilities of all outcomes (X0 to10) for any
p
For convenience, use the midpoints of ten
equally-spaced intervals for p (.05 .95)
Using X0-50 for Basic, X60-80 for Proficient,
X90-100 for Advanced yields

47
Classification Probabilities for Students with
Various ps

p Basic Proficient Advanced
.95 .0001 .0861 .9138
.85 .0098 .4458 .5443
.75 .0781 .6779 .2440
.65 .2485 .6656 .0860
.55 .4956 .4812 .0232
.45 .7384 .2571 .0045
.35 .9052 .0944 .0005
.25 .9803 .0207 .0000
.15 .9986 .0013 .0000
.05 1.000 .0000 .0000

48
3x3 Decision Accuracy

Collapsing across p with True Basic .05-.55,
True Proficient .65-.85, True Advanced .95
Classification
True Level Basic Proficient Advanced Total
Advanced .0000 .0086 .0914 .1000
Proficient .0336 .1789 .0874 .3000
Basic .5118 .0855 .0028 .6000
P(Accurate) .5118 .1789 .0914 .7821
This assumes equally-weighted ps

49
Empirically Weighted ps

Mastery Objectives Mastered in 2006 for Reading
and Math (N 4851 students)
Percent Mastered Reading Percent Math Percent
100 21.8 26.4
90 16.1 16.7
80 11.6 10.3
70 8.0 7.8
60 6.7 6.1
50 5.5 5.8
40 4.9 4.6
30 5.1 4.1
20 4.7 4.1
10 6.7 6.3
0 6.9 7.7

50
3x3 Decision Accuracy with Empirical Weights -
Reading

Observed Achievement Level
True Level Basic Proficient Advanced Total
Advanced .0000 .0258 .2726 .2984
Proficient .0274 .1768 .1057 .3099
Basic .3414 .0486 .0017 .3917
P(Accurate) .3414 .1768 .2726 .7908

51
NCLB requires decisions in terms of
Proficient/Advanced vs. Basic

Observed Level Group - Reading
True Level Basic Proficient or Advanced
Proficient
or Advanced .0451 .9549
Basic .8716 .1284
These are conditional probabilities they sum to
1 by rows.
PType I Error (taking action) .0451
PType II Error (taking no action) .1284
These are less than Cohens guidelines of .05 and
.20.

52
3x3 Decision Accuracy with Empirical Weights -
Math

Observed Achievement Level
True Level Basic Proficient Advanced Total
Advanced .0000 .0299 .3174 .3474
Proficient .0256 .1676 .1014 .2946
Basic .3092 .0472 .0017 .3581
P(Accurate) .3092 .1676 .3174 .7942

53
NCLB requires decisions in terms of
Proficient/Advanced vs. Basic

Observed Level Group - Math
True Level Basic Proficient or Advanced
Proficient
or Advanced .0398 .9602
Basic .8635 .1365
These are conditional probabilities they sum to
1 by rows.
PType I Error (taking action) .0398
PType II Error (taking no action) .1365
These are also less than Cohens guidelines of
.05 and .20.

54
Reliability of ScoresConclusions

Decision accuracy of Reading is 79.1
Decision accuracy of Math is 79.4
Misclassification probabilities are
False Reading Math
Prof. 12.8 13.6
Not Prof. 4.5 4.0
These are within Cohens guidelines

55
Validity of CriteriaContent Evidence

Could study MO development review process for 9
samples of students, L-M-H degrees of challenge
for L-M-H grade levels
Could map student progress along content standard
strands over time
Could evaluate and monitor the use of the bank
Could survey parents are MOs too modest, about
right, or too idealistic
MSDE will conduct a new cut-score study

56
Validity of Criteria Quantitative Evidence

For n267 same-student portfolio pairs from 2006
2007
95 of 2007 reading MOs
90 of 2007 math MOs
were completely new or more demanding
than the respective students 2006MOs
(suggesting growth)
Alternate standard-setting studies could generate
evidence about validity of the existing (or
resulting) criteria

57
Possible Alternate Standard Setting Study
Approaches

Develop percentage cut-scores for groups with
different degrees of disability (e.g., modified
Angoff) articulate vertically horizontally
Establish criterion groups using an external
criterion and identify cut scores that minimize
classification errors (contrasting groups)
Set cutpoints that match the percentages of
students in the achievement levels in the general
population (equipercentile)

58
Validity of CriteriaConsequential Evidence

Could study IEPs to see if they have become more
oriented toward academic goals over time
Could study of the ability of Alt-MSA to drive
instruction e.g., do the enacted content
standards move toward the assessed content
standards?

59
Validity of ScoresContent Evidence

Could study how well raters can categorize
samples of artifacts into the content strand
elements their MOs were designed to represent

60
Validity of ScoresConsequential Evidence

Could survey stakeholders
How have the scores been used?
How have the scores been misused?

61
Two Philosophical Issues

Justification is needed for implementing flexible
performance expectations all the way down to the
individual student
Justification is needed for using standardized
percentages for success categories across the
flexible performance expectations