Asfwasfwer sdf

About This Presentation

Title:

Asfwasfwer sdf

Description:

... to Beauty. F. Kaftandjieva. When Ugliness turns to Beauty. F. Kaftandjieva ... rests in part of the opinion or decisions of expert judges, observers or raters, ... – PowerPoint PPT presentation

Number of Views:48

Avg rating:3.0/5.0

Slides: 55

Provided by: Fel60

Category:

more less

Transcript and Presenter's Notes

Title: Asfwasfwer sdf

1
(No Transcript)
2
(No Transcript)
3
CEFR
4
Bad Practice
Good Practice
5
Terminology
Alignment
Anchoring
Calibration
Projection
Scaling
Comparability
Linking
Concordance
Benchmarking
Equating
Prediction
Moderation
6
Milestones in Comparability
The proof and measurement of association between
two things
association
Spearman
7
Milestones in Comparability
Scores on two or more tests may be said to be
comparable for a certain population if they show
identical distributions for that population.
comparable
population
Flanagan
Spearman
8
Milestones in Comparability

Scales, norms, and equivalent scores
Equating
Calibration
Comparability

Angoff
Flanagan
Spearman
9
Milestones in Comparability
Linking
Mislevy, Linn
Angoff
Flanagan
Spearman
10
Milestones in Comparability
Alignment
Webb, Porter
Mislevy, Linn
Angoff
Flanagan
Spearman
11
Alignment

Alignment refers to the degree of match between
test content and the standards
Dimensions of alignment
Content
Depth
Emphasis
Performance
Accessibility

12
Alignment

Alignment is related to content validity
Specification (Manual Ch. 4)
Specification can be seen as a qualitative
method. There are also quantitative methods for
content validation but this manual does not
require their use. (p. 2)
24 pages of forms
Outcome A chart profiling coverage graphically
in terms of levels and categories of CEF. (p. 7)
Crocker, L. et al. (1989). Quantitative Methods
for Assessing the Fit Between Test and
Curriculum. In Applied Measurement in
Education, 2 (2), 179-194.

Why?
How?
13
Alignment (Porter, 2004)
www.ncrel.org
14
Milestones in Comparability
Linking
Webb, Porter
Mislevy, Linn
Angoff
Flanagan
Spearman
15
Mislevy Linn Linking Assessments
Equating ? Linking
16
The Good The Bad

in Calibration

17
Model Data Fit
18
Model Data Fit
19
Model Data Fit
Reality
Models
20
Sample-Free Estimation
21
The ruler (? scale)
22
The ruler (? scale)
23
The ruler (? scale)
24
The ruler (? scale)
boiling water
absolute zero
25
The ruler (? scale)
F 1.8 C 32 C (F 32) / 1.8
26
Mislevy Linn Linking Assessments
27
Standard Setting
28
The Ugly
29
Fact 1

Human judgment is
the epicenter of every standard-setting method
Berk, 1995

30
When Ugliness turns to Beauty
31
When Ugliness turns to Beauty
32
Fact 2

The cut-off points on the latent continuum do not
possess any objective reality outside and
independently of our minds. They are mental
constructs, which can differ within different
persons.

33
Consequently

Whether the levels themselves are set at the
proper points is a most contentious issue and
depends on the defensibility of the procedures
used for determining them
Messick, 1994

34
Defensibility
Evidence
Claims
35
Defensibility Claims vs. Evidence

National Standards
Understands manuals for devices used in their
everyday life

CEF A2
Can understand simple instructions on equipment
encountered in everyday life such as a public
telephone (p. 70)

(A2)
36
Defensibility Claims vs. Evidence

Cambridge ESOL
DIALANG
Finnish Matriculation
CIEP (TCF)
CELI Universit? per Stranieri di Perugia
Goethe-Institut
TestDaF Institut
WBT (Zertifikat Deutsch)

75 of the institutions provide only claims about
item's CEF level
37
Defensibility Claims vs. Evidence

Common Practice (Buckendahl et al., 2000)
External Evaluation of the alignment of
12 tests by 2 publishers
Publisher reports
No description of the exact procedure followed
Reports include only the match between items and
standards
Evaluation study
At least 10 judges per test
Comparison results
of agreement 26 - 55
Overestimation of the match by test-publishers

38
Standards for educational and psychological
testing,1999

Standard 1.7
When a validation rests in part of the opinion or
decisions of expert judges, observers or raters,
procedures for selecting such experts and for
eliciting judgments or ratings should be fully
described. The description of procedures should
include any training and instruction provided,
should indicate whether participants reached
their decisions independently, and should report
the level of agreement reached. If participants
interacted with one another or exchanged
information, the procedures through which they
may have influenced one another should be set
forth.

39
Evaluation Criteria

Hambleton, R. (2001). Setting Performance
Standards on Educational Assessments and Criteria
for Evaluating the Process. In Setting
Performance Standards Concepts, Methods and
Perspectives., Ed. by Cizek, G., Lawrence Erlbaum
Ass., 89-116.
A list of 20 questions as evaluation criteria
Planning Documentation 4 (20)
Judgments 11 (55)
Standard Setting Method 5 (25)

Planning
40
Judges

Because standard-setting inevitably involves
human judgment, a central issue is who is to make
these judgments, that is, whose values are to be
embodied in the standards.
Messick, 1994

41
Selection of Judges

The judges should have
the right qualifications, but
some other criteria such as
occupation,
working experience,
age,
sex
may be taken into account, because although
ensuring expertise is critical, sampling from
relevant different constituencies may be an
important consideration if the testing
procedures and passing scores are to be
politically acceptable (Maurer Alexander,
1992).

42
Number of Judges

Livingston Zieky (1982) suggest the number of
judges to be not less than 5.
Based on the court cases in the USA, Biddle
(1993) recommends 7 to 10 Subject Matter Experts
to be used in the Judgement Session.
As a general rule Hurtz Hertz (1999) recommend
10 to 15 raters to be sampled.
10 judges is a minimum number, according to the
Manual (p. 94).

43
Training Session

The weakest point
How much?
Until it hurts (Berk, 1995)
Main focus
Intra-judge consistency
Evaluation forms
Hambleton, 2001
Feedback

?
?
44
Training Session Feedback Form
45
Training Session Feedback Form
46
Standard Setting Method

Good Practice
The most appropriate
Due diligence
Field tested
Reality check
Validity evidence
More than one

47
Standard Setting Method

Probably the only point of agreement among
standard-setting gurus is that there is hardly
any agreement between results of any two
standard-setting methods, even when applied to
the same test under seemingly identical
conditions.
Berk, 1995

48
He that increaseth knowledge increaseth sorrow.
(Ecclesiastes 118)
Examinee-centered methods
B1/B2
Test-centered methods
49
He that increaseth knowledge increaseth sorrow.
(Ecclesiastes 118)
Test-centered methods
B1/B2
Examinee-centered methods
50
Instead of Conclusion

In sum, it may seem that providing valid grounds
for valid inferences in standards-based
educational assessment is a costly and
complicated enterprise. But when the consequences
of the assessment affect accountability decisions
and educational policy, this needs to be weighed
against the costs of uninformed or invalid
inferences.
Messick, 1994

Butterfly Effect
Change one thing, change everything!
51
Instead of Conclusion

The chief determiner of performance standards is
not truth it is consequences.
Popham, 1997

Butterfly Effect
Change one thing, change everything!
52
Instead of Conclusion

Perhaps by the year 2000, the collaborative
efforts of measurement researchers and
practitioners will have raised the standard on
standard-setting practices for this emerging
testing technology.
Berk, 1996

Butterfly Effect
Change one thing, change everything!
53
Rise up Magyar
A coward and a lowly bastard
Is he, who dares not raise the standard!
54
Thanks!
Rise up Magyar
A coward and a lowly bastard
Is he, who dares not raise the standard!

Write a Comment

User Comments (0)