Asfwasfwer sdf - PowerPoint PPT Presentation

1 / 54
About This Presentation
Title:

Asfwasfwer sdf

Description:

... to Beauty. F. Kaftandjieva. When Ugliness turns to Beauty. F. Kaftandjieva ... rests in part of the opinion or decisions of expert judges, observers or raters, ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 55
Provided by: Fel60
Category:
Tags: asfwasfwer | sdf

less

Transcript and Presenter's Notes

Title: Asfwasfwer sdf


1
(No Transcript)
2
(No Transcript)
3
CEFR
4
Bad Practice
Good Practice
5
Terminology
Alignment
Anchoring
Calibration
Projection
Scaling
Comparability
Linking
Concordance
Benchmarking
Equating
Prediction
Moderation
6
Milestones in Comparability
The proof and measurement of association between
two things
association
Spearman
7
Milestones in Comparability
Scores on two or more tests may be said to be
comparable for a certain population if they show
identical distributions for that population.
comparable
population
Flanagan
Spearman
8
Milestones in Comparability
  • Scales, norms, and equivalent scores
  • Equating
  • Calibration
  • Comparability

Angoff
Flanagan
Spearman
9
Milestones in Comparability
Linking
Mislevy, Linn
Angoff
Flanagan
Spearman
10
Milestones in Comparability
Alignment
Webb, Porter
Mislevy, Linn
Angoff
Flanagan
Spearman
11
Alignment
  • Alignment refers to the degree of match between
    test content and the standards
  • Dimensions of alignment
  • Content
  • Depth
  • Emphasis
  • Performance
  • Accessibility

12
Alignment
  • Alignment is related to content validity
  • Specification (Manual Ch. 4)
  • Specification can be seen as a qualitative
    method. There are also quantitative methods for
    content validation but this manual does not
    require their use. (p. 2)
  • 24 pages of forms
  • Outcome A chart profiling coverage graphically
    in terms of levels and categories of CEF. (p. 7)
  • Crocker, L. et al. (1989). Quantitative Methods
    for Assessing the Fit Between Test and
    Curriculum. In Applied Measurement in
    Education, 2 (2), 179-194.

Why?
How?
13
Alignment (Porter, 2004)
www.ncrel.org
14
Milestones in Comparability
Linking
Webb, Porter
Mislevy, Linn
Angoff
Flanagan
Spearman
15
Mislevy Linn Linking Assessments
Equating ? Linking
16
The Good The Bad
  • in Calibration

17
Model Data Fit
18
Model Data Fit
19
Model Data Fit
Reality
Models
20
Sample-Free Estimation
21
The ruler (? scale)
22
The ruler (? scale)
23
The ruler (? scale)
24
The ruler (? scale)
boiling water
absolute zero
25
The ruler (? scale)
F 1.8 C 32 C (F 32) / 1.8
26
Mislevy Linn Linking Assessments
27
Standard Setting
28
The Ugly
29
Fact 1
  • Human judgment is
  • the epicenter of every standard-setting method
  • Berk, 1995

30
When Ugliness turns to Beauty
31
When Ugliness turns to Beauty
32
Fact 2
  • The cut-off points on the latent continuum do not
    possess any objective reality outside and
    independently of our minds. They are mental
    constructs, which can differ within different
    persons.

33
Consequently
  • Whether the levels themselves are set at the
    proper points is a most contentious issue and
    depends on the defensibility of the procedures
    used for determining them
  • Messick, 1994

34
Defensibility
Evidence
Claims
35
Defensibility Claims vs. Evidence
  • National Standards
  • Understands manuals for devices used in their
    everyday life
  • CEF A2
  • Can understand simple instructions on equipment
    encountered in everyday life such as a public
    telephone (p. 70)

(A2)
36
Defensibility Claims vs. Evidence
  • Cambridge ESOL
  • DIALANG
  • Finnish Matriculation
  • CIEP (TCF)
  • CELI Universit? per Stranieri di Perugia
  • Goethe-Institut
  • TestDaF Institut
  • WBT (Zertifikat Deutsch)

75 of the institutions provide only claims about
item's CEF level
37
Defensibility Claims vs. Evidence
  • Common Practice (Buckendahl et al., 2000)
  • External Evaluation of the alignment of
  • 12 tests by 2 publishers
  • Publisher reports
  • No description of the exact procedure followed
  • Reports include only the match between items and
    standards
  • Evaluation study
  • At least 10 judges per test
  • Comparison results
  • of agreement 26 - 55
  • Overestimation of the match by test-publishers

38
Standards for educational and psychological
testing,1999
  • Standard 1.7
  • When a validation rests in part of the opinion or
    decisions of expert judges, observers or raters,
    procedures for selecting such experts and for
    eliciting judgments or ratings should be fully
    described. The description of procedures should
    include any training and instruction provided,
    should indicate whether participants reached
    their decisions independently, and should report
    the level of agreement reached. If participants
    interacted with one another or exchanged
    information, the procedures through which they
    may have influenced one another should be set
    forth.

39
Evaluation Criteria
  • Hambleton, R. (2001). Setting Performance
    Standards on Educational Assessments and Criteria
    for Evaluating the Process. In Setting
    Performance Standards Concepts, Methods and
    Perspectives., Ed. by Cizek, G., Lawrence Erlbaum
    Ass., 89-116.
  • A list of 20 questions as evaluation criteria
  • Planning Documentation 4 (20)
  • Judgments 11 (55)
  • Standard Setting Method 5 (25)

Planning
40
Judges
  • Because standard-setting inevitably involves
    human judgment, a central issue is who is to make
    these judgments, that is, whose values are to be
    embodied in the standards.
  • Messick, 1994

41
Selection of Judges
  • The judges should have
  • the right qualifications, but
  • some other criteria such as
  • occupation,
  • working experience,
  • age,
  • sex
  • may be taken into account, because although
    ensuring expertise is critical, sampling from
    relevant different constituencies may be an
    important consideration if the testing
    procedures and passing scores are to be
    politically acceptable (Maurer Alexander,
    1992).

42
Number of Judges
  • Livingston Zieky (1982) suggest the number of
    judges to be not less than 5.
  • Based on the court cases in the USA, Biddle
    (1993) recommends 7 to 10 Subject Matter Experts
    to be used in the Judgement Session.
  • As a general rule Hurtz Hertz (1999) recommend
    10 to 15 raters to be sampled.
  • 10 judges is a minimum number, according to the
    Manual (p. 94).

43
Training Session
  • The weakest point
  • How much?
  • Until it hurts (Berk, 1995)
  • Main focus
  • Intra-judge consistency
  • Evaluation forms
  • Hambleton, 2001
  • Feedback

?
?
44
Training Session Feedback Form
45
Training Session Feedback Form
46
Standard Setting Method
  • Good Practice
  • The most appropriate
  • Due diligence
  • Field tested
  • Reality check
  • Validity evidence
  • More than one

47
Standard Setting Method
  • Probably the only point of agreement among
    standard-setting gurus is that there is hardly
    any agreement between results of any two
    standard-setting methods, even when applied to
    the same test under seemingly identical
    conditions.
  • Berk, 1995

48
He that increaseth knowledge increaseth sorrow.
(Ecclesiastes 118)
Examinee-centered methods
B1/B2
Test-centered methods
49
He that increaseth knowledge increaseth sorrow.
(Ecclesiastes 118)
Test-centered methods
B1/B2
Examinee-centered methods
50
Instead of Conclusion
  • In sum, it may seem that providing valid grounds
    for valid inferences in standards-based
    educational assessment is a costly and
    complicated enterprise. But when the consequences
    of the assessment affect accountability decisions
    and educational policy, this needs to be weighed
    against the costs of uninformed or invalid
    inferences.
  • Messick, 1994

Butterfly Effect
Change one thing, change everything!
51
Instead of Conclusion
  • The chief determiner of performance standards is
    not truth it is consequences.
  • Popham, 1997

Butterfly Effect
Change one thing, change everything!
52
Instead of Conclusion
  • Perhaps by the year 2000, the collaborative
    efforts of measurement researchers and
    practitioners will have raised the standard on
    standard-setting practices for this emerging
    testing technology.
  • Berk, 1996

Butterfly Effect
Change one thing, change everything!
53
Rise up Magyar
A coward and a lowly bastard
Is he, who dares not raise the standard!
54
Thanks!
Rise up Magyar
A coward and a lowly bastard
Is he, who dares not raise the standard!
Write a Comment
User Comments (0)
About PowerShow.com