Sevres

About This Presentation

Transcript and Presenter's Notes

Title: Sevres

1

Sevres Munich benchmarking conferences some
personal observations
Dr Neil Jones
ALTE Conference Cardiff November 2005

2
A report on the Sevres analysis

Is available at http//www.coe.int/T/E/Cultural_C
o-operation/education/Languages/Language_Policy/Co
mmon_Framework_of_Reference/SevresreportNJ.pdf
(Follow links for CEFR then click on
Illustrations of the European levels of language
proficiency,
Then on Report on analysis of rating data)
The following questions are addressed
1. What is the best estimate of the CEFR level of
each extract?
2. How well do raters agree in their ratings?
3. What is the effect of plenary discussion on
the extent of agreement?
4. How do raters understand and use the rating
criteria?
5. Does agreement improve over time?
6. Do rater groups perform differently?

3
A benchmarking conference
True ability
Shared construct
CEFR
Learnersranked on shared criteria
and their CEFRlevelsagreed
Own constructs
Sample of performance
Raters
4
A benchmarking conference
True ability
Shared construct
CEFR
Learnersranked on shared criteria
and their CEFRlevelsagreed
Own constructs
Sample of performance
Raters
5
A benchmarking conference
True ability
Shared construct
CEFR
Learnersranked on shared criteria
and their CEFRlevelsagreed
Own constructs
Sample of performance
Guidance
Raters
6
Issues

What formats can work well for benchmarking?
How can/should a benchmarking exemplar differ
from a standardisation video for a given exam?
Which parts of the CEFR reference scales are
most useful for rating?
Is developing a shared understanding of the
construct exactly the same thing as standardising
raters agreement on CEFR levels?

7
Using the CEFR scales evidence from analysis of
data

The speaking assessment criteria are not
differentiated.
A generalizability study from Sevres
Munich data not yet analysed

8
Generalizability study (Rating criteria are not
differentiated)
Rating criteria are Range, Accuracy, Fluency,
Interaction, Coherence, Global rating
9
So what are raters really doing?

First form an overall impression of the level.
(this was the procedure adopted in Munich)
Then look at the criteria to confirm/rationalise
the decision.
The criteria are generally not concrete enough to
differentiate between specific performances.
Yes, raters do judge some criteria more harshly
than others

10
Relative difficulty of rating criteria
11
Relative difficulty of rating criteria

Raters do judge some criteria more harshly than
others, but they do the same for everybody!
Munich data not analysed but seemed very similar
in this respect
Should the accuracy scale be adjusted down and
the fluency/interaction scales adjusted up?
Perhaps this would not help penalizing error and
rewarding communication is part of feeling
comfortable about our overall decision.

12
Focus on salience

When we form an overall impression of the level
of a performance, what are we focussing on?
The salient features of the level what
distinguishes it from a higher or lower level
(an exercise based on this in Munich)
Fluency is a scale with one point on it described
as fluent.
My attempt at a minimal level description

13
NJs minimal table of salient features
14
NJs even shorter list
15
Do all rating criteria have the same status?

Range appears to be linked to the tasks one can
do it is almost a definition of a
functionally-defined proficiency scale.
(Hence the problems with format, if the task is
below a subjects level.)
Interaction, coherence are dependent on the task
you cant demonstrate more than the task
demands.
Fluency, accuracy are inversely related to the
demands of a task.
But there may be a trade-off between fluency and
accuracy.

16
Simple model of speaking performance
Subject atsame levelas taskeven profile
17
Simple model of speaking performance
Subject athigher levelthan taskcant show
true range, interaction, coherence but good
accuracy, fluency
18
Simple model of speaking performance
Subject below levelof taskshows true range,
interaction, coherence, gives poor impression of
accuracy, fluency
19
Simple model of speaking performance
But may manage greater accuracy at expense of
fluency
20
Simple model of speaking performance
Or vice-versa
21
My personal opinion

One should aim for a shared understanding of the
construct, which includes a shared awareness of
how rating works.
Spending more time comparing performances (rather
than rating them) would help.
Its vital to grasp the salient features of a
level.
CEFR has much useful text about this.
Detailed study of the text of the rating criteria
may not be the best way of standardising
perceptions.

22
(No Transcript)
23
Some other analysis results

Degree of agreement before and after discussion
Performance by rater group

24
Agreement before and after discussion
25
Performance by rater group

Write a Comment

User Comments (0)

About PowerShow.com

Sevres PowerPoint PPT Presentation