Sevres - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Sevres

Description:

A report on the Sevres analysis ... Using the CEFR scales: evidence from analysis of data ... Munich data not analysed but seemed very similar in this respect ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 26
Provided by: wielan
Category:

less

Transcript and Presenter's Notes

Title: Sevres


1
  • Sevres Munich benchmarking conferences some
    personal observations
  • Dr Neil Jones
  • ALTE Conference Cardiff November 2005

2
A report on the Sevres analysis
  • Is available at http//www.coe.int/T/E/Cultural_C
    o-operation/education/Languages/Language_Policy/Co
    mmon_Framework_of_Reference/SevresreportNJ.pdf
  • (Follow links for CEFR then click on
    Illustrations of the European levels of language
    proficiency,
  • Then on Report on analysis of rating data)
  • The following questions are addressed
  • 1. What is the best estimate of the CEFR level of
    each extract?
  • 2. How well do raters agree in their ratings?
  • 3. What is the effect of plenary discussion on
    the extent of agreement?
  • 4. How do raters understand and use the rating
    criteria?
  • 5. Does agreement improve over time?
  • 6. Do rater groups perform differently?

3
A benchmarking conference
True ability
Shared construct
CEFR
Learnersranked on shared criteria
and their CEFRlevelsagreed
Own constructs
Sample of performance
Raters
4
A benchmarking conference
True ability
Shared construct
CEFR
Learnersranked on shared criteria
and their CEFRlevelsagreed
Own constructs
Sample of performance
Raters
5
A benchmarking conference
True ability
Shared construct
CEFR
Learnersranked on shared criteria
and their CEFRlevelsagreed
Own constructs
Sample of performance
Guidance
Raters
6
Issues
  • What formats can work well for benchmarking?
  • How can/should a benchmarking exemplar differ
    from a standardisation video for a given exam?
  • Which parts of the CEFR reference scales are
    most useful for rating?
  • Is developing a shared understanding of the
    construct exactly the same thing as standardising
    raters agreement on CEFR levels?

7
Using the CEFR scales evidence from analysis of
data
  • The speaking assessment criteria are not
    differentiated.
  • A generalizability study from Sevres
  • Munich data not yet analysed

8
Generalizability study (Rating criteria are not
differentiated)
Rating criteria are Range, Accuracy, Fluency,
Interaction, Coherence, Global rating
9
So what are raters really doing?
  • First form an overall impression of the level.
  • (this was the procedure adopted in Munich)
  • Then look at the criteria to confirm/rationalise
    the decision.
  • The criteria are generally not concrete enough to
    differentiate between specific performances.
  • Yes, raters do judge some criteria more harshly
    than others

10
Relative difficulty of rating criteria
11
Relative difficulty of rating criteria
  • Raters do judge some criteria more harshly than
    others, but they do the same for everybody!
  • Munich data not analysed but seemed very similar
    in this respect
  • Should the accuracy scale be adjusted down and
    the fluency/interaction scales adjusted up?
  • Perhaps this would not help penalizing error and
    rewarding communication is part of feeling
    comfortable about our overall decision.

12
Focus on salience
  • When we form an overall impression of the level
    of a performance, what are we focussing on?
  • The salient features of the level what
    distinguishes it from a higher or lower level
  • (an exercise based on this in Munich)
  • Fluency is a scale with one point on it described
    as fluent.
  • My attempt at a minimal level description

13
NJs minimal table of salient features
14
NJs even shorter list
15
Do all rating criteria have the same status?
  • Range appears to be linked to the tasks one can
    do it is almost a definition of a
    functionally-defined proficiency scale.
  • (Hence the problems with format, if the task is
    below a subjects level.)
  • Interaction, coherence are dependent on the task
    you cant demonstrate more than the task
    demands.
  • Fluency, accuracy are inversely related to the
    demands of a task.
  • But there may be a trade-off between fluency and
    accuracy.

16
Simple model of speaking performance
Subject atsame levelas taskeven profile
17
Simple model of speaking performance
Subject athigher levelthan taskcant show
true range, interaction, coherence but good
accuracy, fluency
18
Simple model of speaking performance
Subject below levelof taskshows true range,
interaction, coherence, gives poor impression of
accuracy, fluency
19
Simple model of speaking performance
But may manage greater accuracy at expense of
fluency
20
Simple model of speaking performance
Or vice-versa
21
My personal opinion
  • One should aim for a shared understanding of the
    construct, which includes a shared awareness of
    how rating works.
  • Spending more time comparing performances (rather
    than rating them) would help.
  • Its vital to grasp the salient features of a
    level.
  • CEFR has much useful text about this.
  • Detailed study of the text of the rating criteria
    may not be the best way of standardising
    perceptions.

22
(No Transcript)
23
Some other analysis results
  • Degree of agreement before and after discussion
  • Performance by rater group

24
Agreement before and after discussion
25
Performance by rater group
Write a Comment
User Comments (0)
About PowerShow.com