Title: Standards-based assessment
1Standards-based assessment
- Tim McNamara
- The University of Melbourne
2Standards-based assessment and criterion
referencing
- Standards-based assessment is a form of
criterion-referenced assessment (cf
norm-referenced assessment).
3Information derived from a Criterion-Referenced
Test
- The degree to which the student has attained
criterion performance, for example whether he can
satisfactorily prepare an experimental report. - Glaser 1994 1963, p.6
4Information derived from a Norm-Referenced Test
- The relative ordering of individuals with respect
to their test performance, for example, whether
Student A can solve his problems more quickly
than Student B. - Glaser 1994 1963, p.6
5Definition of a criterion-referenced test
- A criterion-referenced test is one that is
deliberately constructed to yield measurements
that are directly interpretable in terms of
specified performance standards. Performance
standards are generally specified by defining a
class or domain of tasks that should be performed
by the individual. - Glaser and Nitko, 1971, p. 653
6Definition of a criterion-referenced test (2)
- A students score on a criterion-referenced
measure provides explicit information as to what
the student can and cant do. Criterion-reference
d measures indicate the content of the
behavioural repertory, and the correspondence
between what an individual does and the
underlying continuum of achievement. Measures
which assess student achievement in terms of a
certain criterion standard thus provide
information as to the degree of competence
attained by a particular student which is
independent of reference to the performance of
others. - Glaser, 1963, p. 519
7Norm-referenced test
- Any test that is primarily designed to disperse
the performances of students in a normal
distribution based on their general abilities, or
proficiencies, for purposes of categorizing the
students into levels or comparing students
performances to the performances of others who
formed the normative group. - Brown and Hudson (2002, p. 2)
8Is CRT behaviourist?
- Criterion-referenced testing has its origins in
behaviourism, but need not be atomistic, purely
dichotomous, or reductive.
9Criterion-referencing and levels on a continuum
- Underlying the concept of achievement measurement
is the notion of a continuum of knowledge
acquisition ranging from no proficiency at all to
perfect performance. An individuals achievement
level falls at some point on this continuum as
indicated by the behaviors he displays during
testing. The degree to which his achievement
resembles desired performance at any level is
assessed by criterion-referenced measures of
achievement or proficiency.
10Scales and CRT
- The standard against which a students
performance is compared when measured in this
manner is the behavior which defines each point
along the achievement continuum. The term
criterion, when used in this way, does not
necessarily refer to final end-of-course
behavior. Criterion levels can be established at
any point in instruction where it is necessary to
obtain information as to the adequacy of an
individuals performance. - Glaser, 1963, pp. 519-520
11Interface with policy - scales and frameworks
- Dominant movement in language education
internationally - Driven by need for accountability and emphasis on
demonstrable outcomes - Has adopted functionalist view of language
education (i.e. not cultural, intellectual,
values dimension) - Response to demands of globalization, efficiency
- Curriculum and assessment addressed in single
framework - Emphasis on reporting
12Format of standards
- Standards are typically formulated as an ordered
series of statements about levels of achievement
or stages of development. - (There may be multiple sets of ordered statements
for different aspects of language development)
13CEFR Levels A2 , B1 (speaking)
- A2 Can understand sentences and frequently used
expressions related to areas of most immediate
relevance (e.g. very basic personal and family
information, shopping, local geography,
employment). Can communicate in simple and
routine tasks requiring a simple and direct
exchange of information on familiar and routine
matters. Can describe in simple terms aspects of
his/her background, immediate environment and
matters in areas of immediate need. - B1 Can understand the main points of clear
standard input on familiar matters regularly
encountered in work, school, leisure, etc. Can
deal with most situations likely to arise whilst
travelling in an area where the language is
spoken. Can produce simple connected text on
topics which are familiar or of personal
interest. Can describe experiences and events,
dreams, hopes and ambitions and briefly give
reasons and explanations for opinions and plans.
14Mislevy claims and evidence
An assessment is a machine for reasoning ASSESSMENT ARGUMENT
about what students know, can do or have accomplished CLAIMS
based on a handful of things they say, do, or make in particular settings OBSERVATIONS/ EVIDENCE
15What is the CEFR?
- It represents a construct definition it is an
exercise in domain modelling - It provides a set of claims
- It provides a general characterization of
evidence and tasks - It is not a test - it allows different kinds of
tests to be realizations of this construct
16Possible functions of standards
- Planning to act as a series of objectives of
goals for teaching and learning involve clear
and specific statements of teaching aims - Professional understanding to inform teachers
about the typical progress of learning more
complex statements and include contextual and
interpretative information in order to help the
teacher understand more fully the nature of the
emergent ability in the learner - Accountability to act as statements of learning
outcomes for administrative purposes - tends to
be dominant function
17Formative vs summative assessment
- Can standards-based assessment help with
formative assessment?
18Gathering evidence to form basis of reporting
- Gathering of evidence a mixture of teacher-led
assessment and external examination - External evidence may be seen as intrusive,
insensitive to learning - Places burden on teacher for record keeping
- Requires intensive professional development of
teachers - Best schemes provide good advice to teachers
about integrating assessment in instruction -
Assessment for learning movement
19The assessment pyramid
- LEVELS
- (NUMBERED)
- LEVEL
- SUMMARIES
- STRAND DESCRIPTIONS
- WITHIN EACH MODE, EXAMPLES PROVIDED
- ADVICE TO TEACHERS DETAILED EXAMPLES
- TEACHER CHOOSES ACTIVITY CRITERIA
20Competing demands in standards-based assessment
Validity demands Managerialist demands Teacher/ learner demands
Intellectual defensibility of construct Evidence of Reliability Other validity evidence Concern for consequences Reporting Accountability Meaningfulness in instructional process Facilitation of learning Enhanced quality of teaching Minimization of administrative burden on teachers
21Dylan Wiliam Beyond norm- and criterion-reference
d tests
- Norm-referenced - hard to interpret in terms of
what a student can do limited to placing student
in cohort group - Criterion-referenced -
- leads to narrowing of teaching
- Also implies a cohort group
22Wiliam on the role of teachers
- An assessment is valid to the extent that you are
happy for teachers to teach towards the test - Therefore
- Involve teachers in summative assessment
- Increases reliability and validity
- Externalize standards
- Locates teacher as coach, not judge
- Requires teachers to form a community of
practice
23Wiliam on construct-referenced assessment
- Criteria do not define but exemplify grades
- Standards are shared by the community of
practice - Standards are implicit and evolve
24Example Standards and the PhD
- Implies a yes/no decision about individuals
- Impossible to specify criteria
- But examination process proceeds successfully
- Granting PhD is a performative utterance, an
illocutionary act (not a description) - the
person is launched on their career
25Wiliam on summative and formative assessment
- Effective summative assessment
- requires teachers to share a construct of quality
- Effective formative assessment
- Requires students to share the same construct of
quality - Requires teachers to posses an anatomy of quality
26Wiliam on quality rather than criteria
- Maxims cannot be understood, still less applied
by anyone not already possessing a good practical
knowledge of the art. They derive their interest
from our appreciation of the art and cannot
themselves either replace or establish that
appreciation.(Polanyi, 1958 p50). - Quality doesnt have to be defined. You
understand it without definition. Quality is a
direct experience independent of and prior to
intellectual abstractions.(Pirsig, 1991 p64).
27Our questions
- 1 assessment vs testing vs evaluation vs
validation vs measurement - 2 affective factors in assessment
- 3 influence of L1 on assessment
- 4 raters/judges
- 5 effect of tasks - (esp CELU)
- 6 criteria in writing and oral interaction
- 7 history of assessment
- 8 why assessment? Can we do without it?
- 9 performance assessment
28Our questions
- 10 qualitative vs quantitative aspects
- 11 correction in an oral exam
- 12 assessment as a process - and the final exam?
- 13 scales/descriptors for oral language
- 14 should listening be part of the oral exam?
- 15 Are we assessing what we want to assess?
- 16 Defining standards - intermed/advanced etc
- 17 Inter-rater reliability?
29Our questions
- 18 Inferring actual performance from exam
performance? - 19 Exam strategies
- 20 Criteria in assessing a performance - e.g.
grammar? - 21 Cultural aspects - interference in
performance, rating, etc?