Inside the black box: Raising standards through classroom assessment - PowerPoint PPT Presentation

About This Presentation
Title:

Inside the black box: Raising standards through classroom assessment

Description:

Not invented here: the baffling insularity of assessment practices in higher education Dylan Wiliam www.dylanwiliam.net Keynote presentation at the University of ... – PowerPoint PPT presentation

Number of Views:392
Avg rating:3.0/5.0
Slides: 29
Provided by: DylanW4
Category:

less

Transcript and Presenter's Notes

Title: Inside the black box: Raising standards through classroom assessment


1
(No Transcript)
2
Not invented here the baffling insularity of
assessment practices in higher educationDylan
Wiliamwww.dylanwiliam.net
  • Keynote presentation at the University of London
    External Systems 150th anniversary Assessment
    Symposium

3
Overview some assessment tensions
  • Function
  • Formative versus summative
  • Quality
  • Validity versus reliability
  • Format
  • Multiple-choice versus constructed response
  • Scope
  • Continuous versus one-off

4
FunctionQualityFormatScope
5
A statement of the blindlingly obvious
  • You cant work out how good something is until
    you know what its intended to do
  • Function, then quality

6
Formative and summative
  • Descriptions of
  • Instruments
  • Purposes
  • Functions

An assessment functions formatively when evidence
about student achievement elicited by the
assessment is interpreted and used to make
decisions about the next steps in instruction
that are likely to be better, or better founded,
than the decisions they would have taken in the
absence of that evidence.
7
Greshams law and assessment
  • Usually (incorrectly) stated as Bad money drives
    out good
  • The essential condition for Gresham's Law to
    operate is that there must be two (or more) kinds
    of money which are of equivalent value for some
    purposes and of different value for others
    (Mundell, 1998)
  • The parallel for assessment Summative drives out
    formative
  • The most that summative assessment (more
    properly, assessment designed to serve a
    summative function) can do is keep out of the way

8
FunctionQualityFormatScope
9
Validity
  • Traditional definition a property of assessments
  • A test is valid to the extent that it assesses
    what it purports to assess
  • Key properties (content validity)
  • Relevance
  • Representativeness
  • Trinitarian doctrines of validity
  • Content validity
  • Criterion-related validity
  • Concurrent validity
  • Predictive validity
  • Construct validity

10
Validity
  • Validity is a property of inferences, not of
    assessments
  • One validates, not a test, but an interpretation
    of data arising from a specified procedure
    (Cronbach, 1971 emphasis in original)
  • The phrase A valid test is therefore a category
    error (like A happy rock)
  • No such thing as a valid (or indeed invalid)
    assessment
  • No such thing as a biased assessment
  • Reliability is a pre-requisite for validity
  • Talking about reliability and validity is like
    talking about swallows and birds
  • Validity includes reliability

11
Modern conceptions of validity
Validity is an integrative evaluative judgment
of the degree to which empirical evidence and
theoretical rationales support the adequacy and
appropriateness of inferences and actions based
on test scores or other modes of assessment
(Messick, 1989 p. 13)
  • Validity subsumes all aspects of assessment
    quality
  • Reliability
  • Representativeness (content coverage)
  • Relevance
  • Predictiveness

12
Meanings and consequences
Result interpretation Result use
Evidential basis Content validity Construct validity utility
Consequential basis Value implications Social consequences
Adverse social consequences are not in
themselves indicative of invalidity (Messick,
1989, p. 89) Right concern, wrong concept
(Popham, 1997)
13
Threats to validity
  • Inadequate reliability
  • Construct-irrelevant variance
  • Differences in scores are caused, in part, by
    differences not relevant to the construct of
    interest
  • The assessment assesses things it shouldnt
  • The assessment is too big
  • Construct under-representation
  • Differences in the construct are not reflected in
    scores
  • The assessment doesnt assess things it should
  • The assessment is too small
  • With clear construct definition all of these are
    technicalnot valueissues
  • But they interact strongly

14
FunctionQualityFormatScope
15
Item formats
  • No assessment technique has been rubbished quite
    like multiple choice, unless it be graphology
    Wood, 1991, p. 32)
  • Myths about multiple-choice items
  • They are biased against females
  • They assess only candidates ability to spot or
    guess
  • They test only lower-order skills

16
Mathematics 2
  • What can you say about the means of the following
    two data sets?
  • Set 1 10 12 13 15
  • Set 2 10 12 13 15 0
  • The two sets have the same mean.
  • The two sets have different means.
  • It depends on whether you choose to count the
    zero.

17
Mathematics 3
Which of the shapes below contains a dotted line
that is also a diagonal?
18
Science
  • The ball sitting on the table is not moving. It
    is not moving because
  • no forces are pushing or pulling on the ball.
  • gravity is pulling down, but the table is in the
    way.
  • the table pushes up with the same force that
    gravity pulls down
  • gravity is holding it onto the table.
  • there is a force inside the ball keeping it from
    rolling off the table

Wilson Draney, 2004
19
OU S354 Understanding space time
  • Below are five statements about the cosmic
    background radiation of our Universe. Select two
    options that are correct, according to the
    standard model of the Universe.
  • The microwave radiation collected on Earth is
    dominated by signals of cosmic origin
  • The total energy of the cosmic background
    radiation is currently much greater than that of
    matter
  • In a closed universe, the cosmic background
    radiation would eventually appear as visible
    light
  • The temperature of the cosmic background
    radiation was equal to that of the matter in the
    Universe until the appearance of galaxies
  • The number of photons in the cosmic background
    radiation has remained approximately constant
    since the era of decoupling

20
English
  • Where would be the best place to begin a new
    paragraph?

No rules are carved in stone dictating how long a
paragraph should be. However, for argumentative
essays, a good rule of thumb is that, if your
paragraph is shorter than five or six good,
substantial sentences, then you should reexamine
it to make sure that you've developed the ideas
fully. A Do not look at that rule of thumb,
however, as hard and fast. It is simply a general
guideline that may not fit some paragraphs. B A
paragraph should be long enough to do justice to
the main idea of the paragraph. Sometimes a
paragraph may be short sometimes it will be
long.  C On the other hand, if your paragraph
runs on to a page or longer, you should probably
reexamine its coherence to make sure that you are
sticking to only one main topic. Perhaps you can
find subtopics that merit their own paragraphs. D
Think more about the unity, coherence, and
development of a paragraph than the basic
length. E If you are worried that a paragraph is
too short, then it probably lacks sufficient
development. If you are worried that a paragraph
is too long, then you may have rambled on to
topics other than the one stated in your topic
sentence.
21
English 2
  • In a piece of persuasive writing, which of these
    would be the best thesis statement?
  • The typical TV show has 9 violent incidents
  • There is a lot of violence on TV
  • The amount of violence on TV should be reduced
  • Some programs are more violent than others
  • Violence is included in programs to boost ratings
  • Violence on TV is interesting
  • I dont like the violence on TV
  • The essay I am going to write is about violence
    on TV

22
History
  • Why are historians concerned with bias when
    analyzing sources?
  • People can never be trusted to tell the truth
  • People deliberately leave out important details
  • People are only able to provide meaningful
    information if they experienced an event
    firsthand
  • People interpret the same event in different
    ways, according to their experience
  • People are unaware of the motivations for their
    actions
  • People get confused about sequences of events

23
Automated scoring technologies
High-order
simulations
e-rater
m-rater
Skill level assessed
c-rater
Multiple-choice items
Low-order
unstructured
structured
evidence structure
24
FunctionQualityFormatScope
25
Continuous vs. one-off assessment
  • Continuous assessment
  • Pros
  • High validity (including reliability)
  • Reduced stress (for some students)
  • Cons
  • Comparability of work done at different times
  • Questions about the accumulation of learning over
    the programme
  • One-off assessment
  • Pros
  • Synoptic
  • Comparability issues minimized
  • Cons
  • Limited validity (especially reliability)
  • Stressful for some students (construct-irrelevant
    variance)

26
Reflections
27
The challenge
  • To design an assessment system that is
  • Distributed
  • So that evidence collection is not undertaken
    entirely at the end
  • Synoptic
  • So that learning has to accumulate
  • Extensive
  • So that all important aspects are covered
    (breadth and depth)
  • Manageable
  • So that costs are proportionate to benefits
  • Trusted
  • So that stakeholders have faith in the outcomes

28
The minimal take-aways
  • No such thing as a summative assessment
  • No such thing as a reliable test
  • No such thing as a valid test
  • No such thing as a biased test
  • Validity including reliability
Write a Comment
User Comments (0)
About PowerShow.com