Prospectus for the PADI design framework in language testing

About This Presentation

Title:

Prospectus for the PADI design framework in language testing

Description:

Constructing measures (Wilson) Understanding by design (Wiggins) ... What behaviors or performances should reveal those constructs? ... – PowerPoint PPT presentation

Number of Views:65

Avg rating:3.0/5.0

Slides: 44

Provided by: bobmi9

Category:

more less

Transcript and Presenter's Notes

Title: Prospectus for the PADI design framework in language testing

1
Prospectus for the PADI design framework in
language testing
Robert J. Mislevy Professor of Measurement Statistics University of Maryland Geneva D. Haertel Assessment Research Area Director SRI International

ECOLT 2006, October 13, 2006, Washington, D.C.
PADI is supported by the National Science
Foundation under grant REC-0129331. Any opinions,
findings, and conclusions or recommendations
expressed in this material are those of the
authors and do not necessarily reflect the views
of the National Science Foundation.

2
Some Challenges in Language Testing

Sorting out evidence about interacting aspects of
knowledge proficiency in complex performances
Understanding the impact of complexity factors
and difficulty factors on inference
Scaling up efficiently to high volume teststask
creation, scoring, delivery
Creating valid cost-effective low volume tests

3
Evidence-Centered Design

Evidence-centered assessment design (ECD)
provides language, concepts, knowledge
representations, data structures, and supporting
tools to help design and deliver educational
assessments,
all organized around the evidentiary argument an
assessment is meant to embody.

4
The Assessment Argument

What kinds of claims do we want to make about
students?
What behaviors or performances can provide us
with evidence for those claims?
What tasks or situations should elicit those
behaviors?
Generalizing from Messick (1994)

5
Evidence-Centered Design

With Linda Steinberg Russell Almond at ETS
The Portal project / TOEFL
NetPASS with Cisco (computer network design
troubleshooting)
Principled Assessment Design for Inquiry (PADI)
Supported by NSF (co-PI Geneva Haertel, SRI)
Focus on science inquirye.g., investigations
Models, tools, examples

6
Some allied work

Cognitive design for generating tasks (Embretson)
Model-based assessment (Baker)
Analyses of task characteristicstest and TLU
(Bachman Palmer)
Test specifications (Davidson Lynch)
Constructing measures (Wilson)
Understanding by design (Wiggins)
Integrated Test Design, Development, and Delivery
(Luecht)

7
Key ideas Explicit relationships Explicit
structures Generativity Re-usability Recomb
inability Interoperability
Layers in the assessment enterprise

From Mislevy Riconscente, in press

8
Expertise research, task analysis, curriculum,
target use, critical incident analysis,
ethnographic studies, etc.

In language assessment, importance of
Psycholinguistics
Sociolinguistics
Target language use

From Mislevy Riconscente, in press

9
Tangible stuff
e.g., what gets made and how it operates in
testing situation

From Mislevy Riconscente, in press

10
How do you get from here to here?

From Mislevy Riconscente, in press

11
We will focus today on two hidden layers

From Mislevy Riconscente, in press

12
We will focus today on two hidden layers
Domain modeling, which concerns the Assessment
Argument

From Mislevy Riconscente, in press

13
And the Conceptual Assessment Framework, which
concerns generative re-combinable design schemas

From Mislevy Riconscente, in press

14
More on the Assessment Argument

From Mislevy Riconscente, in press

15
PADI Design Patterns

Organized around elements of assessment argument
Narrative structures for assessing pervasive
kinds of knowledge / skill / capabilities
Based on research experience , e.g.
PADI Design under constraint, inquiry cycles,
representations
Compliance w. Grices maxims cause/effect
reasoning giving spoken directions
Suggest design choices that apply to different
contexts, levels, purposes, formats
Capture experience in structured form
Organized in terms of assessment argument

16
A Design Pattern Motivated by Grices Relation
Maxims
Attribute Value(s)
Name Grices Relation MaximResponding to a Request
Summary In this design pattern, an examinee will demonstrate following Grices Relation Maxim in a given language, by producing or selecting a response in a situation that presents a request for information (e.g., conversation).
Central claims In contexts/situations with xxx characteristics, can formulate and respond to representations of implicature from referents . semantic implication pragmatic implication
Additional knowledge that may be at issue Substantive knowledge in domain Familiarity with cultural models Knowledge of language
17
Grices Relation Maxims
Characteristic features The stimulus situation needs to present a request for relevant information to the examinee, either explicitly or implicitly.
Variable task features Production or choice as response? If production, oral or written production required? If oral, single response to a preconfigured situation or part of an evolving conversation? If evolving conversation, open or structured interview? Formality of prepackaged products (multiple choice, video taped conversations, written questions or conversations, one to one or more conversations which are prepared by interviewers) Formality of information and task (concrete or abstract, immediate or remote, information requiring retrieval or transformation, familiar or unfamiliar setting and topic, written or spoken) If prepackaged speech stimulus length, content, difficulty of language, explicitness of request, degree of cultural dependence. Content of situation (familiar or unfamiliar, degree of difficulty) Time pressure (e.g., time for planning and response) Opportunity for control the conversation
18
Grices Relation Maxims
Potential performances and work products Constructed oral response Constructed written or typed-in response Answer to a multiple-choice question where alternatives vary
Potential features of performance to evaluate Whether a student can formulate representations of implicature, as they are required in the given situation. Whether a student can make a conversational contribution or express the idea towards the accepted direction. Whether a student provides the relevant information as is required. Whether quality of choice among alternatives offered for a production in a given situation satisfies the Relation Maxim.
Potential rubrics (later slide)
Examples (in paper)
19
Some Relationships between Design Patterns and
Other TD Tools

Conceptual models for proficiency
Task characteristic frameworks
Grist for design choices about KSAs task
features
DPs present integrated design space
Test specifications
DPs for generating argument, design choices
Test specs for documenting, specifying choices

20
More on the Conceptual Assessment Framework

From Mislevy Riconscente, in press

21
Evidence-centered assessment design
Technical specs that embody the elements
suggested in the design pattern

The three basic models

22
Evidence-centered assessment design
Conceptual Representation

The three basic models

23
Screen shot of user interface
User-Interface Representation
24
High-level UML Representation of the PADI Object
Model
UML Representation (sharable data structures,
behind the screen)
25
Evidence-centered assessment design

What complex of knowledge, skills, or other
attributes should be assessed?

26
The NetPass Student Model
Multidimensional measurement model with selected
aspects of proficiency
Can use same student model with different tasks.
27
Evidence-centered assessment design

What behaviors or performances should reveal
those constructs?

28
Evidence-centered assessment design

What behaviors or performances should reveal
those constructs?

From unique student work product to evaluations
of observable variablesi.e., task-level scoring
29
Skeletal Rubric for Satisfaction of Quality
Maxims

4 Responses and explanations are relevant as
required for current purposes of the exchange and
neither more elaborated than appropriate or
insufficient for the context. They fulfill the
demands of the task with at most minor lapses in
completeness. They are appropriate for the task
and exhibit coherent discourse.
3 Responses and explanations address the task
appropriately and are relevant as required for
current purposes of the exchange, but they may
either more elaborated than required or fall
short of being fully developed.
2 The responses and explanations are connected
to the task, but are either markedly excessive in
information supplied or not very relevant to the
current purpose of the exchange. Some relevant
information might be missing or inaccurately
cast.
1 The responses and explanations are either
grossly relevant or are very limited in content
or coherence. In either case they may be only
minimally connected to the task.
0 Speaker makes no attempt to respond or
response is unrelated to the topic. A writing
response at this level merely copies sentences
from the topic, rejects the topic or is otherwise
not connected to the topic. A spoken response is
not connected to the direct or implied request
for information.

30
Notes re Observable Variables

Re-usable (tailorable) to different tasks
projects
Can be multiple aspects of performance being
rated.
May be 1-1 relationship with Student model
Variables, but need not be.
That is, there can be multiple aspects of
proficiency that are involved in probability of
high / satisfactory/ certain style of response

31
Evidence-centered assessment design
Values of observable variables used to update
probability distributions for student-model
variables via psychometric modeli.e., test-level
scoring.

What behaviors or performances should reveal
those constructs?

32
An NetPass Evidence-Model Fragment for Design
Measurement models indicate which SMVs, in which
combinations, affect which observables. Task
features influence which ones and how much, in
structured measurement models.
Re-usable conditional-probability fragments and
variable names for different tasks with the same
evidentiary structure.
33
Evidence-centered assessment design

What tasks or situations should elicit those
behaviors?

34
Representations to the student, and sources of
variation
35
Task Specification Template -Determining Key
Features (Wizards)

Setting Corporation
Conference Center
University
Building Length Less than 100m
More than 100m
Ethernet Standard 10BaseT
100BaseT
Subgroup Name Teacher
Student
Customer
Bandwidth for a Subgroup Drop 10Mbps
100Mbps
Growth Requirements Given
NA

36
Structured Measurement Models

Examples of models
Multivariate Random Coefficients Multinomial
Logit Model (MRCMLM Adams, Wilson, Wang, 1997)
Bayes nets (Mislevy, 1996)
General Diagnostic Model (von Davier Yamamoto)
By relating task characteristics to difficulty
with respect to different aspects of proficiency,
create tasks with known properties.
Can create families of tasks around same
evidentiary frameworks e.g., For read write
tasks, can vary characteristics of texts,
directives, audience, purpose.

37
Structured Measurement Models

Articulated connection between task
characteristics and models of proficiency
Moves beyond modeling difficulty
Traditional test theory a bottleneck in
multivariate environment
Dealing with complexity factors and difficulty
factors (Robinson)
Model complexity factors as covariates for
difficulty parameters wrt those aspects of
proficiency they impact
Model difficulty factors as either SMVs, if
target of inference, or as noise, if nuisance.

38
Advantages A framework that

Guides task and test construction (Wizards)
Provides high efficiency and scalability
By relating task characteristics to difficulty,
allows creating tasks with targeted properties
Promotes re-use of conceptual structures (DPs,
arguments) in different projects
Promotes re-use of machinery in different projects

39
Evidence of effectiveness

Cisco
Certification training assessment
Simulation-based assessment tasks
IMS/QTI
Conceptual model for standards for data
structures for computer-based testing
ETS
TOEFL
NBPTS

40
Conclusion

Isnt this just a bunch of new words for
describing what we already do?

41
An answer (Part 1)

42
An answer (Part 2)

An explicit, general framework makes
similarities and implicit principles explicit
To better understand current assessments
To design for new kinds of assessment
Tasks that tap multiple aspects of proficiency
Technology-based tasks (e.g., simulations)
Complex observations, student models, evaluation
To foster re-use, sharing, modularity
Concepts arguments
Pieces of machinery processes (QTI)

Prospectus for the PADI design framework in language testing - PowerPoint PPT Presentation

Prospectus for the PADI design framework in language testing

Constructing measures (Wilson) Understanding by design (Wiggins) ... What behaviors or performances should reveal those constructs? ... – PowerPoint PPT presentation