Title: Suggestions for Using Information- Exchange Tasks for Oral Testing
1Chapter 5
- Suggestions for Using Information- Exchange Tasks
for Oral Testing -
2In this chapter we explore
- Four general criteria for designing language
tests that can be applied to the design of oral
tests - Washback effects
- Suggestions for developing oral tests from
information-exchange tasks - Evaluation criteria for oral tests
3Four criteria for designing a good test
- Carroll (1980) identifies four general criteria
in foreign language testing - Economy
- Relevance
- Acceptability
- Comparability
4Economy
- By economy Carroll means obtaining the greatest
amount of information about the learners
language in as little time as possible and with a
minimum of energy expended. - For a test to be economic, it should merely
sample the material covered, not exhaust it. - An instructor can select from among the many
items covered and infer or project something
about the learners overall knowledge or ability.
5Relevance
- Relevance refers to the match between the course
and curriculum goals and the tests. - For example, if you teach a course in
conversational use of Italian, you would not want
to give a formal composition as the final exam. - For a test to be relevant, it should reflect not
simply what is taught but, more importantly, how
it is taught.
6Acceptability
- Acceptability is a concept that takes the
learners point of view into consideration. - It implies learners willingness to participate
in the testing and their satisfaction that the
test evaluates their progress. - For many learners, acceptability is tied to
familiarity. If they are not familiar with a
testing format or procedure, they may view it as
unacceptable.
7Comparability
- Comparability is a concept that takes the
institutions point of view into consideration. - Test scores for learners who are taught the same
material by the same method should be similar. - For example, those enrolled in the 9am section of
Portuguese 102 should have test scores similar to
the scores of learners enrolled in the 2pm
section if the two sections have common goals,
materials, syllabi, and methods.
8Washback effects
- Krashen and Terrell (1983) made a statement that
addresses the acceptability of a test. - Testing can be done in a way that will have a
positive effect on the students progress. The
key to effective testing is the realization that
testing has a profound effect on what goes on in
the classroom
9Krashen and Terrell (1983) continued
- Teachers are motivated to teach and students
are motivated to study materials which will be
covered on tests. Quite simply, if we want
students to acquire a second language, we should
give tests that promote the use of acquisition
activities in and out of the classroom. In
other words, our tests should motivate students
to prepare for the tests by obtaining more
comprehensible input and motivate teachers to
supply it. - (Krashen Terrell, 1983, p.165)
10Washback effect
- What and how you test has ramifications for what
instructors do in the classroom, what learners
expect instructors to do in the classroom, and
what learners do outside the classroom. - Testing cannot be viewed as an isolated event it
must be an integral part of the teaching and
learning enterprise.
11The relevance of a test
- Using an approach in the classroom which
emphasizes the ability to exchange messages, and
at the same time testing only the ability to
apply grammar rules correctly, is an invitation
to disaster. - (Krashen Terrell, 1983, p.165)
12Oral testing in classrooms
- Adapting information-exchange tasks for use as
oral tests and quizzes - Lee and VanPatten define communicative burden
as the responsibility of an individual test taker
to initiate, respond, manage, and negotiate an
oral event. - The communicative burden of a group discussion is
less than the communicative burden of an oral
interview. - In a discussion, multiple participants share the
communicative burden, each one assuming the
responsibilities of initiating, responding,
managing, and negotiating the event.
13Communicative burden
- The communicative burden of a test format becomes
an issue when the teacher is considering whether
to give an oral quiz or test. - One might decide that an oral quiz at the end of
a lesson in the first semester should have a low
communicative burden, whereas a quiz at the end
of a lesson in the fourth semester should have a
greater one. - There are a number of instructional decisions to
make regarding oral testing, and these decisions
depend on a variety of pedagogical and practical
factors.
14Washback effect
- These decisions may well have a washback effect
on instruction. - By knowing and being familiar with the
characteristics of the test, instructors may
incorporate activities into the classroom that
they feel will lead to success on the test. - The type of test can influence both what
instructors emphasize and the way in which they
emphasize it.
15Content of the oral quiz
- The content of the oral quiz or test can have
another kind of washback effect on instruction. - If the content of the oral test is overtly tied
to classroom activities, the learners are
provided a stronger motivation for participating
in the activities. - Testing and teaching should be interrelated so
that learners are responsible for what happens in
class.
16Demonstration
- To demonstrate how Lee and VanPatten interrelate
teaching and testing, they convert four of the
information-exchange tasks presented in Chapter 3
into test sections. One of these examples is
illustrated here. - Recall the following activity from Chapter 3.
17Compare your birthday experiences
- Step 1 Fill in the chart as you interview a
classmate.
Birthday Where? With whom? Food? Fun?
2 yrs. Ago
5 yrs. Ago
10yrs. ago
Step 2 Now write a paragraph in which you
compare and contrast your birthdays.
18Test section on this activity
- Phase 1 Warm up. Make the test taker feel
comfortable. - Phase 2 Initial questioning. Who was your
partner? When is that persons birthday? When is
your birthday? - Phase 3 Activity-related questions. Referring to
the chart you made in class, tell me whether you
and your partner have celebrated your birthdays
in similar or different ways.
19Two tests for evaluating spoken language
- The first oral proficiency test is the Oral
Proficiency Interview (OPI) which was developed
by the American Council on the Teaching of
Foreign Languages (ACTFL) in conjunction with the
Educational Testing Service and several
government agencies. - The other test is the Israeli National Oral
Proficiency Test developed by Elana Shohamy and
her colleagues.
20The Oral Proficiency Interview (OPI)
- The ACTFL Oral Proficiency Interview has been
likened to a face-to-face conversation because an
interviewer converses with an interviewee. - The goal of the OPI is to obtain a sample of
speech that can be rated using the ACTFL
Proficiency Guidelines as the measure.
21Guidelines
- These guidelines comprise level-by-level (from
Novice to Superior) descriptions of learner
performance - The content that a learner at a particular level
might dominate - Simple greetings, health matters, family, etc.
- The functions the learner dominates
- Narrating in the past, present, and future
- The accuracy present in the learners speech
22Phases
- The procedures used to elicit learner language
during the OPI are termed phases. - Omaggio-Hadley (1993, pp.456-58) describes each
phase as follows
23Phase 1 Warm up
- The warm-up portion of the interview is very
brief and consists of greeting the interviewee,
making him or her feel comfortable, and
exchanging the social amenities that are normally
used in everyday conversations. - Typically, the warm-up lasts less than three
minutes.
24Phase 2 Level check
- This phase consists of establishing the highest
level of proficiency at which the interviewee can
sustain speaking performance. - This phase of the interview allows the person
being tested to demonstrate his or her strengths. - Designed to elicit a speech sample that is
adequate to prove that the person can function
accurately at the level hypothesized by the
interviewer during the warm-up phase. - Allows the interviewer to get a better idea of
the actual proficiency level of the interviewee.
25Phase 3 Probes
- Probes are questions or tasks designed to elicit
a language sample at one level of proficiency
higher than the hypothesized level in order to
establish a ceiling on the interviewees
performance. - The probes may result in linguistic breakdown-
the point at which the interviewee ceases to
function accurately or cogently because the task
is too difficult.
26Phase 4 Wind-down
- When a ratable sample has been obtained, the
tester brings the interviewee back to the level
at which he or she functions most comfortably for
the last few minutes of the interview. - This last phase gives the tester one more
opportunity to verify that his or her rating is
indeed correct.
27Single-format test
- Each test giver follows the standard, prescribed
phases. - OPI training ensures that raters carry out the
interview uniformly and apply the ratings
consistently. - The OPI is referred to as a single-format test,
for it consists of only one task (an interview)
and there are no other components to the test.
28Two concepts
- There are two important concepts that emerge from
a consideration of testing. - Bias refers to situations in which elicitation
and evaluation procedures are not the same for
all test takers. The test giver is the variable
in this scenario. - Inter-rater reliability refers to the desire to
have all raters evaluate a test the same way.
Given a set of criteria, all raters should apply
them the same way.
29Questions about OPI
- Although useful for a variety of reasons, the OPI
has been questioned because of its single-format
nature. - Shohamy states that
- Viewing oral language as constituting a
multiple of different speech styles and
functions, (e.g., discussing, arguing,
apologizing, interviewing, conversing, being
interviewed, reporting, etc.) means that
30Shohamy continued
- being interviewed, the speech style and
function tapped in an oral interview, represents
only a single type of oral interaction. No doubt
that it is an important speech style, but
clearly, there are also other oral interactions
which are equally important in real life
situations. - (Shohamy, 1987, p.52)
31The Israeli National Oral Proficiency Test
- In a series of studies, Shohamy and her
colleagues (Reyes, 1982 Shohamy , Reyes,
Bejerano, 1986) found that a learners
performance on an oral interview was not a valid
predictor of that learners performance on other
oral tasks. - This test was introduced in Israel in 1986 as the
national examination for students at the end of
twelfth grade.
32INOPT continued
- The Israeli National Oral Proficiency Test in
English as a Foreign Language (INOPT), in
contrast to the OPI, is multicomponential by
design and therefore, more comprehensive. - In addition to the oral interview, three other
tasks are also used to evaluate test takers oral
proficiency role play, a reporting task, and
group discussion.
33Justification of the four formats
- Each format elicits a different speech style, so
that the test as a whole comprises a range of
speech styles that reflect communicative language
use in authentic situations. - Their research demonstrated that the test did
discriminate well among various levels of oral
proficiency.
34Justification continued
- Their statistical analyses on the test allowed
them to conclude that each section of the test
was indeed different from the other sections. - They concluded that if the goal was to test
various speech styles, then each would need to be
tested via separate oral tests.
35Descriptions
- Shohamy and her fellow researchers offer the
following descriptions used in the INOPT. - You will notice that their Oral Interview and the
ACTFL OPI are quite similar.
36Shohamys tests
- Test 1Oral interview. The rationale underlying
this test was to guide the test-takers into a
dialogue with the tester. - Test 2Role play. The rationale behind this test
was to stimulate the test-taker to produce
spontaneous speech-behavior within given roles
eliciting specific speech functions. In it, the
test-taker had to play one role, with the tester
playing another, both partners in a dialogue.
37Shohamys tests continued
- Test 3Reporting test. The rationale underlying
this test was to stimulate the test-taker into a
monologue in the foreign language. The student
read a newspaper article silently in Hebrew, and
was asked to report its general content in
English. - Test 4Group Discussion. The rationale underlying
this test was to stimulate the test-takers into a
spontaneous discussion of a controversial issue.
38Evaluation criteria for tests of spoken language
- The speech sample elicited via the OPI is judged
against the ACTFL Proficiency Guidelines. - The following level descriptions are taken from
Omaggio Hadley (1993, pp.502-504). - Novice The Novice level is characterized by the
ability to communicate minimally with learned
material.
39Level descriptions continued
- Intermediate
- Create with the language by combining and
recombining learned elements, though primarily in
a reactive mode - Initiate, minimally sustain, and close in a
simple way basic communicative tasks - Ask and answer questions
40Level descriptions continued
- Advanced
- Converse in a clearly participatory fashion
- Initiate, sustain, and bring to closure a wide
variety of communicative tasks - Satisfy the requirements of the school and work
situations - Narrate and describe with paragraph-length
connected discourse
41Level descriptions continued
- Superior
- Participate effectively in most formal and
informal conversations on practical, social,
professional, and abstract topics - Support opinions and hypothesize using
native-like discourse strategies
42Judging speech samples
- The four speech samples elicited by the INOPT are
- each judged separately according to the following
scale. - 4 Unintelligible
- No language produced
- No interaction possible
- 5 Hardly intelligible
- Very poor language produced
- Only simplest, fragmentary interaction possible
43Judging speech samples continued
- 6 Clearly intelligible
- Simple language produced
- Interaction possible
- Not articulate
- 7 Responsive in interaction
- Slightly more sophisticated language produced
- Consistent error but do not interfere with
fluency - Strong MT mother tongue interference
(translated patterns, etc.)
44Judging speech samples continued
- 8 Almost effortless in expression
- Adequate in interaction
- Errors not consistent
- 9 Facility of expression
- Comfortable initiating in interaction
- Sporadic mistakes
- 10 No limitation whatsoever
- Near-native
- (Shohamy et al. 1986, p.219)
45Similarities between OPI and INOPT
- Each contains some kind of interview.
- Each uses holistic ratings (that is, a single
final score for the entire test). - Bachman (1990, p.328) argues that proficiency is
not a unitary ability, but rather a componential
one because we can identify the pieces and
constituent parts of oral proficiency.
46Componential rating scales
- If oral proficiency is not a unitary ability,
then it should not be tested as such (Shohamy et
al., 1986), and just as important, it should not
be scored as such (Bachman,1990). - Bachman proposes that tests of oral proficiency
be evaluated using componential scoring criteria
and provides the following criteria used in a
test of oral proficiency he developed with a
colleague (Bachman Palmer, 1983.) - The three scales assess grammatical, pragmatic,
and sociolinguistic competence.
47Sample of scale for grammatical competence
Rating Range Accuracy
0 No systematic evidence of morphologic and syntactic structures Control of few or no structures errors of all or most possible types
3 Large, but not complete range of both morphologic and syntactic structures Control of some structures used, but with many error types
6 Complete range No systematic errors
48Sample of scale for pragmatic cohesion
Rating Vocabulary Cohesion
0 Extremely limited (a few phrases and formulaic words) No cohesion (utterances completely disjointed)
2 Moderate size (frequently misses or searches for words) Moderate cohesion (relationships between utterances generally marked)
4 Extensive size (rarely, if ever, misses or searches for words) Excellent cohesion (uses a variety of appropriate devices)
49Sample of scale for sociolinguistic competence
Registers Nativeness Use of cultural references
0 Evidence of only one register 1 Frequent nonnative but grammatical structures 0.5 No evidence of ability to use cultural references
3.5 Control of both formal and informal registers 4 No nonnative but grammatical structures 4 Full control of appropriate cultural references
50Pause to consider(p. 111)
- the diagnostic uses of classroom tests. One of
the important functions of classroom testing is
its diagnostic function By examining learners
performance on a test, we can provide them
feedback on their strengths and weaknesses. - Does a global, holistic score provide an
instructor the capability of giving diagnostic
feedback? Think about what you would want to
know about your own oral proficiency in the L2. - Would a high score on an oral proficiency test
mean that you did not have weaknesses? - Would a low score indicate what specific things
you could do to improve?
51Summary of chapter 5
- Adapted classroom activities for testing
situations - Examined two tests for evaluating spoken language
- Suggested the use of tests that examine a variety
of speech styles and functions via multiple
formats
52Summary of chapter 5 continued
- Presented several componential rating scales,
which allow a more precise evaluation of the
speech sample as well as a more detailed
diagnosis of the learners language. - Suggested that the choice of rating scales should
depend on the types of oral interactions elicited
and whether the interaction involves just a test
giver or other learners.
53Thinking more about it p. 115
- 3 Consider the advantages and disadvantages of
having three language learners perform an
information-exchange task as an oral test or
quiz. What rating scales would you use?