Title: Week 1
1Week 1
- Introduction to language testing/assessment
2 TEACHING
ASSESSMENT
FORMAL
INFORMAL
TESTS
ALTERNATIVE ASSESSMENT
3Alternative Assessment Traditional Assessment - Testing
Direct examination of student performance and knowledge on real-life like tasks. formative Open-ended, creative answers Indirect examination of Ss performance and knowledge test items represent competence summative Focus on the right answer
Continous long-term assessment (process-oriented) one-shot standardized exams (product-oriented)
Untimed, free-response format, e.g. essays, oral presentation, portfolios Timed, MC, T/F, or matching format
Multiple modes of assessment, e.g. conduct research, write, revise, discuss papers, provide oral analysis of an event or reading (interactive performance) Modes of assessment usually limited to paper-and-pencil, one-answer questions (Non-interactive performance)
Contextualized communicative tasks Decontextualized test items
Individualized feedback Scores as feedback
4How is the procedure to be used?Formative
assessment vs. summative assessment
- Formative evaluating students in the process of
forming their skills to help them. All kinds
of informal assessment are formative. - Summative aims to measure what a student has
done, at the end of instruction or course, i.e.,
final exams
5Kinds of tests
- Language tests can be described from different
perspectives, depending on our criteria of
classification.
6Dimension One How test scores are reported?
- Norm-referenced
- Criterion-referenced
7Norm-referenced tests
- Measure global language abilities (e.g.
listening, reading speaking, writing) - Score on a test is interpreted relative to the
scores of all other students who took the test in
PERCENTILE terms. - The assumption the scores will be spread out
along a normal bell-shaped curve.
8Norm-referenced tests
- Students are compared with other peers. We are
not told directly what the student is capable of
doing. - Students know the format of the test but do not
know what specific content or skill will be
tested - A few relatively long subtests with a variety of
question contents
9Criterion-referenced tests
- Students are evaluated against a set of criteria
(i.e., language skills and knowledge, etc.). We
learn smt. about what she can actually do in the
language. - Measure well-defined and fairly specific
objectives - Interpretation of scores is considered absolute
without referring to other students scores is
compared only to the amount, or PERCENTAGE, of
material learned.
10Criterion-referenced tests
- Students know in advance what types of questions,
tasks, and content to expect for the test - A series of short, well-defined subtests with
similar question contents
11TYPES of TESTS (Adapted from Brown, 96)
Characteristic Norm-Referenced Criterion-Referenced
Type of interpretation Relative (A ss performance is compared to that of all other Ss in percentile terms) Absolute (A Ss performance is compared only to the amount or percentage of material learned)
Type of measurement To measure general language abilities or proficiencies To measure specific objectives-based language points
Distribution of scores Normal distribution of scores around the mean Varies, usually nonnormal (SS who know all the material should score 100)
Purpose of testing Spread Ss out along a continuum of general abilities or proficiencies Assess the amount of material known , or learned by each student
Test structure A few relatively long subtests with a variety of question contents A series of short, well-defined subtests with similar question contents
Knowledge of questions Ss have little or no idea what content to expect in questions Ss know exactly what content to expect
12Dimension two Purpose/Content
- Proficiency tests
- Achievement tests
- Diagnostic tests
- Placement tests
13Proficiency tests
- Test students ability in a language, regardless
of any training they may have had in that
language
14Achievement tests
- About the amount of learning that students have
done - The decision may involve who will advanced to the
next level of study or which students should
graduate - Must be designed with a specific reference to a
particular course - Conducted at the end of the program
- Used to make decisions about students levels of
learning, meanwhile can be used to affect
curriculum changes and to test those changes
continually against the program realities
15Placement tests
- Group students of similar ability levels
(homogeneous ability levels) - Help decide what each students appropriate level
will be within a specific program
16Diagnostic tests
- Aimed at fostering achievement by promoting
strengths and eliminating the weaknesses of
individual students - Require more detailed information about the very
specific areas in which students have strengths
and weaknesses - Conducted at the beginning or in the middle of a
language course - Can be diagnostic at the beginning or in the
middle but achievement test at the end - Perhaps the most effective use of a diagnostic
test is to report the performance level on each
objective to each student so that he or she can
decide how and where to invest time and energy
most profitably
17Decision Purposes (Adapted from Brown, 96)
Norm - Referenced Criterion - Referenced
Test Qualities Proficiency Placement Achievement Diagnostic
Detail of info. Very general General Specific Very specific
Focus Language ability regardless of any training Learning points all levels and skills program Terminal objectives of course or program Terminal and enabling objectives of courses
Purpose of decision Overall comparison of an individual with other individuals To find each students appropriate level To determine the degree of learning for advancement or graduation To inform students and teachers of objectives needing more work
Relationship to program Comparisons with other institutions Comparisons wihin program Directly related to objectives of program Directly related to objectives still needing work
When administered Before entry sometimes at exit Beginning of program End of courses Begining and/or middle of courses
Interpretation of scores Spread of scores Spread of scores Number amount of objectives learned Number amount of objectives learned
18Dimension Three Tasks to be performed in a test
- Direct
- Testing is said to be direct when it requires the
candidate to perform the skill that we want to
measure - e.g., Writing a short essay to test writing
ability - Semi-direct test
- Speaking and recording on a tape-recorder
- Interview is direct testing.
-
19- Indirect
- To measure the abilities that underlie the skills
in which we are interested. - e.g., Error correction to test writing ability
20Types of Tests Continued
- Direct Tests
- Candidate performs precisely the skill being
performed - Straightforward assessment of performance and
interpretation of scores - Positive backwash effect
- Easier with productive skills (i.e. Writing
speaking) - Small sample of tasks and teaching objectives
(generalizability problem) - Better for final achievement and proficiency
testing as long as a wide sample of tasks is used.
- Indirect Tests
- Measures the abilities which underlie the skills
being measured (e.g. Testing writing or
pronunciation through tasks that require
recognition, identification, discrimination
skills) - Allows for a representative sample of tasks and
teaching objectives - Weak relationship between the performance on the
task and on the skill - Better to obtain diagnostic information (e.g.
Measure control of grammatical structures)
21Dimension Four What language components are
tested
- Discrete-point
- refers to the testing of one element at a time,
item by item, focuses on precise points of vocab,
syntaz or grammar. - e.g., a test on a set of vocabulary only
- Or multiple-choice
- Angry
- alerta
- aburrido
- enojado
- molesto
22- Integrative
- requires the candidate to combine many language
elements in the completion of a task. - e.g., Cloze tests and dictation
23Types of Tests Continued
Discrete-point Tests Integrative Tests
Testing of one element at a time performance in very restricted areas of TL use. Combine many language elements in the completon of the task (e.g. writing a composition, dictation, cloze tests)
Based on the assumption that language can be broken down into its component parts Based on the assumption that language proficiency is indivisible (Ollers unitary trait hypothesis)
Tend to be indirect tests Used for diagnostic purposes Tend to be direct Used to test overall language ability
24Dimension Five How tests are scored (methods of
scoring).
- Objective
- if no judgment is required on the part of the
scorer, scoring is objective. - e.g., Multiple-choice
- Subjective
- e.g., Short essay questions
25Types of Tests Continued
- Subjective Tests
- Judgement is required
- There are degrees of subjectivity
- Reliable (objectified) subjective scoring is
possible through the use of a) precise rating
scales, b) multiple independent raters - Require special expertise in content area on the
part of the scorer
- Objective Tests
- Do not involve judgement
- Do not require special expertise in the content
area on the part of the scorer
26Dimension Six Test formats
27Dimension Seven What technology is used in test
administration
- Computer-based testing (CBT)
- Electronic equivalent of the traditional paper
and pencil based tests. Measurement of test
quality and student scores often use the
classical testing theory (which will be discussed
in this course). - Computer adaptive testing (CAT)
- The computer selects the next test item for the
student based on his/her response of the previous
item. It uses Item Response Theory.Â
28What to test
- Attitudes (toward FL study, TL, TL culture, TL
speakers, course etc.) - Motivation
- Beliefs (about how FLs are learnt, benefits of FL
learning) - Interests (topics, activities)
- Needs
- Personality variables (e.g. Intraversion vd.
Extraversion, creativity, self-confidence,
language ego etc.) - Learning styles and strategies (e.g. Field
dependence vs. F. independence) - Study habits
- Language ability ( language skills, language
areas, e.g. Grammar usage, vocabulary,
phonology)
29Why to test (Adapted from Ur, 96)
- Tests may be used as means to
- Give the teacher information about where the Ss
are at the moment, to help decide what they need
to teach next - Give the Ss information about what they know, so
that they also have an awareness of what they
need to learn or review - Assess current teaching (a final grade for the
course, selection) - Motivate Ss to learn or review specific
material - Get a noisy class to keep quiet and concentrate
- Provide a clear indication that the class has
reached a station in learning, such as the end
of a unit - Get Ss to make an effort (in doing the test
itself), which is likely to lead to better
results and a feeling of satisfaction - Provide Ss with a sense of achievement and
progress in their learning.
30Backwash
- The effect of testing on teaching and learning
(If a test is regarded as important, then
preparation for the test will dominate all
teaching and learning activities) - Teachers should know that the way they teach and
the way they test should go hand in hand. There
should be a strong fit between teaching and
testing
31- Beneficial backwash if the test is supportive of
good teaching practice, and exerts a corrective
influence on bad teaching.
32Achieving Beneficial Backwash
- Test the abilities whose development you want to
encourage - Sample widely and unpredictably (reduce the
guessing factor) - Use direct testing
- Base tests on objectives rather than on detailed
teaching and textbook content - Ensure that the test format is known and
understood by Ss and teachers (SS must be
familiar with task types).
33- Harmful/negative backwash if test content and
test techniques do not match with the course
objectives.
34Teaching factors
- Teachers narrow the curriculum.
- Teachers stop teaching.
- Teachers replace class textbooks with worksheets
identical to previous years.
35Course content factors
- Students being taught examination-ese
- Students practicing test-like items in format
to those on the test. - Students applying test-taking strategies in
class. - Students studying vocabualry and grammar rules.
36Course time factors
- Requesting additional test-preparation classes.
- Review sessions added to regular class hours.
- Skipping language classes to study for the test.
37 38- An official statement about what the test
measures and how it measures it. - Who needs it?
- Test writers
- Test users
- Test validators
39WRITING TEST SPECIFICATIONS
- 1- Statement of the Problem
- Determine the purpose of the test
- Proficiency
- placement
- diagnostic
- achievement,
- Determine the characteristics of test takers
- Age
- Language ability level
- Grade level
- Type of learner
40Writing test specifications
- 2. Content
- Grammar
- A complete list of structures
- Vocabulary
- A complete list of lexical items
- Language skills
41- Language skills
- Operations
- Types of text
- Addressee of texts
- Topics
- Length of texts
- Readability
- Structural range
- Vocabulary range
- Dialect, accent
42- 3. Blueprint
- Test structure
- Number of parts
- Sequence of parts
- Relative importance of parts
- Number of items per part
- Timing of each part
43Stages of Test Construction
- Item Specifications
- - Question/item types
- - Time
- - Weighting of items/questions
- - Instructions
- Language (native, target)
- Channel (aural, visual)
- Characteristics of input and expected response
- Channel (aural, visual)
- Language (native, target, both)
- Length
- Sample items
- Scoring method
- Procedures for scoring
- Criterial levels of performance
444. WRITING AND MODERATING ITEMS
- Sampling
- Sample widely for content validity
- Writing items
- Moderating items
- Get expert opinion
- Pretesting
- Pilot testing with a similar group
- Item analysis
- Reliability measures
45- E. ADMINISTERING THE TEST
- F. SCORING
- G. VALIDATION (statistical measures)
46Questions on Test Administration
- Before the test
- How far in advance do you announce the test?
- How much do you tell the class about what is
going to be in it, and about the criteria for
marking? - How much information do you need to give them
about the time, place, any limitations or rules? - Do you give them any tips about how best to
cope with the test format? - Do you expect them to prepare at home, or do you
give them some class time for preparation? - Giving the test
- How important is it for you yourself to
administer the test? - Assuming that you do, what do you say before
giving out the test papers? - Do you add anything when the papers have been
distributed but students have not yet started
work? - During the test, are you absolutely passive or
are you interacting with the students in any way? - After the test
- How long does it take for you to mark and return
the papers? - Do you go through them in class?
- Do you demand any follow-up work on the part of
the students?
47 48- Lecturer.Serkan GÃœRKAN
- serkan.gurkan_at_kou.edu.tr