Title: Construction and analysis of test
1ITEM ANALYSIS
2CONSTRUCTION of test analysis
3Characteristics of Good Test
- Validity
- It refers to the appropriateness or truthfulness
of a tool. A tool is valid if it measures what it
is supposed to measure. - Reliability
- It refers to the trustworthiness or consistency
of measurement of a tool , whatever it measures.
4- Objectivity
- Refers to the absence of subjective bias in the
interpretation of responses obtained by a tool. - Economy
- The test should be simple and administered in a
short time , saving money and time.
5- Practicability or Feasibility
- The test should not require special infra-
structure like dark room, one way see-through
room etc.
6Decision to gather evidence
â
Decision to allocate resources
â
Content analysis and test blue print
â
Item writing
â
Item review 1
â
Planning item scoring
â
Production of trial tests
â
Trials
â
Item review 2
â
Amendment (revise/replace/discard)
â
More items needed?
â
No
â
Assembly of final tests
7Trail Test
- It involves time and resources
- Prepare content analysis and blue print
- Review each item before trail testing
8Content Analysis
- What is the area of curriculum is selected?
- Are there significant sections in the content?
- Are there significant subdivisions in the
content? - Which of the representative areas should include ?
9Blue Print
- Title
- Fundamental purpose
- The aspects of curriculum covered
- For whom the test is constructed
- Time ,date, who will administer and who will
score - Weightage for recall , comprehensive and
reflective thinking
10Blue Print
content Recall comprehension Critical thinking Total
PROSE 2 ITEMS 2 ITEMS 5 ITEMS 9
POETRY 2 ITEMS 4 ITEMS 5 ITEMS 11
GRAMMER 2 ITEMS 4 ITEMS 12 ITEMS 18
CRITICISM 4 ITEMS 2 ITEMS --------------- 6
COMPARISIONS 4 ITEMS 2 ITEMS ------------------- 6
TOTAL 14 ITEMS 14 ITEMS 22 ITEMS 50
11Item Specification
content Recall comprehension Critical thinking Total
PROSE Items 2,5 Items 12,23 Items 28 ,31,32,40 ,50 9
POETRY Items 6,10 Items 13,14,16,17 Items 33,36,37,38.39 11
GRAMMER Items 1,7 Items 18,19,20,21 Items 21,29,30,41,42,43,44,45,46,47,48,49 18
CRITICISM Items 3,4,8,9 Items 34,35 --------------- 6
COMPARISIONS Items 11,15,22,25 Items 26,27 ------------------- 6
TOTAL 14 ITEMS 14 ITEMS 22 ITEMS 50
12Scoring Key
1 2 3 4 5 6 7 8 9 10
2 5 1 2 3 4 1 4 5 3
13Item Revision-1
- The dependable inferences can be made about the
choice of the content - All important parts of curriculum is addressed
- Achievement over the range is assessed
14How to review?
- Is the item is clear in expression ?
- Are the items expressed in a simplest possible
language ? - Are there unintended clues to correct answer?
- Is the format reasonably consistent?
- Is there a single, clearly correct answer for
each item ? - Is the type of item appropriate to the
information required ? - Are there enough items to provide adequate
coverage to behaviour to be assessed ?
15Purpose of Trail Test
- Establishes the difficulty of each item
- Identify the distracters which do not appear
plausible. - Suggest number of items to be included in the
final test - Establishing the contribution of each item to the
discrimination between candidates who achieve low
and high. - Check the adequacy of the administration
instructions to identify misconceptions held by
the students through analysis of their responses. -
16Choosing a Sample
- Sample of 100 to 150 students of varied abilities
may be selected - Approximately male and female students are equal
- Judgment Sampling technique- Target group
17Try out of the Test
- The test to be administered on a representative
sample , chosen from the target population for
whom the test is intended , and scored . This
pilot study will be useful for the following - To identify the weak or defective item and to
reveal needed improvements. - To determine the difficulty level and
discriminating power of each individual item in
order that a selection of item may be made.
18- To provide data needed to determine appropriate
time limit for the final test. - To standardize the instruction and procedures.
- To know how to organize the items.
- To decide the proper format.
19Scoring of Trail Test
- Needs training
- Not according to the scorers' judgment
- Refer to scoring key
- Mechanical scoring is recommended to maintain
accuracy
20Scores in the Matrix
Item GEET RAI RAJU RANI SURI POO RITA JOE CATH RUTH Total
1 1 1 1 1 1 0 1 0 0 1 7
2 1 0 0 1 0 1 0 0 0 0 3
3 1 1 1 1 1 1 1 1 0 0 8
4 1 1 1 0 1 0 1 1 1 0 7
5 1 1 1 1 1 1 1 1 1 1 10
6 1 1 0 0 1 1 0 0 1 0 5
7 1 1 1 1 0 1 0 1 0 0 6
8 1 0 1 0 0 0 0 1 0 0 3
9 1 0 0 0 0 1 0 0 0 0 2
10 1 1 1 1 1 0 1 0 0 0 6
Total 10 7 7 6 6 6 5 5 3 2 57
21Arranging Pupil
- After scoring the test in the trial test ,
according to the total score value , individuals
are placed in order from high to low .
22Arranging Pupils' Scores
Item GEET RAI RAJU RANI SURI POO RITA JOE CATH RUT Total
5 1 1 1 1 1 1 1 1 1 1 10
3 1 1 1 1 1 1 1 1 0 0 8
1 1 1 1 1 1 0 1 0 0 1 7
4 1 1 1 0 1 0 1 1 1 0 7
7 1 1 1 1 0 1 0 1 0 0 6
10 1 1 1 1 1 0 1 0 0 0 6
6 1 1 0 0 1 1 0 0 1 0 5
2 1 0 0 1 0 1 0 0 0 0 3
8 1 0 1 0 0 0 0 1 0 0 3
9 1 0 0 0 0 1 0 0 0 0 2
Total 10 7 7 6 6 6 5 5 3 2 57
23Indices of difficulty and discriminating power of
items
- Top 27 constitutes the high achievers and the
bottom 27 constitutes the low achieving group. - The indices of discriminating power and
difficulty level are computed for each item of
the test using the following formulae.
24Analysis of an Item
I 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
0 1 0 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1
25- Discriminating power Ph-Pl
- U
- Difficulty level (Ph Pl )
- U
- Ph the proportion of pupils in the high
achieving group who answered the items correctly. - Pl the proportion of pupils in the low achieving
group who answered the items correctly. - UTotal number of pupils in both groups
26Types of Discriminators
- Positive Discriminator
- Negative Discriminator
- Non Discriminator
27Graphical Analysis of Scores
- Acceptable may be acceptable correct answer
response pattern. - Non acceptable correct answer response pattern.
28Criteria for Selection
Discriminating Power Difficulty Level
.4 above Excellent item Between .4 .6 Average difficulty
Between .4 .3 Good Between .2 .4 Difficult item
Between .2 .3 Average item Between .6 .8 Easy item
Between .2 .1 Requires improvement Between .8 1 Very easy item
Less than .1 Item to be dropped Between 0 .2 Very difficult item
29Is this a good item ?
- Compute the difficulty and discrimination indices
for an item administered to 263 pupils where 74
pupils answered the item correctly, 32 pupils in
upper group and 23 pupil in the lower group
passed the item. - Is this a good item ?
30Is this a good item ?
- Compute the difficulty and discrimination indices
of a test item administered to 84 pupils if 52
test takers answered the item correctly, 20 in
the upper group and 12 in the lower group. - Is this a good item ?
31Selection of Items
- Based on the calculated values of item
discrepancy and difficulty , appropriate items
are chosen for the final form of the standardized
test. - Arranged the items in the increasing order of
difficulty.
32Assembly of the test in the final form
- Based upon discriminating power items are first
chosen and among the so chosen items, items with
proper difficulty level are finally selected for
the final form. - Care should be taken to see that at least 50 of
the items are of average difficulty, 25 are easy
, 20 difficult and 5 are very difficult.
33- A detailed scoring scheme is also to be prepared
- so as to ensure objective evaluation of pupil
responses. - Appropriate instruction/procedure for
administering the test has also to be developed
and incorporated suitably in the test.
34Advantages of Item Analysis
- Powerful technique to improve instruction.
- Helpful for guidance.
- Valid measures of instructional objectives.
- Gives clue to the nature of the misunderstanding
and suggests remediation.
35Reliability
- Stability and trustworthiness is called
reliability. - It should be free from error.
- (E.G.) Standford Binets I.Q.
- The score is a good estimate of the childs
mental ability.
36Methods of determining Reliability
- Four procedure for computing reliability
coefficient. - Test Retest method
- Alternative or Parallel form
- Split half technique
- Rational Equivalence
37Test Retest Method
- Repetition of the test is the most simplest
method of determining agreement between two sets
of scores. - The test is given and repeated on the same group
and the correlation computed between the first
and second set of scores.
38Defects in Test Retest method
- If the test is repeated immediately, many
subjects will recall their first answer- tend to
increase their scores. - Practice and confidence induced by familiarity
also affect scores. - If the interval is longer ( six month) growth
changes will effect the retest. - Because of these defects test retest is generally
less useful than are the other methods.
39Alternative or Parallel form method
- When alternative or parallel forms of a test can
be constructed , the correlation between form A
and form B may be taken as a measure of the self
correlation of the test. - The alternative form method is satisfactory when
sufficient time has intervened between the
administration of the two forms to weaken or
eliminate memory and practice effects.
40- When form B of a test follows form A closely ,
scores on the second form of the test will often
be increased because of familiarity. - If such increases are approximately constant
(3 to 5 points) the reliability coefficient of
the test will not be affected, since the paired A
and B scores maintain the same relative positions
in the two distributions.
41- In drawing up alternative test forms ,care must
be exercised to match test materials for content,
difficulty and form. - When alternative forms are virtually identical ,
reliability will be too high otherwise
reliability will be too low. - An interval of at least two to four weeks should
be allowed between administration of the test.
42The split half method
- In this method the test is first divided into two
equivalent haves and the correlation found for
these half tests . - From the reliability of the half test the self
correlation of the whole test is then estimated
by the Spearman Brown Prophecy formula.
43- The split half method is regarded by many as the
best of the methods for measuring test
reliability.
44- Advantage
- Advantage is the fact that all data for computing
reliability are obtained upon one occasion. So
that variations brought about by difference
between the two testing situations are
eliminated.
45- How to divide ?
- Alternative Statements
- All the items are of equal difficulty
46Method of Rational Equivalence
- This method represents an attempt to get an
estimate of the reliability of a test free from
the objections raised against the methods
outlined above. - Two forms of tests are equivalent when the items
a A , b B ,c C etc are inter changeable and when
the inter item correlations are the same for both
forms.
47Errors
- Chance Error
- Many psychological factors affect the
reliability coefficient of a test fluctuations
in interest and attention shifts in emotional
attitude and differential effects of memory and
practice. - The environmental factors such as distractions,
noise , interruptions, scoring errors etc all
these are called chance error or error of
measurement - The scores may go up or down from the true value.
48- Constant Errors
- Constant errors work in only one direction .
Constant error raise or lower all of the scores
on a test but doesn't affect the reliability
coefficient. - Such errors are easily be avoided than are chance
errors by subtracting two points from a retest
score to allow for practice.
49Validity
- The validity of a test or of any measuring
instrument , depends upon the fidelity with which
it measures , what it purports to measure. - A test is valid when the performances which it
measures correspond to the same performances as
otherwise independently measured or objectively
defined.
50Difference between Reliability and Validity
- Suppose that a clock is set forward 20 minutes ,
if the clock is a good time piece the time it
tells will be reliable(consistent) but will not
be valid as judged by standard time. - Validity is a relative term.
51- A test is valid for a particular purpose or in a
particular situation it is not generally valid.
52- Content Validity
- This requires content analysis. Validity inferred
by subject experts after going through the test
items and giving their opinions to what extent
the test items forms a fair representative sample
of the universe of items that could be , form the
content areas being tested.
53- Construct Validity
- This is the functional aspect of content
validity. - Suppose the test is to measure the creative
writing of students , then the items should cover
the creative expression only. - A well known test on creative expression , as
well as the newly constructed creative expression
test both are administered to a group of students
for whom it is meant. - The coefficient correlation computed for the
scores from the two tests is an index of validity
of the newly constructed test.
54- Predictive Validity
- It is concerned with the relation of test scores
to some measures on future performance. - If scores on a spelling test help us to
differentiate between pupils who will succeed and
pupil who fail in stenography course, then we can
infer that the spelling test has predictive
validity as far as stenography is concerned. - This type of validity is mainly useful in
evaluating aptitude tests.
55Relations of Validity and Reliability
- They differ to different aspects for test
efficiency. - A reliable test is theoretically valid ,but may
be practically invalid , as judged by its
correlations with various independent criteria. - A highly valid test cannot be unreliable since
its correlation with a criterion is limited by
its own index of reliability.
56- Want to have a best choice
- then
- ANALYSE AND CHOOSE
57(No Transcript)
58THANKS FOR MAKING ME HAPPY