Title: Research
1Research
- Dr. Terry Ackerman
- Chair, Department of Educational
- Research Methodology
- University of North Carolina at Greensboro
2Objectives of this Session In this session I
will walk you through a basic and thorough item
and test analysis to help you get a better
understanding of what the results mean and
provide you with suggestions about how this
information can be recycled to improve classroom
instruction
3- Topics that will be covered
- Item/Test analysis using ITEMAL
- Comparison of Groups using SPSS
4An Overview of the ITEMAL5
5- When would you use this program?
- ITEMAL5 is an item analysis program designed to
provide the researcher or practitioner with
psychometric information about the quality of
individual items. This program is designed to
analyze assessment data in which each item can be
scored as correct or incorrect. Once the output
is obtained the researcher or practitioner should
go through the statistics for each item in
concert with a substantive review of the actual
item to decide if that item is appropriate for
the assessment.
6- Why would you use this program?
- Anytime you administer an assessment you must
perform an item analysis and evaluate both the
reliability and validity of the responses. This
should be done before you do any statistical
analysis with the results. Researchers should
always include a discussion of the reliability
and validity of their data as part of the
descriptive analysis before discussing any
statistical results. Educators can examine which
incorrect answers were most often chosen to help
identify errors in students reasoning.
7- How could you use this program?
- If you have identifiable sub groups (e.g.,
gender, ethnicity, SES) you should probably run
the program for each group separate as well as
combined to see if there are any noticeable
differences. Remember, the item and test
statistics will change if you add or drop
examinees or add or drop items. - If an item has poor psychometric properties
(e.g., low point biserials, is too easy or too
difficult or correlates negatively with other
items) you need to carefully examine that item
and decide if you want to retain it. If you
decide to delete it then you would have to rerun
the analysis.
8- How would you run this program?
- ITEMAL5 is an item analysis program designed to
read two input files the file containing your
response data and a key file. Both files must be
text files. It is easiest to create these in
either Notepad or Wordpad. The program will read
in the key, score and analyze the data and then
put the output into a file that you provide the
name for.
9- DATA File
- The data file contains a row of item responses
for each examinee. If you had 36 examinees you
would have 36 rows. The length of each row should
equal the number of items. Convert each response
to an item to an integer using the same scheme
that is used in the key, (e.g., a 1, b 2, c
3, etc.) If an examinee does not answer an item
give them a zero for that item. Thus, if an
examinee had the responses abbcade_ab you would
enter 1223145012.
10- EXAMPLE INPUT RESPONSE FILE
-
- DanC111323
- MarK133142
- SalM142322
- JohP424232
- AdrT513132
ID
Item responses
These data would be read in with the
following format (4x,6i1)
11- KEY File
- The key file is simply a row of integers denoting
the correct responses. One integer for each item
(i.e., if you have 20 items your row would be 20
values long.) Use the following scheme a 1, b
2, c 3, d 4, e 5. Your key must be made up
of values from 1 to 5.
12- The user is prompted to answer the following
questions - What will appear on Type of Response Input
- the screen by the user
-
- Enter the file name containing item responses
mydata.txt -
- Enter response file format (e.g. (30I1)) (5i1)
-
- Enter the file name containing the key
mykey.txt -
- Enter key file format (e.g. (30I1)) (5i1)
13- The user is prompted to answer the following
questions - What will appear on Type of Response Input
- the screen by the user
-
- Enter the number of items 5
-
- Enter the number of people 15
-
- Enter in the output file name output.txt
14- How do you interpret the output?
15The key is printed out so that the user can check
to make sure the program is reading the key file
correctly.
16-
- Response Vectors for the
- First Five Examinees
-
-
- 22234
- 23221
- 22211
- 40121
- 23231
The response vectors for the first five examinees
are printed out so that the user can make sure
that the program is reading the dataset properly.
17 ITEM 1 0
1 2 3 4 5 PBIS .5817 UPP N
3 0 3 0 0 0 0 BIS
.9450 MID N 3 0 3 0 0 0
0 DIFF .8750 LOW N 2 1 1 0
0 0 0 IREL .5090 TOT N 8
1 7 0 0 0 0 DIS .4000
For skewed or nonnormal distributions the
frequencies may not be symmetrical
Subjects are divided into three groups based
upon their total score.
Middle Group Middle 46
Lower Group Bottom 27
Upper Group Top 27
Total score
18For each item the possible responses are listed.
There is a maximum of 5 categories. Note
that the categories are numerical and are
designed to represent the choices in a multiple
choice item a1, b2,c3 etc. 0no response.
The asterisked value() represents the correct
answer.
-
- Individual Item Statistics
-
-
- ITEM 1 0 1 2
3 4 5 - PBIS .8427 UPP N 9 0 0 9
0 0 0 - BIS 1.1392 MID N 1 0 0 1
0 0 0 - DIFF .7333 LOW N 5 0 1 1
1 2 0 - IREL .3727 TOT N 15 0 1 11
1 2 0 - DIS -.2000
19The program determines the number of examinees
from each group that selected each alternative.
You would expect more subjects from the Upper
group to have the largest number of people
selecting the correct alternative. Note in the
Upper group below all 9 selected 2 or b. In the
Lower group of the 5 examinees, one selected 1
(a) one selected 2 (b) one selected 3 (c) and
two selected 4 (d).
-
- Individual Item Statistics
-
-
- ITEM 1 0 1 2
3 4 5 - PBIS .8427 UPP N 9 0 0 9
0 0 0 - BIS 1.1392 MID N 1 0 0 1
0 0 0 - DIFF .7333 LOW N 5 0 1 1
1 2 0 - IREL .3727 TOT N 15 0 1 11
1 2 0 - DIS .8000
The total number of subjects selecting each
alternative is listed in the bottom row.
20The PBIS is the point biserial. It is the
Pearson correlation between the scored responses
for an item and the total test score. It is a
measure of discrimination. Acceptable values are
between .3 and .8. Values less than .2 you
probably should edit.
Items Ss 1 2 3 4 5 Tot 1
1 0 1 1 0 3 2 0 0 0 1 0 1 3
1 1 0 0 0 2 4 1 1 1 1 0 4
-
- Individual Item Statistics
-
-
- ITEM 1 0 1 2
3 4 5 - PBIS .8427 UPP N 9 0 0 9
0 0 0 - BIS 1.1392 MID N 1 0 0 1
0 0 0 - DIFF .7333 LOW N 5 0 1 1
1 2 0 - IREL .3727 TOT N 15 0 1 11
1 2 0 - DIS .8000
21The BIS is the biserial. It is a theoretically
computed measure of discrimination. It is always
larger than the PBIS. Acceptable values are
between .3 and .8.
-
- Individual Item Statistics
-
-
- ITEM 1 0 1 2
3 4 5 - PBIS .8427 UPP N 9 0 0 9
0 0 0 - BIS 1.1392 MID N 1 0 0 1
0 0 0 - DIFF .7333 LOW N 5 0 1 1
1 2 0 - IREL .3727 TOT N 15 0 1 11
1 2 0 - DIS -.2000
22The DIFF is the p-value or proportion correct.
Acceptable values are between .3 and .8. The
larger the value the easier the item. It is
equal to the total number of correct answers
divided by the total number of subjects.
-
- Individual Item Statistics
-
-
- ITEM 1 0 1 2
3 4 5 - PBIS .8427 UPP N 9 0 0 9
0 0 0 - BIS 1.1392 MID N 1 0 0 1
0 0 0 - DIFF .7333 LOW N 5 0 1 1
1 2 0 - IREL .3727 TOT N 15 0 1 11
1 2 0 - DIS -.2000
11/15.7333
23The IREL is the item reliability. It is the
standard deviation of the scored responses of
the item times the point biserial. The item
standard deviation is
-
- Individual Item Statistics
-
-
- ITEM 1 0 1 2
3 4 5 - PBIS .8427 UPP N 9 0 0 9
0 0 0 - BIS 1.1392 MID N 1 0 0 1
0 0 0 - DIFF .7333 LOW N 5 0 1 1
1 2 0 - IREL .3727 TOT N 15 0 1 11
1 2 0 - DIS -.2000
24The DIS is another measure of item discrimination
called the item discrimination index. It is the
calculated as
-
- Individual Item Statistics
-
-
- ITEM 1 0 1 2
3 4 5 - PBIS .8427 UPP N 9 0 0 9
0 0 0 - BIS 1.1392 MID N 1 0 0 1
0 0 0 - DIFF .7333 LOW N 5 0 1 1
1 2 0 - IREL .3727 TOT N 15 0 1 11
1 2 0 - DIS .8000
25The Inter-item Correlations are the Pearson
correlations between the scored responses for
each pair of items. Items which correlate
negatively with other items cause the reliability
to decrease. These items need to be reviewed
substantively to see if they are measuring the
same skills as the other items.
Items Ss 1 2 3 4 5 Tot 1 1 0 1 1
0 3 2 0 0 0 1 0 1 3 1 1 0 0
0 2 4 1 1 1 1 0 4
-
-
- Inter-item Correlations
-
-
- 1 2 3 4 5
- ITEM 1 1.00 .36 .66 .49
.30 - ITEM 2 .36 1.00 .36 .43
-.30 - ITEM 3 .66 .36 1.00 .49
.30 - ITEM 4 .49 .43 .49 1.00
-.07 - ITEM 5 .30 -.30 .30 -.07 1.00
26- OVERALL TEST SCORE DISTRIBUTION
-
- CUMULATIVE
- SCORE FREQUENCY FREQUENCY
- 0 3 3
- 1 2 5
- 2 1 6
- 3 5 11
- 4 4 15
- 5 0 15
The Frequency and Cumulative Frequency
distributions are based upon the number correct
scores for all examinees
27The Test Statistics represent the first four
moments of the raw scores. The KR20 value
represents the internal consistency measure of
reliability.
- Test Statistics
-
-
- MEAN 2.333
- STANDARD DEVIATION 1.491
- SKEWNESS -.461
- KURTOSIS -1.284
- KR20 .695
Remember, negative skewness means you have more
high scores than low (i.e., the test was not very
difficult).
KR-20 is an estimate of internal consistency
reliability
28Introduction to SPSS (Statistical package for the
Social Sciences)
- SPSS is a complete user-friendly software that
will allow researchers to do both descriptive and
inferential univariate and multivariate
statistical analysis.
29 Spearman-Brown formula a. A formula used to
estimate how much a test's reliability
will increase (or decrease when the length
of the test in increased (or decreased) by
adding (or removing) parallel items b.
Formula where L the number of
times longer the new test will be
30Example of the Spearman-Brown FormulaGiven a
20-item test with 1. What would be the
reliability if the test length were tripled
(i.e., made three times longer by adding 40
parallel items)
31What would the reliability be if I reduced the
test length to only 10 items?
32How much longer (i.e., number of times longer)
would I have to make the test to achieve
? is what you have, is what you
want
33 f. Standard error of measurement - (SEM) 1.
The formula to calculate the SEM is given
as 2. Confidence intervals
(bands) CI Observed Score 1.0(SEM)
34 g. Estimating true scores from observed
scores 1. Estimated true score where
2. Estimated true scores are always
closer to the mean test score than the
obtained test score because all tests
contain measurement error
35Make sure your variable columns are labeled
SPSS ANOVA Handout
For one-way ANOVA you put your data into two
columns one for grouping (i.e., levels of the
independent variable) and one for values of
the dependent variable.
Note Subject 4 belongs to Group 2 and scored
9 on the dependent variable.
These are the data used in Example 1 for Chapter
11
36To perform the ANOVA, after the data are
entered click on STATISTICS (v.8) or ANALYZE
(v.9,10) COMPARE MEANS, ONE-WAY ANOVA.
37 Dependent variables are highlighted and brought
into the Dependent List.
Independent variables are highlighted and brought
into the FACTOR box.
OPTIONS includes the computation of means and
standard deviations for each of the levels of
the independent variable.
To have Post Hoc tests performed click on POST
HOC. These are discussed in Chapter 12.
Apriori identified contrasts can be performed.
38Always ask for the Descriptive statistics
(means, standard deviations) to be calculated
Click on Continue to return to the ANOVA box
To perform Levenes test for the Homogeneity of
Variance assumption click here.
If you click on OPTIONS this box will appear.
If a subjects data are missing that subject or
case can be deleted from the entire
analysis (TOP choice) or that subject can
be deleted from only the levels for which data is
missing (BOTTOM choice).
The means for each level of the
independent variable can be plotted.
39This is the box that appears when you click on
Post Hoc.
These post hoc tests apply only when the
homogeneity of variance assumption is met. These
tests are discussed in Chapter 12.
These post hoc tests apply when the
assumption for homogeneity of variance assumption
(Levenes Test) is NOT met.
Click on CONTINUE to return to the main ANOVA
box.
This is where you select the alpha level for the
post hoc tests.
40These are the Means, Standard Deviations, Standard
Errors, 95 Confidence Intervals, and Minimum
and Maximum values for each level of the
independent variable.
If this Significance value is less than .05,
then the assumption of homogeneity of variance is
NOT met.
Levenes test of Homogeneity of Variance is an
F-test and thus has two degrees of freedom.
41The Treatment effect corresponds to Between
Groups and the Error corresponds to Within Groups.
The Significance value represents the proportion
of area in the F-distribution from the F-value
on out. The Fcrit is based on two degrees of
freedom values, dfB and dfW.
The F ratio equals MSB/MSW
Sum of Squares, SS, refers to the numerator of
the variance. SST SSBSSW
Mean Square values are variances. They
equal the Sum of Squares values divided by the
degrees of freedom
df refers to your Degrees of Freedom dfT dfB
dfW
42The Y-axis is for the dependent variable (i.e.,in
terms of the mean metric)
The graph shows the mean values of your
dependent variable for each level of independent
variable.
The X-axis is for the independent variable and
lists the values for the three groups.
43(No Transcript)