Title: Instructional Tools in Educational Measurement and Statistics ITEMS for School Personnel:
1Instructional Tools in Educational Measurement
and Statistics (ITEMS) for School Personnel
Development and Evaluation of Three Web-Based
Training Modules
- Rebecca Zwick
- U.C. Santa Barbara
- Measured Progress
- August, 2007
2Overview of Presentation
- 1. What was the impetus for the project?
- 2. How is the project structured?
- 3. Whats in the modules, and how are statistical
- concepts presented?
- 4. How effective are the modules?
- 5. What have been the challenges and successes?
- 6. Clip from Module 3 Whats the Difference?
3What was the impetus for the project?
4In todays NCLB era
- Teachers and administrators are expected to use
test results to make decisions about instruction
and resource allocation and to explain results to
students, parents, the school board, and the
press. - Many educators have not received the measurement
and statistics training needed to use test scores
productively.
5Stiggins, Education Week, 2002
- only a few states explicitly require competence
in assessment as a condition for being licensed
to teach. No licensing examination now in place
verifies competence in assessment - almost no states require competence in assessment
for licensure as a principal or school
administrator at any level.
6Evidence from Preliminary Assessment Literacy
Survey (Brown Daw, 2004)
- Of 24 UCSB M.Ed./credential students, only
- 10 could choose correct definition of Z-score
- 10 could choose definition of measurement error
- Of 10 experienced teachers/ administrators, only
- 5 could choose the correct combined average when
told 20 students averaged 90 on an exam and 30
students averaged 40. - 1 could choose definition of measurement error
7Goal of ITEMS
- Create 3 25-minute Web-based modules to increase
the assessment literacy of K-12 educators by
teaching basic concepts in educational
measurement and statistics, as applied to test
score interpretation. - Assess effectiveness of modules
- Funded by National Science Foundation 2004-2008
82. How is the project structured?
9Who works on the project?
- Staff
- Rebecca Zwick, Project Director
- Jeff Sklar (Statistics Dept., Cal Poly,San Luis
Obispo), Senior Researcher - Alex Norman (Media Arts Technology, UCSB),
Technical Specialist - Cris Hamilton, Independent animator/ designer
- Pamela Yeagley (Education, UCSB), Project
Evaluator - Liz Alix (Education, UCSB), Project Administrator
10Advisory Committee
- Kevin Almeroth, Computer Science UCSB
- Beth Chance, Statistics Department, Cal Poly
- Willis Copeland, Education, UCSB
- Raya Feldman, Statistics, UCSB
- Mary Hegarty, Psychology, UCSB
- Richard Mayer, Psychology UCSB
- Tine Sloan, Acting Director, Teacher Ed, UCSB
- 4 administrators 2 teachers (local districts)
11Work cycleDevelop and evaluate 1 module per
year
- Fall Develop module
- Winter/spring - Collect data on module
effectiveness - Summer - Analyze data post module on our Website
with supplementary materials distribute
CDs/DVDs. - Modules 1 2 are posted Module 3 will be posted
soon.
12Module Administration and Evaluation
- On Website, participants view module take an
assessment literacy quiz tailored to its content. - Participants are randomly assigned to take quiz
either before or after viewing module. - Hypothesis mean score for Module-first
(treatment) group will be higher than mean for
Quiz-first (control) group. - Participants get 15 Borders (electronic) gift
card and can print out a personalized
completion certificate.
13SAMPLE QUIZ ITEM
14Later phases of data collection
- One-month follow-up Participants take quiz
again to check retention (another Borders card) - Participants respond to Web-based project
evaluation survey asking their opinions on the
module (no gift card!)
153. Whats in the modules?
- How are statistical concepts presented?
16Module Content
- Module 1 (2005) Whats the Score?
- -Test score distributions and their properties,
types of test scores, score interpretations - Module 2 (2006) What Test Scores Do and Dont
Tell Us - -Measurement error and sampling error
imprecision in individual and average test scores - Module 3 (2007) Whats the Difference?
- -Interpretation of test score trends and group
differences data aggregation issues
17Modules use cognitive psychology principles to
enhance learning
- Multimedia Present concepts using both words
and pictures (see Mayer, Multimedia learning,
2001) - Prior knowledge Use words and pictures that
invoke participants prior knowledge (Narayanan
Hegarty, 2002) use analogies, metaphors
(English, 1997) - Use conversational (informal) style
18Embedded questions (Modules 2 and 3)
- Each module segment includes a question designed
to allow participants to check their
understanding of the material. - If their answer is incorrect, theyre encouraged
to go back and view the segment again. - Found helpful by nearly all participants (Year 3)
- Example is in upcoming clip.
19Goals for Presentation of Technical Concepts
- Clear and accurate, but without formulas or
jargon - Based on realistic examples no abstractions.
- Engaging not just talking heads
- Decision Use animated characters
20EXAMPLES
21Module 1 How to explain distribution of test
scores?
- Show test papers being tossed into bins,
gradually forming a distribution. - Then discuss mean, median, SD, skewness of
distribution.
22Module 1 Test Score Distribution
23Module 1 Test Score Distribution
24Module 2 How to convey the idea of measurement
error?
- Multiple Edgars
- A child takes a test repeatedly .
- His brain is magically purged of his memory of
the test in between administrations. - For various reasons, he gets different scores
each time.
25Module 2 Measurement Error
26Module 2 Measurement Error
27Module 3 How to explain data aggregation
complexities and paradoxes?
- No abstractions!
- Use realistic and specific examples
- Performance for all student groups could
increase, but overall school performance
decreases (Simpsons paradox/ amalgamation
paradox)
28Simpsons Paradox Example
29Module 3 How to explain sampling error (of a
change in test score averages)?
- Especially complex in the case of NCLB-type
testing. - Models based on random sampling are not only hard
to explain, but dont apply! - Solution Show that the change in test score
averages is more sensitive to extreme values
when N is small.
30Later..
- A clip from Module 3
- Module 3 includes upgrades-professional animator,
actors, sound studio.
31How effective are the modules?
- Quiz Results
- Program Evaluation Results
- Informal Emails
32Quiz Results for Module 1 Evaluation (N113)
Average Number of Correct Responses (Out of 20
items)
33Quiz Results for Module 2 Evaluation (N 104)
Average Number of Correct Responses (Out of 16
items)
34Module 3 quiz results
- Major recruitment problems, N 23
- Module-first and quiz-first groups both scored an
average of 10.4 on a 14-item quiz. - Possible reason Only 4 of 23 were teacher ed
students. - Supplementary data analysis may occur - CSU
Fresno teacher ed students
35 One-month follow-up
- Quiz results tended to be the same or better at
one-month follow-up - However, follow-up samples are small (N 11, 38,
and 10 for the three years) and are not a random
subgroup of initial participants
36Conclusion on quiz outcomes
- Modules are probably most effective for those who
are new to the classroom. - We hope to encourage their use in teacher
education programs and in in-service training
programs for new teachers.
37Formal independent program evaluation
- Year 1 phone interviews and paper surveys on
presentation, content, impact - Years 2 and 3 Web-based surveys
- Responses to above were positive, but
participation rates were only 10-12.
38Formal program evaluation (continued)
- Comments entered in boxes during participation
were mixed - Some negative comments on navigational features
(later improved) and on animation - Comments on content and utility were favorable
39Sample of Email Comments Received
- Very helpful and right to the point. If I were
a building principal or a department chair today
all of the staff would go through this until
everyone really understood it. - I am inclined to recommend this as required
viewing for all new hires in our K-12 district,
and it certainly will be recommended for
inclusion in professional development on
assessment literacy. - I will be sharing this with my Assistant
Superintendent with the hope of promoting it as a
part of our new teacher induction process.
405. Project Challenges and Successes
41The big challenge publicity and recruitment
- Despite
- Ads in two educational magazines
- Personal contacts with school districts
- District participation on advisory committee
- Contacts with professional organizations
- Contacts with California State Dept. of Education
and other state organizations - Deans letter to 100 superintendents
- Website and blog postings
42Successes
- Automated system has facilitated administration
and evaluation of module module quality has
improved. - Quiz results show Modules 1 and 2 were effective,
mainly for teacher education students. - Participant comments indicated that modules were
found useful by many.
43The future
- Repackaging project?
- Redo modules with superior production values, as
in Module 3 professional animation,
professional actors, sound studio - Unify look and feel across the modules
- Work on mechanisms for disseminating as a package
44MORE INFORMATION??
- See http//items.education.ucsb.edu
- See Zwick, Sklar, Wakefield, Folsom,
Educational Measurement Issues and Practice, in
press. - Email us at
- rzwick_at_education.ucsb.edu OR
- items_at_education.ucsb.edu
45Disclaimer
- Any opinions, findings, and conclusions or
recommendations expressed in this material are
those of the author(s) and do not necessarily
reflect the views of the National Science
Foundation
46Clip from Module 3 Whats the Difference?
- Topic How the number of students affects the
interpretation of score trends - Context Press conference
- 2 reporters ask questions about a recent test
score release. - Superintendent Florence and 2 teachersStan, and
Normarespond.