Title: Measurement and Evaluation of Science Teaching
1Measurement and Evaluation of Science Teaching
- Xiufeng Liu, PhD
- Department of Learning and Instruction
- University at Buffalo, SUNY
- E-mail xliu5_at_buffalo.edu
2Measurement and Evaluation
- What may happen in measurement and evaluation of
science teaching? - We have never studied the content of these
questions. - This test is too easy. I can answer them by
closing my eyes. - I scored 65 on the test, but I still failed the
test, which doesnt make sense to me.
3What is all about?
4(No Transcript)
5Key to Good Assessment
- Assessment
- (Measurement Evaluation)
- alignment
- Curriculum Instruction
6Necessary Conditions for Good Assessment
- Planning
- Developing
- Administering
- Scoring
- Analyzing
- Grading
7Test Planning Test Grid
8Developing multiple-choice (MC) questions
better
poor
9Developing MC questions
poor
better
10Ways to make choices plausible
- 1. Using students common misconceptions or
errors - 2. Use the textbook language or other phraseology
that has the appearance of truth - 3. Use distracters that are parallel in form and
grammatically consistent with the items stem - 4. Make the distracters similar to the correct
answer in length, vocabulary, sentence structure,
and complexity of thought
11Developing MC questions
poor
better
12Developing MC questions
poor
better
13Developing MC questions
better
poor
14Summary Checklist for good MC questions
- The stem presents a clear problem
- The stem is stated as a question
- The choices are equally plausible
- The choices are in alpha-numerical or other
logical order - The choices are consistent in length and contain
no extraneous clues - The choices contain only one best or correct
answer - None of the above or all of the above choices
are avoided
15Advantages and Limitations of M-C Questions
- Advantages
- Easy to score
- Objective to score
- Large coverage
- Good at assessing specific knowledge and
understanding or lower order thinking skills
(LOWS) - Incorrect answers provide valuable information on
students learning difficulties
- Limitations
- Limited in assessing higher order thinking skills
(HOTS) - Guessing
- Reading comprehension
- Time consuming to write good M-C questions
16How to measure higher order thinking skills (HOTS)
- First of all You dont have to use MC to assess
HOTS there are many other question formats that
assess HOTS better than MC does - Understand the difference among different
cognitive levels - Use combinations of question formats
- Develop appropriate multiple-choice questions
17Lower Order Thinking Skills (LOTS)
- Remember recognize (identify) , recall
(retrieve) - Understand interpret (clarify, paraphrase,
represent, translate) , exemplify (illustrate,
instantiate), classify (categorize, instantiate),
summarize (abstract, generalize), infer
(conclude, extrapolate, interpolate, predict),
compare (contrast, map, match), explain
(construct, model) - Apply execute (carry out), implement (use)
18Higher Order Thinking Skills (HOTS)
- Analyze differentiate (discriminate,
distinguish, focus, select), organize (find
coherence, integrate, outline, parse, structure),
attribute (deconstruct) - Evaluate check (coordinate, detect, monitor,
test), critique (judge) - Create generate (hypothesize), plan (design),
Produce (construct)
19Using combinations of question formats M-C M-C
- After a large ice-cube has melted in a beaker of
water, how will the water level change? - a. higher
- b. lower
- c. the same
- After a large ice-cube has melted in a beaker of
water, how will the water level change? - a. higher
- b. lower
- c. the same
- Why do you think so? Choose all that apply.
- a. The mass of water displaced is equal to the
mass of the ice. - b. Ice has more volume than water.
- c. Water is denser than ice.
- d. Ice cube decreases the temperature of water.
- e. Water molecules in water occupy more space
than in ice.
analyze
understand
20Using combinations of question formats M-C
Constructed Response
- After a large ice-cube has melted in a beaker of
water, how will the water level change? - a. higher
- b. lower
- c. the same
- After a large ice-cube has melted in a beaker of
water, how will the water level change? - a. higher
- b. lower
- c. the same
- Why do you think so? Please justify your choice
understand
analyze
21Using combinations of question formats
Performance M-C
- Using the materials provided at your table,
create a model of the human heart. You should
use the blue and red play-doh to represent
de-oxygenated and oxygenated blood. Be sure to
create and label the following - Left Atrium (2 pts.)
- Right Atrium (2 pts.)
- Left Ventrical (2 pts.)
- Right Ventrical (2 pts.)
- Aorta (2 pts.)
- Pulmonary Vein (2 pts.)
- Pulmonary Artery (2 pts.)
- Using the materials provided at your table,
create a model of the human heart. You should
use the blue and red play-doh to represent
de-oxygenated and oxygenated blood. Be sure to
create and label the following - Same as the left
- In the heart, the mixing of oxygen-rich and
oxygen-poor blood is prevented by the - a.mitral valve
- b.tricuspid valve
- c.septum
- d.pericardium.
Create
Understand/Create
22Developing appropriate M-C questions for HOTS
- 1. Providing a factual statement, ask students to
analyze. - The Sun is the only body in our solar system that
gives off large amounts of light and heat. Why
can we see the Moon? - A. It is nearer the earth than the Sun
- B. It is reflecting light from the Sun
- C. It is the biggest object in the solar system
- D. It is without an atmosphere
23Developing appropriate M-C questions for HOTS
- 2. Providing a diagram, ask students to identify
elements - In the cell on the right, what letter correctly
identifies the portion that first receives a
signal - a. A
- b. B
- c. C.
- d. D
- e. E
-
24Developing appropriate M-C questions for HOTS
- 3. Providing data, ask students to develop a
hypothesis - Amounts of oxygen produced in a pound at
different depths are shown below - Location Oxygen
- Top meter 4 g/m3
- Second meter 3g/m3
- Third meter 1g/m3
- Bottom meter 0g/m3
- Which statement is a reasonable hypothesis based
on the data in the table? - A. More oxygen production occurs near the surface
because there is more light there. - B. More oxygen production occurs near the bottom
because there are more plants there - C. The greater the water pressure, the more
oxygen production occurs - D. The rate of oxygen production is not related
to depth.
25Developing appropriate M-C questions for HOTS
- 4. Providing a statement, ask students to
evaluate its validity - The crews of two boats at sea can communicate
with each other by shouting to each other, so are
crews of two close-by spaceships in the space.
How valid is this statement? - A. Valid
- B. Partially valid
- C. Invalid
- D. Not enough information to make a judgment
26(No Transcript)
27 28Developing constructed response questions for
assessing HOTS
- Short constructed response (SCR) questions
require answers ranging from one word to a few
sentences. - Extended constructed response (ECR) questions
require students to write a few sentences or a
short paragraph. - Essay (E) questions require students to write a
few paragraphs to a few pages.
29General guidelines for writing constructed-respons
e questions
- 1. Define the task completely and specifically.
- Poor State whether you think pesticide should be
used in farms. -
- Better State the environmental effects of
pesticide use in farms
30Avoid ambiguous words
- Possible student interpretations of the word
discuss - Explain in my own words, maybe with an
introduction, something in the middle and a
conclusion - Analyze in length
- Present analogies and comparisons
- Tell all I know as much as possible
- Put down facts
-
31General guidelines for writing constructed-respons
e questions
- 2. Give explicit directions such as the length,
grading guideline, and time to complete. - Poor State whether you think pesticide should be
used in farms. - Better State whether you think pesticide should
be used in farms. Defend your position as
follows - a. Identify any positive benefits associated with
pesticide use. - b. Identify any negative effects associated with
pesticide use. - c. Compare positive benefits against negative
effects. - d. Suggest if better alternatives than pesticide
are available. - Your essay should be in no more than 2
double-spaced pages. Two of the points will be
used to evaluate the sentence structure,
punctuation, and spelling. (10 points).
32General guidelines for writing constructed-respons
e questions
- 3. Do not provide optional questions for students
to choose - Because different questions may measure
completely different constructs, which makes
comparisons among students difficult
33General guidelines for writing constructed-respons
e questions
- 4. Define scoring clearly and appropriately
scoring rubric - Analytic vs Holistic
34Holistic Scoring Rubric
35Analytic Scoring Rubric
36Holistic vs Analytic
- Holistic
- Easy to construct
- Efficient to score
- Clear implication
- Vague feedback
- Less informative for students to answer the
question
- Analytic
- Time consuming to construct
- Time consuming to score
- Unclear implication
- Specific feedback
- Informative for students to answer the question
37Guidelines for scoring essay questions
- Essays are scored anonymously
- Essays are scored question by question across
students - Each essay is graded twice independently to
ensure consistency/reliability - Appropriate scoring rubrics are developed and
applied consistently
38Multiple Faculty, TAs
- Common curriculum
- Common learning opportunities
- Develop and agree on a common test grid
- Same scoring rubrics
- Consider item banking
39Developing vs. Adopting
- There are many standardized tests or item banks
(e.g. http//www.flaguide.org/) - Standardized tests have established validity,
reliability, and absence of bias - The key is the match between the test coverage
and the curriculum/instruction
40Necessary conditions for good assessment
- Planning
- Developing
- Administering
- Scoring
- Analyzing
- Grading
41Administering tests
- Order questions easy to difficult SRC questions
first, SCR questions next, and ECR questions at
last - Give complete instructions before students begin
test purpose, time allowance, basis for
responding, methods of recording, appropriateness
of guessing - Use equivalent forms or different item orders and
recording sheets to avoid cheating - Ensure adequate physical setting
- Avoid unnecessary interaction with students
- Start and end the test at the same time
42Scoring tests
- Hand scoring vs. optical scanning
- Need to establish inter-rater reliability for
constructed response questions - Correct for guessing when appropriate (e.g.
speeded) - Corrected Score R W/(n-1)
- Correct for cheating
- Harpp-Hogan index (H-H) EEIC/D
- EEIC is exact errors in common
- D is number of different responses
-
43Item and test analysis
- Item analysis item response patterns, item
difficulty, item discrimination, etc. - Test analysis reliability (?), criterion related
validity, bias, prediction related validity, etc.
44Grading
- Lake Wobegon Effect
- In 1988 it was reported that 70 of the
students, 90 of the 15,000 school districts, and
50 states in US were scoring above the national
norms on norm-referenced achievement tests in
elementary schools (Cannell, 1988)
45Criterion-referenced grading based on standards
- Commonly used standards pass/fail, A/B/C/F,
- The essential part of standard setting is to
decide a cut-off score
46Deciding the cut-off score M-C test
- If number of test questions are more than 20, and
the ?0 is within the range of .50 to .80, the
approximate X can be calculated as follows - X (n-?)/ ? ?0 (?-1)/ ? M .5
- M is the mean score on the test, ? is test
reliability, n is total number of questions - ?0 is a true cut-off score if measurement
quality is perfect - X is the approximate cut-off score given the
measurement error (?)
47Example (n 28, ?0 .75, X021)
48Norm-referenced grading or curving grading
49Norm-referenced vs. criterion-reference
- Number of students
- Characteristics of students
- Purpose of testing
- Use of testing results
- Quality of tests
50Other grading issues
- 1.Components of Grades (achievement, efforts,
attitude) - 2. Combining Scores for the Final Grade (equate
before weight before aggregate) - 3. Translating Final Grades to Letter Grades
(pre-determined scheme) - 4. Reporting Grades (clear definition)
51Putting all things together VRA
reliability
absence of bias
validity
52- Congratulations! You have survived two hours
preach, you can preach others now. - Questions?