Measurement and Evaluation of Science Teaching - PowerPoint PPT Presentation

1 / 52
About This Presentation
Title:

Measurement and Evaluation of Science Teaching

Description:

What may happen in measurement and evaluation of science teaching? ... Use the textbook language or other phraseology that has the 'appearance of truth' ... – PowerPoint PPT presentation

Number of Views:141
Avg rating:3.0/5.0
Slides: 53
Provided by: xliu8
Category:

less

Transcript and Presenter's Notes

Title: Measurement and Evaluation of Science Teaching


1
Measurement and Evaluation of Science Teaching
  • Xiufeng Liu, PhD
  • Department of Learning and Instruction
  • University at Buffalo, SUNY
  • E-mail xliu5_at_buffalo.edu

2
Measurement and Evaluation
  • What may happen in measurement and evaluation of
    science teaching?
  • We have never studied the content of these
    questions.
  • This test is too easy. I can answer them by
    closing my eyes.
  • I scored 65 on the test, but I still failed the
    test, which doesnt make sense to me.

3
What is all about?
4
(No Transcript)
5
Key to Good Assessment
  • Assessment
  • (Measurement Evaluation)
  • alignment
  • Curriculum Instruction

6
Necessary Conditions for Good Assessment
  • Planning
  • Developing
  • Administering
  • Scoring
  • Analyzing
  • Grading

7
Test Planning Test Grid
8
Developing multiple-choice (MC) questions
better
poor
9
Developing MC questions
poor
better
10
Ways to make choices plausible
  • 1. Using students common misconceptions or
    errors
  • 2. Use the textbook language or other phraseology
    that has the appearance of truth
  • 3. Use distracters that are parallel in form and
    grammatically consistent with the items stem
  • 4. Make the distracters similar to the correct
    answer in length, vocabulary, sentence structure,
    and complexity of thought

11
Developing MC questions
poor
better
12
Developing MC questions
poor
better
13
Developing MC questions
better
poor
14
Summary Checklist for good MC questions
  • The stem presents a clear problem
  • The stem is stated as a question
  • The choices are equally plausible
  • The choices are in alpha-numerical or other
    logical order
  • The choices are consistent in length and contain
    no extraneous clues
  • The choices contain only one best or correct
    answer
  • None of the above or all of the above choices
    are avoided

15
Advantages and Limitations of M-C Questions
  • Advantages
  • Easy to score
  • Objective to score
  • Large coverage
  • Good at assessing specific knowledge and
    understanding or lower order thinking skills
    (LOWS)
  • Incorrect answers provide valuable information on
    students learning difficulties
  • Limitations
  • Limited in assessing higher order thinking skills
    (HOTS)
  • Guessing
  • Reading comprehension
  • Time consuming to write good M-C questions

16
How to measure higher order thinking skills (HOTS)
  • First of all You dont have to use MC to assess
    HOTS there are many other question formats that
    assess HOTS better than MC does
  • Understand the difference among different
    cognitive levels
  • Use combinations of question formats
  • Develop appropriate multiple-choice questions

17
Lower Order Thinking Skills (LOTS)
  • Remember recognize (identify) , recall
    (retrieve)
  • Understand interpret (clarify, paraphrase,
    represent, translate) , exemplify (illustrate,
    instantiate), classify (categorize, instantiate),
    summarize (abstract, generalize), infer
    (conclude, extrapolate, interpolate, predict),
    compare (contrast, map, match), explain
    (construct, model)
  • Apply execute (carry out), implement (use)

18
Higher Order Thinking Skills (HOTS)
  • Analyze differentiate (discriminate,
    distinguish, focus, select), organize (find
    coherence, integrate, outline, parse, structure),
    attribute (deconstruct)
  • Evaluate check (coordinate, detect, monitor,
    test), critique (judge)
  • Create generate (hypothesize), plan (design),
    Produce (construct)

19
Using combinations of question formats M-C M-C
  • After a large ice-cube has melted in a beaker of
    water, how will the water level change?
  • a. higher
  • b. lower
  • c. the same
  • After a large ice-cube has melted in a beaker of
    water, how will the water level change?
  • a. higher
  • b. lower
  • c. the same
  • Why do you think so? Choose all that apply.
  • a. The mass of water displaced is equal to the
    mass of the ice.
  • b. Ice has more volume than water.
  • c. Water is denser than ice.
  • d. Ice cube decreases the temperature of water.
  • e. Water molecules in water occupy more space
    than in ice.

analyze
understand
20
Using combinations of question formats M-C
Constructed Response
  • After a large ice-cube has melted in a beaker of
    water, how will the water level change?
  • a. higher
  • b. lower
  • c. the same
  • After a large ice-cube has melted in a beaker of
    water, how will the water level change?
  • a. higher
  • b. lower
  • c. the same
  • Why do you think so? Please justify your choice

understand
analyze
21
Using combinations of question formats
Performance M-C
  • Using the materials provided at your table,
    create a model of the human heart. You should
    use the blue and red play-doh to represent
    de-oxygenated and oxygenated blood. Be sure to
    create and label the following
  • Left Atrium (2 pts.)
  • Right Atrium (2 pts.)
  • Left Ventrical (2 pts.)
  • Right Ventrical (2 pts.)
  • Aorta (2 pts.)
  • Pulmonary Vein (2 pts.)
  • Pulmonary Artery (2 pts.)
  • Using the materials provided at your table,
    create a model of the human heart. You should
    use the blue and red play-doh to represent
    de-oxygenated and oxygenated blood. Be sure to
    create and label the following
  • Same as the left
  • In the heart, the mixing of oxygen-rich and
    oxygen-poor blood is prevented by the
  • a.mitral valve
  • b.tricuspid valve
  • c.septum
  • d.pericardium.

Create
Understand/Create
22
Developing appropriate M-C questions for HOTS
  • 1. Providing a factual statement, ask students to
    analyze.
  • The Sun is the only body in our solar system that
    gives off large amounts of light and heat. Why
    can we see the Moon?
  • A. It is nearer the earth than the Sun
  • B. It is reflecting light from the Sun
  • C. It is the biggest object in the solar system
  • D. It is without an atmosphere

23
Developing appropriate M-C questions for HOTS
  • 2. Providing a diagram, ask students to identify
    elements
  • In the cell on the right, what letter correctly
    identifies the portion that first receives a
    signal
  • a. A
  • b. B
  • c. C.
  • d. D
  • e. E

24
Developing appropriate M-C questions for HOTS
  • 3. Providing data, ask students to develop a
    hypothesis
  • Amounts of oxygen produced in a pound at
    different depths are shown below
  • Location Oxygen
  • Top meter 4 g/m3
  • Second meter 3g/m3
  • Third meter 1g/m3
  • Bottom meter 0g/m3
  • Which statement is a reasonable hypothesis based
    on the data in the table?
  • A. More oxygen production occurs near the surface
    because there is more light there.
  • B. More oxygen production occurs near the bottom
    because there are more plants there
  • C. The greater the water pressure, the more
    oxygen production occurs
  • D. The rate of oxygen production is not related
    to depth.

25
Developing appropriate M-C questions for HOTS
  • 4. Providing a statement, ask students to
    evaluate its validity
  • The crews of two boats at sea can communicate
    with each other by shouting to each other, so are
    crews of two close-by spaceships in the space.
    How valid is this statement?
  • A. Valid
  • B. Partially valid
  • C. Invalid
  • D. Not enough information to make a judgment

26
(No Transcript)
27

28
Developing constructed response questions for
assessing HOTS
  • Short constructed response (SCR) questions
    require answers ranging from one word to a few
    sentences.
  • Extended constructed response (ECR) questions
    require students to write a few sentences or a
    short paragraph.
  • Essay (E) questions require students to write a
    few paragraphs to a few pages.

29
General guidelines for writing constructed-respons
e questions
  • 1. Define the task completely and specifically.
  • Poor State whether you think pesticide should be
    used in farms.
  • Better State the environmental effects of
    pesticide use in farms

30
Avoid ambiguous words
  • Possible student interpretations of the word
    discuss
  • Explain in my own words, maybe with an
    introduction, something in the middle and a
    conclusion
  • Analyze in length
  • Present analogies and comparisons
  • Tell all I know as much as possible
  • Put down facts

31
General guidelines for writing constructed-respons
e questions
  • 2. Give explicit directions such as the length,
    grading guideline, and time to complete.
  • Poor State whether you think pesticide should be
    used in farms.
  • Better State whether you think pesticide should
    be used in farms. Defend your position as
    follows
  • a. Identify any positive benefits associated with
    pesticide use.
  • b. Identify any negative effects associated with
    pesticide use.
  • c. Compare positive benefits against negative
    effects.
  • d. Suggest if better alternatives than pesticide
    are available.
  • Your essay should be in no more than 2
    double-spaced pages. Two of the points will be
    used to evaluate the sentence structure,
    punctuation, and spelling. (10 points).

32
General guidelines for writing constructed-respons
e questions
  • 3. Do not provide optional questions for students
    to choose
  • Because different questions may measure
    completely different constructs, which makes
    comparisons among students difficult

33
General guidelines for writing constructed-respons
e questions
  • 4. Define scoring clearly and appropriately
    scoring rubric
  • Analytic vs Holistic

34
Holistic Scoring Rubric
35
Analytic Scoring Rubric
36
Holistic vs Analytic
  • Holistic
  • Easy to construct
  • Efficient to score
  • Clear implication
  • Vague feedback
  • Less informative for students to answer the
    question
  • Analytic
  • Time consuming to construct
  • Time consuming to score
  • Unclear implication
  • Specific feedback
  • Informative for students to answer the question

37
Guidelines for scoring essay questions
  • Essays are scored anonymously
  • Essays are scored question by question across
    students
  • Each essay is graded twice independently to
    ensure consistency/reliability
  • Appropriate scoring rubrics are developed and
    applied consistently

38
Multiple Faculty, TAs
  • Common curriculum
  • Common learning opportunities
  • Develop and agree on a common test grid
  • Same scoring rubrics
  • Consider item banking

39
Developing vs. Adopting
  • There are many standardized tests or item banks
    (e.g. http//www.flaguide.org/)
  • Standardized tests have established validity,
    reliability, and absence of bias
  • The key is the match between the test coverage
    and the curriculum/instruction

40
Necessary conditions for good assessment
  • Planning
  • Developing
  • Administering
  • Scoring
  • Analyzing
  • Grading

41
Administering tests
  • Order questions easy to difficult SRC questions
    first, SCR questions next, and ECR questions at
    last
  • Give complete instructions before students begin
    test purpose, time allowance, basis for
    responding, methods of recording, appropriateness
    of guessing
  • Use equivalent forms or different item orders and
    recording sheets to avoid cheating
  • Ensure adequate physical setting
  • Avoid unnecessary interaction with students
  • Start and end the test at the same time

42
Scoring tests
  • Hand scoring vs. optical scanning
  • Need to establish inter-rater reliability for
    constructed response questions
  • Correct for guessing when appropriate (e.g.
    speeded)
  • Corrected Score R W/(n-1)
  • Correct for cheating
  • Harpp-Hogan index (H-H) EEIC/D
  • EEIC is exact errors in common
  • D is number of different responses

43
Item and test analysis
  • Item analysis item response patterns, item
    difficulty, item discrimination, etc.
  • Test analysis reliability (?), criterion related
    validity, bias, prediction related validity, etc.

44
Grading
  • Lake Wobegon Effect
  • In 1988 it was reported that 70 of the
    students, 90 of the 15,000 school districts, and
    50 states in US were scoring above the national
    norms on norm-referenced achievement tests in
    elementary schools (Cannell, 1988)

45
Criterion-referenced grading based on standards
  • Commonly used standards pass/fail, A/B/C/F,
  • The essential part of standard setting is to
    decide a cut-off score

46
Deciding the cut-off score M-C test
  • If number of test questions are more than 20, and
    the ?0 is within the range of .50 to .80, the
    approximate X can be calculated as follows
  • X (n-?)/ ? ?0 (?-1)/ ? M .5
  • M is the mean score on the test, ? is test
    reliability, n is total number of questions
  • ?0 is a true cut-off score if measurement
    quality is perfect
  • X is the approximate cut-off score given the
    measurement error (?)

47
Example (n 28, ?0 .75, X021)
48
Norm-referenced grading or curving grading
  • Z (X-µ)/s

49
Norm-referenced vs. criterion-reference
  • Number of students
  • Characteristics of students
  • Purpose of testing
  • Use of testing results
  • Quality of tests

50
Other grading issues
  • 1.Components of Grades (achievement, efforts,
    attitude)
  • 2. Combining Scores for the Final Grade (equate
    before weight before aggregate)
  • 3. Translating Final Grades to Letter Grades
    (pre-determined scheme)
  • 4. Reporting Grades (clear definition)

51
Putting all things together VRA

reliability
absence of bias
validity
52
  • Congratulations! You have survived two hours
    preach, you can preach others now.
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com