Classroom Assessments in Large Scale Assessment Programs - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Classroom Assessments in Large Scale Assessment Programs

Description:

Classroom Assessments in Large Scale Assessment Programs Catherine Taylor University of Washington/OSPI Lesley Klenk OSPI History of Criterion-Referenced Assessment ... – PowerPoint PPT presentation

Number of Views:203
Avg rating:3.0/5.0
Slides: 40
Provided by: Cather149
Category:

less

Transcript and Presenter's Notes

Title: Classroom Assessments in Large Scale Assessment Programs


1
Classroom Assessments in Large Scale Assessment
Programs
  • Catherine Taylor
  • University of Washington/OSPI
  • Lesley Klenk
  • OSPI

2
History of Criterion-Referenced Assessment Models
  • Measurement driven instruction" (e.g., Popham,
    1987) emerged during the 1980s
  • A process wherein the tests are used as the
    driver for instructional change.
  • If we value something, we must assess it.
  • Minimum-competency movement of the 1980's
  • Drive" instructional practices toward teaching
    of basic skills
  • Movement was successful - Teachers did teach to
    the tests.
  • Unfortunately, teachers taught too closely to
    tests (Smith, 1991 Haladyna, Nolen, Hass, 1991).
  • The tests were typically multiple-choice tests of
    discrete skills
  • Instruction narrowed to the content that was
    tested in the same form that it was tested.

3
History of Criterion-Referenced Assessment Models
  • Large-scale achievement tests came under wide
    spread criticism
  • Negative impacts on the classroom
    (Darling-Hammond Wise, 1985 Madaus, West,
    Harmon, Lomax, Viator, 1992 Shepard
    Dougherty, 1991).
  • Lack of fidelity to valued performances

4
History of Criterion-Referenced Assessment Models
  • Studies compared indirect and direct measures of
  • writing (Stiggins, 1982)
  • mathematical problem-solving (Baxter, Shavelson,
    Herman, Brown, Valadez, 1993)
  • science inquiry (Shavelson, Baxter, and Gao,
    1993)
  • Demonstrated that some of the knowledge and
    skills measured in each assessment format overlap
  • Moderate to low correlations between different
    assessment modes
  • Questions about the validity of multiple-choice
    test scores.
  • Other studies (Haladyna, Nolen, and Haas, (1991)
    Shepard and Dougherty (1991), and Smith (1991))
    showed
  • pressure to raise scores on large scale tests
  • narrowing of the curriculum to the specific
    content tested
  • substantial classroom time spent teaching to the
    test and item formats.

5
History of Criterion-Referenced Assessment Models
  • In response to criticisms of multiple-choice
    tests assessment reformers (e.g., Shepard, 1989
    Wiggins, 1989) pressed for
  • Different types of assessment
  • Assessments that measure students' achievement of
    new curriculum standards
  • Assessment formats that more closely match the
    ways knowledge, concepts and skills are used in
    the world beyond tests
  • Encourage teachers to teach higher order
    thinking, problem-solving, and reasoning skills
    rather than rote skills and knowledge.

6
History of Criterion-Referenced Assessment Models
  • In response to these pressures to improve tests
  • LEAs, testing companies, and projects (e.g., New
    Standards Project) incorporated performance
    assessments into testing programs
  • Performance assessments" included
  • Short-answer items similar to multiple-choice
    items
  • Carefully scaffolded, multi-step tasks with
    several short-answer items (e.g., Yen, 1993)
  • Open-ended performance tasks (California, 1990
    OSPI, 1997).

7
History of Criterion-Referenced Assessment Models
  • Still, writers criticized these efforts
  • Tasks are contrived and artificial (see, for
    example, Wiggins, 1992)
  • Teachers complain that standardized tests dont
    assess what is taught in the classroom
  • Shepard (2000) indicated that the promises of
    high quality performance-based assessments have
    not been realized.
  • Authentic tasks are costly to implement,
    time-consuming, and difficulty to evaluate
  • Less expensive performance assessment options are
    less authentic

8
Impact of National Curriculum Standards
  • Knowledge is expanding rapidly
  • Education must shift away from knowledge
    dissemination
  • Students must learn how to
  • Gather information
  • Comprehend, analyze, interpret information
  • Evaluate the credibility of information
  • Synthesize information from different sources
  • Develop new knowledge

9
Early Attempts to Use Portfolios for State
Assessment
  • Three states attempted to use collections of
    classroom work for state assessment
  • California (Kirst Mazzeo, 1996 Palmquist,
    1994)
  • Kentucky (Kentucky State Department of Education,
    1996)
  • Vermont (Fontana, 1995 Forseth, 1992 Hewitt,
    1993 Vermont State Department of Education,
    1993, 1994a, 1994b).

10
Early Attempts to Use Portfolios for State
Assessment
  • Initial efforts were fraught with problems
  • Inconsistency of raters when applying scoring
    criteria (Koretz, Stecher, Deibert, 1992b
    Koretz, Stecher, Klein, McCaffrey, 1994a),
  • Lack of teacher preparation in high quality
    assessment development (Gearhart Wolf, 1996),
  • Inconsistencies in the focus, number, and types
    of evidence included in portfolios (Gearhart
    Wolf, 1996 Koretz, et al 1992b), and
  • Costs and logistics associated with processing
    portfolios (Kirst Mazzeo, 1996).

11
Research on Large Scale Portfolio Assessment
  • Research on impact of portfolios showed mixed
    results
  • Teachers and administrators have generally
    positive attitudes about use of portfolios
    (Klein, Stecher, Koretz, 1995 Koretz, et al
    1992a Koretz, et al 1994a)
  • Positive effects on instruction (Stecher,
    Hamilton, 1994)
  • Teachers develop a better understanding of
    mathematical problem-solving (Stecher Mitchell,
    1995)
  • Too much time spent on the assessment process
    (Stecher, Hamilton, 1994 Koretz et al, 1994a)
  • Teachers work too hard to ensure that portfolios
    "look good" (Callahan, 1997).

12
Advantages to using classroom evidence in
large-scale assessment program
  • Evidence that teachers are preparing students to
    meet curriculum and performance standards
    (opportunity to learn),
  • Broader evidence about student achievement
  • Opportunity to assess knowledge and skills
    difficult to assess via standardized tests (e.g.,
    speaking and presenting, report writing,
    scientific inquiry processes)
  • Opportunity to include work that more closely
    represents the real contexts in which knowledge
    and skill are applied

13
Opportunity to Learn
  • Little evidence is available about whether
    teachers are actually teaching to curriculum
    standards.
  • Claims about positive impacts of new assessments
    on instructional practices are largely anecdotal
    or based on teacher self-report
  • Legal challenges to tests for graduation,
    placement, and promotion demand evidence that
    students have had the opportunity to learn tested
    curriculum (Debra P. v. Turlington, 1979).
  • There is no efficient method to assess students
    opportunity to learn the valued concepts and
    skills
  • Collections of classroom work provide a window
    into the educational experiences of students
  • Collections of classroom work provide window into
    the educational practices of teachers
  • Collections of classroom work could help
    administrators evaluate the effectiveness of
    in-service teacher development programs
  • Classroom assessments could be used in court
    cases to provide evidence of individual students
    opportunity to learn

14
Broader Evidence of Student Learning
  • Some students function well in the classroom but
    do not perform well on tests.
  • Stereotype threat" research - fear of negative
    stereotype can lead minority students and girls
    to perform less well than they should on
    standardized tests (Aronson, Lustin, Good,
    Keough, Steele, Brown, 1999 Steele, 1999 Steele
    Aronson, 2000).
  • Students may have cultural values or language
    development issues that inhibit performance on
    timed, standardized tests
  • These factors threaten the validity of
    large-scale test scores.
  • Classroom work can be more sensitive to students
    cultural and linguistic backgrounds
  • Collections of classroom work can be more
    reliable than standardized test scores

15
Including Standards that are Difficult Measure on
Tests
  • Some desirable curriculum standards are too
    unwieldy to measure on large-scale tests (e.g.,
    scientific inquiry, research reports, oral
    presentations)
  • Historically, standardized tests measure complex
    work by testing knowledge of how to conduct the
    work. Examples
  • Knowing where to locate sources for reports
  • Knowing how to use tables of contents,
    bibliographies, card catalogues, and indexes
  • Identifying control or experimental variables in
    a science experiment
  • knowing appropriate strategies for oral
    presentation
  • Knowing appropriate ways to use visual aids
  • Critics often note that knowing what to do
    doesn't necessarily mean one is able to do.

16
Authenticity
  • Frederickson (1984) question of authenticity in
    assessment due to misrepresentation of domains by
    standardized tests.
  • Wiggins (1989) claimed that in every discipline
    there are tasks that are authentic to the given
    discipline.
  • Frederickson (1998) stated that authentic
    achievement is
  • significant intellectual accomplishment that
    results in the construction of knowledge through
    disciplined inquiry to produce discourse,
    products, or performances that have meaning or
    value beyond success in school. (p. 19, italics
    added).
  • Examples of performances
  • Policy analysis
  • Historical narrative and evaluation of historical
    artifacts
  • Geographic analysis of human movement
  • Political debate
  • Story and poetry writing
  • Literary analysis/critique
  • Mathematical modeling
  • Investment or business analyses
  • Geometric design and animation
  • Written report of a scientific investigations
  • Evaluation of the health of an ecosystem

17
Authenticity
  • Some measurement specialists question the use of
    the terms authentic and direct measurement
  • All assessments are indirect measures from which
    we make inferences about other, related
    performances (Terwilliger, 1997))
  • However
  • Validity is related to the degree of inference
    necessary from scores on a standardized tests to
    valued work
  • Authentic classroom work requires less inference
    than multiple choice test scores

18
Challenges with Inclusion of Classroom Work in
Large Scale Programs
  1. Limited teacher preparation in classroom-based
    assessment (which can limit the quality of
    classroom-based evidence),
  2. Selections of evidence (which can limit
    comparisons across students),
  3. Reliability of raters (which can limit the
    believability of scores given to student work)
  4. Construct irrelevant variance (which can limit
    the validity of scores)

19
Solving Teacher Preparation Issues
  • Teachers must be taught how to
  • Select, modify, and develop assessments
  • Score (evaluate) student work
  • Write scoring (marking) rules for assessments
    that align to standards
  • Significant, ongoing professional development in
    assessment is essential.
  • Teachers need to re-examine
  • Important knowledge and skills within each
    discipline
  • How to teach so that students are more
    independent learners

20
Selection of Evidence
  • "For which knowledge, concepts, and skills do we
    need classroom-based evidence?"
  • Koretz, et al (1992b) claimed that, when teachers
    are free to select evidence, there is too much
    diversity in tasks
  • Diversity may cause low inter-judge agreement
    among raters of the portfolios.
  • Koretz and his colleagues recommended placing
    some restrictions on the types of tasks
    considered acceptable for portfolios.
  • Teachers need guidance in terms of what
    constitutes appropriate types of evidence.

21
Improving Selections of Evidence
  • Provide guidelines for what constitutes an
    effective collection of evidence
  • Provide models for the types of assignments
    (performances) that will demonstrate the
    standards.
  • Provide blueprints for tests that can assess that
    EALRs assessed by WASL
  • Provide guides for writing test questions and
    scoring rubrics
  • Provide guides for writing directions and scoring
    rubrics for assignments (performances)

22
Guidelines for Collections Include
  • Lists of important work samples to collect (e.g.,
    research reports, mathematics problems)
  • Number and types of evidence for each category
  • Outline of steps in performances and work samples
  • Tools for assessment of students performances
    and work samples

23
Example Lists of Number and Types of Work Samples
to Collect
  • Writing Performances
  • At least 2 different writing purposes
  • At least 3 different audiences
  • Some examples from courses other than English
  • Science Investigations
  • At least 3 investigations (physical, earth/space,
    life)
  • Observational assessments of hands-on work
  • Lab books
  • Summary research reports

24
Develop Benchmark Performance Assessments
  • Benchmark performances are performances that
  • Have value in their own right
  • Are complex and interdisciplinary
  • Students expected to do by the end of some
    defined period of time (e.g., the end of middle
    school).
  • Performance may require
  • Application of knowledge, concepts and skills
    across subject disciplines (e.g., survey
    research)
  • Authentic work within one subject discipline
    (e.g., scientific investigations, expository
    writing)

25
Example Description of a Benchmark Performance in
Reading
  • By the end of middle school, students will select
    one important character from a novel, short
    story, or play and write a multi-paragraph essay
    describing a character, how the character's
    personality, actions, choices, and relationships
    influence the outcome of the story, and how the
    character was affected by the events in the
    story. Each paragraph will have a central thought
    that is unified into a greater whole supported by
    factual material (direct quotations and examples
    from the text) as well as commentary to explain
    the relationship between the factual material and
    the student's ideas.

26
Example Description of a Benchmark Performance in
Mathematics
  • By the end of high school, students will
    investigate and report on a topic of personal
    interest by collecting data for a research
    question of personal interest. Students will
    construct a questionnaire and obtain a sample a
    relevant population. In the report, students will
    report the results in a variety of appropriate
    forms (including pictographs, circle graphs, bar
    graphs, histograms, line graphs, and/or stem and
    leaf plots and incorporating the use of
    technology), analyze and interpret the data using
    statistical measures (central tendency,
    variability, and range) as appropriate, describe
    the results, make predictions, and discuss the
    limitations of their data collection methods.
    Graphics will be clearly labeled (including name
    of data, units of measurement and appropriate
    scale) and informatively titled. References to
    data in reports will include units of
    measurement. Sources will be documented.

27
Example of the Process of Developing Benchmark
Performances
  • Select work that would be familiar or meaningful
  • Purchasing decision
  • Describe the performance in some detail
  • A person plans to buy a ___ on credit. The
    person figures out how much s/he can spend
    (down-payment and monthly payments), does
    research on the different types of ___, reads
    consumer reports or product reviews, compares
    costs and qualities, and makes a final selection.
    The person then locates the chosen product and
    purchases it or finances the purchase.

28
Example of the Process (continued)
  • Define the steps adults take to complete the
    performance
  • A person plans to buy a ___ on credit for ____
    purpose.
  • The person figures out how much s/he can spend
  • Determines money available for down-payment
  • Compares income and monthly expenses to determine
    cash available for monthly payment
  • Does research on the different types of ___
    including costs and finance options.
  • Reads consumer reports or product reviews
  • Compares costs, qualities, and finance options
  • Makes a final selection.
  • Locates the chosen product and finances the
    purchase.

29
Example of the Process (continued)
  • Create grade level appropriate steps
  • The student plans to buy a ___ on credit for
    _____ purpose.
  • The student
  • Figures out how much s/he can spend
  • Determines money available for down-payment
  • Compares income and monthly expenses to determine
    cash available for monthly payment
  • Does research on the at least 3 types of ______
  • Determines costs and finance options.
  • Reads consumer reports or product reviews
  • Compares costs, qualities, and finance options
  • Makes a final selection that is optimal for cost,
    quality and finance options within budget.

30
Example of the Process (continued)
  • Identify the EALRs demonstrated at each step
  • The student plans to buy a ___ on credit for
    _____ purpose.
  • The student
  • Figures out how much s/he can spend (EALR 4.1)
  • Determines money available for down-payment (EALR
    4.1)
  • Compares income and monthly expenses to determine
    cash available for monthly payment (EALR 3.1)
  • Does research on the at least 3 types of ______
    (EALR 4.1)
  • Determines costs and finance options (EALR 1.5.4)
  • Reads consumer reports or product reviews (EALR
    4.1)
  • Compares costs, qualities, and finance options
    (EALR 3.1)
  • Makes a final selection that is optimal for cost,
    quality and finance options within budget (EALR
    2.1-2.3)

31
Example of the Process (continued)
  • Modify the steps as needed to ensure
    demonstration of the EALRs
  • The student plans to buy a ___ on credit for
    _____ purpose.
  • The student
  • Figures out how much s/he can spend (EALR 4.1)
  • Determines money available for down-payment (EALR
    4.1)
  • Compares income and monthly expenses to determine
    cash available for monthly payment (EALR 3.1)
  • Does research on the at least 3 types of ____
    (EALR 4.1)
  • Determines costs and finance options (EALR 1.5.4)
  • Reads consumer reports or product reviews (EALR
    4.1)
  • Creates a table to show comparison of costs,
    qualities, and finance options (EALR 3.1)
  • Makes a final selection and explains how it is
    optimal for cost, quality and finance options
    within budget (EALR 2.1-2.3)

32
Possible Authentic Performances in Mathematics
  • Survey Research
  • Community issue
  • School issue
  • Return on investment (costs and sales)
  • Purchasing decisions
  • Graphic designs
  • Animation
  • Social science analyses
  • Sources of GDP
  • Major categories of federal budget
  • Casualties during war

33
Possible Authentic Performances in Reading
  • Literary analyses
  • Comparisons across different works by the same
    author
  • Comparisons across works by different authors on
    same theme
  • Analysis of theme, character, plot development
  • Reading journals
  • Research reports
  • Summary of information on a topic from multiple
    sources
  • Investigation of a social or natural science
    research question using multiple sources
  • Position paper based on information from multiple
    sources

34
Providing example blueprint for tests that can
assess the standards
Type of Standard Multiple-Choice Items Short-Answer Items Essay Items and/or Performance Tasks
Simple Application 2-4 1-2
Multi-step application 2-3
Solve problem 2-3
Communicate 1-2 1-2
Total 2-4 4-6 4-5
35
Example blueprint for tests that can assess
standards
Learning Target Multiple-Choice Items Short-Answer Items Essay Items and/or Performance Tasks
Main ideas/ important details 3-4 1-2
Analysis, interpretation, synthesis 1-2 2-3
Critical thinking 1-2 2-3
Total 3-4 4-6 4-5
36
Solving Score Reliability Issues
  • Train expert teachers to evaluate diverse
    collections of evidence
  • Expert teachers evaluate the collection of work
    to determine whether it meets standards

37
Construct Irrelevant Variance
  • Factors that are unrelated to targeted knowledge
    and skills that affect validity of performance
  • Teachers provide too much help
  • Teachers provide differential types of help
  • Students get help from parents
  • Directions for assignments are not clear
  • Students are taught the content but not how to do
    the type of performance

38
Solving Construct Irrelevant Variance Problems
  • Provide guidelines for what constitutes valid
    evidence
  • Provide model performance assessments or
    benchmark performance descriptions
  • Provide professional development on appropriate
    levels of help
  • Provide professional development on the EALRs and
    GLEs
  • Provide professional development on how to teach
    to authentic work

39
Conclusion
  • Collections of evidence CAN be used to measure
    valued knowledge and skills
  • Collection of Evidence (COE) guidelines for
    Washington State
  • Incorporate many of the characteristics that will
    ensure more valid student scores
  • Will continue to improve as more examples are
    provided
  • Scoring of collections
  • Will involve use of the same rigor in scoring as
    on WASL items
  • Will provide reliable student level scores
Write a Comment
User Comments (0)
About PowerShow.com