Automatically Grading Programming Assignments with Web-CAT - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

Automatically Grading Programming Assignments with Web-CAT

Description:

Automatically Grading Programming Assignments with Web-CAT Stephen H. Edwards Virginia Tech Dept. of Computer Science edwards_at_cs.vt.edu http://web-cat.sourceforge.net/ – PowerPoint PPT presentation

Number of Views:203
Avg rating:3.0/5.0
Slides: 41
Provided by: StephenE151
Category:

less

Transcript and Presenter's Notes

Title: Automatically Grading Programming Assignments with Web-CAT


1
Automatically Grading Programming Assignments
with Web-CAT
  • Stephen H. Edwards
  • Virginia Tech
  • Dept. of Computer Science
  • edwards_at_cs.vt.edu
  • http//web-cat.sourceforge.net/

2
My goals today are to
  • Explain how requiring students to formulate and
    test hypotheses about their own code can improve
    their understanding and performance
  • Describe our experiences with an alternate
    grading approach supported by a new tool Web-CAT
  • Describe some of the flexibility in Web-CAT for
    supporting other approaches
  • Convince you software testing can be an
    importantand practicaladdition to classroom
    practices

3
Students hold onto ineffective techniques
  • Too often, intro students believe that if their
    code
  • compiles, the errors are mostly gone
  • runs correctly when I try it once, it is correct
  • runs on the instructor-provided sample input, it
    is correct
  • has a problem, it can be fixed by trial and error

4
What is reflection-in-action?
  • For an expert, when the current technique is
    failing
  • Step back and reflect I must be missing
    something
  • Re-examine the situation, your solution, and your
    implicit assumptions about the problem
  • Leads to guesses (hypotheses) about why the
    solution isnt working or why something else will
    be better
  • Carry out an experiment which serves to
    generate both a new understanding of the
    phenomenon and a change in the situation

5
Practicing software testing will help students
frame and carry out experiments
  • The problem too much focus on synthesis and
    analysis too early in teaching CS
  • Need to be able to read and comprehend source
    code
  • Envision how a change in the code will result in
    a change in the behavior
  • Need explicit, continually reinforced practice in
    hypothesizing about program behavior and then
    experimentally verifying their hypotheses

6
Student comments suggest their current testing
practices are often weak
  • I run them through some simple tests to ensure
    that it is operating as expected. But for the
    most part I have always relied on supplied test
    data
  • I dont think about test cases until I am
    confident my program is 100 working. Of course,
    it almost never is
  • I usually write the whole thing up and then start
    doing rapid-fire tests of everything I can think
    of.

7
A comprehensive strategy is necessary for a
culture shift in what students do
  • Students cannot test their own code
  • Want a culture shift in student behavior
  • A single upper-division course would have little
    impact on practices in other classes
  • So Systematically incorporate testing practices
    across many courses

CS1
CS2
Testing Practices
OO Design
Data Struct
8
Expect students to apply their testing skills all
the time in programming assignments
  • Expect students to test their own work
  • Empower students by engaging them in the
    process of assessing their own programs
  • Require students to demonstrate the correctness
    of their own work through testing
  • Do this consistently across many courses

9
What tools and techniques should I teach?
  • We want to start with skills that are directly
    applicable to authentic student-oriented tasks
  • Dont want to add bureaucratic busywork to
    assignments
  • Without tool support, this is a lost cause!
  • It is imperative to give students skills they
    value
  • But most textbooks only give a conceptual
    intro to idealized industrial practices, not
    techniques students can use in their own
    assignments

10
Test-driven development is very accessible for
students
  • Also called test-first coding
  • Focuses on thorough unit testing at the level of
    individual methods/functions
  • Write a little test, write a little code
  • Tests come first, and describe what is expected,
    then followed by code, which must be revised
    until all tests pass
  • Encourages lots of small (even tiny) iterations
  • See http//web-cat.sf.net/ for on-line references

11
Students can apply TDD in assignments and get
immediate, useful benefits
  • Conceptually, easy for students to understand and
    relate to
  • Increases confidence in code
  • Increases understanding of requirements
  • Preempts big bang integration

12
The problem is devising an effective assessment
strategy
  • Need to assess student performance at testing
  • Need to give productive feedback
  • Need to provide rapid turnaround
  • Cannot afford huge increase in resources required

13
Conventional automated assessment does not
encourage good testing habits
  • Student uploads program
  • Program is compiled
  • Executed against test data
  • Scored based on output

14
The conventional approach provides useful
benefits that do lead to a cultural change
  • Fast, precise feedback to students
  • Chance(s) to improve based on feedback
  • Good assessment of behavior
  • Systematic use resulted in culture change

15
But the conventional approach may discourage
desired behavior and skills
  • Focus is on output correctness, first and
    foremost
  • Get it working first, work on commenting,
    structure, etc. later
  • Students not encouraged or rewarded for testing
    on their own
  • Students often do less testing

16
Proper grading and feedback can provide positive
incentive for desirable behavior
  • Decide what behavior to foster
  • Choose a corresponding scoring/reward
    system
  • Design feedback approach
  • Use students adaptive nature to drive cultural
    change

17
Proper grading and feedback is critical to
reinforcing desired behavior
  • Assess test validity correctness of students
    tests
  • Assess test completeness the thoroughness of
    students tests
  • Assess program correctness behavior of students
    solution
  • Multiply scores as percentages

18
Students improve their code quality when using
Web-CAT
Newly written untested code

Commerical-quality code
19
Students start earlier and finish earlier when
they use Web-CAT


20
An evaluation of submitted code indicates
students program more effectively
Bold ? p .05 significance Without With TDD
Recorded grades 90.2 96.1
TA assessment 98.1 98.2
Automated grader assessment 76.8 94.0
Faults on master test suite 36.7 24.9
Projected Defects/KSLOC 70 38 (45 less!)
How early was first submission? 2.2 days 4.2 days
21
After using TDD and Web-CAT, students clearly
perceive practical benefits
Agree Disagree
More helpful at detecting errors than Curator 4.3
Provides excellent support for TDD 4.1
Increases my confidence in correctness 3.9
Increases my confidence when making changes 3.8
Makes me test my solution more thoroughly 3.8
Makes me more systematic in devising tests 3.8
Would like to use, even if not required 3.8
22
Student reactions are very positive toward TDD
  • I am very excited about using TDD.
  • I agree that TDD can be beneficial and Im glad
    we are being required to experiment with it in
    this course.
  • If it increases the effectiveness of my
    programming and decreases the time I spend
    debugging, then I am all for it.
  • Previously, I had to quit my detailed testing
    and stick to making the program appear to work
    with the sample data given every time a deadline
    drew near. With TDD, the tests are such an
    integral part of the project that no
    time-conserving measure will save me.

23
We use Web-CAT to automatically process student
submissions and check their work
  • Web application written in 100 pure Java
  • Deployed as a servlet
  • Built on Apples WebObjects
  • Uses a large-grained plug-in architecture
    internally, providing for easily extensible data
    model, UI, and processing features

24
Web-CATs strengths are targeted at broader use
  • Security mini-plug-ins for different
    authentication schemes, global user permissions,
    and per-course role-based permissions
  • Portability 100 pure Java servlet for Web-CAT
    engine
  • Extensibility Completely language-neutral,
    process-agnostic approach to grading, via
    site-wide or instructor-specific grading plug-ins
  • Manual grading HTML web printouts of student
    submissions can be directly marked up by course
    staff to provide feedback

25
Grading plug-ins are the key to process
flexibility and extensibility in Web-CAT
  • Processing for an assignment consists of a tool
    chain or pipeline of one or more grading
    plug-ins
  • The instructor has complete control over which
    plug-ins appear in the pipeline, in what order,
    and with what parameters
  • A simple and flexible, yet powerful way for
    plug-ins to communicate with Web-CAT, with each
    other
  • We have a number of existing plug-ins for Java,
    C, Scheme, Prolog, Pascal, Standard ML,
  • Instructors can write and upload their own
    plug-ins
  • Plug-ins can be written in any language
    executable on the server (we usually use Perl)

26
The most well-known plug-in is for grading Java
assignments that include student tests
  • ANT-based build of arbitrary Java projects
  • PMD and Checkstyle static analysis
  • ANT-based execution of student-written JUnit
    tests
  • Carefully designed Java security policy
  • Clover test coverage instrumentation
  • ANT-based execution of optional instructor
    reference tests
  • Unified HTML web printout
  • Highly configurable (PMD rules, Checkstyle rules,
    supplemental jar files, supplemental data files,
    java security policy, point deductions, and lots
    more)

27
Web-CAT supports a variety of languages, and its
Java plug-in is aimed at software testing
  • ANT-based build of arbitrary Java projects
  • PMD and Checkstyle static analysis
  • ANT-based execution of student-written JUnit
    tests
  • Carefully designed Java security policy
  • Clover test coverage instrumentation
  • ANT-based execution of optional instructor
    reference tests
  • Unified HTML web printout
  • Highly configurable (PMD rules, Checkstyle rules,
    supplemental jar files, supplemental data files,
    java security policy, point deductions, and lots
    more)

28
Web-CAT provides timely, constructive feedback on
how to improve performance
  • Indicates where code can be improved
  • Indicates which parts were not tested well enough
  • Provides as many revise/ resubmit cycles as
    possible

29
The most important step in writing testable
assignments is
  • Learning to write tests yourself
  • Writing an instructors solution with tests that
    thoroughly cover all the expected behavior
  • Practice what you are teaching/preaching

30
Students get frustrated without feedback, so
reference tests must provide some
  • If students only get a score, but no other
    feedback for how to improve, they get easily
    frustrated
  • We augment our reference tests to provide hints
    for failed tests, cross-referenced to the program
    assignment

Requirements in assignment spec mul this command takes two arguments from the evaluation stack and multiplies them11.
Feedback to student on failed test Your testing does not fully cover (11)
More detailed alternate feedback (11) mul command failed, expected 4 but received 8
31
Students will try to get Web-CAT to do their work
for them
  • Students appreciate the feedback, but will avoid
    thinking at (nearly) all costs
  • Too much feedback encourages students to use
    Web-CAT for testing instead of writing their own
    teststhey use it as a development tool instead
    of simply to check their work
  • This limits the learning benefits, which come in
    large part from students writing their own tests
  • Lesson balance providing suggestive feedback
    without giving away the answers lead the
    student to think about the problem

32
We have also tried to influence student work
habits to improve their success
  • Encourage early submission by providing extra
    incentives or using late penalties
  • Score bonuses and/or penalties are easy
  • Another useful approach
  • Generous limit on the total number of submissions
    (60)
  • Hints disappear one day before the due date
  • Project closes for one day to encourage students
    to step away and reflect on the last bug
  • Project opens again for one day with hints
    re-enabled, but with a cap on how much the score
    can improve

33
Lessons for writing program assignments intended
for automatic grading
  • Requires greater clarity and specificity
  • Requires you to explicitly decide what you wish
    to test, and what you wish to leave open to
    student interpretation
  • Requires you to unambiguously specify the
    behaviors you intend to test
  • Requires preparing a reference solution before
    the project is due, more upfront work for
    professors or TAs
  • Grading is much easier as many things are taken
    care by Web-CAT course staff can focus on
    assessing design

34
Areas to look out for in writing testable
assignments
  • How do you write tests for the following
  • Main programs
  • Code that reads/write to/from stdin/stdout or
    files
  • Code with graphical output
  • Code with a graphical user interface

35
Testing main programs
  • The key think in object-oriented terms
  • There should be a principal class that does all
    the work, and a really short main program
  • The problem is then simply how to test the
    principal class (i.e., test all of its methods)
  • Make sure you specify your assignments so that
    such principal classes provide enough accessors
    to inspect or extract what you need to test

36
Testing input and output behavior
  • The key specify assignments so that input and
    output use streams given as parameters, and are
    not hard-coded to specific sources destinations
  • Then use string-based streams to write test
    cases show students how
  • In Java, we use BufferedReaders and PrintWriters
    for all I/O
  • In C, we use istreams and ostreams for all I/O

37
Testing programs with graphical output
  • The key if graphics are only for output, you can
    ignore them in testing
  • Ensure there are enough methods to extract the
    key data in test cases
  • We use this approach for testing Karel the Robot
    programs, which use graphic animation so students
    can observe behavior

38
Testing programs with graphical UIs
  • This is a harder problemmaybe too distracting
    for many students, depending on their level
  • The key question what is the goal in writing the
    tests? Is it the GUI you want to test, some
    internal behavior, or both?
  • Three basic approaches
  • Specify a well-defined boundary between the GUI
    and the core, and only test the core code
  • Switch in an alternative implementation of the UI
    classes during testing
  • Test by simulating GUI events

39
Conclusion including software testing helps
promote learning and performance
  • If you require students to write their own tests
  • Our experience indicates students are more likely
    to complete assignments on time, produce one
    third less bugs, and achieve higher grades on
    assignments
  • It is definitely more work for the instructor
  • But it definitely improves the quality of
    programming assignment writeups and student
    submissions

40
Visit our SourceForge project!
  • http//web-cat.sourceforge.net/
  • Info about using our automated grader, getting
    trial accounts, etc.
  • Movies of making submissions, setting up
    assignments, and more
  • Custom Eclipse plug-ins for C-style TDD
  • Links to our own Eclipse feature site
Write a Comment
User Comments (0)
About PowerShow.com