Developers Workflows - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Developers Workflows

Description:

... the same activities: A series of compilations (maybe to change pointers to test ... no change: compilation failure. 14.18% debugging change. 25.53% only ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 31
Provided by: lori147
Category:

less

Transcript and Presenter's Notes

Title: Developers Workflows


1
Developers Workflows
2
Outline
  • Workflow definition reasons for study
  • Strategies for gathering and interpreting
    workflow data
  • Approaches for
  • Defining activities in the workflow
  • Understanding how accurately they can be
    identified

3
Defining workflow models
  • A workflow model is
  • A set of activities, with transitions among them
    that describe the
  • activities of some person(s) when developing code
    for
  • solving a given problem.
  • The workflow applied by a developer may vary from
    one context
  • (language, architecture, problem type) to
    another.
  • Basic assumption It is useful and possible to
    break development effort up into discrete
    activities
  • I.e. although we expect the actual process to be
    highly iterative, we can pretend activities are
    discrete and draw dividing lines between them

4
Goals for Modeling Workflows
  • Improving productivity time-to-solution
  • What is the relative importance of activities?
  • Which activities are the bottlenecks/resource
    sinks?
  • For what activities can vendor support have an
    effect?
  • Prediction
  • Of project success/failure (even at a heuristic
    level)
  • Of time/effort that would be required for
    different activities
  • Identifying necessary infrastructure / tool
    support for each activity
  • Training
  • Communicating about effective strategies to
    novices
  • Help programmers be more productive by adopting
    more optimum workflows

5
Types of data we can collect
  • Self-reported data
  • Effort logs
  • Defect logs
  • Background questionnaires
  • Post-study questionnaires
  • Automatically collected data in studies so far
  • Compile job submission timestamps (UMD)
  • Success/failure of compiles job submissions
    (UMD)
  • Time for compiling, submitting a job (UMD)
  • Source code at each compile (UMD)
  • Shell commands (Hackystat/UMD)
  • Editor events (Hackystat)
  • Job start/stop times, of procs (accounting logs)

6
Self-reported data has not been reliable
  • We can ask the programmer to keep track of what
    they are doing (effort log with activity info),
    BUT
  • Many students and professionals do not fill in
    the log
  • Logging is disruptive
  • They often report multiple activities in a given
    time interval (e.g. parallelizing debugging)
  • Manual logs tend to show little correlation to
    automated logging

7
Strategies for inferring workflow
fromautomatically collected data
  • Assign activities based on type of program being
    run
  • Examples
  • Debugger debugging
  • Profiler tuning
  • Job scheduler debugging? testing? tuning?
  • Editor ???
  • However, usually not fine-grained enough
  • Analyze changes in source code
  • Visually (e.g. with CodeVizard)
  • (Semi-)automatically Several heuristics have
    been developed, for example,
  • Many small changes followed by recompile rerun
    gt debugging

8
Some activities could not be distinguished in
this approach
9
Study An Observational Study to Calibrate
Workflow
  • Conducted an observational study with a graduate
    student
  • Problem to be solved was Game of Life in C and
    MPI (starting from scratch)
  • Subject had little experience with C and none
    with parallel programming (Novice)
  • Before the study subject was given a 1.5 hours
    lecture on parallel programming and MPI
  • Study conducted in 3 sessions of 3 hours each, in
    a one sided mirror room
  • 3 members observed the subject through direct
    observation and VNC screen shots and source code
    analysis.
  • The resulting golden version was described at
    two levels of detail
  • High-level view of the general phase (what part
    of the problem is the subject addressing)
  • Low-level view of more specific activities

10
Current model of workflow(based on heuristics)
Testing debugging
Syntax fixes
Parallel coding
Serial coding
Break
11
High-level Workflow(human observation rating)
Testing debugging
Syntax fixes
Parallel coding
Serial coding
Break
12
High-level Workflow(human observation rating)
Testing debugging
Syntax fixes
Parallel coding
Serial coding
Break
Added initial code, followed by a test/debug
session to figure out why file wasnt being read
correctly.
13
High-level Workflow(human observation rating)
Testing debugging
Syntax fixes
Parallel coding
Serial coding
Break
Adding additional code (with break) until an
infinite loop is found that causes a run-time
problem.
14
High-level Workflow(human observation rating)
Testing debugging
Syntax fixes
Parallel coding
Serial coding
Break
This infinite loop was hard to fix! Spends a
significant amount of effort debugging. Some time
at the end was spent cleaning up the code again
(removing printfs, etc.)
15
High-level Workflow(human observation rating)
Testing debugging
Syntax fixes
Parallel coding
Serial coding
Break
Towards end of 2nd session, begins parallel
coding. Wasnt using parallel functions but was
working on the algorithm to distribute processing.
16
Results of calibration study
  • Compared to current baseline
  • Heuristics match golden workflow for 45 of
    activities
  • Heuristics find 44 transitions while high-level
    golden workflow has 8. (Low-level golden
    workflow has 21 transitions.)

17
Work-Rework HeuristicsBased on activity types
Total work 29.08 Total rework 70.92
18
Work-Rework HeuristicsBased on compile/run
success and failures
Successful compile-run cycle
Successful edit-compile
Failed compile-run cycle
Failed edit-compile
A series of failed and successful Compile-Run
cycles Run-time defects being fixed
A series of failed and successful Compile cycles
with no runs New code is being added and
Compile-time defects being fixed
A series of successful Compile and failed Run
cycles Developer is not able to fix the defects
Observation Conclusion
19
Conclusions
  • We need to be realistic about the level of
    possible precision in workflow modeling
  • E.g. time spent thinking/designing solutions is
    too onerous for developers to count and not
    possible for computers to log.
  • Although observational studies are expensive,
  • The human-rated workflows are very valuable as a
    baseline.
  • It lets us reason about whether / why the
    heuristics mis-classify some activities.
  • We can refine our understanding by testing and
    proposing additional heuristics. E.g.,
  • Added a significant number of printfs gt likely
    test/debug
  • Created/modified test data file gt likely
    test/debug
  • We need to explore other directions
  • Presenting data in such a way that researchers
    can grasp the ideas and identify high-level
    activities easily (e.g. with CodeVizard)
  • Allow users to capture this knowledge (e.g.
    annotations / journaling)
  • Providing sufficient data for simulation

20
  • BACKUPS

21
Who would use small-scale lone programmer
workflow models?
  • Intended users / stakeholders include
  • Researchers / HPC developers
  • Understand lessons learned about what works /
    what doesnt work for similar development
    activities
  • HPC development managers
  • Understand the process that their developers may
    follow
  • Understand broadly what types of resources /
    support / etc. will be necessary
  • People making decisions about funding new HPC
    acquisition
  • Understand for a given problem type what
    models/languages are useful and what support
    infrastructure will be necessary

22
Methods for Modeling Workflow
  • Measure which of effort over the entire
    development is spent on each activity
  • Process models / flowcharts
  • Timed Markov models

23
Goal
  • Given a stream of timestamped events, we want to
    estimate
  • When the programmer was working
  • What type of activity the programmer was engaged
    in

Parallelizing
Debugging
Events
Time
24
Goals for Modeling Workflows
  • Research goal
  • To understand
  • How people solve problems with HPC codes
  • Under various conditions
  • To do this we need to study programmer activities
    for varied
  • Problem types
  • Programming languages
  • Parallel models
  • This understanding should be able to support

25
Imposing questions on the developer
Activities used by UMD in Classroom studies
26
Collecting self-reported data on only one activity
  • Defect logs Collect info on debugging activity
    alone
  • Less overhead than full effort log(?)
  • Even partial data is useful
  • Can we come up with logs to capture other
    activities?

27
Workflow Categories
  • We refined the current set of workflow categories
    to incorporate the situations that arose during
    the observation
  • Serial coding The subject is primarily focused
    on adding functionality through serial code.
    Includes short "syntax debugging" activities
  • Small changes are made to fix specific problems,
    but the subject doesnt really leave the
    programming mindset.
  • Testing/debugging Focused on finding and fixing
    a problem, not adding new functionality. Can be
    identified via some recognizable testing
    strategies
  • Adding printfs to code
  • Creating / modifying test data files
  • Includes refactoring steps to get rid of test
    code when done
  • Parallel coding Adding code to take advantage of
    multiple processors. (NOT just adding function
    calls to parallel library.)
  • These categories are consistent with the latest
    set used in classroom studies and SSCAs.

28
More Detailed Workflow(human observation
rating)
Testing debugging
Syntax fixes
Parallel coding
Serial coding
Break
Detailed view generally tracks the high-level
view, but smaller activities like syntax problems
are shown. Can see that - Generally, adding new
funct followed by fixing syntax issues - Some
syntax issues require full-fledged debugging
29
More Detailed Workflow(human observation
rating)
Testing debugging
Syntax fixes
Parallel coding
Serial coding
Break
Detailed view generally tracks the high-level
view, but smaller activities like syntax problems
are shown. Can see that - Generally, adding new
funct followed by fixing syntax issues - Some
syntax issues require full-fledged debugging
30
More Detailed Workflow(human observation
rating)
Testing debugging
Syntax fixes
Parallel coding
Serial coding
Break
Detailed view generally tracks the high-level
view, but smaller activities like syntax problems
are shown. Can see that - Generally, adding new
funct followed by fixing syntax issues - Some
syntax issues require full-fledged debugging
Write a Comment
User Comments (0)
About PowerShow.com