Introducing Natural Language Program Analysis - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

Introducing Natural Language Program Analysis

Description:

... auction sniping program. Bug: User-triggered add auction ... Task: Locate code related to add auction' trigger. Seed: DoAction() method, from prior knowledge ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 48
Provided by: wilmi9
Learn more at: http://cs.drew.edu
Category:

less

Transcript and Presenter's Notes

Title: Introducing Natural Language Program Analysis


1
Introducing Natural Language Program Analysis
  • Lori Pollock, K. Vijay-Shanker, David Shepherd,
  • Emily Hill, Zachary P. Fry, Kishen Maloor

2
NLPA Research Team Leaders
K. Vijay-Shanker The Umpire
Lori Pollock Team Captain
University of Delaware
3
Problem
  • Modern software is large and complex

Software development tools are needed
object oriented class hierarchy
4
Successes in Software Development Tools
  • Good with local tasks
  • Good with traditional structure

object oriented class hierarchy
5
Issues in Software Development Tools
  • Scattered tasks are difficult
  • Programmers use more than traditional program
    structure

object oriented class hierarchy
6
Observations in Software Development Tools
public interface Storable...
//Store the fields in a file....
Key Insight Programmers leave natural language
clues that can benefit software development tools
undo action
public void Circle.save()
update drawing
activate tool
save drawing
object oriented system
7
Studies on choosing identifiers
So, I could use x, y, z. But, no one will
understand my code.
I dont care about names.
Carla, the compiler writer
Pete, the programmer
  • Impact of human cognition on names Liblit et al.
    PPIG 06
  • Metaphors, morphology, scope, part of speech
    hints
  • Hints for understanding code
  • Analysis of Function identifiers Caprile and
    Tonella WCRE 99
  • Lexical, syntactic, semantic
  • Use for software tools metrics, traceability,
    program understanding

8
Our Research Path
  • Motivated usefulness of exploiting natural
    language (NL) clues in tools
  • Developed extraction process and an NL-based
    program representation
  • Created and evaluated a concern location tool and
    an aspect miner with NL-based analysis

MACS 05, LATE 05
AOSD 06
ASE 05, AOSD 07, PASTE 07
9
pic
Name David C Shepherd Nickname
Leadoff Hitter Current Position PhD May 30,
2007 Future Position Postdoc, Gail Murphy
Stats Year coffees/day redmarks/paper
draft 2002 0.1 500 2007 2.2 100
10
Aspect Mining
How can I fix Pauls atrocious code?
Applying NL Clues for
  • Aspect-Oriented Programming

Molly, the Maintainer
Aspect Mining Task
Locate refactoring candidates
11
Timna An Aspect Mining Framework ASE 05
  • Uses program analysis clues for mining
  • Combines clues using machine learning
  • Evaluated vs. Fan-in
  • Precision (quality) and Recall (completeness)

P R 37 2 62 60
Fan-In Timna
12
Integrating NL Clues into Timna
  • iTimna (Timna with NL)
  • Integrates natural language clues
  • Example Opposite verbs (open and close)

P R 37 2 62 60 81 73
Fan-In Timna iTimna
Natural language information increases the
effectiveness of Timna
Come back Thurs 1005am
13
Concern Location
Applying NL Clues for
Motivation
  • 60-90 software costs spent on reading and
    navigating code for maintenance
  • (fixing bugs, adding features, etc.)

Erlikh Leveraging Legacy System Dollars for
E-Business
14
Key Challenge Concern Location
  • Find, collect, and understand all source code
    related to a particular concept


Concerns are often crosscutting
15
State of the Art for Concern Location
  • Mining Dynamic Information
    Wilde ICSM 00
  • Program Structure Navigation
    Robillard FSE 05, FEAT, Schaefer ICSM 05
  • Search-Based Approaches
  • RegEx grep, Aspect Mining Tool 00
  • LSA-Based Marcus 04
  • Word-Frequency Based GES 06

Reduced to similar problem
Slow
Fast
Fragile
Sensitive
No Semantics
16
Limitations of Search Techniques
  1. Return large result sets
  2. Return irrelevant results
  3. Return hard-to-interpret result sets

17
The Find-Concept Approach
1. More effective search
2. Improved search terms
Source Code
3. Understandable results
concept
Method a
Concrete query
Find-Concept
Method b
Method c
NL-based Code Rep
Recommendations
Method d
Method e
Natural Language Information
Result Graph
18
Underlying Program Analysis
  • Action-Oriented Identifier Graph (AOIG) AOSD
    06
  • Provides access to NL information
  • Provides interface between NL and traditional
  • Word Recommendation Algorithm
  • NL-based
  • Stemmed/Rooted complete, completing
  • Synonym finish, complete
  • Combining NL and Traditional
  • Co-location completeWord()

19
Experimental Evaluation
Find Concept, GES, ELex
  • Research Questions
  • Which search tool is most effective at forming
    and executing a query for concern location?
  • Which search tool requires the least human effort
    to form an effective query?
  • Methodology
  • 18 developers complete nine concern location
    tasks on medium-sized (gt20KLOC) programs
  • Measures
  • Precision (quality), Recall (completeness),
    F-Measure (combination of both P
    R)

20
Overall Results
Across all tasks
  • Effectiveness
  • FC gt Elex with statistical significance
  • FC gt GES on 7/9 tasks
  • FC is more consistent than GES
  • Effort
  • FC Elex GES

FC is more consistent and more effective in
experimental study without requiring more effort
21
Natural Language Extraction from Source Code
What was Pete thinking when he wrote this code?
  • Key Challenges
  • Decode name usage
  • Develop automatic extraction process
  • Create NL-based program representation

Molly, the Maintainer
22
Natural Language Which Clues to Use?
  • Software Maintenance
  • Typically focused on actions
  • Objects are well-modularized

Maintenance Requests
23
Natural Language Which Clues to Use?
  • Software Maintenance
  • Typically focused on actions
  • Objects are well-modularized
  • Focus on actions
  • Correspond to verbs
  • Verbs need Direct Object
  • (DO)
  • Extract verb-DO pairs

24
Extracting Verb-DO Pairs
Extraction from comments
  • Two types of extraction

class Player / Play a specified
file with specified time interval /
public static boolean play(final File file,final
float fPosition,final long length)
fCurrent file try
playerImpl null //make sure to
stop non-fading players stop(false)
//Choose the player Class
cPlayer file.getTrack().getType().getPlayerImpl(
)
Extraction from method signatures
25
Extracting Clues from Signatures
  1. POS Tag Method Name
  2. Chunk Method Name
  3. Identify Verb and Direct-Object (DO)

public UserList getUserListFromFile( String path
) throws IOException try
File tmpFile new File( path )
return parseFile(tmpFile) catch(
java.io.IOException e ) throw new
IOrException( UserList format issue" path "
file " e )
POS Tag
getltverbgt Userltadjgt Listltnoungt From ltprepgt File
ltnoungt
Chunk
getltverb phrasegt User Listltnoun phrasegt From File
ltprep phrasegt
26
pic
Name Zak Fry Nickname The
Rookie Current Position Upcoming
senior Future Position Graduate School
Stats Year diet cokes/day lab days/week 2006 1
2 2007 6 8
27
Developing rules for extraction
verb
DO
  • For many methods
  • Identify relevant verb (V) and direct object (DO)
    in method signature
  • Classify pattern of V and DO locations
  • If new pattern, create new extraction rule

verb
DO
verb
DO
28
Our Current Extraction Rules
  • 4 general rules with subcategories

URL parseURL()
void mouseDragged()
void Host.onSaved()
void message()
29
Example Sub-Categories for Left-Verb General
Rule
  • Look beyond the method name
  • Parameters, Return type, Declaring class name,
    Type hierarchy

30
Representing Verb-DO Pairs
  • Action-Oriented Identifier Graph (AOIG)

verb1
verb2
verb3
DO1
DO2
DO3
verb1, DO1
verb1, DO2
verb3, DO2
verb2, DO3
use
use
use
use
use
use
use
use
source code files
31
Representing Verb-DO Pairs
  • Action-Oriented Identifier Graph (AOIG)

play
add
remove
file
playlist
listener
play, file
play, playlist
remove, playlist
add, listener
use
use
use
use
use
use
use
use
source code files
32
Evaluation of Extraction Process
  • Compare automatic vs ideal (human) extraction
  • 300 methods from 6 medium open source programs
  • Annotated by 3 Java developers
  • Promising Results
  • Precision 57
  • Recall 64
  • Context of Results
  • Did not analyze trivial methods
  • On average, at least verb OR direct object
    obtained

33
pic
Name Emily Gibson Hill Nickname
Batter on Deck Current Position 2nd year PhD
Student Future Position PhD Candidate
Stats Year cokes/day meetings/week 2003 0.2 1
2007 2 5
34
Program Exploration
Ongoing work
  • Purpose Expedite software maintenance and
    program comprehension
  • Key Insight Automated tools can use program
    structure and identifier names to save the
    developer time and effort

35
Dora the Program Explorer
Query
Dora
Relevant Neighborhood
Dora comes from exploradora, the Spanish word
for a female explorer.
36
State of the Art in Exploration
  • Structural (dependence, inheritance)
  • Slicing
  • Suade Robillard 2005
  • Lexical (identifier names, comments)
  • Regular expressions grep, Eclipse search
  • Information Retrieval FindConcept, Google
    Eclipse Search Poshyvanyk 2006

37
Motivating need for structural and lexical
information
ExampleScenario
  • Program JBidWatcher, an eBay auction sniping
    program
  • Bug User-triggered add auction event has no
    effect
  • Task Locate code related to add auction
    trigger
  • Seed DoAction() method, from prior knowledge

38
Using only structural information
Looking for add auction trigger
DoAction()
  • DoAction() has 38 callees, only 2/38 are relevant

RelevantMethods
  • Locates locally relevant items, but many
    irrelevant

And what if you wanted to explore more than one
edge away?
Irrelevant Methods
39
Using only lexical information
Looking for add auction trigger
  • 50/1812 methods contain matches to addauction
    regular expression query
  • Only 2/50 are relevant
  • Locates globally relevant items, but many
    irrelevant

40
Combining Structural Lexical Information
Looking for add auction trigger
  • Structural guides exploration from seed

RelevantNeighborhood
  • Lexical prunes irrelevant edges

41
The Dora Approach
Prune irrelevant structural edges from seed
  • Determine method relevance to query
  • Calculate lexical-based relevance score
  • Low-scored methods pruned from neighborhood
  • Recursively explore

42
Calculating Relevance ScoreTerm Frequency
Query add auction
  • Score based on query term frequency of the method

6 query term occurrences
Only 2 occurrences
43
Calculating Relevance ScoreLocation Weights
Query add auction
  • Weigh term frequency based on location
  • Method name more important than body
  • Method body statements normalized by length

?
44
Dora explores add auction trigger
  • From DoAction() seed
  • Correctly identified at 0.5 threshold
  • DoAdd() (0.93)
  • DoPasteFromClipboard() (0.60)
  • With only one false positive
  • DoSave() (0.52)

45
Summary
  • NL technology used
  • Synonyms, collocations, morphology, word
    frequencies, part-of-speech tagging, AOIG
  • Evaluation indicates
  • Natural language information shows promise for
    improving software development tools
  • Key to success
  • Accurate extraction of NL clues

46
Our Current and Future Work
  • Basic NL-based tools for software
  • Abbreviation expander
  • Program synonyms
  • Determining relative importance of words
  • Integrating information retrieval techniques

47
Posed Questions for Discussion
  • What open problems faced by software tool
    developers can be mitigated by NLPA?
  • Under what circumstances is NLPA not useful?
Write a Comment
User Comments (0)
About PowerShow.com