Title: ??????????????????????? (Refactoring Support Based on Code Clone Analysis)
1??????????????????????? (Refactoring Support
Based on Code Clone Analysis)
- ?? ??,?? ??,?? ??,?? ??
- (Yoshiki Higo, Toshihiro Kamiya, Shinji Kusumoto,
Katsuro Inoue) - ???? ??????????
- (Graduate School of Information Science and
Technology, - Osaka University)
- ???????? ????
- (Presto, Japan Science and Technology Agency)
- y-higo,kamiya,kusumoto,inoue_at_ist.osaka-u.ac.jp
2Background
- What is code clone?
- a code fragment that has identical or similar
fragments in the same or different files in a
system - introduced in the source program because of
various reasons such as reusing code by
copy-and-paste - makes software maintenance more difficult.
copy-and-paste
copy-and-paste
3Requirements for Code Clone Detection
- Appropriate code clones should be detected in
compliance with demands. - To understand the amount and distribution of code
clones, it is desirable to detect all code clones - To remove code clones (Restructuring or
Refactoring), it is useful to detect code clones
that can be removed, and also removing them
improves software maintainability
4Research Objective and Approach
- We aim to extract code clones which can be easily
refactored - Approach
- To detect code clones efficiently, we use a code
clone detection tool, CCFinder. - Then, we extract the specific code clones easily
refactored and provide applicable refactoring
patterns for the code clones. - Finally, we develop a refactoring support tool
and apply it to an open source program.
5Refactoring Process Support
- Commonly used refactoring process
- Step 1 Determine where refactoring should be
applied - Step 2 Determine which refactoring patterns
can/should be applied - Step 3 Investigate the effectiveness of the
refactoring patterns - Step 4 Modify source code
- Step 5 Conduct regression tests
- Proposed Method supports Steps1 and 2
- High scalability it take less of high time
complexity. - Detect fine-graded clone it detect more
fine-graded code clone than method unit.
6Outline of CCFinder
- CCFinder directly compares source code on token
unit, and detects code clones - Normalization of name space
- Replacement of names defined by user
- Removal of table initialization
- Consideration of module delimiter
- CCFinder can analyze the system of millions line
scale in practical use time
7CCFinderClone Detection Process
1. static void foo() throws RESyntaxException
2. String a new String "123,400",
"abc", "orange 100" 3. org.apache.regexp.RE
pat new org.apache.regexp.RE("0-9,") 4.
int sum 0 5. for (int i 0 i lt
a.length i) 6. if (pat.match(ai)) 7.
sum Sample.parseNumber(pat.getParen(0)
) 8. System.out.println("sum " sum) 9.
10. static void goo(String a) throws
RESyntaxException 11. RE exp new
RE("0-9,") 12. int sum 0 13. for
(int i 0 i lt a.length i) 14. if
(exp.match(ai)) 15. sum
parseNumber(exp.getParen(0)) 16.
System.out.println("sum " sum) 17.
8DefinitionsClone Pair and Clone Set
- Clone Pair a pair of identical or similar
fragments - Clone Set a set of identical or similar
fragments - CCFinder detects code clones as a clone pair
- After detection process, clone pairs are
transformed into clone sets
Clone Pair Clone Set
(C1, C4) C1, C4, C5
(C1, C5) C2, C3
(C2, C3)
(C4, C5)
9 Extraction of code clones easily refactored
- Structural code clones are regarded as the target
of refactoring - Detect clone pairs by CCFinder
- Transform the detected clone pairs into clone
sets - Extract structural parts as structural code
clones from the detected clone sets - What is a structural code clone ?
- example Java language
- Declaration class declaration, interface
declaration - Method method body, constructor, static
initializer - statement do, for, if, switch, synchronized,
try, while
10fragment 1
609 reset() 610 grammar
g 611 // Lookup make-switch threshold in
the grammar generic options 612 if
(grammar.hasOption("codeGenMakeSwitchThreshold"))
613 try 614
makeSwitchThreshold grammar.getIntegerOpti
on("codeGenMakeSwitchThreshold") 615
//System.out.println("setting
codeGenMakeSwitchThreshold to "
makeSwitchThreshold) 616 catch
(NumberFormatException e) 617
tool.error( 618
"option 'codeGenMakeSwitchThreshold' must be
an integer", 619
grammar.getClassName(), 620
grammar.getOption("codeGenMakeSwitchThre
shold").getLine() 621
) 622 623
624 625 // Lookup bitset-test
threshold in the grammar generic options 626
if (grammar.hasOption("codeGenBitsetTestThresho
ld")) 627 try 628
bitsetTestThreshold
grammar.getIntegerOption("codeGenBitsetTestThresho
ld")
fragment 2
623 624 625 // Lookup
bitset-test threshold in the grammar generic
options 626 if (grammar.hasOption("codeGen
BitsetTestThreshold")) 627 try
628 bitsetTestThreshold
grammar.getIntegerOption("codeGenBitsetTestThres
hold") 629
//System.out.println("setting codeGenBitsetTestThr
eshold to " bitsetTestThreshold) 630
catch (NumberFormatException e) 631
tool.error( 632
"option 'codeGenBitsetTestThresh
old' must be an integer", 633
grammar.getClassName(), 634
grammar.getOption("codeGenBi
tsetTestThreshold").getLine() 635
) 636 637
638 639 // Lookup debug code-gen in
the grammar generic options 640 if
(grammar.hasOption("codeGenDebug")) 641
Token t grammar.getOption("codeGenDebug"
) 642 if (t.getText().equals("tru
e"))
11fragment 3
1007 if ( inputState.guessing0 )
1008 buf.append(a.getText()) 10
09 1010 1011
_loop144 1012 do 1013
if ((LA(1)WILDCARD)) 1014
match(WILDCARD) 1015
aid() 1016
if ( inputState.guessing
0 ) 1017
buf.append('.') buf.append(a.getText()) 1018
1019
fragment 4
1527 if ( inputState.guessing0 )
1528 ta.getText() 1529
1530 1531
_loop84 1532 do 1533
if ((LA(1)COMMA)) 1534
match(COMMA) 1535
id() 1536
if ( inputState.guessing0 )
1537
t","b.getText() 1538
1539
12Provision of applicable refactoring patterns
- Following refactoring patterns12 can be used
to remove code sets including structural code
clones - Extract Class,
- Extract Method,
- Extract Super Class,
- Form Template Method,
- Move Method,
- Parameterize Method,
- Pull Up Constructor,
- Pull Up Method,
- For each clone set, the proposed method
determines which refactoring pattern is
applicable by using several metrics.
1 M. Fowler Refactoring Improving the Design
of Existing Code, Addison-Wesley, 1999. 2
http//www.refactoring.com/, 2004.
13Metrics(1)Volume Metrics for Clone SetLEN, POP,
DFL
- LEN(S) is the average length of token sequence
for a clone set S - POP(S) is the number of elements (code
fragments) of a clone set S - DFL(S) indicates an estimation of how many
tokens would be removed from source files when
all code fragments in a clone set S are
reconstructed
new sub routine
caller statements
14Metrics(2) Coupling Metrics for Clone SetNRV,
NSV
- NRV(S) represents the average number of
externally defined variables referred in the
fragment of a clone set S - NSV(S) represents the average number of
externally defined variables assigned to in the
fragment of a clone set S - Definition
- Clone set S includes fragment f1, f2, , fn
- si is the number of externally defined variable
which fragment fi refers - ti is the number of externally defined variable
which fragment fi assigns
15Metrics(3)Inheritance Metric for Clone SetDCH
- DCH(S) represents the position and distance
between each fragment of a clone set S - Definition
- Clone set S includes fragment f1, f2, ,fn
- Fragment fi exists in class Ci
- Class Cp is a class which locates lowest position
in C1, C2, ,Cn on class hierarchy - If no common parent class of C1,C2,,Cn exists,
the value of DCH(S) is 8 - This metric is measured for only the class
hierarchy where target software exists.
16Aries Refactoring Support ToolOverview
- Target Java programs
- Runtime environment JDK1.4 or above
- Implementation
- Analysis component Java 32,000 Lines
- CCFinder is used as code clone detection
component - JavaCC is used to construct syntax and semantic
analysis component - GUI component Java14,000 Lines
- User can specify target clone sets through GUI
operations.
17Case Study AntOverview
- Ant is one of build tools like make
- Input for Aries
- Source files of Ant 627
- LOC about 180,000
- It took 30 seconds to extract structural code
clones - We got 151 clone sets.
- Environment
- OS FreeBSD 4.9
- CPU Xeon 2.8G x 2
- Memory 4GB
18Case Study AntExtract Method (conditions)
- To apply Extract Method pattern, we filtered
clone sets by using following conditions - The unit of clone is statement (do, for, if, )
- Set the value of DCH(S) 0
- All fragments of a clone set are included in a
class - Set the value of NSV(S) lt 2
- Each fragment of a clone set assigns any value to
1 or no externally defined variable. - 32 clone sets satisfied these conditions
19Case Study AntExtract Method(result)
- 32 clone set can be categorized as followings
category number
No parameter, no return value 3
Addition of some parameters, no return value 18
Addition of some parameters and return the value 7
Others 4
if (!isChecked()) // make sure we don't
have a circular reference here Stack stk
new Stack() stk.push(this) die
OnCircularReference(stk, getProject())
if (name null) if (other.name !
null) return false
else if (!name.equals(other.name))
return false
20Conclusion
- We have
- proposed refactoring support method
- implemented a refactoring support tool, Aries
- conducted a case study to Ant, which is an open
source program, and most of filtered clone sets
could be removed.
21Future Works
- As future works, we are going to
- evaluate whether or not each refactoring should
be done as the viewpoint of software quality
(support Step 3) - find a group of clone sets that can be refactored
at once to conduct refactoring more effectively - Commonly used refactoring process
- Step 1 Determine where refactoring should be
applied - Step 2 Determine which refactoring patterns
can/should be applied - Step 3 Investigate the effectiveness of the
refactoring patterns - Step 4 Modify source code
- Step 5 Conduct regression tests
22(No Transcript)
23Code clone detection for refactoringRelated
Works
- Detect similar sub-graphs as clone on program
dependency graph 1. - High accuracy This approach finds out
data-dependence and control dependence in source
codes. - High time complexity It takes O(n2) time to
construct program dependency graph. - Detect similar methods and functions as clone
using metrics 2. - Low accuracy if the size of target method or
function is small, the values of metric make no
difference. - detection unit restriction only method and
function unit clone can be detected.
1 R. Komondoor and S. Horwitz, Using slicing
to identify duplication in source code, In Proc.
of the 8th International Symposium on Static
Analysis, Paris, France, July 16-18, 2001. 2
Magdalena Balazinska, Ettore Merlo, Michel
Dagenais, Bruno Lague, and Lostas Kontogiannis,
Advanced Clone-Analysis to Support
Object-Oriented System Refactoring, WCRE 2000,
pp. 98-107