Title: Refactoring Erlang Programs
1RefactoringErlang Programs
- Huiqing Li
- Simon Thompson
- University of Kent
2Overview
- What is refactoring?
- Examples
- The process of refactoring
- Tool building and infrastructure
- What is in Wrangler demo
- Latest advances data, processes, erlide.
3Introducing refactoring
4Soft-ware
- Theres no single correct design
- different options for different situations.
- Maintain flexibility as the system evolves.
5Refactoring
- Refactoring means changing the design or
structure of a program without changing its
behaviour.
Refactor
Modify
6Examples
7Generalisation
Generalisation and renaming
- -module (test).
- -export(f/1).
-
- add_one (HT) -gt
- H1 add_one(T)
- add_one () -gt .
- f(X) -gt add_one(X).
- -module (test).
- -export(f/1).
-
- add_one (N, HT) -gt
- HN add_one(N,T)
- add_one (N,) -gt .
- f(X) -gt add_one(1, X).
-
-module (test). -export(f/1). add_int
(N, HT) -gt HN add_int(N,T) add_int
(N,) -gt . f(X) -gt add_int(1, X).
8Generalisation
- -export(printList/1).
- printList(HT) -gt
- ioformat("p\n",H),
- printList(T)
- printList() -gt true.
- printList(1,2,3)
- -export(printList/2).
- printList(F,HT) -gt
- F(H),
- printList(F, T)
- printList(F,) -gt true.
- printList(
- fun(H) -gt
- ioformat("p\n", H)
- end,
- 1,2,3).
-
9Generalisation
- -export(printList/1).
- printList(HT) -gt
- ioformat("p\n",H),
- printList(T)
- printList() -gt true.
- -export(printList/1).
- printList(F,HT) -gt
- F(H),
- printList(F, T)
- printList(F,) -gt true.
- printList(L) -gt
- printList(
- fun(H) -gt
- ioformat("p\n", H) end,
- L).
-
10Asynchronous to synchronous
- pid! self(),msg
- Parent,msg -gt
- body
- pid! self(),msg, receive pid, ok-gt ok
- Parent,msg -gt
- Parent! self(),ok,
- body
11Refactoring
12Refactoring Transformation Condition
- Transformation
- Ensure change at all those points needed.
- Ensure change at only those points needed.
- Condition
- Is the refactoring applicable?
- Will it preserve the semantics of the module? the
program?
13Transformations
full
stop
one
14Condition gt Transformation
- Renaming an identifier
- "The existing binding structure should not be
affected. No binding for the new name may
intervene between the binding of the old name and
any of its uses, since the renamed identifier
would be captured by the renaming. Conversely,
the binding to be renamed must not intervene
between bindings and uses of the new name."
15Which refactoring exactly?
- Generalise f by making 23 a parameter of f
- f(X) -gt
- Con 23,
- g(X) Con 23.
- This one occurrence?
- All occurrences (in the body)?
- Some of the occurrences to be selected.
16Compensate or crash?
- -export(oldFun/1,
- newFun/1).
- oldFun(L) -gt
- newFun(L).
- newFun(L) -gt
-
- .
- -export(newFun/1).
- newFun(L) -gt
-
- .
or
?
17Refactoring tools
18Tool support
- Bureaucratic and diffuse.
- Tedious and error prone.
- Semantics scopes, types, modules,
- Undo/redo
- Enhanced creativity
19Semantic analysis
- Binding structure
- Dynamic atom creation, multiple binding
occurrences, pattern semantics etc. - Module structure and projects
- No explicit projects for Erlang cf Erlide /
Emacs. - Type and effect information
- Need effect information for e.g. generalisation.
20Erlang refactoring challenges
- Multiple binding occurrences of variables.
- Indirect function call or function spawn
apply (lists, rev, a,b,c) - Multiple arities  multiple functions rev/1
- Concurrency
- Refactoring within a design library OTP.
- Side-effects.
21Static vs dynamic
- Aim to check conditions statically.
- Static analysis tools possible but some aspects
intractable e.g. dynamically manufactured atoms. - Conservative vs liberal.
- Compensation?
22Architecture of Wrangler
23Wrangler in Emacs
24Wrangler in Emacs
25Wrangler refactorings
- Rename variable/function/module
- Generalise function definition
- Move a function definition to another (new)
module - Function extraction
- Fold expression against function
- Expression search
- Detect duplicate code
- Tuple function parameters
- From tuple to record
26Wrangler demo
27(No Transcript)
28Tool building
29Wrangler and RefactorErl
- Lightweight.
- Better integration with interactive tools (e.g.
emacs). - Undo/redo external?
- Ease of implementing conditions.
- Higher entry cost.
- Better for a series of refactorings on a large
project. - Transaction support.
- Ease of implementing transformations.
30Integration with IDEs
- Back to the future? Programmers' preference for
emacs and gvim - though some IDE interest Eclipse,
NetBeans - Issue of integration with multiple IDEs building
common interfaces.
31Integration  with tools
- Test data sets and test generation.
- Makefiles, etc.
- Working with macros e.g. QuickCheck uses Erlang
macros - in a particular idiom.
32APIs  programmer / user
- API in Erlang to support user-programmed
refactorings - declarative, straightforward and complete
- but relatively low-level.
- Higher-level combining forms?
- OK for transformations, but need a separate
condition language.
33Verification and validation
- Possible to write formal proofs of correctness
- check conditions and transformations
- different levels of abstraction
- possibly-name binding substitution for renaming
etc. - more abstract formulation for e.g. data type
changes. - Use of Quivq QuickCheck to verify refactorings in
Wrangler.
34Clone detection
35The Wrangler Clone Detector
- Uses syntactic and static semantic information.
- Syntactically well-formed code fragments
- identical after consistent renaming of
variables, - with variations in literals, layout and
comments. - Integrated within the refactoring environment.
36The Wrangler Clone Detector
- Make use of token stream and annotated AST.
- Tokenbased approaches
- Efficient.
- Report non-syntactic clones.
- AST-based approaches.
- Report syntactic clones.
- Checking for consistent renaming is easier.
37The Wrangler Clone Detector
Source Files
Tokenisation
Token Stream
Normalisation
Normalised Token Stream
Suffix Tree Construction
Suffix tree
38The Wrangler Clone Detector
Source Files
Tokenisation
Parsing Static Analysis
Token Stream
Annotated ASTs
Syntactic Clones
Normalisation
Clone Decomposition
Filtered Initial Clones
Normalised Token Stream
Suffix Tree Construction
Clone Filter
Suffix tree
Initial Clones
Clone Collector
39The Wrangler Clone Detector
Source Files
Tokenisation
Parsing Static Analysis
Token Stream
Annotated ASTs
Syntactic Clones
Consistent Renaming Checking
Normalisation
Clone Decomposition
Filtered Initial Clones
Normalised Token Stream
Clones to report
Suffix Tree Construction
Clone Filter
Suffix tree
Initial Clones
Clone Collector
40The Wrangler Clone Detector
Source Files
Tokenisation
Parsing Static Analysis
Token Stream
Annotated ASTs
Syntactic Clones
Consistent Renaming Checking
Normalisation
Clone Decomposition
Filtered Initial Clones
Normalised Token Stream
Clones to report
Suffix Tree Construction
Clone Filter
Formatting
Suffix tree
Initial Clones
Clone Collector
Reported Code Clones
41Clone detection demo
42(No Transcript)
43(No Transcript)
44(No Transcript)
45Support for clone removal
- Refactorings to support clone removal.
- Function extraction.
- Generalise a function definition.
- Fold against a function definition.
46Case studies
- Applied the clone detector to Wrangler itself
with threshold values of 30 and 2. - 36 final clone classes were reported 12 are
across modules, and 3 are duplicated function
definitions. - Without syntactic checking and consistent
variable renaming checking, 191 would have been
reported. - Applied to third party code base (32k loc, 89
modules),109 clone classes reported.
47Data-oriented refactorings
48Tupling parameters
-module(tup1). -export(gcd/1). gcd(X,Y) -gt
if XgtY -gt gcd(X-Y,Y) YgtX -gt
gcd(Y-X,X)? true -gt X end.
- -module(tup1).
- -export(gcd/2).
- gcd(X,Y) -gt
- if XgtY -gt
- gcd(X-Y,Y)
- YgtX -gt
- gcd(Y-X,X)
- true -gt
- X ?
- end.
2
49Introduce records
-module(rec1). -record(rec,f1,
f2). g(recf1A, f2B)-gt A B. h(X, Y)-gt
g(recf1X,f2X), g(rec
f1element(1,Y), f2element(2,Y)).
- -module(rec1).
- g(A, B)-gt
- A B.
- h(X, Y)-gt
- g(X, X),
- g(Y).
f1 f2
50Introduce records in a project
- Need to replace other expressions
- Replace tuples with record
- Record update expression
- Record access expression
- Chase dependencies across functions
- and across modules.
51Refactoring and Concurrency
52Wrangler and processes
- Refactorings which address processes
- Register a process.
- Rename a registered process.
- From function to process.
- Add tags to messages sent / received.
53Challenges to implementation
- Data gathering is a challenge because
- Processes are syntactically implicit.
- Pid to process links are implicit.
- Communication structure is implicit.
- Side effects.
54Underlying analysis
- Analyses include
- Annotation of the AST, using call graph.
- Forward program slicing.
- Backwards program slicing.
55Wrangler and Erlide
56Wrangler and Erlide
- Erlide is an Eclipse plugin for Erlang.
- Distribution simplified.
- Integration with the edit undo history.
- Notion of project.
- Refactoring API in the Eclipse LTK.
- Ongoing support for Erlide from Ericsson.
57(No Transcript)
58(No Transcript)
59Issues on integration
- LTK has a fixed workflow for interactions.
- New file vs set of diffs as representation.
- Fold and generalise interaction pattern.
- Cannot support rename / create file.
- Other refactorings involve search a different
API.
60Conclusions
61Future work
- Concurrency continue work.
- Refactoring within a design library OTP.
- Working with Erlang Training and Consulting.
- Continue integration with Eclipse other IDEs.
- Test and property refactoring in
. - Clone detection fuller integration.
62Ackonwledgements
- Wrangler development funded by EPSRC.
- The developers of syntax-tools, distel and
Erlide. - George Orosz and Melinda Toth.
- Zoltan Horvath and the RefactorErl group at
Eotvos Lorand Univ., Budapest.
63http//projects.cs.kent.ac.uk