Title: YACC no more
1YACC no more
Integrating parsers, interpreters and compilers
into your application
2This is he
- Sriram Srinivasan
- One of the core engineers of the WebLogic app
server - Wrote the first commercially available EJB
implementation - Wrote the TP engine in the WLS
- Author Advanced Perl Programming (Oreilly)
Beginning
3Why this talk?
- Quest for higher level programming patterns
- More productive / faster / maintainable etc
- Integrating compilers, parsers, interpreters into
your application
Beginning
4Embeddable Parsers
Case Study Configuration Data
- JDK parsers for configuration data
- java.util.Properties, XML, regex library
- java.util.Properties
- Limited to property value format
- Takes care of comments, multi-line values, quotes
app server properties connectionPoolName
testPool numThreads 10 p new
Properties().load(inputStream)
Middle
5XML parsers
- Good for structured, hierarchical data
- DOM (Document Object Model) parser
- Converts an entire XML document into a
corresponding tree of Nodes. - SAX (Simple API for XML)
- Callback class extends DefaultHandler
- Supplies methods for startDocument(),
startElement(), endElement() etc.
Middle
6Adding code to data
- Problem We want to add add macros and
expressions to our properties.
numThreads numProcessors Ensure that
connection pool is smaller than thread pool.
connectionPoolSize min(numThreads 2, 1)
- This requires an expression evaluator
Middle
7Embeddable interpreters
- Plethora of free, high quality interpreters
available - BeanShell (Java-like syntax)
- Rhino (JavaScript)
- Jython (Python in Java)
- Kawa (Scheme in Java)
- When embedded, flow of control easily goes from
java to interpreter to back. - Command-line shell always included
Middle
8BeanShell
- Expressions identical to java
- Types are inferred dynamically
add( a, b ) return a b sum add(1,
2) // 3 str add("Web", "Logic") //
"WebLogic"
Middle
9Embedding BeanShell
import bsh.Interpreter Interpreter i new
Interpreter() i.set("foo", 5) i.eval("bar
foo10") System.out.println("bar "
i.get("bar"))
- Instead of writing code to parse the properties
file, just eval it! - Comments should be // , not
- Each property definition line should end in
i.eval(new FileReader("config.properties")) Integ
er n i.get("connectionPoolSize")
Middle
10BeanShell features
- Strict java expression syntax
- no class declarations
- Loose convenience syntax
b new java.awt.Button() b.label "Yo" //
eqvt. to b.setLabel("Yo") h new
Hashtable() h"spud" "potato" // Swing
stuff b new JButton("My Button") f new
JFrame("My Frame") f.getContentPane().add(b,
"Center") f.pack() f.show()
Middle
11Rhino
- Free ECMAScript interpreter from Mozilla
- Slightly more cumbersome to embed than BeanShell
- Contains bytecode compiler that can be called
from within java - Closures
- Regex support built-in. Good for text
manipulation
Middle
12Case study Command pattern
function insertCommand(text) this.pos
buf.pos buf.insert(text) this.len
text.length this.undo function ()
buf.moveTo(this.pos) buf.erase(this.len)
undoStack.push(this) new
insertCommand("foo") undoStack.pop().undo()
Middle
13Python
- Python (Java implementation is "Jython")
- powerful high-level language
- Compiles to bytecode.
- True scripting language
- Can extend java classes
- Static compilation and standalone execution
Middle
14More case studies
- Embedded expressions
- Spreadsheet formulae
- Customizable GUIs
- Macro facility, keyboard mapping
- Remote agents
- Monitoring
- Performance through partial evaluation
Middle
15Case Study Remote Agents
- Example Test Agents
- Can upload script to each agent to launch
processes, control them locally. - Jython is well-suited for this kind of task
- Example Scriptable IMAP mail server
- "All messages that contain this regex, make a
copy in this folder"
Middle
16Case Study Monitoring
- SNMP model Obtain attributes from each node over
the network, do calculation - Alternatively, upload script to each node, and
let it return the result - Conserves network bandwidth
- Can insert any kind of probe
- Study application data structures
- Application-specific profiling
Middle
17Case Study Performance
- Partial evaluation can yield substantial
performance benefits - Object - RDBMS adaptors
- Code generator studies class and db schema
- Omits unnecessary conversions, null checks
- Vector dot product
dp a0b0 a1b1 a2b2 // But
if 'a' is fixed 16,0,4 dp b0 ltlt 4 b2
ltlt 2
Middle
18Generating java
- Moving from embedded interpreters to generating
java source - Example JSP.
- Convert template to java, compile and dynamically
load - BEA/WebLogic's weblogic.dtdc
- Converts XML DTD to a high performance SAX parser
tuned to that DTD
Middle
19Generating code with Doclets
- javadoc is a general purpose parser
- javadoc doclet ListClass foo.java
- ListClass.start() called with a hierarchy of Doc
nodes
import com.sun.javadoc. public class ListClass
public static boolean start(RootDoc root)
ClassDoc classes root.classes() for
(int i 0 i lt classes.length i)
System.out.println(classesi)
return true
- Arbitrary tags can be introduced at any level
Middle
20Case study iContract
- Pattern doclet expressions converted to
annotated java code
/ Ensure that argument is always gt 0 _at_pre
f gt 0.0 Ensure that the function produces
the sqrt within a _at_post Math.abs((return
return) - f) lt 0.001 / public float sqrt(float
f) ...
Middle
21Case Study EJBGen
/ _at_ejbgenentity ejb-name
AccountEJB-OneToMany data-source-name
demoPool table-name Accounts / abstract
public class AccountBean implements EntityBean
/ _at_ejbgencmp-field column acct_id
_at_ejbgenprimkey-field _at_ejbgenremote-metho
d transaction-attribute
Required / abstract public String
getAccountId()
Middle
22Generating bytecode
- Example WebLogic RMI adaptors
- Sometimes, some facilities are available only in
bytecode (goto's!) - Example fast string matching
- Given a search string, encode the state machine
into bytecode - Worth it if the same pattern is going to be used
many times - Virus scanners
- Searching genome sequences
Middle
23Example String matching
- Problem match "10100"
- Convert to a state machine
- Each state encodes a succesful prefix match
Middle
24String matching (contd.)
- If only goto were allowed in java
- But, goto's are allowed in bytecode!
try //buf is the buffer to be searched int i
-1 s0 i if (bufi ! '1') goto s0 s1
i if (bufi ! '0') goto s1 s2 i if
(bufi ! '1') goto s0 s3 i if (bufi !
'0') goto s1 s4 i if (bufi ! '0') goto
s3 s5 i return i-5 catch
(ArrayIndexOutOfBoundsException e) return
-1
Middle
25String matching (contd.)
- Using an assembler like jasmin
iconst_m1 istore_1 S0 i if ai
! '1' goto S0 iinc 1 1 i
aload_0 load ai iload_1 caload
bipush 49 load '1' if_icmpne S0 if ..
goto S0 S1 i if ai ! '0' goto S1
iinc 1 1 aload_0 iload_1 caload
bipush 48 if_icmpne S1
Middle
26Custom languages
- Craft a language that fits the context you are
working in - Avoid XML ugliness SRML (Simple Rule Markup)
- Instead of "if s.purchaseAmount gt 100 "
ltsimpleCondition className"ShoppingCart"
objectVariable"s"gt ltbinaryExp
operator"gt"gt ltfield name"purchaseAmount"/gt
ltconstant type"float" value"100"/gt
lt/binaryExpgt lt/simpleConditiongt
Middle
27Antlr Introduction
- Antlr A recursive descent parser with
configurable lookahead (LL(k) parser) - Much, much simpler than lex/yacc
- Yacc error messages are cryptic, tough for non-CS
types to understand - Even generated code easy to understand
- Includes tree building and recognition
- No such facility in yacc
- Lexer, parser and tree recognizer phase have
similar syntax
Middle
28Antlr
- Example hierarchical property list
- A list consists of name value pairs
- Names are identifiers, values are numbers or
lists
( a 200 b (c 10 d 20) )
Middle
29Antlr (contd.)
class LispLexer extends Lexer ID ('a' ..
'z') NUM ('0' .. '9') LP '(' RP ')'
class LispParser extends Parser list
LP (nameValuePair) RP nameValuePair ID
value value NUM list
Middle
30Antlr (contd.)
- Adding code, arguments, return values
nameValuePair returns NVP retnull Object v
tID vvalue ret new
NVP(t.getText(),v) value returns Object
retnull tNUM rett.getText()
retlist
Middle
31Way out there
- Configurable hardware
- New circuits on the fly
- Intentional programming
- Code not represented as a stream of characters
Middle
32Summary
- Run-time evaluation gives you a lot of power
- Other languages add features (e.g. closures) to
java - Lots of simple, free, quality parsers,
interpreters - Produce custom java source or byte code for
performance - Roll your own domain-specific language with ANTLR
or javacc. - Yacc No More.
End
33References
- Doclets
- Doclet tools www.doclet.com
- EJBGen www.beust.com, Cedric Beust
- Icontract www.reliable-systems.com, Reto Kramer
- Languages, interpreters
- Beanshell www.beanshell.org
- Rhino www.mozilla.org/rhino
- Python www.python.org, www.jython.org
- ANTLR www.antlr.org
- More flp.cs.tu-berlin.de/tolk/vmlanguages.html
- SRML xml.coverpages.org/srml.html
End
34References (contd.)
- Bytecode manipulation
- Jasmin mrl.nyu.edu/meyer/jasmin/
- Jikes Bytecode toolkit www.alphaworks.ibm.com/tec
h/jikesbt - BCEL bcel.sourceforge.net
- "Rapid" - Reconfigurable hardware
- www.cs.washington.edu/research
- "The death of computer languages, the birth of
intentional programming", Charles Simonyi - research.microsoft.com/scripts/pubs/trpub.asp
- Microsoft tech report MSR-TR-95-52
- Thinking in Patterns with Java, Bruce Eckel
- www.mindview.net/Books/TIPatterns
End