Title: YOLT
1YOLT
- Yuan Zheng
- Omar Ahmed
- Lukas Dudkowski
- T. Mark Kuba
2Overview of YOLT
- Simple scripting language
- Easy for coding and maintenance.
- Regular expression support
- and _at_
- Web-scraping uses
- Natural Language Processing
- Generating RSS Feeds
- Reformatting HTML for other uses (XML,etc)
3A Useful YOLT Program
4Semantics
- YOLT Semantic checker is extremely simple. It
serves a few main tasks - Make sure that functions are declared properly,
i.e. function declarations match functions, and
function calls match the declarations - Make sure that variables are initialized before
they are used (or, in some cases, un-initialized) - (redundant) Make sure that the tree is properly
formed (i.e. make sure that an if-then-else node
has exactly three children, etc) - note there was once basic type-checking, but
no longer.
5Semantics Lessons Learned
- It is very easy to do too much in semantic
checking - Either there are types, or no types (NO MIDDLE
GROUND) - Scripting languages are an enormous relief to a
semantic checker--they take away the biggest
hassles - The tree walker should know EXACTLY what the
structure of the AST will look like and cannot
make ANY assumptions--things, as evident, can
break down when you least expect them to.
6Code Generation
- Written in Java
- Input correct AST
- Output Perl program
AST
Perl Program
Code generator
Java
7Implementation
- Walk AST
- According to the information of the node,
generate code or go down to the child node - e.g.
- a http//www.columbia.edu
- Go down to the tree at node
- Generate code at node a and
http//www.columbia.edu
8Implementation (tricks)
- The httpget
- invoke UNIX system call wget to download the
web page into a temp file - Read the file line by line and store them into an
perl array - Invoke another UNIX system call rm to remove
the temp file - Keep the web address in an perl scalar
- Scalar and arrays use same syntax
- Compiler (code generator) guesses whether the
variable is a scalar or an array - Arrays can only appears in certain places (e.g..
Foreach)
9Documentation and Testing
Lexer/Parser - Semantic Checker
Log result Good should be good. Bad should be
bad.
Lexer/Parser Semantic Checker
Diff
Reference File What I think it should produce
10Integration Testing
Trying little YOLT programs to see functionality,
code generation, etc. Working out bugs in
implementation design.
Example
Generated Perl
- Goal display any comics that have the word
hamster in the URL of www.toothpastefordinner.com,
Summer 2002 archive.
toothpaste_home "http//www.toothpastefordinner.
com/" system('wget -q -O - http//www.toothpastef
ordinner.com/archives-sum02.php
toothpaste.txt') open INFILE, "toothpaste.txt" _at_
toothpaste close INFILE system ('rm
toothpaste.txt') toothpaste
"http//www.toothpastefordinner.com/archives-sum02
.php" tags ".hamster."
_at_tmp1() foreach ( _at_toothpaste) if
(_m/(tags)/i) push _at_tmp1, 2 _at_elements
_at_tmp1 foreach x ( _at_elements ) print "src\"".toothpaste_home.x."\""."
" print
"\n"
Yolt Program
begin toothpaste_home"http//www.toothpasteford
inner.com/" toothpaste"http//www.toothpastefo
rdinner.com/archives-sum02.php" tags"href\"(.)\".hamster." elements
tags _at_ toothpaste foreach x in elements
echo "br" echo "\n" end
Resultant HTML
2/hamster-table-tennis.gif"
src"http//www.toothpastefordinner.com/072502/eve
n-hamsters.gif"
tefordinner.com/060602/hamsters-are-the-best.gif"
11The Result
The source site
The end result
12Lessons Learned
- Develop and test incrementally
- There are ALWAYS bugs, you just havent found
them yet - CLIC is not designed to be lived in
13One More Example