Title: LANGUAGE TRANSLATORS: WEEK 14
1LANGUAGE TRANSLATORS WEEK 14
- LECTURE
- REGULAR EXPRESSIONS
- FINITE STATE MACHINES
- LEXICAL ANALYSERS
- TUTORIAL
- CAPTURING LANGUAGES USING REGULAR EXPRESSIONS
2LEXICAL ANALYSIS
- Is the first step in the translation/compilation
process - input language gt output language
- means putting the raw characters of the input
into TOKENS.
3LEXICAL ANALYSIS PHASE
- The language of TOKENS e.g. Identifiers is always
a regular language. - REGULAR EXPRESSIONS generate regular languages
(as do Regular Grammars..) The tokens of
languages are often specified by regular
expressions. - Finite State Machines consume regular languages
-
4REGULAR EXPRESSIONS
- One line method of specifying a language
- equivalent to type 3 or regular grammars
- used to parameterize UNIX/LINUX file processing
commands -
5REGULAR EXPRESSIONS - DEFINITION
- EXAMPLE DEFINITION
- a b means choice
- a b c abc .. is shorthand for
multiple choice - e e means the empty
word - (abc) means repetition 0,1
or more .. - (abcd) means repetition 1 or
more times
6REGULAR EXPRESSIONS - EXAMPLES
- a - z A - Za - z A - Z 0 - 9
- defines the language of IDENTIFIERS in some
- programming languages
- (xyz) defines the language
- e , xyz, xyzxyz, xyzxyzxyz, ..
- abcd defines the language
- a, b, c, d, aa, ab, ac, ad, ba, bb, bc, bd, ca,
.. - Putting choice and repetition together produces
- complicated regular languages
7Finite State Machines
- Can be defined by annotated nodes and arcs.
-
- Can translate Reg. Exps into FSMs but must add
ERROR STATES onto the FSMs
8Regular Expression gt NDFSM
-
- ab
- ab
- a
- then NDFSM gt FSM..
a
b
a
b
a
9Example
- Specify a language of alphabet w,x,y,z with
the only restrictions being that - 1. no strings contain both x and y, and
- 2. If there is a y and w in a string, then the
first w ALWAYS occurs before the first y - SOLUTION
- 1. Write down exs and counter exs
- 2. Decide on any ambiguities
- 3.. Use Case Analysis to sub-divide the problem
- language (a) strings of w,x,z UNION
- (b)strings of w,y,z with restriction 2.
- - Part (a) w x z
- - Part (b) can assume y is always in a string
- y z z w wz y x y z
- -. Put together answer w x z y z
z w wz y x y z
10A LEXICAL ANALYSER - GENERATOR (e.g. LEX, JLEX) -
how they work
- INPUT REGULAR EXPRESSIONS
- TRANSLATE REGULAR EXPRESSION INTO
NON-DETERMINISTIC FSM - TRANSLATE NON-DETERMINISTIC FSM INTO
DETERMINISTIC FSM (which is easily described as a
simple program)
11EXAMPLE INPUT TOA LEXICAL ANALYSER - GENERATOR
-
- "" return new Symbol(sym.SEMI)
- "" return new Symbol(sym.PLUS)
- "" return new Symbol(sym.TIMES)
- "(" return new Symbol(sym.LPAREN)
- ")" return new Symbol(sym.RPAREN)
- 0-9 return new Symbol(sym.NUMBER, new
Integer(yytext())) - \t\r\n\f / ignore white space. /
- . System.err.println("Illegal character
"yytext()) - example if string (2313)3 was input to the
- generated lexical analyser the output would be
- LPAREN (NUMBER,231) PLUS (NUMBER,3) RPAREN
- TIMES (NUMBER,3)
12Simple Lexical Analyser
for () switch (next_char)
case '0' case '1' case '2' case '3'
case '4' case '5' case '6' case
'7' case '8' case '9' / parse a
decimal integer / int i_val 0
do i_val i_val
10 (next_char - '0')
advance() while (next_char gt
'0' next_char lt '9') return new
Symbol(sym.INT, new Integer(i_val))
case 'p' advance() return new
Symbol(sym.PRINT) case 'r'
advance() return new Symbol(sym.REPEAT)
case 'u' advance() return new
Symbol(sym.UNTIL) case ''
advance() return new Symbol(sym.ASSIGNS)
case '' advance() return new
Symbol(sym.SEMI) case ''
advance() return new Symbol(sym.PLUS)
case '-' advance() return new
Symbol(sym.MINUS) case '('
advance() return new Symbol(sym.LPAREN)
case ')' advance() return new
Symbol(sym.RPAREN) case 'x'
advance() return new Symbol(sym.ID,"x")
case 'y' advance() return new
Symbol(sym.ID,"y") case 'z'
advance() return new Symbol(sym.ID,"z")
case -1 return new Symbol(sym.EOF)
default advance() break
- public class scanner
- protected static int next_char
- protected static void advance()
- throws java.io.IOException
- next_char System.in.read()
- public static void init()
- throws java.io.IOException
- advance()
- public static Symbol next_token()
- throws java.io.IOException
-
13nb
- Regular expressions simplify pattern-matching
code - Discover the elegance of regular expressions in
text-processing scenarios that involve pattern
matching - By Jeff Friesen, JavaWorld.com, 02/07/03
- Text processing frequently requires code to match
text against patterns. That capability makes
possible text searches, email header validation,
custom text creation from generic text (e.g.,
"Dear Mr. Smith" instead of "Dear Customer"), and
so on. Java supports pattern matching via its
character and assorted string classes. Because
that low-level support commonly leads to complex
pattern-matching code, Java also offers regular
expressions to help you write simpler code. - Regular expressions often confuse newcomers.
However, this article dispels much of that
confusion. After introducing regular expression
terminology, the java.util.regex package's
classes, and a program that demonstrates regular
expression constructs, I explore many of the
regular expression constructs that the Pattern
class supports. I also examine the methods
comprising Pattern and other java.util.regex
classes. A practical application of regular
expressions concludes my discussion. - See http//www.javaworld.com/javaworld/jw-02-2003/
jw-0207-java101.html
14Summary
- Regular expressions are a quick and easy way to
specify simple forms of language. They can be
easily translated into FSMs (which have nice
properties e.g. they have linear time complexity
in their execution) - There are tools (JLEX) which input regular
expressions and output a lexical analyser which
recognises the language they define.