Two%20issues%20in%20lexical%20analysis

About This Presentation

Title:

Description:

Number of Views:50

Avg rating:3.0/5.0

Slides: 12

Provided by: xyu

Learn more at: http://www.cs.fsu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Two%20issues%20in%20lexical%20analysis

1

How to recognize tokens specified by regular
expressions?
A recognizer for a language is a program that
takes a string x as input and answers yes if x
is a sentence of the language and no otherwise.
In the context of lexical analysis, given a
string and a regular expression, a recognizer of
the language specified by the regular expression
answer yes if the string is in the language.
A regular expression can be compiled into a
recognizer (automatically) by constructing a
finite automata which can be deterministic or
non-deterministic.

Non-deterministic finite automata (NFA)
A non-deterministic finite automata (NFA) is a
mathematical model that consists of (a 5-tuple
a set of states Q
a set of input symbols
a transition function that maps state-symbol
pairs to sets of states.
A state q0 that is distinguished as the start
(initial) state
A set of states F distinguished as accepting
(final) states.
An NFA accepts an input string x if and only if
there is some path in the transition graph from
the start state to some accepting state.
Show an NFA example (page 116, Figure 3.21).

An NFA is non-deterministic in that (1) same
character can label two or more transitions out
of one state (2) empty string can label
transitions.
For example, here is an NFA that recognizes the
language ???.
An NFA can easily implemented using a transition
table.
State
a b
0 0, 1 0
1 - 2
2 - 3

a
2
3
1
0
a
b
b
b
5

The algorithm that recognizes the language
accepted by NFA.
Input an NFA (transition table) and a string x
(terminated by eof).
output yes if accepted, no otherwise.
S e-closure(s0)
a nextchar
while a ! eof do begin
S e-closure(move(S, a))
a next char
end
if (intersect (S, F) ! empty) then return yes
else return no
Note e-closure(S) are the state that can be
reached from states in S through transitions
labeled by the empty string.

a
N(s)
N(t)
N(s)
N(t)
N(s)
8

Using NFA, we can recognize a token in
O(S2X) time, we can improve the time
complexity by using deterministic finite
automaton instead of NFA.
An NFA is deterministic (a DFA) if
no transitions on empty-string
for each state S and an input symbol a, there is
at most one edge labeled a leaving S.
What is the time complexity to recognize a token
when a DFA is used?

Algorithm to convert an NFA to a DFA that accepts
the same language (algorithm 3.2, page 118)
initially e-closure(s0) is the only state in
Dstates and it is unmarked
while there is an unmarked state T in Dstates do
begin
mark T
for each input symbol a do begin
U e-closure(move(T, a))
if (U is not in Dstates) then
add U as an unmarked state to
Dstates
DtranT, a U
end
end
Initial state e-closure(s0), Final state ?