Title: Discussion Class 4
1Discussion Class 4
- Lexical Analysis and Stop Lists
2Discussion Classes
Format Question Ask a member of the class to
answer Provide opportunity for others to
comment When answering Give your name. Make
sure that the TA hears it. Stand up Speak
clearly so that all the class can hear
3Question 1 Tokens
(a) What is a token? (b) Give some examples
where decisions about hyphenation may change
retrieval effectiveness. (c) Give some examples
where decisions about the inclusion of
punctuation may change retrieval effectiveness.
4Question 2 Transition diagram
letter, digit
In this diagram, what are (a) the states (b) the
transitions (c) the starting state(s) (d) the
final state(s) Can you suggest a simplification
of this diagram?
1
2
space
(
letter
3
)
4
0
5
6
eos
7
other
8
5Question 3 Data conversion
In the implementation of the lexical analyzer,
(a) What is the purpose of the array
char_class128? (b) What is the purpose of the
array convert_class 128?
6Question 4 Changing the lexical analyzer
Suppose that you wish to change the lexical
analyzer to accept tokens that begin with a
digital. What would you change?
7Question 5 Stop lists
(a) What is a stop list? (b) What are the
benefits of a long stop list? (c) What are the
benefits of a short stop list? (d) How would you
choose?
8Question 6 A generated finite state machine
? n nd
a an and in into to
? d
n
q1
q4
L0
d
a
? to
n nto
?
i
n
q5
q6
q2
q0
t
o
t
o
Explain this diagram
q3