Building Finite-State Machines - PowerPoint PPT Presentation

About This Presentation
Title:

Building Finite-State Machines

Description:

Title: Lecture 17: Implementing Finite-State Operations Author: Jason Eisner Last modified by: UM Licensed User Created Date: 3/22/1999 2:46:01 AM – PowerPoint PPT presentation

Number of Views:153
Avg rating:3.0/5.0
Slides: 71
Provided by: Jason621
Learn more at: https://www.cs.jhu.edu
Category:

less

Transcript and Presenter's Notes

Title: Building Finite-State Machines


1
Building Finite-State Machines
2
Finite-State Toolkits
  • In these slides, well use Xeroxs regexp
    notation
  • Their tool is XFST free version is called FOMA
  • Usage
  • Enter a regular expression it builds FSA or FST
  • Now type in input string
  • FSA It tells you whether its accepted
  • FST It tells you all the output strings (if any)
  • Can also invert FST to let you map outputs to
    inputs
  • Could hook it up to other NLP tools that need
    finite-state processing of their input or output
  • There are other tools for weighted FSMs (Thrax,
    OpenFST)

3
Common Regular Expression Operators (in XFST
notation)
  • concatenation EF
  • iteration E, E
  • union E F
  • intersection E F
  • \ - complementation, minus E, \x, F-E
  • .x. crossproduct E .x. F
  • .o. composition E .o. F
  • .u upper (input) language E.u domain
  • .l lower (output) language E.l range

600.465 - Intro to NLP - J. Eisner
3
4
Common Regular Expression Operators (in XFST
notation)
  • concatenation EF
  • EF ef e ? E, f ? F
  • ef denotes the concatenation of 2 strings.
  • EF denotes the concatenation of 2 languages.
  • To pick a string in EF, pick e ? E and f ? F and
    concatenate them.
  • To find out whether w ? EF, look for at least one
    way to split w into two halves, w ef, such
    that e ? E and f ? F.
  • A language is a set of strings.
  • It is a regular language if there exists an FSA
    that accepts all the strings in the language, and
    no other strings.
  • If E and F denote regular languages, than so does
    EF.(We will have to prove this by finding the
    FSA for EF!)

5
Common Regular Expression Operators (in XFST
notation)
  • concatenation EF
  • iteration E, E
  • E e1e2 en n?0, e1? E, en? E
  • To pick a string in E, pick any number of
    strings in E and concatenate them.
  • To find out whether w ? E, look for at least one
    way to split w into 0 or more sections, e1e2
    en, all of which are in E.
  • E e1e2 en ngt0, e1? E, en? E EE

600.465 - Intro to NLP - J. Eisner
5
6
Common Regular Expression Operators (in XFST
notation)
  • concatenation EF
  • iteration E, E
  • union E F
  • E F w w? E or w? F E ? F
  • To pick a string in E F, pick a string from
    either E or F.
  • To find out whether w ? E F, check whether w ?
    E or w ? F.

600.465 - Intro to NLP - J. Eisner
6
7
Common Regular Expression Operators (in XFST
notation)
  • concatenation EF
  • iteration E, E
  • union E F
  • intersection E F
  • E F w w? E and w? F E?? F
  • To pick a string in E F, pick a string from E
    that is also in F.
  • To find out whether w ? E F, check whether w ?
    E and w ? F.

600.465 - Intro to NLP - J. Eisner
7
8
Common Regular Expression Operators (in XFST
notation)
  • concatenation EF
  • iteration E, E
  • union E F
  • intersection E F
  • \ - complementation, minus E, \x, F-E
  • E e e?? E ? - E
  • E F e e?? E and e?? F E F
  • \E ? - E (any single character not in E)

? is set of all letters so ? is set of all
strings
600.465 - Intro to NLP - J. Eisner
8
9
Regular Expressions
  • A language is a set of strings.
  • It is a regular language if there exists an FSA
    that accepts all the strings in the language, and
    no other strings.
  • If E and F denote regular languages, than so do
    EF, etc.
  • Regular expression EF(F G)
  • Syntax

Semantics Denotes a regular language. As
usual, can build semantics compositionally
bottom-up. E, F, G must be regular languages.
As a base case, e denotes e (a language
containing a single string), so ef(fg) is
regular.
600.465 - Intro to NLP - J. Eisner
9
10
Regular Expressionsfor Regular Relations
  • A language is a set of strings.
  • It is a regular language if there exists an FSA
    that accepts all the strings in the language, and
    no other strings.
  • If E and F denote regular languages, than so do
    EF, etc.
  • A relation is a set of pairs here, pairs of
    strings.
  • It is a regular relation if here exists an FST
    that accepts all the pairs in the language, and
    no other pairs.
  • If E and F denote regular relations, then so do
    EF, etc.
  • EF (ef,ef) (e,e) ? E, (f,f) ? F
  • Can you guess the definitions for E, E, E F,
    E F when E and F are regular relations?
  • Surprise E F isnt necessarily regular in the
    case of relations so not supported.

600.465 - Intro to NLP - J. Eisner
10
11
Common Regular Expression Operators (in XFST
notation)
  • concatenation EF
  • iteration E, E
  • union E F
  • intersection E F
  • \ - complementation, minus E, \x, F-E
  • .x. crossproduct E .x. F
  • E .x. F (e,f) e ? E, f ? F
  • Combines two regular languages into a regular
    relation.

600.465 - Intro to NLP - J. Eisner
11
12
Common Regular Expression Operators (in XFST
notation)
  • concatenation EF
  • iteration E, E
  • union E F
  • intersection E F
  • \ - complementation, minus E, \x, F-E
  • .x. crossproduct E .x. F
  • .o. composition E .o. F
  • E .o. F (e,f) ?m. (e,m) ? E, (m,f) ? F
  • Composes two regular relations into a regular
    relation.
  • As weve seen, this generalizes ordinary function
    composition.

600.465 - Intro to NLP - J. Eisner
12
13
Common Regular Expression Operators (in XFST
notation)
  • concatenation EF
  • iteration E, E
  • union E F
  • intersection E F
  • \ - complementation, minus E, \x, F-E
  • .x. crossproduct E .x. F
  • .o. composition E .o. F
  • .u upper (input) language E.u domain
  • E.u e ?m. (e,m) ? E

600.465 - Intro to NLP - J. Eisner
13
14
Common Regular Expression Operators (in XFST
notation)
  • concatenation EF
  • iteration E, E
  • union E F
  • intersection E F
  • \ - complementation, minus E, \x, F-E
  • .x. crossproduct E .x. F
  • .o. composition E .o. F
  • .u upper (input) language E.u domain
  • .l lower (output) language E.l range

600.465 - Intro to NLP - J. Eisner
14
15
Function from strings to ...
false, true
strings
numbers
(string, num) pairs
16
How to implement?
slide courtesy of L. Karttunen (modified)
  • concatenation EF
  • iteration E, E
  • union E F
  • \ - complementation, minus E, \x, E-F
  • intersection E F
  • .x. crossproduct E .x. F
  • .o. composition E .o. F
  • .u upper (input) language E.u domain
  • .l lower (output) language E.l range

600.465 - Intro to NLP - J. Eisner
16
17
Concatenation
example courtesy of M. Mohri
r
r
18
Union
example courtesy of M. Mohri
r

19
Closure (this example has outputs too)
example courtesy of M. Mohri

The loop creates (red machine) . Then we add a
state to get do ? (red machine) . Why do it
this way? Why not just make state 0 final?
20
Upper language (domain)
example courtesy of M. Mohri
.u
similarly construct lower language .l also called
input output languages
21
Reversal
example courtesy of M. Mohri
22
Inversion
example courtesy of M. Mohri
23
Complementation
  • Given a machine M, represent all strings not
    accepted by M
  • Just change final states to non-final and
    vice-versa
  • Works only if machine has been determinized and
    completed first (why?)

24
Intersection
example adapted from M. Mohri
fat/0.5
pig/0.3
eats/0
1
0
sleeps/0.6

25
Intersection

Paths 0012 and 0110 both accept fat pig eats So
must the new machine along path 0,0 0,1 1,1
2,0
26
Intersection
fat/0.5
pig/0.3
eats/0
1
0
sleeps/0.6
pig/0.4
fat/0.2
sleeps/1.3

1
0
eats/0.6

0,0
Paths 00 and 01 both accept fat So must the
new machine along path 0,0 0,1
27
Intersection
fat/0.5
pig/0.3
eats/0
1
0
sleeps/0.6
pig/0.4
fat/0.2
sleeps/1.3

1
0
eats/0.6
fat/0.7

0,0
0,1
Paths 00 and 11 both accept pig So must the
new machine along path 0,1 1,1
28
Intersection
fat/0.5
pig/0.3
eats/0
1
0
sleeps/0.6
pig/0.4
sleeps/1.3
fat/0.2

1
0
eats/0.6
fat/0.7
pig/0.7

0,0
0,1
1,1
Paths 12 and 12 both accept fat So must the
new machine along path 1,1 2,2
29
Intersection
fat/0.5
eats/0
pig/0.3
1
0
sleeps/0.6
pig/0.4
sleeps/1.3
fat/0.2

1
0
eats/0.6
fat/0.7
pig/0.7

0,0
0,1
1,1
sleeps/1.9
30
What Composition Means
f
ab?d
abcd
31
What Composition Means
ab?d
abgd
abed
Relation composition f ? g
ab?d
...
32
Relation set of pairs
ab?d ? abcd ab?d ? abed ab?d ? abjd
abcd ? abgd abed ? abed abed ? ab?d
f
ab?d
abcd
33
Relation set of pairs
ab?d ? abcd ab?d ? abed ab?d ? abjd
abcd ? abgd abed ? abed abed ? ab?d
ab?d ? abgd ab?d ? abed ab?d ? ab?d
4
ab?d
abgd
2
abed
8
ab?d
...
34
Intersection vs. Composition
Intersection
pig/0.4
pig/0.3
pig/0.7


0,1
1
1
0
1,1
Composition
pigpink/0.4
Wilburpink/0.7
Wilburpig/0.3

.o.
0,1
1
0
1
1,1
35
Intersection vs. Composition
Intersection mismatch
elephant/0.4
pig/0.3
pig/0.7


0,1
1
1
0
1,1
Composition mismatch
elephantgray/0.4
Wilburgray/0.7
Wilburpig/0.3

.o.
0,1
1
0
1
1,1
36
Composition
example courtesy of M. Mohri
.o.

37
Composition
.o.

ab .o. bb ab
38
Composition
.o.

ab .o. ba aa
39
Composition
.o.

ab .o. ba aa
40
Composition
.o.

bb .o. ba ba
41
Composition
.o.

ab .o. ba aa
42
Composition
.o.

aa .o. ab ab
43
Composition
.o.

bb .o. ab nothing (since intermediate symbol
doesnt match)
44
Composition
.o.

bb .o. ba ba
45
Composition
.o.

ab .o. ab ab
46
Composition in Dyna
start pair( start1, start2 ). final(pair(Q1,Q
2)) - final1(Q1), final2(Q2). edge(U, L,
pair(Q1,Q2), pair(R1,R2)) min edge1(U,
Mid, Q1, R1) edge2(Mid, L, Q2, R2).
47
Relation set of pairs
ab?d ? abcd ab?d ? abed ab?d ? abjd
abcd ? abgd abed ? abed abed ? ab?d
ab?d ? abgd ab?d ? abed ab?d ? ab?d
4
ab?d
abgd
2
abed
8
ab?d
...
48
3 Uses of Set Composition
  • Feed string into Greek transducer
  • abed?abed .o. Greek abed?abed, abed?ab?d
  • abed .o. Greek abed?abed, abed?ab?d
  • abed .o. Greek.l abed, ab?d
  • Feed several strings in parallel
  • abcd, abed .o. Greek abcd?abgd,
    abed?abed, abed?ab?d
  • abcd,abed .o. Greek.l abgd, abed, ab?d
  • Filter result via No? abgd, ab?d,
  • abcd,abed .o. Greek .o. No? abcd?abgd,
    abed?ab?d

49
What are the basic transducers?
  • The operations on the previous slides combine
    transducers into bigger ones
  • But where do we start?
  • ae for a ? S
  • ex for x ? D
  • Q Do we also need ax? How about ee ?

50
Some Xerox Extensions
slide courtesy of L. Karttunen (modified)
  • containment
  • gt restriction
  • -gt _at_-gt replacement
  • Make it easier to describe complex languages and
    relations without extending the formal power of
    finite-state systems.

600.465 - Intro to NLP - J. Eisner
50
51
Containment
Warning ? in regexps means any character at
all. But ? in machines means any character not
explicitly mentioned anywhere in the machine.
600.465 - Intro to NLP - J. Eisner
51
52
Restriction
slide courtesy of L. Karttunen (modified)
600.465 - Intro to NLP - J. Eisner
52
53
Replacement
slide courtesy of L. Karttunen (modified)
600.465 - Intro to NLP - J. Eisner
53
54
Replacement is Nondeterministic
600.465 - Intro to NLP - J. Eisner
54
55
Replacement is Nondeterministic
600.465 - Intro to NLP - J. Eisner
55
56
Replacement is Nondeterministic
slide courtesy of L. Karttunen (modified)
a b b b a a b a -gt x applied to aba Four
overlapping substrings match we havent told it
which one to replace so it chooses
nondeterministically a b a a b a a b a
a b a a x a a x x a x
600.465 - Intro to NLP - J. Eisner
56
57
More Replace Operators
slide courtesy of L. Karttunen
  • Optional replacement a b (-gt) b a
  • Directed replacement
  • guarantees a unique result by constraining the
    factorization of the input string by
  • Direction of the match (rightward or leftward)
  • Length (longest or shortest)

600.465 - Intro to NLP - J. Eisner
57
58
_at_-gt Left-to-right, Longest-match Replacement
slide courtesy of L. Karttunen
a b b b a a b a _at_-gt x applied to aba a
b a a b a a b a a b a a x a
a x x a x
_at_-gt left-to-right, longest match _at_gt
left-to-right, shortest match -gt_at_ right-to-left,
longest match gt_at_ right-to-left, shortest match
600.465 - Intro to NLP - J. Eisner
58
59
Using for marking
slide courtesy of L. Karttunen (modified)
p o t a t o potato
Note actually have to write as -gt ...
or -gt ... since are parens
in the regexp language
600.465 - Intro to NLP - J. Eisner
59
60
Using for marking
slide courtesy of L. Karttunen (modified)
p o t a t o potato
Which way does the FST transduce potatoe?
p o t a t o e potatoe
p o t a t o e potato e
vs.
How would you change it to get the other answer?
600.465 - Intro to NLP - J. Eisner
60
61
Example Finnish Syllabification
slide courtesy of L. Karttunen
define C b c d f ... define V a e i
o u
C V C _at_-gt ... "-" _ C V Insert a
hyphen after the longest instance of the C V
C pattern in front of a C V pattern.
s t r u k t u r a l i s m i s t r u k - t
u - r a - l i s - m i
600.465 - Intro to NLP - J. Eisner
61
62
Conditional Replacement
slide courtesy of L. Karttunen
600.465 - Intro to NLP - J. Eisner
62
63
Hand-Coded Example Parsing Dates
slide courtesy of L. Karttunen
Today is Tuesday, July 25, 2000.
Need left-to-right, longest-match constraints.
600.465 - Intro to NLP - J. Eisner
63
64
Source code Language of Dates
slide courtesy of L. Karttunen
  • Day Monday Tuesday ... Sunday
  • Month January February ... December
  • Date 1 2 3 ... 3 1
  • Year 0To9 (0To9 (0To9 (0To9))) - 0?
    from 1 to 9999
  • AllDates Day (Day , ) Month Date (,
    Year))

600.465 - Intro to NLP - J. Eisner
64
65
Object code All Dates from 1/1/1 to 12/31/9999
slide courtesy of L. Karttunen
,
,
Jan
Feb
Mar
Apr
May
Jun
3
Jul
Aug
Sep
Oct
Nov
,
Dec
13 states, 96 arcs 29 760 007 date expressions
,
May
Jan
Feb
Mar
Apr
Jun
Jul
Aug
Oct
Nov
Dec
Sep
600.465 - Intro to NLP - J. Eisner
65
66
Parser for Dates
slide courtesy of L. Karttunen (modified)
Compiles into an unambiguous transducer (23
states, 332 arcs).
AllDates _at_-gt DT ...
Today is DT Tuesday, July 25, 2000 because
yesterday was DT Monday and it was DT July 24
so tomorrow must be DT Wednesday, July 26 and
not DT July 27 as it says on the program.
600.465 - Intro to NLP - J. Eisner
66
67
Problem of Reference
slide courtesy of L. Karttunen
Valid dates Tuesday, July 25, 2000 Tuesday,
February 29, 2000 Monday, September 16,
1996 Invalid dates Wednesday, April 31,
1996 Thursday, February 29, 1900 Tuesday, July
26, 2000
600.465 - Intro to NLP - J. Eisner
67
68
Refinement by Intersection
slide courtesy of L. Karttunen (modified)
Valid Dates
Q Why does this rule end with a comma? Q Can we
write the whole rule?
Q Why do these rules start with spaces?(And is
it enough?)
600.465 - Intro to NLP - J. Eisner
68
69
Defining Valid Dates
slide courtesy of L. Karttunen
AllDates 13 states, 96 arcs 29 760 007 date
expressions
AllDates MaxDaysInMonth LeapYears WeekdayDat
es
ValidDates
ValidDates 805 states, 6472 arcs 7 307 053 date
expressions
600.465 - Intro to NLP - J. Eisner
69
70
Parser for Valid and Invalid Dates
slide courtesy of L. Karttunen
AllDates - ValidDates _at_-gt ID ...
, ValidDates _at_-gt VD ...
2688 states, 20439 arcs
600.465 - Intro to NLP - J. Eisner
70
Write a Comment
User Comments (0)
About PowerShow.com