Finite state subautomata Application to Electronic Dictionaries - PowerPoint PPT Presentation

1 / 67
About This Presentation
Title:

Finite state subautomata Application to Electronic Dictionaries

Description:

An automaton that recognizes the flexion of nine verbs. H(14)=4. H(13)=5. 8 ... An automaton that recognizes the flexion of nine verbs. 20. Closed subautomaton (CSA) ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 68
Provided by: markh219
Category:

less

Transcript and Presenter's Notes

Title: Finite state subautomata Application to Electronic Dictionaries


1
Finite state subautomataApplication to
Electronic Dictionaries
Lamia Tounsi Polytech'Tours, Computer Science
laboratory François Rabelais University of
Tours, France Lamia.tounsi_at_univ-tours.fr
2
Motivation
  • DFSA are widely used in Natural Language
    processing
  • Find all sub structures in a given FSA.
  • Search of subautomata in a DFSA
  • Decompose a very large FSA into smaller ones
  • Discover frequently occurring data
  • Reduce memory consumption

3
Plan
  • Mathematical preliminaries
  • Automaton
  • Subautomaton
  • Research of subautomata
  • Smallest closed subautomaton
  • Smallest subautomaton
  • Application to automata representing dictionaries
  • Indexation and Compression
  • Conclusion

4
Finite state subautomataApplication to
Electronic Dictionnaries
  • Mathematical preliminaries
  • Automaton
  • Subautomaton
  • Research of subautomata
  • Smallest closed subautomaton
  • Smallest subautomaton
  • Application to automata representing dictionaries
  • Indexation and Compression
  • Conclusion

5
Automaton
  • A deterministic acyclic automaton A lt?, Q, ?,
    qi, qf gt
  • ? is the alphabet
  • Q is the finite set of states
  • ? is the transition function ? Q ? ? ? Q
  • qi is the initial state (qi ?Q)
  • qf is the final state (qf ?Q)
  • Let a ? ? and w ? ?
  • ? (p, ?)p
  • ? (p, wa) ? ( ? (p,w),a)

6
Successors predecessors
  • Succ(p) q?Q ????, ?(p,?) q
  • Succ(p) q?Q ?w??, ?(p,w) q
  • Pred(p) q?Q ????, ?(q,?) p
  • Pred(p) q?Q ?w??, ?(q,w) p
  • Height
  • H(qf)0
  • H(p)Maxq ?Succ(p) H(q)1

7
Automaton
An automaton that recognizes the flexion of nine
verbs
8
Source (E) Initial State (p)
  • Let E ??

E
9
Source (E) Initial State (p)
  • Let E ??
  • AP(E) w path from qi to p, p ? E

10
Source (E) Initial State (p)
  • Let E ??
  • AP(E) w path from qi to p, p ? E
  • AN(E)p ?Q/ ?w ?AP(E), p ?w

11
Source (E) Initial State (p)
  • Let E ??
  • AP(E) w path from qi to p, p ? E
  • AN(E)p ?Q/ ?w ?AP(E), p ?w
  • source(E) ? AN (E)
  • Source(E)
  • H(source(E)) Minq?AN (E)(H(q))

12
Source (E) Initial State (p)
  • Let E ??
  • AP(E) w path from qi to p, p ? E
  • AN(E)p ?Q/ ?w ?AP(E), p ?w
  • source(E) ? AN (E)
  • source(E)
  • H(source(E)) Minq?AN (E)(H(q))
  • Let p ?Q, p ?qi
  • IS(p) Source(Pred(p))

13
Source (E) Initial State (p)
  • Source(q2, q3, q5) Source(q3, q4) q2
  • Source(q3, q4, q5) Source(q3, q4, q5 , q6)
    q1
  • IS(q3) q2
  • IS(q5) q1
  • IS(q6) q1

14
Sink (E) Final State (p)
  • Let E ??
  • PP(E) w path from p to qf, p ? E
  • PN(E) p ?Q/ ?w ?PP(E), p ?w
  • Sink(E) ? PN (E)
  • Sink(E)
  • H(Sink(E)) Maxq?PN (E)(H(q))
  • Let p ?Q, p ?qi
  • FS(p) Sink(Succ(p))

15
Subautomaton (SA)
  • Alt?, Q, ?, si, sf gt is a sub automaton of A
    iff
  • Q? Q
  • si, sf ? Q
  • Q ? ? ? Q
  • ?
  • ?(q, ?) ? Q ? ? ? (q, ?) ? (q, ?)
  • ?q ? Q q ? Succ(si) and q ? Pred(sf)
  • ?q ? Q \ si, sf Succ(q) ? Q and Pred(q) ?
    Q

16
Subautomaton (SA)
SA
An automaton that recognizes the flexion of nine
verbs
17
Subautomaton (SA)
SA
An automaton that recognizes the flexion of nine
verbs
18
Subautomaton (SA)
SA
An automaton that recognizes the flexion of nine
verbs
19
Subautomaton (SA)
An automaton that recognizes the flexion of nine
verbs
20
Closed subautomaton (CSA)
  • Let Q ? Q and si, sf two distinct states
  • A subautomaton Alt?, Q, ?, si, sf gt is a
    closed subautomaton iff
  • ?q ? Q \ si Pred(q) ? Q
  • ?q ? Q \ sf Succ(q) ? Q

21
Closed subautomaton (CSA)
?CSA
An automaton that recognizes the flexion of nine
verbs
22
Closed subautomaton (CSA)
CSA
An automaton that recognizes the flexion of nine
verbs
23
Closed subautomaton (CSA)
CSA
An automaton that recognizes the flexion of nine
verbs
24
Smallest Closed subautomaton (SCSA)
  • Let Q ? Q and si, sf two distinct states
  • A closed subautomaton Alt?, Q, ?, si, sf gt
  • is a smallest closed subautomaton iff
  • (si, q) is CSA ? q sf
  • ?q ? Q
  • (q, sf) is CSA ? q si

25
Smallest Closed subautomaton (SCSA)
?SCSA
SCSA
SCSA
SCSA
An automaton that recognizes the flexion of nine
verbs
26
Smallest subautomaton (SSA)
  • Let p ? Q \si, sf
  • The subautomaton Alt?, Q, ?, si, sf gt
  • is SSA(p) iff
  • A strictly contains p
  • ? Alt?, Q, ?, si, sf gt wich strictly
    contains p Q ? Q

27
Smallest subautomaton (SSA)
SSA(6)
SSA(18)
An automaton that recognizes the flexion of nine
verbs
28
Finite state subautomataApplication to
Electronic Dictionaries
  • Mathematical preliminaries
  • Automaton
  • Subautomaton
  • Research of subautomata
  • Smallest closed subautomaton (SCSA)
  • Smallest subautomaton (SSA)
  • Application to automata representing dictionaries
  • Indexation and Compression
  • Conclusion

29
Research SCSA
  • Property 1.
  • (si, sf ) is a SCSA iff IS(sf) si FS(si) sf
  • Property 2. (Associativity)
  • If EE1?E2 and E1 ??, E2 ?? then
  • Source(E) Source(Source(E1),Source(E2))
  • Property 3. (Hierarchy between two SCSA )
  • Either, they have no common transitions,
  • Either, one is strictly included in the other.

30
Research SCSA
  • Let p ? Q
  • P.IS initial state associated to p.
  • P.FSmin minimal final state associated to p,
    assuming that p is the initial state of a SCSA.
  • P.FSmax maximal final state associated to p,
    assuming that p is the initial state of a SCSA.
  • Property 4.
  • ?pgtqi, (p.IS,p) is a SCSA iff p.IS.FSmin ?p?
    p.IS.FSmax
  • Complexity Algorithm O (n2)

31
Research SCSA
32
Research SCSA
33
Research SSA
  • Let Alt?, Q, ?, si, sf gt be a subautomaton
  • Property 5.
  • ?E? Q \ sf Succ(si)?Pred(E)? Q
  • ?E? Q \ si Pred(sf)?Succ(E)? Q

34
SSA associated to grey states
35
SSA associated to grey states
36
SSA associated to grey states
Source
37
SSA associated to grey states
38
SSA associated to grey states
39
Research SSA
  • Property 6.
  • Let p, p, q, q ? Q
  • p, p ?Pred(q) and q, q ?Succ(p)
  • H(p) H(p) and H(q) H(q)
  • p and q belong to the same SSA

40
All Subautomata of an automaton
  • Algorithm input A - output
    subautomata
  • 1 repeat
  • 2 repeat
  • 3 Detect, store and replace each parallels by
    one transition
  • 4 Detect, store and replace each sequences by
    one transition
  • 5 until the automaton is freed from all its
    parallels and sequences
  • 6 Detect, store and replace each smallest
    subautomata by one transition
  • 7 until The automaton A is reduced to one single
    transition

Valdez J., Tarjan R. E., Lawler E. L., The
recognition of series-parallel digraphs, SIAM J.
Comput. 11-2298-313, 1982.
41
All Subautomata of an automaton
42
All Subautomata of an automaton
43
All Subautomata of an automaton
44
All Subautomata of an automaton
45
All Subautomata of an automaton
46
All Subautomata of an automaton
47
Finite state subautomataApplication to
Electronic Dictionaries
  • Mathematical preliminaries
  • Automaton
  • Subautomaton
  • Research of subautomata
  • Smallest closed subautomaton (SCSA)
  • Smallest subautomaton (SSA)
  • Application to automata representing dictionaries
  • Indexation and Compression
  • Conclusion

48
Dictionaries and automata
  • 10 dictionaries Lexicographic order of words
  • 6 Delaf French, English, Serbian, German,
    Polylexicaux English, French cities.
  • 4 Web Frech, Hungarian, Bulgarian and
    Portuguese.
  • Properties of automata
  • Finit set of states, Acyclic, deterministic,
    unique initial state, unique final state, minimal.

49
Internal structure of automata
50
Internal structure of automata
51
Experimental Results
52
Finite state subautomataApplication to
Electronic Dictionnaries
  • Mathematical preliminaries
  • Automaton
  • Subautomaton
  • Research of subautomata
  • Smallest closed subautomaton
  • Smallest subautomaton
  • Application to automata representing dictionaries
  • Factorisation, indexation and compression
  • Conclusion

53
Factorisation, indexation and compression
  • The reseach of subautomata detects sequences and
    parallels
  • Sequence subautomaton
  • Parallel subautomaton
  • Proposal
  • The application of the direct acyclic word graph,
    initially dedicated for indexing text, to index
    the subautomata,
  • heuristic to select the most interesting
    substructure to factorize.

54
Storage of an automaton
55
Factorization
a
b
c
?
56
Factorisation
57
How can we choose the subautomata to factorize ?
  • The best candidates to be factorized are those
    which increase memory storage efficiency and
    reduce the size of the initial automaton
  • Profit saved memory Consumed memory
  • The memory space is saved by elimination of all
    occurrences of the substructure
  • The memory space is consumed by the extention of
    the alphabet and the index.

58
Directed Acyclic word graph (DAWG)
Computations of frequency and profit associated
to each sequence with a DAWG
59
Greedy Algorithm of Compression
  • Algorithm input A - Output A, Alphabet
  • 1 Iterative process
  • 2 Select the best sequence s from the DAWG
  • 3 Extend the alphabet to represent s
  • 4 Delete s from A and from DAWG
  • 5 Update the DAWG

60
Compression FCM
FCM
61
Compression FCNM
62
Compression FCDic
63
Best Compressions
64
Best Compressions
65
Finite state subautomataApplication to
Electronic Dictionaries
  • Mathematical preliminaries
  • Automaton
  • Subautomaton
  • Research of subautomata
  • Smallest closed subautomaton
  • Smallest subautomaton
  • Application to automata representing dictionaries
  • Factorisation, indexation and compression
  • Conclusion

66
Conclusion
  • Research of two kinds of smallest subautomata
  • Statistical analysis of the internal structure of
    some automata associated to dictionnaries
  • Method of compression based on factorizations of
    sequences or parallel subautomata
  • A minimised automaton does not always lead to the
    better compression.

67
Future works
  • Factorization of more kinds of subautomata,
  • Find a way to deminimised an automaton in order
    to get a better compression,
  • Work on alternative encoding of automata, for
    example a depth first codage
Write a Comment
User Comments (0)
About PowerShow.com