Efficient Inference of a subclass of Even Linear Languages - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Efficient Inference of a subclass of Even Linear Languages

Description:

x L(B), y L(C) Ter(x) Ter(y) Proposed Method to Infer TDELG. Union-Find approach ... Ter(FS) FS. Tail. Head. Example ... LIST = { (N11, N22), (N11, N33), (N22, ... – PowerPoint PPT presentation

Number of Views:89
Avg rating:3.0/5.0
Slides: 22
Provided by: abc777
Category:

less

Transcript and Presenter's Notes

Title: Efficient Inference of a subclass of Even Linear Languages


1
Efficient Inference of a subclass of Even Linear
Languages
  • J. A. Laxminarayana
  • G. Nagaraja
  • Computer Science and Engg. Deptt.
  • Indian Institute of Technology, Bombay
  • Powai, Mumbai, INDIA

2
Motivation
  • Growing influence of grammatical inference.
  • The class of terminal distinguishable languages.
  • Well established theoretical framework.
  • Scope to improve existing techniques.
  • Scope to design efficient algorithms.
  • Identification of suitable applications.

3
Grammatical Inference

Sample strings
Parser
Grammar rules
Inference algorithm
Parser Generator
Addtl. Info (if any)
Parser
Test strings
Accept/ Reject
Back
4
Formal Languages
  • A formal grammar G has four components.
  • A set of symbols ?, called terminals.
  • A set of symbols V, called non-terminals with the
    restriction that ? and V are disjoint.
  • A special non-terminal symbol S , called a start
    symbol.
  • A set of production rules P , where each
    production of the form ? ? ?.

5
Chomsky Hierarchy
  • Noam Chomsky defined classes of grammars
  • Type 0 Recursively Enumerable Languages
    (Unrestricted Grammars)
  • Type 1 Context Sensitive Languages (Context
    Sensitive Grammars)
  • Type 2 Context Free Languages (Context Free
    Grammars)
  • Type 3 Regular Languages (Regular Grammars)

6
Inductive Inference
  • Proposed by Angluin, 1983.
  • Deductive and inductive inference.
  • Identification by enumeration and identification
    in limit.
  • Specifying inference problems.
  • Class of rules , hypothesis space ,
  • Set of examples, inference methods.
  • Criteria for evaluating and comparing inference
    methods.

7
Identification in the limit
G1 G2 Gn
Sample 1 Sample 2 Sample n
G0
Teacher
Learner
8
Golds Results
  • The class of phrase structured languages is
    learnable from positive and negative samples.
  • Not even the class of regular languages is
    learnable from positive samples alone.
  • Any language class which contains all finite
    languages and at least one infinite language
    (super finite language class) is NOT identifiable
    in the limit from positive samples.
  • The finite cardinality languages class is
    identifiable from positive samples.

9
Angluins Results
  • Angluin1980 proposed that a language class
    that contains some finite languages and some
    infinite languages is identifiable from positive
    samples alone.
  • Angluin proposed an efficient characterizable
    method using which one can learn many interesting
    classes of languages. Examples are.
  • Parenthesis language, pattern language.
  • K-reversible language and TDR language.

10
Terminal Distinguishable Languages
  • Based on structural information (skeleton).
  • Good algebraic and grammatical characteristics.
  • Good incremental behaviour.
  • Based on three properties backward determinism,
    terminal completeness, terminal dissimilarity.

11
Example of even linear skeleton

12
Definitions
  • Two nodes Nij and Nkl are equivalent and merged
    iff
  • SSNF(Nij ) SSNF(Nkl) or
  • FS(Nij ) FS(Nkl) or PTF(Nij ) PTF(Nkl)
  • An even linear grammar G is TDELG, if f
  • B?w and C?w implies B C
  • ?A ? N-S and ?x,y ?L(A), Ter(x)Ter(y)
  • Let A,B,C ? N, a,b ?? and i) the productions
  • S?B and S?C where S is the start symbol
  • ii) the productions A ?aBb and A ?aCb appear in
    the sets of productions, then
  • ? x ?L(B), ? y ?L(C) Ter(x)?Ter(y)

13
Proposed Method to Infer TDELG
  • Union-Find approach
  • Simple to understand and easy to implement
  • Incorporates all properties of TDELG
  • Minimizing computing overheads
  • Suitable for Incremental inferencing

14
Definitions
  • S A set of given sample strings.
  • ND(S) A set of nodes found in all skeletons of
    S.
  • Parent(p) Immediate predecessor in a skeleton
    having p.
  • H(S) A set of SSNF of S
  • F(S) A set of frontier strings of S

15
Method
  • Initial partition ?0 is obtained using SSNFs
  • A list of pair of nodes is maintained to denote
    possible merging of blocks of partitions.
  • A pair of nodes is removed from the list
    repeatedly until list becomes empty and the
    blocks containing the nodes are tested for
    merging to get new partition ?n . New pairs are
    added to list if PTF of nodes of other blocks are
    changed.

16
Example
  • Let the sample set be ab, aabb, aaabbb.
  • Computation of head, tail, SSNF, FS, and PTF of
    the nodes in the even linear skeletons are shown
    in the table.
  • Initial partition is ?0 N11 , N21 ,N31 ,
  • , N22 , ,N32 , N33

17
Computation of Functions
18
Example
  • LIST (N11, N22), (N11, N33), (N22, N33),
    (N21, N32) .
  • Consider (N11, N22) ? LIST
  • This pair forces the merging of blocks
  • B1 N11, N21, N31 and B1 N22, N32
  • New partition ?1 N11,N21,N31, N22, N32, N33
    .
  • Final partition N11, N21, N31, N22, N32 ,
    N33
  • TDELG is (S,S1, a,b, S? S1, S1? aS1b, S1?
    ab , S )

19
Complexity Analysis
  • Let S w1 , w2 wn be the input sample.
  • Let k be the size of the input i.e sum of lengths
    of all input strings belonging to S .
  • Construction of initial partition of ND(S )
    requires a computation time O(k2) .
  • The partition may be queried and updated using
    collapsing FIND and weighted UNION operations
    which lead to a running time of O(k?(k)) where ?
    is a very slowly growing function.

20
Comparison with the related works
  • Methods in the literature
  • Takada proposed a method of inferring a subclass
    of even linear languages using control sets.
  • Fernau also proposed similar method.
  • Basic idea in their methods extend regular
    language inference algorithm.
  • However, their methods require extra
    pre-processing and post processing steps.
  • Their approaches are difficult to understand.
  • Our method
  • We propose a characterizable subclass of even
    linear languages.
  • Our method is not an extension of regular
    inference mechanism.
  • We need not to preprocess the input or post
    process the output.
  • An algorithm is proposed which takes only
    positive samples as the input.
  • The tabular approach which is simple to
    understand can be used to implement.

21
References
  • D.Angluin, Inference of Reversible Languages, J.
    ACM 29, pp. 741-765, 1982.
  • D. Angluin, C. H. Smith. Inductive inference
    theory and methods, Computing Surveys, 15, pp.
    237-269, 1983.
  • E. M. Gold, Language identification in the limit,
    Information and Control, 10, pp. 447-474, 1967.
  • V. Radhakrishnan. Grammatical inference from
    positive data An effective integrated approach.
    PhD thesis, Department of Computer Science and
    Engineering, Indian Institute of Technology,
    Bombay (India), 1987.
Write a Comment
User Comments (0)
About PowerShow.com