Title: Rani Siromoney
1Forty Years of Formal Languages and Automata
Theory Reminiscences
Rani Siromoney
Professor Emeritus Madras Christian College and
Adjunct Professor Chennai Mathematical institute
3rd Update Meeting Automata Verification
MatScience February 29 March 2, 2004
2LAND MARKS / MILESTONES
3LAND MARKS / MILESTONES (continued)
4(No Transcript)
5CHRONOLOGY
Equal Matrix Languages (n-right linear/multi-tape
FSA)
Grammatical Inference Takada
Two-dimensionalKK GS
AlgorithmicLearning TheoryNoise
ModelSakakibara
L-Systems(2-D, Radial, Hexagonal)GS
DNA Splicing Takada
RecognizableKGS et al
DNA Computing Lila Kari
PKC
OTA(Learning)
Machine LearningILP Arul
Aqueous ComputingBireswar
6- Prior to 1965, mainly Chomskian Hierarchy
General Trend from 1965 To introduce new
classes of grammar / automata and the
corresponding languages generated / - accepted or recognized motivated
by nice theoretical properties / applications
Parallel Rewriting Restricted Rewriting
Using Simple Rules (CF/Regular) To enhance
generative capacity capture interesting
properties (indexed grammars, matrix
grammars etc) Equal Matrix Languages, falls
under the following categories - Parallel
Rewriting - Restricted Rewriting -
Nice Theoretical Properties
7Equal Matrix Languages (EML), RS 1969
(n-right linear, Ibarra 1970 , Multi-tape Finite
Automata 1967)
- Motivation Parikh mapping of any CFL
semi-linear Converse not true Examples
anbncn/n?1, ww / w ? (a,b),
context-sensitive - To find a family of languages for which the
converse is true - Came across Matrix Languages (Abraham 1964)
- Came up with EML n right-linear rules applied
in parallel, constrained by a matrix - Nice Properties Higher Generative
Capacity(Simple EMG for well-known CSL) Parikh
mapping semi-linear for all EML L ? w1 wk
, w1 , , wk ??, k finite, is an EML
(Bounded EML) iff Parikh mapping is semi-linear
Unambiguity Automata characterization
(Finite-turn Checking Automata) Closure
Properties Decidability Results - Application to kernel sentences in Tamil (GS)
8Two Dimensional Languages / Picture
LanguagesIntroduction (as extension of EML) Two
major types Significant contribution -
extension of catenation to row and column -
has become standard terminology Digression to
Kolam Marcia Aschers paper and book in 2002
9Lindenmayer Systems
(2D, Radial, Hexagonal)
- - 2D Expansion in all four directions-
Hexagonal- Significant contribution-
Introduction of arrow head catenation- Hexagon
catenated to hexagon yields hexagon - Work continued (i) By MM KK in memorial
volume (ii) KGS- View of 3D solids as hexagons - - Block behind block reflected by arrow head
catenation
10Consider hexagonal arrays, define arrowhead
catenation Triangular grid made up of lines
equally inclined and parallel to three fixed
directions Definition of catenation of an
a-hexagonal array (A)with a p-hexagonal array
(H)
11(No Transcript)
12(No Transcript)
13Kambi Kolam and Circular DNA Splicing
Rani Siromoney
Professor Emeritus, Madras Christian
College Adjunct Professor, Chennai Mathematical
Institute Chennai, Tamil Nadu, India
14Picture Languages
Kolam is a traditional art practiced
extensively in the southern part of India,
for decorating courtyards of dwellings.
15(No Transcript)
16Picture languages (continued)
Kolam figures grouped into families attracted
interest of theoretical computer scientists
concerned with analysis and description of
pictures through the use of picture languages,
which use sets of basic units and specific,
formal rules for combining the units
17 Derivation of multi-kambi kolam from single
kambi kolam According to one KP, a proper
kambi kolam should consist of a single
kambi If a kolam did contain more than one kambi
,then the greater the number of kambis the
easier it is to memorize the kolam. A single
kambi kolam can be converted into a
multi-kambi kolam by applying a cut at a
crossing. A cut and join (de-link) operation
fuses ends together, two at a time, after cut
at a cross which produces four ends.
18(No Transcript)
19- A cut and connect operation can link two adjacent
corners. - A cut is introduced such that it goes through two
adjacent - rounded corners producing four ends.
- These ends are connected either forming a
crossing - alternately two new adjacent rounded corners.
- -Two kambis when used in a cut and connect
operation will - fuse into one same kambi.
- - If two adjacent corners belong to the same
kambi then a - cut and connect operation can produce two kambis
- or just a kambi with an additional crossing.
20SPLICING SYSTEMS (Tom Head) INTRODUCTION
Initial set (finite or infinite) consists of
double-stranded DNA molecules Specific classes
of enzymatic activities considered- those of
restriction enzymes Recombinant behavior
modeled and associated sets analyzed by new
formalism called Splicing Systems Attention
focused oneffect of sets of restriction enzymes
and a ligasethat allow DNA molecules to be
cleaved and Re-associated to produce further
molecules.
21Circular DNA and Splicing Systems DNA
molecules exist not only in linear forms
but also in circular forms.
22Action of splicing schemes on Circular strings
(RS,KGS,VRD,UBE,1992) certain recombination
processes pair of circular DNA molecules produce
a circular string Formally S (A, T, P)
splicing systemhpxq, wuxv circular strings in
A for which (p,x,q) P (u, x,v) S acts on two
circular strings to producea pair of single
circular string hgpxvwuxq
cut and connect operation
23Picture Languages Ascher Marcia M. 2002, The
Kolam Tradition, American Scientist,90,56-63
Ascher , Marcia..2002, Mathematics Elsewhere
Princeton University Press. Princeton,
N.J Narasimhan, R. 1992. The oral-literate
dimension in Indian culture. In Indological
EssaysCommemorative Volume II for Gift
Siromoney, ed. M. Lockwood, Department of
Statistics, Madras Christian College,
pp.67-79 Prusinkiewicz, P., K. Krithivasan and
M. G. Vijayanarayana. 1989. Application of
L-systems to algorithmic generation of South
India folk art patterns and Karnatic music, in A
Perspective in Computer Science Commemorative
Volume for Gift Siromoney, ed. R.Narasimhan.
Computer Science Series, vol. 16, World
Scientific, Singapore pp. 229-247 Gift
Siromoney, Rani Siromoney and T. Robinson, Kambi
kolam and cycle grammers, in A Perspective in
Computer Science Commemorative Volume for Gift
Siromoney, ed. R.Narasimhan. Computer Science
Series, vol. 16, World Scientific, Singapore pp.
267-300 Gift Siromoney, Studies on the
traditional art of Kolam, Working Paper I, May
1985, (Manuscript)
24Distributed Circular Systems Rani
SiromoneyDistributed Circular Systems,Grammar
Systems 2000 ,Bad Ischl, Austria
25Sequential Distributed Circular Automata
26- Finite Automata for Circular Languages
- J. Kari and L. Kari,Context-free
Recombinations,Words, sequences, languages where
computer science, biology and linguistics meet,
C. Martin-Vide, V. Mitrana (Eds.). Kluwer, The
Netherlands. - DefinitionFinite automaton A, circular
languageK-accepted by A,L( A )K , all words w
such that A has a cycle labeled by w - K AcceptanceCircular/linear language accepted
by a finite automaton A, defined as L(A) ?
L(A), L(A) linear language accepted by
automaton A defined in the usual way - Definition A circular/linear language L ?
???? is regular if there is a finite automaton
A that accepts the circular and linear parts of L
i.e., that accepts L ?? and L ? ?
27P-Acceptance The following definition is
equivalent to a definition given by Pixton
namely, the circular language accepted by a
finite automaton is a set of all words that label
a loop containing at least one initial and one
final state.DefinitionGiven a finite
automaton A,the circular language accepted by A,
L(A)Pis the set of all words w such that A
has a cycle labelled by w that containsat least
one final state.
28H-Acceptance The circular languages accepted by
finite automaton by the following definition
coincide with the regular circular languages
introduced by Head Given a finite automation A,
the circular language accepted by A, L( A )H
is the set of all words w such that w u v and
v u ? L( A ) Pixton has shown that if in
addition we assume that the family of languages
is closed under repetition (i.e., wn is in the
language whenever w is) H acceptance and P
Acceptance are equivalent
29- Proposition
- Family of circular languages accepted by finite
automata by K-Acceptance strictly included in - Family of circular languages accepted by finite
automata by P-acceptance strictly included in - Family of circular languages accepted by finite
automata by H-acceptance - Context-free recombination are computationally
weak, able to generate only regular languages. - Advantages of K-Acceptance
- The same automaton accepts both the linear and
circular components of the language
30- Sequential Distributed Architecture by Automata
for Circular Languages - Splicing for purely circular strings and
- automata characterization given by Pixton
- Mixed splicing/recombination and automata
characterizations given - (Pixton , Kari and Kari )
- To extend sequential distributed architecture to
these cases - Sequential grammar systems are extended to
automata (Krithivasan et.al.) - Distributed FSAs and Distributed PDAs
Similar to the modes defined in CD-Grammar
Systems Acceptance power analyzed
Distributed FSAs in all modes not more
powerful than centralized FSAs For
PDAs in all modes, distributed counterpart as
powerful as TMs - For purely circular languages,consider
P-acceptance - For mixed ( circular/linear ) languages, consider
the K-Automata, - apply technique in Krithivasan et.al. for
distributed processing in automata for
the different modes of acceptance - Distributed processing in automata ( sequential )
does not increase the generative power
for Regular Circular Languages
31- Algorithmic Learning TheoryInductive Logic
Programming - Sakakibara Y, Siromoney R 1992 , A noise model on
learning sets of strings. Proceedings of the
Fifth Annual ACM Workshop on Computational
Learning Theory (ACM Press) pp 295-302 - Siromoney A, Siromoney R 1993, Local Exceptions
in Inductive Logic Programming, Presented at the
International Workshop on Machine Intelligence,
ARL Labs, Hitachi, Japan. - Siromoney A, Siromoney R 1995 Variations and
Local Exceptions in Inductive Logic Programming.
Machine Intelligence (eds) K Furukawa, D Michie,
S Muggleton (Oxford Clarendon) vol. 14, pp
211-232 - Arul Siromoney and Rani Siromoney, June 1996 , A
machine learning system for identifying
transmembrane domains from amino acid sequences
Sadhana, Vol. 21, Part 3, pp. 317-325.
32- A new noise model on learning sets of strings
- in the framework of PAC learning - Instance domain, ?n set of strings of
length n over a finite alphabet ? - EDIT operation errors - Insertion,
deletion, change of a symbol in a string - EDIT Noise - Examples corrupted by random
errors - General upper bounds on the EDIT
noise rate that a learning algorithm of
taking the strategy of minimizing
disagreements can tolerate and learning algorithm
can tolerate. - Next we present an efficient algorithm that can
learn a class of decision lists (Rivest 87) over
the attributes a string w contains a pattern p
from noisy examples under some restriction on
the EDIT noise rate.
33- ILP System uses background knowledge and
a set of examples and counter examples to
learn the description of a concept in the form of
a set of Horn clauses or Prolog program. - A machine learning system that uses - inductive
logic programming techniques- to learn how to
identify- transmembrane domain from amino acid
sequences- very important in protein
classification problem - uses operations such as contains that act
on entire sequences rather than individual
elements - prediction accuracy of implementation (around)
93 - compares favourably with earlier results
34- In Real Life, Rules have exceptions
- Exceptions incorporated in ILP - GOLEM (ILP
Learning Algorithm) suitably extended - Learning local exceptions in ILP PAC
learnable - Application considered where exceptions are
useful
35AS RS
Variations and local exceptions in inductive
logic programming.- Variations are valid
departure from the normal.- Classical musical
composer writing variations on a theme.- Jazz
player improvising on a melody- Indian
percussionist improvising on a rhythm.
Variations are different from noise- Variation
valid departure from the normal- Noise
incorrect or illegal deviation from the
normalApplication in music, molecular biology,
Speech recognition and distributed knowledge
36DNA Plasmids to Solve a Counting Problem
Rani Siromoney
Professor Emeritus, Madras Christian
College Adjunct Professor, Chennai Mathematical
Institute Chennai, Tamil Nadu, India siromoney_at_cmi
.ac.in
Bireswar Das
Junior Research Fellow Institute of
Mathematical Sciences Chennai, Tamil Nadu, India
bireswar_at_imsc.res.in
37Sources 1. T. Head, Circular Suggestions for DNA
Computing, in Pattern Formation in Biology,
Vision and Dynamics, Eds. A.Carbone, M Gromov and
P.Prusinkiewicz, World Scientific,Singapore ,
2000, pp. 325-335. 2. J. Kari, A Cryptosystem
Based on Propositional Logic, in Machines,
Languages and Complexity, 5th International
Meeting of Young Computer Scientists,
Czeckoslovakia, Nov. 14-18, 1988, Eds. J. Dassow
and J.Kelemen, LNCS 381, Springer, 1989,
pp.210-219. 3. Rani Siromoney, Bireswar Das, DNA
Algorithm for Breaking a Propositional Logic
Based Cryptosystem, Bulletin of the EATCS, Number
79, February 2003, pp.170-176 (P.T.O)
38 Introducing CUT-DELETE-EXPAND-LIGATE (C-D-E-L)
model Combine features in Divide-Delete-Drop
(D-D-D), (Leiden) and
CUT-EXPAND-LIGATE (C-E-L)(Binghamton) to form
CUT-DELETE-EXPAND-LIGATE (C-D-E-L This enables us
to get an aqueous solution to 3SAT which is a
counting problem and known to be in IP. 3SAT
Defined as follows Instance F a
propositional formula of form F C1 ?C2 ? Cm
where Ci , i 1, 2, , m are clauses. Each Ci is
of the form ( li1 ? li2 ? li3) where li j , j
1, 2, 3 are literals from the set of variables
x1 , x2 , , xn Question What is the number
of truth assignments that satisfy F?
39Data register molecule a standard double
stranded DNA cloning plasmid commercially
available. This plasmid is a circular molecule
approximately 3 kb. It contains a sub-segment,
MCS (multiple cloning site) of approximately 175
base pairs that can be removed using a pair of
restriction enzyme sites that flank the
segment. The MCS contains pair-wise disjoint
sites at which restriction enzymes act such that
each produces a 5 overhang bases.
40 In C-D-E-L, segment of plasmid used is of the
form c1s1c1c2s2c2cnsnncn ci, i1,,n
are called sites, such that no other subsequence
of plasmid matches with this sequence si,
i1,,n are called stations In D-D-D, lengths of
stations required to be same Difference in
C-D-E-L, lengths of stations all different
differences in lengths, fundamental in solving
3SAT Bio-molecular operations used in
(C-D-E-L)similar to the operations in C-E-L
41 DESIGN x1 , , xn the variables in F, ?x1 ,
, ?xn their negations si station associated
with xi ?si station associatd with ? si ci site
associated with station si?ci site associated
with station ?si vi length of station associated
with xi , i1, , nvnj length of station
associated with literal ?xj , j1,, n
Choose stations in such a way thatthe sequence
v1 , , v2n satisfies the property
k ? vi lt vk1 , k 1, , 2n-1
i1 i.e. an Super-increasing( Easy) Knapsack
Sequence From sum, sub-sequence efficiently
recovered.
42 Solution in Cn is analyzed by gel separation If
more than one solution is present, they will be
of different lengths, thus will form separate
bands By counting number of bands, we count
the number of satisfying assignments. Furthermore
, from lengths of satisfying assignment ,exact
assignment is read. This can be done since
stations have lengths from easy knapsack
sequence - any subsequence of an easy knapsack
sequence has different sum from the sums of other
subsequences.
43C-D-E-L
44(No Transcript)
45 Thus solution to 3SAT viz. finding the
number of satisfying assignments is effectively
done. Moreover, reading the truth assignments
is a great advantage to break the cryptosystem
based on propositional logic
46Advantage over previous method of attack In the
cryptanalytic attack proposed earlier, modifying
D-D-D, it was required to execute the DNA
algorithm for each bit in the crypto-text But in
the present method proposed, using C-D-E-L
(combining features of C-C-C and C-E-L )
apply 3-SAT on P and read any satisfying
assignment from the final solution This gives an
equivalent public key, which amounts to breaking
the cryptosystem
47Parikh Again
Parikh Mapping (vector) classical and important
tool in theory of Formal Languages.- Image of a
CFL is semi-linear- Basic idea Properties of
words expressed as numerical properties
of words- But much information lost in the
transition
48Parikh Matrices
- - All matrices are triangular with 1s on the
main diagonal and 0s below it- Classical Parikh
vector occurs as 2nd diagonal all other
entries contain information about the order of
letters in the original word - Two words with same Parikh matrix always have the
same Parikh vector - But two words with the same Parikh Vector have in
many cases different Parikh matrices - Thus the Parikh matrix gives more information
about a word than its Parikh vector. - Still not injective- Open problem to
characterize non-injectivity
49References Mateescu,A., Salomaa,A., Salomaa K.
and Yu, S., A Sharpening of the Parikh mapping,
Theroret. Informatics Appl. 35 (2001)
551-564 A. Atanasiu, C. Martin-Vide and
Mateescu,A., On the injectivity of the Parikh
matrix mapping (submitted) Mateescu,A.,
Salomaa,A., and Yu, S., An Inequality for
Occurrences of SubwordsTurku Centre for Computer
Science, TUCS Technical ReportNo 481, December
2002
50THANK YOU