Title: Grammatical Complexity Of Symbolic Sequences: A Brief Introducton
1Grammatical ComplexityOf Symbolic SequencesA
Brief Introducton
Bailin Hao T-Life Research Center, Fudan
University Institute of Theoretical Physics,
Academia Sinica The Santa Fe Institute, New
Mexico, USA http//www.itp.ac.cn/hao/
2Three Paradigms in TheoreticalDescription of
Nature
- Deterministic based on periodicities and
recurrences, from Kepler to Yang-Mills - Stochastic based on randomness, from Brownian
motion to MSR field theory of hydrodynamics and
molecular motors - Fractal, self-similar, scale invariant from
phase transitions and critical phenomena to
chaotic dynamics - Finiteness is the unifying Physics languages
3???(language ?? philology)??
- ?????
- ???????
- Zipf ??
- ???????????????
- ????Chomsky??
- ????Lindenmayer ??(???????)
- ??????(Factorizable language)
4?????????
- ???
- ???
- ???
- ?????
- ????
- ????????????
- ?????,???????
- ????????
- ????????
- ??????????
- ????????
??? ????????? ????????? ?????????? ??????????
5An Observation
- u d c s b t
- charge, mass, flavor, charm,
- p n e
- charge, mass, spin, magnetic momentum,
- H C N O P
- atomic number, ion radius, valence, affinity,
- H2O NO CO2
- molecular weight, polarity,
- a c g t
- A D E F G H W Y V
- BRCA1 PDGF
6A PROGRAMME
- Coarse-Grained Description of Nature
- Use of Symbols and Symbolic Strings
- Language
- Grammar and Complexity
- (Chomsky, Lindenmayer, etc.)
- So far this programme has been best realized in
the study of dynamics by using Symbolic Dynamics. - There have been preliminary attempts in analyzing
biological sequences.
7- It may not be a coincidence that the two systems
in the universe that most impress us with their
open-ended complex design life and mind are
based on discrete combinatorial systems. Many
biologists believe that if inheritance were not
discrete, evolution as we know it could not have
taken place. - S. Pinker, The Language Instinct (1995)
8Simple Examples
- At the level of words
- DOG GOD
-
- At sentence level
- Dog bites Man
- Man bites Dog
9- N C
EGF (Epidermal GF) - N C
Chymotrypsin (??????) - N C
Urokinase (UK) (???) - N
C Factor IX -
(????IX, X-mas??????) - N
C Plasminogen -
(???????) - ?????????domain??
- B.Alberts ?,Mol.Biology of the Cell
??? 1994. P.123
Ca ????
?3?-s-s-
10- GC ?????
- ??? ?
- ?1. ? a, c, g, t
- ?2. ? A, C, D
W, Y - ?3. ? a, z, A,
Z, , , - ????????????????
- (????) ?
- ???????????????
- ?? ???,????,????
- ????????
11Classification of Formal Languages
- Chomsky Hierarchy
- Sequential production rules
- Lindenmayer Systems
- Parallel production rules
12Generative Grammar
- S ? Sentence
- NP ? Noun Phrase
- VP ? Verb Phrase
- Adj ? Adjective
- Art ? Article
S NP VP VP V NP NP (Art) Adj N
S if S then S S either S or S
N boy girl scientist V sees
believes loves eats Adj young good
beautiful Art a one the
Non-Terminal and Terminal Symbols
13- Chomsky ????
- N ??????(?????)
- T ?????
- S ? N ????
- P ????(x y)???
- x, y ???? ?? x, y ???????????
- ?? G (N, T, P, S)
- 0 ???
- x ? (N?T) N(N?T)
- y ? (N?T)
???????????
14- 1 ??? ???????
- x t1 a t2
- t1, t2 ? T
- a ? N
- 2 ??? ???????
- x a ? N
- 3 ??? ????
- x a y b ? bc
- a, c ? N b ? ? b ? T
15?????Chomsky??
? ?? ??? ????
0 ???? REL ??? (?????) ??
1 ????? CSL ??????? ???????
2 ????? CFL ????? ??? (??)
3 ?? RGL ????? ???
16R L R R
R L R R
a
b
(i)
(ii)
? (a, R) b A Finite State Automaton (FSA)
R L
a b c
b
c
d
A transfer function
17FSA Finite State Automata
- Deterministic FSA
- Non-Deterministic SFA
- Equivalence of DFSA and NDFSA subset
construction - Minimal DFSA
- Myhill-Nerode theorem (1958) number of nodes in
minDFSA
18A Pushdown Automaton
- Pushdown list
- Stack
- First In Last Out (FILO)
19A Turing MachineAlan M. Turing (1912-1954)
- FSA ? R/W tape
- Church-Turing Thesis (1936)
- Any effective (mechanical) computation can
- be carried out by a Turing machine
20Example ai b ici igt0 CSL
- Terminals a, b, c
- Non-terminal A, B
- Sequential rules B aBAc abc
- bA bb
- cA Ac
- B abc
- B aBAc aabcAc
aabAcc - B abAc aaBAcAc
-
aaBAAc -
aaabcAAc -
aaabAcAc aaabbAcc -
21Rules to Generate Gene-Like Sequences( according
to David Searls )
- gene upstream transcript downstream
- transcript 5-untranslated-region
start-codon coding-region - 3-untranslated-region
- coding-region codon coding-region
stop-codon splice - coding region
- codon lys asn thr met glu his
pro asp ala gly tyr - trp phe leu ile ser
arg gln val cys - start-codon met
- stop-codon taa tag tga
22- leu tt purine ct base (6)
- ser ag pyrimidine tc base (6)
- arg ag purine cg base (6)
- val gt base
pro cc base (4) - ala gc base
gly gg base (4) - thr ac base
(4) - ile at pyrimidine ata (3)
- lys aa purine
asn aa pyrimidine (2) - gln ca purine
his ca pyrimidine (2) - glu ga purine
cys tg pyrimidine (2) - phe tt pyrimidine tyr
ta pyrimidine (2) - asp ga pyrimidine (2)
- met atg
trp tgg - base m a c g t
purine a g - primidine c t
23- splice intron intron
gt intron-body ag - splice a a intron splice c c
intron - splice t t intron splice g g
intron - a splice intron a c splice
intron c - t splice intron t g splice
intron g - upstream enhancer promotor enhancer
- enhancer
- promotor
- silencer
- isolator
24- These rules are capable to generate an
unlimited - set of gene-like sequences, mostly biological
nonsense. - They may be used to recognize gene-like segments
- in long DNA sequences.
- Syntax versus Semantics texts vs. grammar.
-
- Physics behind this coarse-grained
description - stereochemistry, interaction between proteins and
- DNA chains, metallic ions etc.
25Symbolic Dynamics Languages
1991
1999
26????????
- ?????????????????????????????????
- ?????????????????????
- ????????????????????????
27Subintervals determined by the periodic
kneading Sequence (RLRRC)8
28Order of visits in the periodic kneading Sequence
(RLLRC)8
29Transformations of subintervals
- a ? c d (on reading L)
- b ? d (on reading R)
- c ? b c (on reading R)
- d ? a (on reading R)
30Input L R R R
q a b c d
d 1 1 0 0
c 1 0 1 0
b 0 0 1 0
a 0 0 0 1
31Transfer Functions
R L
a c, d
b d
c b, c
d a
R L
a,b,c,d a,b,c,d c,d
c,d a,b,c
a,b,c b,c,d c,d
b,c,d a,b,c,d
32(No Transcript)
33Stefan matrix for 256P in Feigenbaum cascade
34Stefan matrix for F13233 Case (a)
35Stefan matrix for F13233. Case (b)
36Stefan matrix for F13233. Case (c)
37Stefan matrix for F13233. Case (d)
38Symbolic Dynamics Languages
1991
1999
39Development of Anabaena catenula (????????)
- br
bl - ar
al - albr
blar - Alphabet S ar, al, br, bl
- Production rules
-
- Initial symbol (axiom) ? ar
- Grammar G (S, P, ?)
- Language L (G) ? S
br ar ar albr bl al al
blar
P
40- Lindenmayer Systems
- Parallel production rules. Finer classification
- D0L Deterministic, no interaction, i.e.,
context-free - 0L non-deterministic, no interaction
- IL non-deterministic, with Interaction, i.e.,
context - sensitive
- T0L with Table of production rules
- TIL
- E0L Extended to non-terminal symbols
- ET0L
- EIL ? REL of Chomsky
41 CSL
CFL
RGL
FIN
DOL
REL
- RGL Regular CFL
Context-Free - CSL Context-Sensitive REL Recursively
Enumerable
420REL
EIL
- Chomsky
- Lindenmayer
- Indexed
1CSL
IND
ET0L
IL
E0L
2CFL
T0L
3RGL
0L
D0L
43Example a la Lindenmayer
- L aibici i gt 0 CSL
- G (S, T, ?)
- ? abc
- S a, b, c
- T t1, t2
- T1 a aa, b bb, c cc
- T2 a ?, b ?, c ?
-
T0L
44Dyck language A language of nested parentheses
- Many types of parentheses
- Finite depth of nesting
- Context-free language
- Our case
- Only 3 types of parentheses
- Shallow nesting
- Conjecture (Xie) may be regular language
45- ?????
- ??????Z .G .Yu (???2001)
- ???????????
- Consensus ???????
- ????
- ????? ??????
- ?????????
46Factorizable Languages
- Symbolic dynamics leads to factorizable languages
- A complete genome defines a factorizable langauge
- An amino acid sequence with unique reconstruction
(at certain K) defines a factorizable language
47Modeling in Biology
- Cells
- Tissues
- Organs
- Systems circulation, respiration,
reproduction, neural, sensory, musclular, etc. - Organisms, population, ecosystems
- Animals versus plants
- Plant development, morphology, physiology and
pathology
48Modeling of Plant MorphologyBy using L-System
- P. Prusinkiewicz, J. Hanan, Lindenmayer Systems,
Fractals, and Plants, LN in Biomath., vol. 79,
Springer, 1989 - P. Prusinkiewicz, A. Lindenmayer, The Algorithmic
Beauty of Plants, Springer, 1990 - P. Prusinkiewicz, M. Hammel, J. Hanan, R. Mech,
Visual models of plant development, Chap.9 in
Handbook of Formal Languages, Vol.3, Springer,
1997
49Consistency of Macro-and Micro-Description of
Nature
- Molecular phylogeny versus phylogeny based on
morphological features - Modeling plant development without getting into
molecular and cellular description - No need to model protein folding by invoking
quarks!
50Some Useful URLs
- www.grogra.org (Growth Grammar)
- http//www.computableplant.org
- http//algorithmicbotany.org
51- Huimin Xie ???
- Grammatical Complexity and
- 1D dynamical Systems
- Vol.6 in Directions in
Chaos - WSPC, 1996.
- ??? ????????
- ?????????, 1994
- Bailin Hao, Weimou Zheng, Applied Symbolic
Dynamics and Chaos (WSPC, 1998), Chap. 8 - J.Hopcroft, J.Ullman, Introduction to Automata
Theory, Languages and Computation,
Addison-Wesley, 1979.
52Thanks!