Grammatical Complexity Of Symbolic Sequences: A Brief Introducton - PowerPoint PPT Presentation

1 / 52
About This Presentation
Title:

Grammatical Complexity Of Symbolic Sequences: A Brief Introducton

Description:

Grammatical Complexity Of Symbolic Sequences: A Brief Introducton Bailin Hao T-Life Research Center, Fudan University Institute of Theoretical Physics, Academia Sinica – PowerPoint PPT presentation

Number of Views:78
Avg rating:3.0/5.0
Slides: 53
Provided by: aaa8180
Category:

less

Transcript and Presenter's Notes

Title: Grammatical Complexity Of Symbolic Sequences: A Brief Introducton


1
Grammatical ComplexityOf Symbolic SequencesA
Brief Introducton
Bailin Hao T-Life Research Center, Fudan
University Institute of Theoretical Physics,
Academia Sinica The Santa Fe Institute, New
Mexico, USA http//www.itp.ac.cn/hao/
2
Three Paradigms in TheoreticalDescription of
Nature
  • Deterministic based on periodicities and
    recurrences, from Kepler to Yang-Mills
  • Stochastic based on randomness, from Brownian
    motion to MSR field theory of hydrodynamics and
    molecular motors
  • Fractal, self-similar, scale invariant from
    phase transitions and critical phenomena to
    chaotic dynamics
  • Finiteness is the unifying Physics languages

3
???(language ?? philology)??
  • ?????
  • ???????
  • Zipf ??
  • ???????????????
  • ????Chomsky??
  • ????Lindenmayer ??(???????)
  • ??????(Factorizable language)

4
?????????
  • ???
  • ???
  • ???
  • ?????
  • ????
  • ????????????
  • ?????,???????
  • ????????
  • ????????
  • ??????????
  • ????????

??? ????????? ????????? ?????????? ??????????
5
An Observation
  • u d c s b t
  • charge, mass, flavor, charm,
  • p n e
  • charge, mass, spin, magnetic momentum,
  • H C N O P
  • atomic number, ion radius, valence, affinity,
  • H2O NO CO2
  • molecular weight, polarity,
  • a c g t
  • A D E F G H W Y V
  • BRCA1 PDGF

6
A PROGRAMME
  • Coarse-Grained Description of Nature
  • Use of Symbols and Symbolic Strings
  • Language
  • Grammar and Complexity
  • (Chomsky, Lindenmayer, etc.)
  • So far this programme has been best realized in
    the study of dynamics by using Symbolic Dynamics.
  • There have been preliminary attempts in analyzing
    biological sequences.

7
  • It may not be a coincidence that the two systems
    in the universe that most impress us with their
    open-ended complex design life and mind are
    based on discrete combinatorial systems. Many
    biologists believe that if inheritance were not
    discrete, evolution as we know it could not have
    taken place.
  • S. Pinker, The Language Instinct (1995)

8
Simple Examples
  • At the level of words
  • DOG GOD
  • At sentence level
  • Dog bites Man
  • Man bites Dog

9
  • N C
    EGF (Epidermal GF)
  • N C
    Chymotrypsin (??????)
  • N C
    Urokinase (UK) (???)
  • N
    C Factor IX

  • (????IX, X-mas??????)
  • N
    C Plasminogen

  • (???????)
  • ?????????domain??
  • B.Alberts ?,Mol.Biology of the Cell
    ??? 1994. P.123

Ca ????
?3?-s-s-
10
  • GC ?????
  • ??? ?
  • ?1. ? a, c, g, t
  • ?2. ? A, C, D
    W, Y
  • ?3. ? a, z, A,
    Z, , ,
  • ????????????????
  • (????) ?
  • ???????????????
  • ?? ???,????,????
  • ????????

11
Classification of Formal Languages
  • Chomsky Hierarchy
  • Sequential production rules
  • Lindenmayer Systems
  • Parallel production rules

12
Generative Grammar
  • S ? Sentence
  • NP ? Noun Phrase
  • VP ? Verb Phrase
  • Adj ? Adjective
  • Art ? Article

S NP VP VP V NP NP (Art) Adj N
S if S then S S either S or S
N boy girl scientist V sees
believes loves eats Adj young good
beautiful Art a one the
Non-Terminal and Terminal Symbols
13
  • Chomsky ????
  • N ??????(?????)
  • T ?????
  • S ? N ????
  • P ????(x y)???
  • x, y ???? ?? x, y ???????????
  • ?? G (N, T, P, S)
  • 0 ???
  • x ? (N?T) N(N?T)
  • y ? (N?T)

???????????
14
  • 1 ??? ???????
  • x t1 a t2
  • t1, t2 ? T
  • a ? N
  • 2 ??? ???????
  • x a ? N
  • 3 ??? ????
  • x a y b ? bc
  • a, c ? N b ? ? b ? T

15
?????Chomsky??
? ?? ??? ????
0 ???? REL ??? (?????) ??
1 ????? CSL ??????? ???????
2 ????? CFL ????? ??? (??)
3 ?? RGL ????? ???
16
R L R R
R L R R

a
b
(i)
(ii)
? (a, R) b A Finite State Automaton (FSA)
R L
a b c
b
c
d
A transfer function
17
FSA Finite State Automata
  • Deterministic FSA
  • Non-Deterministic SFA
  • Equivalence of DFSA and NDFSA subset
    construction
  • Minimal DFSA
  • Myhill-Nerode theorem (1958) number of nodes in
    minDFSA

18
A Pushdown Automaton
  • Pushdown list
  • Stack
  • First In Last Out (FILO)

19
A Turing MachineAlan M. Turing (1912-1954)
  • FSA ? R/W tape
  • Church-Turing Thesis (1936)
  • Any effective (mechanical) computation can
  • be carried out by a Turing machine

20
Example ai b ici igt0 CSL
  • Terminals a, b, c
  • Non-terminal A, B
  • Sequential rules B aBAc abc
  • bA bb
  • cA Ac
  • B abc
  • B aBAc aabcAc
    aabAcc
  • B abAc aaBAcAc

  • aaBAAc

  • aaabcAAc

  • aaabAcAc aaabbAcc

21
Rules to Generate Gene-Like Sequences( according
to David Searls )
  • gene upstream transcript downstream
  • transcript 5-untranslated-region
    start-codon coding-region
  • 3-untranslated-region
  • coding-region codon coding-region
    stop-codon splice
  • coding region
  • codon lys asn thr met glu his
    pro asp ala gly tyr
  • trp phe leu ile ser
    arg gln val cys
  • start-codon met
  • stop-codon taa tag tga

22
  • leu tt purine ct base (6)
  • ser ag pyrimidine tc base (6)
  • arg ag purine cg base (6)
  • val gt base
    pro cc base (4)
  • ala gc base
    gly gg base (4)
  • thr ac base
    (4)
  • ile at pyrimidine ata (3)
  • lys aa purine
    asn aa pyrimidine (2)
  • gln ca purine
    his ca pyrimidine (2)
  • glu ga purine
    cys tg pyrimidine (2)
  • phe tt pyrimidine tyr
    ta pyrimidine (2)
  • asp ga pyrimidine (2)
  • met atg
    trp tgg
  • base m a c g t
    purine a g
  • primidine c t

23
  • splice intron intron
    gt intron-body ag
  • splice a a intron splice c c
    intron
  • splice t t intron splice g g
    intron
  • a splice intron a c splice
    intron c
  • t splice intron t g splice
    intron g
  • upstream enhancer promotor enhancer
  • enhancer
  • promotor
  • silencer
  • isolator

24
  • These rules are capable to generate an
    unlimited
  • set of gene-like sequences, mostly biological
    nonsense.
  • They may be used to recognize gene-like segments
  • in long DNA sequences.
  • Syntax versus Semantics texts vs. grammar.
  • Physics behind this coarse-grained
    description
  • stereochemistry, interaction between proteins and
  • DNA chains, metallic ions etc.

25
Symbolic Dynamics Languages
1991
1999
26
????????
  • ?????????????????????????????????
  • ?????????????????????
  • ????????????????????????

27
Subintervals determined by the periodic
kneading Sequence (RLRRC)8
28
Order of visits in the periodic kneading Sequence
(RLLRC)8
29
Transformations of subintervals
  • a ? c d (on reading L)
  • b ? d (on reading R)
  • c ? b c (on reading R)
  • d ? a (on reading R)

30
Input L R R R
q a b c d
d 1 1 0 0
c 1 0 1 0
b 0 0 1 0
a 0 0 0 1
31
Transfer Functions
R L
a c, d
b d
c b, c
d a
R L
a,b,c,d a,b,c,d c,d
c,d a,b,c
a,b,c b,c,d c,d
b,c,d a,b,c,d
32
(No Transcript)
33
Stefan matrix for 256P in Feigenbaum cascade
34
Stefan matrix for F13233 Case (a)
35
Stefan matrix for F13233. Case (b)
36
Stefan matrix for F13233. Case (c)
37
Stefan matrix for F13233. Case (d)
38
Symbolic Dynamics Languages
1991
1999
39
Development of Anabaena catenula (????????)
  • br
    bl
  • ar
    al
  • albr
    blar
  • Alphabet S ar, al, br, bl
  • Production rules
  • Initial symbol (axiom) ? ar
  • Grammar G (S, P, ?)
  • Language L (G) ? S

br ar ar albr bl al al
blar
P
40
  • Lindenmayer Systems
  • Parallel production rules. Finer classification
  • D0L Deterministic, no interaction, i.e.,
    context-free
  • 0L non-deterministic, no interaction
  • IL non-deterministic, with Interaction, i.e.,
    context
  • sensitive
  • T0L with Table of production rules
  • TIL
  • E0L Extended to non-terminal symbols
  • ET0L
  • EIL ? REL of Chomsky

41

CSL
CFL
RGL
FIN
DOL
REL
  • RGL Regular CFL
    Context-Free
  • CSL Context-Sensitive REL Recursively
    Enumerable

42
0REL
EIL
  • Chomsky
  • Lindenmayer
  • Indexed

1CSL
IND
ET0L
IL
E0L
2CFL
T0L
3RGL
0L
D0L
43
Example a la Lindenmayer
  • L aibici i gt 0 CSL
  • G (S, T, ?)
  • ? abc
  • S a, b, c
  • T t1, t2
  • T1 a aa, b bb, c cc
  • T2 a ?, b ?, c ?

  • T0L

44
Dyck language A language of nested parentheses
  • Many types of parentheses
  • Finite depth of nesting
  • Context-free language
  • Our case
  • Only 3 types of parentheses
  • Shallow nesting
  • Conjecture (Xie) may be regular language

45
  • ?????
  • ??????Z .G .Yu (???2001)
  • ???????????
  • Consensus ???????
  • ????
  • ????? ??????
  • ?????????

46
Factorizable Languages
  • Symbolic dynamics leads to factorizable languages
  • A complete genome defines a factorizable langauge
  • An amino acid sequence with unique reconstruction
    (at certain K) defines a factorizable language

47
Modeling in Biology
  • Cells
  • Tissues
  • Organs
  • Systems circulation, respiration,
    reproduction, neural, sensory, musclular, etc.
  • Organisms, population, ecosystems
  • Animals versus plants
  • Plant development, morphology, physiology and
    pathology

48
Modeling of Plant MorphologyBy using L-System
  • P. Prusinkiewicz, J. Hanan, Lindenmayer Systems,
    Fractals, and Plants, LN in Biomath., vol. 79,
    Springer, 1989
  • P. Prusinkiewicz, A. Lindenmayer, The Algorithmic
    Beauty of Plants, Springer, 1990
  • P. Prusinkiewicz, M. Hammel, J. Hanan, R. Mech,
    Visual models of plant development, Chap.9 in
    Handbook of Formal Languages, Vol.3, Springer,
    1997

49
Consistency of Macro-and Micro-Description of
Nature
  • Molecular phylogeny versus phylogeny based on
    morphological features
  • Modeling plant development without getting into
    molecular and cellular description
  • No need to model protein folding by invoking
    quarks!

50
Some Useful URLs
  • www.grogra.org (Growth Grammar)
  • http//www.computableplant.org
  • http//algorithmicbotany.org

51
  • Huimin Xie ???
  • Grammatical Complexity and
  • 1D dynamical Systems
  • Vol.6 in Directions in
    Chaos
  • WSPC, 1996.
  • ??? ????????
  • ?????????, 1994
  • Bailin Hao, Weimou Zheng, Applied Symbolic
    Dynamics and Chaos (WSPC, 1998), Chap. 8
  • J.Hopcroft, J.Ullman, Introduction to Automata
    Theory, Languages and Computation,
    Addison-Wesley, 1979.

52
Thanks!
Write a Comment
User Comments (0)
About PowerShow.com