Title: MOST OF BIOLOGY IS LOOKING FOR SIGNALS
1(MOST OF) BIOLOGY IS LOOKING FOR SIGNALS
- STRUCTURAL ( HOW IT IS PUT TOGETHER )
- FUNCTIONAL ( HOW IT WORKS )
- EVOLUTIONARY ( HOW IT CAME TO BE
LIKE IT IS ) - S. and F. CAN BE STUDIED DIRECTLY -
OBSERVATION, EXPERIMENT - E. HAPPENED WHEN WE WERE NOT THERE gt
NEED TO INFER - INFERENCE IS TYPICALLY BASED ON COMPARISON
- S., F., and E. signals may enhance or
interfere with e.o. -
2COMPUTATIONAL APPROACHES IN BIOLOGY
- RELY ON BACKGROUND KNOWLEDGE OF THE PROCESS
- DB SEARCH STATISTICS WHAT IS A RANDOM
MATCH ? - SIMILARITY BETWEEN SEQUENCES WHERE DOES THE
SCORING FUNCTION COME FROM ? - PHYLOGENETIC TREES ARE CHANGE RATES
CONSTANT OVER TIME? - etc., etc.
- WHAT FOR ?
- ESTIMATION OF PARAMETERS
- DEEPER FOUNDATIONS WHAT IS POSSIBLE /
ALLOWED -
3PROBLEM TO INTERPRET GENOME SEQUENCE
- NON ALGORITHMIC, OPEN - ENDED
- I NEW GENOME SEQUENCE, GENOMES AND OTHER
SEQUENCE IN DB ( ALL BIOLOGICAL KNOWLEDGE ) - O STATEMENTS ABOUT STRUCTURE, FUNCTION,
AND EVOLUTIONARY HISTORY - NOTHING IN BIOLOGY MAKES SENSE EXCEPT
WHEN IN LIGHT OF EVOLUTION ( Th.
Dobzhansky ) - NOT A PLUG, BUT LITERAL DESCRIPTION
- CHANGE INPUT TO LIST OF PROTEIN CODING
GENES -
4SIMILARITY SEARCH ANNOTATION TRANSFER
98 COG0494 NTP pyrophosphohydrolases including
oxidative damage repair enzymes 87 COG2217
Cation transport ATPases 78 COG0050 GTPases -
translation elongation factors 77 COG0037
Predicted ATPase of the PP-loop superfamily
implicated in cell cycle control 76 COG0330
Membrane protease subunits, stomatin/prohibitin
homologs 75 COG0492 Thioredoxin reductase 74
COG0480 Translation elongation and release
factors (GTPases) 71 COG0008 Glutamyl- and
glutaminyl-tRNA synthetases 69 COG0459
Chaperonin GroEL (HSP60 family) 65 COG0470
ATPase involved in DNA replication 65 COG0681
Signal peptidase I 60 COG0086 DNA-directed RNA
polymerase beta' subunit/160 kD subunit (split
gene in archaea and Syn) 60 COG0475 Kef-type K
transport systems, membrane components 58
COG1475 Predicted transcriptional regulators 56
COG0258 5'-3' exonuclease (including N-terminal
domain of PolI) 58 COG0550 Topoisomerase IA 58
COG0638 Proteasome protease subunit 57 COG0468
RecA/RadA recombinase 56 COG2890 Predicted rRNA
or tRNA methylase 56 COG0438 Predicted
glycosyltransferases 56 COG0616 Periplasmic
serine proteases (ClpP class) 53 COG0009
Putative translation factor (SUA5) 49 COG0085
DNA-directed RNA polymerase beta subunit/140 kD
subunit 48 COG0441 Threonyl-tRNA synthetase 48
COG0162 Tyrosyl-tRNA synthetase 47 COG0532
Translation initiation factor 2 (GTPase) 46
COG0080 Ribosomal protein L11 46 COG0048
Ribosomal protein S12 46 COG0522 Ribosomal
protein S4 and related proteins 46 COG0592 DNA
polymerase sliding clamp subunit (PCNA
homolog) 46 COG0350 Methylated DNA-protein
cysteine methyltransferase 45 COG0018
Arginyl-tRNA synthetase 45 COG0495 Leucyl-tRNA
synthetase 45 COG0143 Methionyl-tRNA
synthetase 45 COG0081 Ribosomal protein L
5MOST OF THE PROTEINS ARE CONSERVED
AND MOST OF THE FAMILIES ARE ANNOTATED(
BUT MIND BOGUS NAMES , DOMAINS , AND FILTERING
! )
6 HOMOLOGY COMMON ANCESTRY
- IT IS EITHER THERE OR IT IS NOT ( NO
DEGREES ) - OBJECTION 1 WHAT IF ONLY HALF OF THE
MOLECULE IS HOMOLOGOUS ? - JUST SAY SO - OBJECTION 2 WE MAY MEAN THE DEGREE OF
CERTAINTY THAT THEY ARE HOMOLOGOUS - 1. JUST
SAY SO - 2. SOME STATISTICIANS DO NOT LIKE IT EITHER
- 3. 60 IDENTITY MAY CONFER 100 BELIEF
THAT HOMOLOGY EXISTS - ORTHOLOGY / PARALOGY IS ESTABLISHED AFTER
- FUNCTIONAL HOMOLOGY USUALLY DOES NOT MAKE
SENSE - ( CALL IT THE SAME FUNCTION )
7HOMOLOGS AND THEIR SUBSETS
PARALOGS
ORTHOLOGS AND PARALOGS
ORTHOLOGS
8WHAT IS A TREE, ANYWAY ?
- TREE IS AN OBJECT OF MATHEMATICS -
SPECIAL TYPE OF - A
GRAPH - A SET OF ELEMENTS ( VERTICES ) PLUS A
SET OF SOME PAIRS OF THESE ELEMENTS (
EDGES ) - GRAPH IS CONNECTED IF EACH TWO VERTICES
ARE LINKED - (by one path or more)
- CONNECTED GRAPH IS A TREE IF IT HAS NO
CYCLES
This is not a TREE, and this is
9PROBLEM MAP ORTHOLOG AND PARALOG
- I TREE OF GENES G (BUILD USING KNOWN
ALGORITHMS) - TREE OF SPECIES S (ASSUMED
NON-CONTROVERSIAL) - O FOR EACH NODE g IN G LABEL IT AS
DUPL. or SPEC.
g1
G
S
g2
g3
A
A
C
D
E
B
C
D
E
B
FOR EACH g , g (g) IS THE SET OF SPECIES
TO WHICH gs ANCESTORS BELONG FOR EACH s,
s(s) IS THE SET OF SPECIES DESCENDING FROM
s MAPPING FUNCTION M (g) IS THE LOWEST
NODE IN S , g (g) s(s) NODE g
(PARENT gi, gJ ) IS DUPLICATION iff M(g)
M(gi) OR M(g) M(gJ )
10g1
G
S
g2
g3
A
A
C
D
E
B
C
D
E
B
g1
g2
g3
A
C
D
E
B
11(No Transcript)
12(No Transcript)
13CANDIDATE ORTHOLOGS ARE
- EACH OTHERS TOP MATCH
- CLOSER TO ONE ANOTHER THAN TO
- A HOMOLOG FROM AN OUTGROUP
- ARCHITECTURALLY SIMILAR
14DISPLACEMENT of ORTHOLOGOUS GENES (DOGs)
WHERE AND MAY BE
ANALOGS PARALOGS
( BIZARRE PHYLOGENY ORTHOLOGS
(XENOLOGS) COEXISTENCE )
15DO NOT KNOW WHY, BUT MAY GUESS HOW
DIFFERENTIAL LOSS
INDEPENDENT RECRUITMENT
16 MEVALONATE PATHWAY FULL OF DOGs
C
H
3
C
H
O
O
O
O
3
C
H
O
H
O
H
O
H
O
H
3
O
OPPi
C
H
C
H
C
H
C
H
O
H
C
H
O
H
O
H
3
O
H
3
3
3
3
C
H
2
O
O
H
SCoA
O
OPPi
OPi
OPPi
CoA
DXPS pathway
17DOGged PATHWAY CANDIDATE APPROACH
- SEQUENCE SIMILARITY AND TREE ANALYSIS
- ( P H Y L O G E N O M I C S )
- PMK KINASE OF GHMP FAMILY
- PMK animal dNTP KINASE FAMILY
- MPPDC GHMP-RELATED
- - PHOSPHORYLATES SUBSTRATE
- IPPI MutT DOMAIN
- - PPase ACTIVITY USELESS
BINDING ? - EXHAUSTIVE DEFINITION OF ALL PARALOGS IN
ARCHAEA, - ANALYSIS OF TREES AND ORPHANS - 1 ,
NO , NO -
C
H
3
C
H
O
O
3
O
H
O
H
OPPi
C
H
C
H
C
H
O
H
3
O
H
3
3
C
H
2
OPPi
OPi
OPPi
18 N O N H O M O L O G Y METHODS
- GENOME CONTEXT METHODS DEPEND ON ORTHOLOGS !
! ! - DOMAIN FUSIONS - ROSETTA STONE - NO GO
- FUNCTIONS OF ADJACENT GENES
- PHYLETIC PATTERNS
- BORRELIA 5 MP GENES IN A SIX-GENE STRING
- ORTHOLOGS IN ALL ARCHAEA, ONLY PARALOGS
ELSEWHERE - IN 4 ARCHAEA, CLOSE TO 1 - 2 RELEVANT
GENES - sequence analysis suggests oxidoreductase
- some plant oxidoreductases have desaturase
activity - flips CC bond - unusual IPPI in
Borrelia archaea ? -
C
H
3
C
H
3
C
H
3
C
H
2
OPPi
OPPi
19MEVALONATE PATHWAY FULL OF DOGs
ARCHAEA
METAZOA
PLANTS
LCA
PLASMODIUM
BORRELIA (and cocci)
BACTERIA
lipids by DXP tryouts of MP
lipids by DXP lipids by MP
non-lipid isoprenoids by DXP non-lipid
isoprenoids by MP - B, E lipids by MP - A
lipids by fatty acids - B, E
20DOGs OF ALL TYPES
- STEP FOR STEP or PATH FOR PATH
- in Mycoplasmae
- AARSase for AARSase (Gly, Pro)
- AARSaseGln for amidotransferase complex ( 3
su ) - OPTIONAL LOSS or MUTUAL EXCLUSION
- analogous phosphoglyceromutases often together
- analogous thymidylate synthases never together
21USES FOR DOGs
- EVOLUTIONARY MARKERS
- WHERE DID BORRELIA GET ITS MP - ANIMALS ?
- NO PHYLOGENETIC SUPPORT
- WRONG TYPE OF PHOSPHOMEVALONATE KINASE
- TARGETS FOR DRUG DEVELOPMENT
- DXP RIase IN PLASMODIUM PMK IN BORRELIA ?
- PROTEIN - PROTEIN INTERACTIONS
- ENOLASE SURFACE TWO PATCHES DISCRIMINATE
BETWEEN SPECIES WITH DIFFERENT PGMs
22DOGs AND PHYLETIC PATTERNS
23DOGs AND PHYLETIC PATTERNS
ThyA
Two types of thymidylate synthase ThyA
conventional, all kingdoms transfer
of MeOH ThyX recently described, only in
some bacteria and archaea,
tranfer and reduction of MeOH
Dissimilar (?) structures, but at least
sequences are clearly dissimilar
ThyX (half)
24PATTERNS FOR THE TWO TYPES OF TS
- 0262 ------y-vdrlb-efghsn-j-i-w DHFR
- 0207 a-m---y--drlb-efghsn-j---w TS (ThyA)
- 1351 -o-pkz-qv-r--c------u-xit- ThyX
25PATTERNS ARE VECTORS
- 0262 ------y-vdrlb-efghsn-j-i-w DHFR
- 0207 a-m---y--drlb-efghsn-j---w TS (ThyA)
- 1351 -o-pkz-qv-r--c------u-xit- ThyX
Sp A
A, B, C
(1, 0, 0)
(1, 1, 0)
(1, 1, 1)
Sp B
Sp C
(0, 0, 0)
DISTANCES BETWEEN VECTORS ARE MEASURED -
EUCLIDEAN OR NOT - SP. VECTORS IN
GENE SPACE gt STATEMENTS ABOUT SPECIES
- GENE VECTORS IN SP. SPACE gt STATEMENTS
ABOUT GENES
26WHICH VECTOR IS THE FARTHEST FROM MINE ?
- NOTE THAT THIS IS AN ASYMMETRIC PROPERTY !
- 2131 ? 0717 ? 1428 or 1428 ? 0717 ? 2131 work,
but 0717 ? does not - 0717 aompkz-q--r-bcef-hsnujxi-- dCytidine DA
- 2131 a--pk-y-vd-lb---g-------t- dCytidylate DA
- 1428 ---------d-lb-----------tw dNuc kinase
- DOG1 0717 vs 21311428 DOG(s) of dNuc kinase
- loss of salvage function in mycoplasmae
27CONSERVED GENE ORDER
- ONE EXTREME CONSERVATION THROUGHOUT THE MAP
- OTHER EXTREME - ONLY IF PHYSICALLY INTERACT