MOST OF BIOLOGY IS LOOKING FOR SIGNALS - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

MOST OF BIOLOGY IS LOOKING FOR SIGNALS

Description:

EVOLUTIONARY ( HOW IT CAME TO BE LIKE IT IS ) ... ORTHOLOGY / PARALOGY IS ESTABLISHED AFTER 'FUNCTIONAL HOMOLOGY' USUALLY DOES NOT MAKE SENSE ... – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 28
Provided by: arcadymu
Category:

less

Transcript and Presenter's Notes

Title: MOST OF BIOLOGY IS LOOKING FOR SIGNALS


1
(MOST OF) BIOLOGY IS LOOKING FOR SIGNALS
  • STRUCTURAL ( HOW IT IS PUT TOGETHER )
  • FUNCTIONAL ( HOW IT WORKS )
  • EVOLUTIONARY ( HOW IT CAME TO BE
    LIKE IT IS )
  • S. and F. CAN BE STUDIED DIRECTLY -
    OBSERVATION, EXPERIMENT
  • E. HAPPENED WHEN WE WERE NOT THERE gt
    NEED TO INFER
  • INFERENCE IS TYPICALLY BASED ON COMPARISON
  • S., F., and E. signals may enhance or
    interfere with e.o.

2
COMPUTATIONAL APPROACHES IN BIOLOGY
  • RELY ON BACKGROUND KNOWLEDGE OF THE PROCESS
  • DB SEARCH STATISTICS WHAT IS A RANDOM
    MATCH ?
  • SIMILARITY BETWEEN SEQUENCES WHERE DOES THE
    SCORING FUNCTION COME FROM ?
  • PHYLOGENETIC TREES ARE CHANGE RATES
    CONSTANT OVER TIME?
  • etc., etc.
  • WHAT FOR ?
  • ESTIMATION OF PARAMETERS
  • DEEPER FOUNDATIONS WHAT IS POSSIBLE /
    ALLOWED

3
PROBLEM TO INTERPRET GENOME SEQUENCE
  • NON ALGORITHMIC, OPEN - ENDED
  • I NEW GENOME SEQUENCE, GENOMES AND OTHER
    SEQUENCE IN DB ( ALL BIOLOGICAL KNOWLEDGE )
  • O STATEMENTS ABOUT STRUCTURE, FUNCTION,
    AND EVOLUTIONARY HISTORY
  • NOTHING IN BIOLOGY MAKES SENSE EXCEPT
    WHEN IN LIGHT OF EVOLUTION ( Th.
    Dobzhansky )
  • NOT A PLUG, BUT LITERAL DESCRIPTION
  • CHANGE INPUT TO LIST OF PROTEIN CODING
    GENES

4
SIMILARITY SEARCH ANNOTATION TRANSFER
98 COG0494 NTP pyrophosphohydrolases including
oxidative damage repair enzymes 87 COG2217
Cation transport ATPases 78 COG0050 GTPases -
translation elongation factors 77 COG0037
Predicted ATPase of the PP-loop superfamily
implicated in cell cycle control 76 COG0330
Membrane protease subunits, stomatin/prohibitin
homologs 75 COG0492 Thioredoxin reductase 74
COG0480 Translation elongation and release
factors (GTPases) 71 COG0008 Glutamyl- and
glutaminyl-tRNA synthetases 69 COG0459
Chaperonin GroEL (HSP60 family) 65 COG0470
ATPase involved in DNA replication 65 COG0681
Signal peptidase I 60 COG0086 DNA-directed RNA
polymerase beta' subunit/160 kD subunit (split
gene in archaea and Syn) 60 COG0475 Kef-type K
transport systems, membrane components 58
COG1475 Predicted transcriptional regulators 56
COG0258 5'-3' exonuclease (including N-terminal
domain of PolI) 58 COG0550 Topoisomerase IA 58
COG0638 Proteasome protease subunit 57 COG0468
RecA/RadA recombinase 56 COG2890 Predicted rRNA
or tRNA methylase 56 COG0438 Predicted
glycosyltransferases 56 COG0616 Periplasmic
serine proteases (ClpP class) 53 COG0009
Putative translation factor (SUA5) 49 COG0085
DNA-directed RNA polymerase beta subunit/140 kD
subunit 48 COG0441 Threonyl-tRNA synthetase 48
COG0162 Tyrosyl-tRNA synthetase 47 COG0532
Translation initiation factor 2 (GTPase) 46
COG0080 Ribosomal protein L11 46 COG0048
Ribosomal protein S12 46 COG0522 Ribosomal
protein S4 and related proteins 46 COG0592 DNA
polymerase sliding clamp subunit (PCNA
homolog) 46 COG0350 Methylated DNA-protein
cysteine methyltransferase 45 COG0018
Arginyl-tRNA synthetase 45 COG0495 Leucyl-tRNA
synthetase 45 COG0143 Methionyl-tRNA
synthetase 45 COG0081 Ribosomal protein L
5
MOST OF THE PROTEINS ARE CONSERVED
AND MOST OF THE FAMILIES ARE ANNOTATED(
BUT MIND BOGUS NAMES , DOMAINS , AND FILTERING
! )
6
HOMOLOGY COMMON ANCESTRY
  • IT IS EITHER THERE OR IT IS NOT ( NO
    DEGREES )
  • OBJECTION 1 WHAT IF ONLY HALF OF THE
    MOLECULE IS HOMOLOGOUS ? - JUST SAY SO
  • OBJECTION 2 WE MAY MEAN THE DEGREE OF
    CERTAINTY THAT THEY ARE HOMOLOGOUS - 1. JUST
    SAY SO
  • 2. SOME STATISTICIANS DO NOT LIKE IT EITHER
  • 3. 60 IDENTITY MAY CONFER 100 BELIEF
    THAT HOMOLOGY EXISTS
  • ORTHOLOGY / PARALOGY IS ESTABLISHED AFTER
  • FUNCTIONAL HOMOLOGY USUALLY DOES NOT MAKE
    SENSE
  • ( CALL IT THE SAME FUNCTION )

7
HOMOLOGS AND THEIR SUBSETS
PARALOGS
ORTHOLOGS AND PARALOGS
ORTHOLOGS
8
WHAT IS A TREE, ANYWAY ?
  • TREE IS AN OBJECT OF MATHEMATICS -
    SPECIAL TYPE OF
  • A
    GRAPH
  • A SET OF ELEMENTS ( VERTICES ) PLUS A
    SET OF SOME PAIRS OF THESE ELEMENTS (
    EDGES )
  • GRAPH IS CONNECTED IF EACH TWO VERTICES
    ARE LINKED
  • (by one path or more)
  • CONNECTED GRAPH IS A TREE IF IT HAS NO
    CYCLES

This is not a TREE, and this is
9
PROBLEM MAP ORTHOLOG AND PARALOG
  • I TREE OF GENES G (BUILD USING KNOWN
    ALGORITHMS)
  • TREE OF SPECIES S (ASSUMED
    NON-CONTROVERSIAL)
  • O FOR EACH NODE g IN G LABEL IT AS
    DUPL. or SPEC.

g1
G
S
g2
g3
A
A
C
D
E
B
C
D
E
B
FOR EACH g , g (g) IS THE SET OF SPECIES
TO WHICH gs ANCESTORS BELONG FOR EACH s,
s(s) IS THE SET OF SPECIES DESCENDING FROM
s MAPPING FUNCTION M (g) IS THE LOWEST
NODE IN S , g (g) s(s) NODE g
(PARENT gi, gJ ) IS DUPLICATION iff M(g)
M(gi) OR M(g) M(gJ )
10
g1
G
S
g2
g3
A
A
C
D
E
B
C
D
E
B
g1
g2
g3
A
C
D
E
B
11
(No Transcript)
12
(No Transcript)
13
CANDIDATE ORTHOLOGS ARE
  • EACH OTHERS TOP MATCH
  • CLOSER TO ONE ANOTHER THAN TO
  • A HOMOLOG FROM AN OUTGROUP
  • ARCHITECTURALLY SIMILAR

14
DISPLACEMENT of ORTHOLOGOUS GENES (DOGs)
WHERE AND MAY BE
ANALOGS PARALOGS
( BIZARRE PHYLOGENY ORTHOLOGS
(XENOLOGS) COEXISTENCE )

15
DO NOT KNOW WHY, BUT MAY GUESS HOW
DIFFERENTIAL LOSS
INDEPENDENT RECRUITMENT
16
MEVALONATE PATHWAY FULL OF DOGs
C
H
3
C
H
O
O
O
O
3
C
H
O
H
O
H
O
H
O
H
3
O
OPPi
C
H
C
H
C
H
C
H
O
H
C
H
O
H
O
H
3
O
H
3
3
3
3
C
H
2
O
O
H
SCoA
O
OPPi
OPi
OPPi
CoA
DXPS pathway
17
DOGged PATHWAY CANDIDATE APPROACH
  • SEQUENCE SIMILARITY AND TREE ANALYSIS
  • ( P H Y L O G E N O M I C S )
  • PMK KINASE OF GHMP FAMILY
  • PMK animal dNTP KINASE FAMILY
  • MPPDC GHMP-RELATED
  • - PHOSPHORYLATES SUBSTRATE
  • IPPI MutT DOMAIN
  • - PPase ACTIVITY USELESS
    BINDING ?
  • EXHAUSTIVE DEFINITION OF ALL PARALOGS IN
    ARCHAEA,
  • ANALYSIS OF TREES AND ORPHANS - 1 ,
    NO , NO

C
H
3
C
H
O
O
3
O
H
O
H
OPPi
C
H
C
H
C
H
O
H
3
O
H
3
3
C
H
2
OPPi
OPi
OPPi
18
N O N H O M O L O G Y METHODS
  • GENOME CONTEXT METHODS DEPEND ON ORTHOLOGS !
    ! !
  • DOMAIN FUSIONS - ROSETTA STONE - NO GO
  • FUNCTIONS OF ADJACENT GENES
  • PHYLETIC PATTERNS
  • BORRELIA 5 MP GENES IN A SIX-GENE STRING
  • ORTHOLOGS IN ALL ARCHAEA, ONLY PARALOGS
    ELSEWHERE
  • IN 4 ARCHAEA, CLOSE TO 1 - 2 RELEVANT
    GENES
  • sequence analysis suggests oxidoreductase
  • some plant oxidoreductases have desaturase
    activity
  • flips CC bond - unusual IPPI in
    Borrelia archaea ?

C
H
3
C
H
3
C
H
3
C
H
2
OPPi
OPPi
19
MEVALONATE PATHWAY FULL OF DOGs
ARCHAEA
METAZOA
PLANTS
LCA
PLASMODIUM
BORRELIA (and cocci)
BACTERIA
lipids by DXP tryouts of MP
lipids by DXP lipids by MP
non-lipid isoprenoids by DXP non-lipid
isoprenoids by MP - B, E lipids by MP - A
lipids by fatty acids - B, E
20
DOGs OF ALL TYPES
  • STEP FOR STEP or PATH FOR PATH
  • in Mycoplasmae
  • AARSase for AARSase (Gly, Pro)
  • AARSaseGln for amidotransferase complex ( 3
    su )
  • OPTIONAL LOSS or MUTUAL EXCLUSION
  • analogous phosphoglyceromutases often together
  • analogous thymidylate synthases never together

21
USES FOR DOGs
  • EVOLUTIONARY MARKERS
  • WHERE DID BORRELIA GET ITS MP - ANIMALS ?
  • NO PHYLOGENETIC SUPPORT
  • WRONG TYPE OF PHOSPHOMEVALONATE KINASE
  • TARGETS FOR DRUG DEVELOPMENT
  • DXP RIase IN PLASMODIUM PMK IN BORRELIA ?
  • PROTEIN - PROTEIN INTERACTIONS
  • ENOLASE SURFACE TWO PATCHES DISCRIMINATE
    BETWEEN SPECIES WITH DIFFERENT PGMs

22
DOGs AND PHYLETIC PATTERNS

23
DOGs AND PHYLETIC PATTERNS
ThyA

Two types of thymidylate synthase ThyA
conventional, all kingdoms transfer
of MeOH ThyX recently described, only in
some bacteria and archaea,
tranfer and reduction of MeOH
Dissimilar (?) structures, but at least
sequences are clearly dissimilar
ThyX (half)
24
PATTERNS FOR THE TWO TYPES OF TS
  • 0262 ------y-vdrlb-efghsn-j-i-w DHFR
  • 0207 a-m---y--drlb-efghsn-j---w TS (ThyA)
  • 1351 -o-pkz-qv-r--c------u-xit- ThyX

25
PATTERNS ARE VECTORS

  • 0262 ------y-vdrlb-efghsn-j-i-w DHFR
  • 0207 a-m---y--drlb-efghsn-j---w TS (ThyA)
  • 1351 -o-pkz-qv-r--c------u-xit- ThyX

Sp A
A, B, C
(1, 0, 0)
(1, 1, 0)
(1, 1, 1)
Sp B
Sp C
(0, 0, 0)
DISTANCES BETWEEN VECTORS ARE MEASURED -
EUCLIDEAN OR NOT - SP. VECTORS IN
GENE SPACE gt STATEMENTS ABOUT SPECIES
- GENE VECTORS IN SP. SPACE gt STATEMENTS
ABOUT GENES
26
WHICH VECTOR IS THE FARTHEST FROM MINE ?
  • NOTE THAT THIS IS AN ASYMMETRIC PROPERTY !
  • 2131 ? 0717 ? 1428 or 1428 ? 0717 ? 2131 work,
    but 0717 ? does not
  • 0717 aompkz-q--r-bcef-hsnujxi-- dCytidine DA
  • 2131 a--pk-y-vd-lb---g-------t- dCytidylate DA
  • 1428 ---------d-lb-----------tw dNuc kinase
  • DOG1 0717 vs 21311428 DOG(s) of dNuc kinase
  • loss of salvage function in mycoplasmae

27
CONSERVED GENE ORDER
  • ONE EXTREME CONSERVATION THROUGHOUT THE MAP
  • OTHER EXTREME - ONLY IF PHYSICALLY INTERACT
Write a Comment
User Comments (0)
About PowerShow.com