Title: Orthology Analysis
1Orthology Analysis
Erik SonnhammerCenter for Genomics and
BioinformaticsKarolinska Institutet, Stockholm
2Outline
- Basic concepts
- BLAST-based approaches to orthology
- Tree-based approaches to orthology
- Domain-level orthology
3Homologs
- genes with a common origin
- May be genes in the same or in different
organisms - Does not say that function is identical
- Can only be true or false, and not a percentage!
- Homologs have the same 3D-structure layout
4Homologs
Orthologs
Paralogs
5Orthologs separated by speciation
Gene Xin human
Orthologs
Gene Xin ancient mammal
Gene X in rat
S
Out-paralogs
paralogs
In-paralogs
D
Orthologs
S
speciation
Time
6In/Out-paralog definition
- In-paralogs co-orthologs
- paralogs that were duplicated after the
speciation and hence are orthologs to a cluster
in the other species - Out-paralogs not co-orthologs
- paralogs that were duplicated before the
speciation. Not necessarily in the same species.
Sonnhammer Koonin, Trends Genet. 18619-620
(2002)
7Orthologs for functional genomics
- Co-orthologs / inparalogs are more likely than
outparalogs to have identical biochemical
functions and biological roles. - Co-orthologs can be used to discover human gene
function via model organism experiments - Co-orthologs are key to exploit functional
genomics/proteomics data in in model organisms
8Orthology and function conservation
- Orthology does not say anything about
evolutionary distance. - Close orthologs, e.g. human-mouse are very likely
to have the same biological role in the organism. - Distant orthologs, e.g. human-worm are less
likely to have the same phenotypical role, but
may have the same role in the corresponding
pathway.
9Ortholog Databases
Sequence database Orthology detection method Ortholog database
SwTrembl proteomes Inparanoid (blast) Inparanoid
proteomes COGs (blast) COGs / KOGs
TIGR gene index COGs (blast) TOGA/EGO
proteomes OrthoMCL (blast) OrthoMCL
Pfam Orthostrapper (tree) HOPS
Pfam RIO (tree)
10How to find orthologs?
- 1. Calculate phylogenetic tree, look for
orthologs in the tree (Orthostrapper, Rio)
2. Two-way best matches between two species can
be used to find orthologs without
trees. However, in-paralogs are harder to find
this way
11Two-way best match approachto finding orthologs
12COGs
Out- paralogs
13Blue species 1 Red species 2
Inparalog n ortholog identification
Inpara-n-oid
14Blue species 1 Red species 2
Inparanoid
15Resolve overlapping clusters
No overlap - no problems
Partial overlap - separate
Complete overlap - merge
16Inparalog score
B
0
20
40
60
80
100
A
P
Score for inparalog P (scoreAP - scoreAB) /
(scoreAA - scoreAB)
17Confidence values for main orthologs from sampling
- TVHIVDDEEPVR---KSLAFM---LTMNGFA
- T DD R K L M T G A
- TILLIDDHPMLRTGVKQLISMAPDITVVGEA
- Sampling with replacement
insertions kept intact - GAFDEP---LVTHVR..........
- GA T R
- GAEEHMAPDILTLLR..........
- Bootstrap
alignment -gt bootstrap score - Confidence (bootstrap alignments best-best
matches / nr of bootstraps)
18http//inparanoid.cgb.ki.se
19inparanoid.cgb.ki.se
Homo Sapiens vs. C. elegans
Remm et al, J. Mol. Biol. 3141041-1052 (2001)
20Ortholog group sizes, human vs X
21Nr of inparalogs per ortholog group
Species Avg. inparalogs in model organism ortholog groups Avg. inparalogs in human ortholog groups
Mouse 1.36 1.56
Fly 1.77 2.75
Worm 1.44 3.13
Mustard weed 3.73 3.33
Yeast 1.26 3.34
E. coli 1.73 3.57
22Drawbacks of Blast-based orthology assignment
- No guarantee that the same segment is used in
different sequences - No evolutionary distance model
- Does not take multiple domains into account
23Domain orthology
- Inparanoid Human-Fly ortholog pairs with domains
in Pfam-A 13.0 20335 - Different domain architectures 5411
- Many of these are minor differences, e.g. 22 vs
21 Spectrin repeats - Sometimes the difference is big
- ef-hand UCH
- TBC UCH
24Tree-based approaches
25Distance-based tree building
A1 MKFYSLPNFPEN A2 MKYYKLPDLPDE A3
MRFYTACENPRS
Distance matrix
1
A2 A3
A1 4 8
A2 10
A1 A2 A3
2
3
5
- Bootstrapping
- randomly pick columns to bootstrap alignment,
calculate tree - Repeat 1000 times, frequency of node bootstrap
support
26Orthology by tree reconciliation
Species tree
Gene tree
Infer 2 duplications and 2 losses
27Drawbacks of tree reconciliation for orthology
assignment
- Assumption that the species tree is fully known
- Does not give confidence values
- Gene trees become unreliable when involving a lot
of sequences (more data -gt less certainty) - Computationally expensive
28Partial tree reconciliation
- Find pairwise orthologs by computer parsing of
tree.
29Pairwise orthology confidence by orthostrapping
The original tree with bootstrap support values
30Pairwise orthology confidence by orthostrapping
31Pairwise orthology confidence by orthostrapping
32Pairwise orthology confidence by orthostrapping
33orthostrapper.cgb.ki.se
34Orthology is not transitive!
Multiple species at different distances may give
erroneous groups, that includes out-paralogs
35Orthology is not transitive!
Y H1 D1 H2 D2
Y
H2
D1
-gt Orthology strictly defined for only 2
species/cladesCombining species of different
distances is very dangerousBut OK to combine
multiple equidistant ones
36Domain-level orthology
37HOPS - Hierarchy of Orthologs and Paralogs
- All species in Pfam are bundled in groups
according to scheme
- Apply Orthostrapper to groups at same level in
Pfam families - Display results in NIFAS
38Pfam
39Pfam in brief
SEED alignment representative members
Profile-HMM HMMer-2.0
Search database
FULL alignment
Description file
Manually curated
Automatically made
- Release 13.0 (April 2004)
- 7426 families Pfam-A domain families
- Based on 1160000 sequences (Swissprot Trembl)
- 21980 unique Pfam-A domain architectures
- 73 of all proteins have gt1 Pfam-A domain
40HOPS results
- Pfam 10, 6190 families
- 2450 families (40) have HOPS orthologs
- 1319 families (21) have HOPS orthologs in all
6 pairwise comparisons - 286356 pairwise orthology assignments (gt 75
orthostrap)
Storm and Sonnhammer, Genome Research
132353-2362 (2003)
41Ways to access HOPS
- NIFAS graphical browser
- By sequence ID at Pfam.cgb.ki.se/HOPS
- Flatfiles (Orthostrap tables of 2 clades)
42Pfam.cgb.ki.se/HOPS
43(No Transcript)
44(No Transcript)
45Evolution of Domain Architectures
46ATP sulfurylase /APS kinase
47ATP sulfurylase domain, metazoa vs fungi
Orthologous shuffled domains?
48APS kinase domain
49HOPS orthologs of PPS1_HUMAN (ATP sulfurylase/APS
kinase)
50Summary of ATP sulfurylases/APS kinases
Shuffled non-orthologous domains
Metazoa
Fungi
51Conclusions
- Orthologs can be detected by
- Blast fast
- tree slow but less error-prone
- Species at different evolutionary distances
should not be combined in orthology analysis - Inparanoid and Orthostrapper were designed to
find inparalogs but not outparalogs - HOPS/NIFAS can be used to find domain orthologs
and analyze domain architecture evolution
52Future perspectives
- Multiparanoid multiple species merging of
pairwise Inparalogs. - Functional divergence among inparalogs
53Acknowledgments
- Christian Storm
- Maido Remm
- Andrey Alexeyenko
- Volker Hollich
- Mats Jonsson
http//sonnhammer.cgb.ki.se