Title: FOG: HighResolution Fungal Orthologous Groups
1FOG High-Resolution Fungal Orthologous Groups
2What is this presentation about?
- What is orthology?
- Why do we study gene-ancestry/gene-trees
(phylogenies)? - Why high-resolution orthology?
- Automated high-resolution orthology detection
- The FOG database and some applications
3Orthology
- This gene in that other species
- We dont have chicken genes !
- They mean the corresponding gene ?
- Why that particular gene ?
- Sure this actually is the gene ?
- Sure that all n orthologs are correct ?
4Orthologous genes
5Duplications, Speciations, and Orthology
- Two genes in two species are orthologous if
- they derive from one gene
- in their last common ancestor
gene duplication by cell division
Orthologous genes are likely to have the same
function
6Detecting orthologous genes
- Usual methods based on blast hit qualitye.g.
bi-directional best hit (BBH)
7KOG clusters
- Based on triangle of BBH between genes of three
species - InParalogs are added
- Triangles are extended by other genes and other
species
8KOG statistics
Low Resolution There must be functional
specialization within these clusters!
These large KOG clusters must have multiple
representatives per species
9High-res versus Low-res
- Many,
- Complete, and
- Closely related
- genomes
Challenge Automatic Orthology assignment
10Gene Families
- Use PSI-blast to recognize (distant) homologs
- Split gene set into families of homologous genes
Challenge Promiscuous domains
Multi domain genes occur very often in
Eukaryotic genomes
11Gene Families
- Promiscuous domains cause genes to be only
partially homologous - Gene A-B is partially homolgous to gene A-C, as
is gene B-C - Merging everything with homologous parts
generates far too large gene families - Not possible to obtain proper multiple alignments
- More advanced technique for separating
multi-domain genes into gene families
12Generating Gene Families
- More advanced technique for the merging of genes
into gene families is not functional yet - Fall back on known gene families using KOG
- Low resolution orthology assignments for
Eukaryotes - Some inclusive families with many genes per
species - Some statistics
- 15 Fungal species with 104.440 genes in total
- Divided into 11.020 KOG clusters (gene families)
- Involving 70.867 genes ( 68)
13Uncertainty in trees
- Evolutionary noise
- Differing rates of evolution
- Convergent evolution (low complexity, coiled
coils) - Promiscuous domains (recombination, fusion,
fission) - Use of heuristic methods
- Multiple alignment
- Tree making
14Reading Gene-Trees
Although genes spec1,1 and spec2,1 are closer
relatives, their distance is larger than that
between spec1,1 and spec3,1
The tree suggests at least 2 gene losses
15Analyze trees but dont trust them fully
If this is correct . this cant be
- Rigid analysis suggests many duplications and
losses - Presume scp branch is wrongly placed!
16Analyze trees but dont trust them fully
- And if we accept wrong placement of branches
Three orthologous groups suggesting 15 gene
losses
Considering one wrongly placed gene leaves only 2
gene losses
17Automatic Orthology assignment
- LOFT Levels of Orthology From Trees
18Result
- Collection of genes is split into KOG families
- KOG families are aligned and phylogenetic trees
are derived - Phylogenetic trees are analyzed using LOFT
resulting in high-resolution orthology
19Result
20Can LOFT be trusted?
21It seems okay!
22Applications
- We now have FOG a complete set of high
resolution orthology assignments for fungi - We know which orthologous genes are present and
absent in which species - Phyletic distribution
23Complex I
24Complex I
25Complex I
26Phyletic distribution of mitochondrial
orthologous groups
27Phylogenetic Tree for Mitochondrial Carrier
Proteins
28Orthologous group 24 is an uncharacterized
mitochondrial carrier
In yeast this is known as YMC1, unknown function
It is present in all fungi, except in Ashbya
gossypii
29YMC1 predicted glycine/serine antiporter
- There are three S.cerevisiae genes with the same
phyletic distribution - subunit glycine decarboxylase
- other subunit glycine decarboxylase
- gene with unknown function