Bioinformatics Werkbespreking 20061107 - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Bioinformatics Werkbespreking 20061107

Description:

BeNeLux BioInformatics Conference 2006. Introduction (1) Phylogenetic patterns show presence/absence of genes over a certain set of species: ... – PowerPoint PPT presentation

Number of Views:20
Avg rating:3.0/5.0
Slides: 32
Provided by: Hul78
Category:

less

Transcript and Presenter's Notes

Title: Bioinformatics Werkbespreking 20061107


1
Bioinformatics Werkbespreking 2006-11-07
  • 1 PhyloPat phylogenetic pattern analysis of
    eukaryotic genes (20 slides)
  • 2 Chicken-human immunogenomics project (9
    slides)

2
PhyloPatphylogenetic pattern analysisof
eukaryotic genes
  • Tim Hulsen
  • 2006-10-17
  • BeNeLux BioInformatics Conference 2006

3
Introduction (1)
  • Phylogenetic patterns show presence/absence of
    genes over a certain set of species
  • e.g. for 10 species 0011101011
  • Very useful for all kinds of evolutionary
    analyses
  • Origin of certain genes
  • Deletion of certain genes
  • Clustering of genes with similar patterns likely
    to have similar function / be in same pathway

4
Introduction (2)
  • Earlier phylogenetic pattern initiatives
  • Phylogenetic Pattern Search (PPS), incorporated
    into COG (Natale et al., 2000)
  • Extended Phylogenetic Patterns Search (EPPS)
    (Reichard Kaufmann, 2003)
  • Incorporated into OrthoMCL-DB (Chen et al., 2006)
  • All applied on proteins, not on genes!
  • ? PhyloPat phylogenetic pattern analysis of
    eukaryotic genes

5
Method
  • Genes easier to check for lineage-specific
    expansions (no alternative transcripts or splice
    forms) less redundant
  • Basis Ensembl (EnsMart) database 21 fully
    available genomes (i.e. no Pre! versions or low
    coverage genomes) S. cer. to H. sap.
  • Make use of accurate Ensembl orthology pipeline
    (combination of BLAST,SW,MUSCLE and PHYML)
  • Single linkage cluster algorithm create
    orthologous groups containing ALL genes in Ensembl

6
Results
  • 446,825 genes were clustered into 147,922 groups,
    using 3,164,088 orthologies from 21 species
  • Species ordered from low ( ) to high (
    ), i.e. approximate distance to human
  • Can be queried in several ways
  • Output in HTML, Excel or plain text format

7
Web interface
http//www.cmbi.ru.nl/phylopat
8
Pattern/ID Search
  • Binary string
  • 0absent, 1present, absent/present
  • e.g. 0000011111111
  • ? must be absent in non-chordata
    , must be present in all mammals
  • MySQL regular expression
  • e.g. 01100
  • ? gives all genes that occur only in ten
    subsequent species
  • Input list of Ensembl/EMBL IDs (PhyloPat contains
    EMBL to Ensembl mapping)

9
Output
10
Phylogenetic Tree
11
Oligo-/Polypresent Genes
  • Oligopresent present in only one/two species
    (oligofew),
  • e.g. 000000010000000000100
  • These two species should be highly related
  • C. sav C. int
    1737 div. 100 Mya
  • (Boffelli et al., 2004)
  • T. nig T. rub
    1572 div. 85 Mya
  • (Yakanoue et al., 2006)
  • A. gam A. Aeg 1058 div.
    140 Mya
  • (Service, 1993)
  • P. tro H. sap
    887 div . 6 Mya
  • (Glazko Nei, 2003)
  • R. nor M. Mus
    713 div. 20 Mya
  • (Springer et al., 2003)
  • Polypresent present in all species, except for
    one/two (polymany),
  • e.g. 111110111110111111111
  • These two species should be related too similar
    analysis possible

12
Omnipresent genes
  • Omnipresent present in all 21 species
    (omniall) 111111111111111111111
  • Currently 1001 omnipresent groups
  • Tend to have very general/important functions,
    mostly involved in transcription/translation

13
FatiGO analysis
  • FatiGO connection with GO terms, KEGG pathways,
    InterPro domains, etc. (El-Shahrour et al., 2004)
  • Analysis of all human genes in output by just
    one mouse click
  • e.g. omnipresent genes

14
Other possibilities
  • Anti-correlating patterns
  • e.g. 001111100011000000000
  • and 110000011100111111111
  • ? could be completely different, or very
    similar (analogous)!
  • Easy homology-inferred functional annotation
    (using information from other genes in the same
    lineage)

15
Case study Hox genes (1)
  • Hox genes determine where limbs and other body
    segments will grow in a developing embryo
  • Should exist mostly in vertebrates
  • Expansion in teleost fish species (
    , 8-11)
  • seven Hox clusters instead of the mammalian four
  • Search Ensembl database for human genes with
    term hox in annotation
  • 44 genes found -gt enter in PhyloPat -gt 32 groups
    found (PP)

16
Case study Hox genes (2)
PPID genes per species phylogenetic
pattern gene name(s) PP022041
011111136562233233222 011111111111111111111
MSX1, MSX2 PP024984 001000011111001111111
001000011111001111111 HOXC4 PP027791
001110023343233333333 001110011111111111111
TLX1, TLX2, TLX3 PP049478 000000221153112322223
000000111111111111111 HOXB8, HOXC8,
HOXD8 PP053824 000000011120010101011
000000011110010101011 HOXD11 PP053827
000000022211111111111 000000011111111111111
HOXA10 PP053828 000000021111212122222
000000011111111111111 HOXC13, HOXD13 PP053829
000000063341122222222 000000011111111111111
HOXA1, HOXB1 PP053830 000000011110010111111
000000011110010111111 HOXB4 PP053832
000000021111011111111 000000011111011111111
HOXA5 PP053833 000000021110111111011
000000011110111111011 HOXB2 PP053834
000000031101011111111 000000011101011111111
HOXD3 PP053835 000000021110111111101
000000011110111111101 HOXA9 PP053836
000000021111111111111 000000011111111111111
HOXA3 PP053838 000000021110101111111
000000011110101111111 HOXC12 PP053839
000000011111111110111 000000011111111110111
HOXD4 PP053840 000000021111201011101
000000011111101011101 HOXC11 PP053842
000000043221111111111 000000011111111111111
HOXA13 PP053844 000000032231011111111
000000011111011111111 HOXB5 PP053845
000000021111111111011 000000011111111111011
HOXB3 PP053846 000000021121111111111
000000011111111111111 HOXD10 PP053847
000000022211111111111 000000011111111111111
HOXA2 PP053849 000000034151132333323
000000011111111111111 HOXA6, HOXB6,
HOXC6 PP053853 000000011101111111011
000000011101111111011 HOXA4 PP053854
000000032252223133213 000000011111111111111
HOXB9, HOXC9, HOXD9 PP053858 0000000111200111111
11 000000011110011111111 HOXA11 PP070659
000000000121212222222 000000000111111111111
HOXA7, HOXB7 PP075622 000000000010001111111
000000000010001111111 HOXC5 PP084287
000000000001101111111 000000000001101111111
HOXC10 PP085049 000000000001011011111
000000000001011011111 HOXD1 PP087941
000000000000111011111 000000000000111011111
HOXD12 PP089685 000000000000111111111
000000000000111111111 HOXB13
17
Case study Hox genes (3)
PPID(s) name cl.A cl.B
cl.C cl.D first sp. position PP053829,085049
HOX1 HOXA1 HOXB1 HOXD1
T. nigrov. anterior PP053847,053833
HOX2 HOXA2 HOXB2 T. nigrov.
anterior PP053836,053845,053834 HOX3
HOXA3 HOXB3 HOXD3 T. nigrov.
PG3 PP053832,053844,075622 HOX5 HOXA5
HOXB5 HOXC5 T. nigrov. central PP053849
HOX6 HOXA6 HOXB6 HOXC6
T. nigrov. central PP053835,053854
HOX9 HOXA9 HOXB9 HOXC9 HOXD9 T. nigrov.
posterior PP053827,084287,053846 HOX10
HOXA10 HOXC10 HOXD10 T. nigrov.
posterior PP053858,053840,053824 HOX11
HOXA11 HOXC11 HOXD11 T. nigrov.
posterior PP053838,087941 HOX12
HOXC12 HOXD12 T. nigrov.
posterior PP053842,089685,053828 HOX13
HOXA13 HOXB13 HOXC13 HOXD13 T. nigrov.
posterior PP053853,053830,024984,053839 HOX4
HOXA4 HOXB4 HOXC4 HOXD4 A. gamb.
central PP027791 TLX TLX1
TLX2 TLX3 A. gamb. PP070659
HOX7 HOXA7 HOXB7
G. acul. central PP049478
HOX8 HOXB8 HOXC8 HOXD8 C. intest.
central PP022041 MSX
MSX1 MSX2 C. eleg.
First vertebrate
Non- vertebrate
Vertebrate
Non- vertebrate
Non- vertebrate
18
Conclusions
  • PhyloPat quick and easy tool for phylogenetic
    pattern search on complete Ensembl database
  • Also usable for study of lineage-specific
    expansions of genes
  • Just updated to Ensembl v41 (released last
    Thursday) 5 new species
  • D.nov E.tel L.afr O.cun
    O.lat
  • extra option gene neighborhood

19
Gene neighborhood
Conservation of gene order functionally related
Equal color belonging to same orthologous group
20
Acknowledgements
supervisor
  • Supervision
  • Peter Groenen
  • Jacob de Vlieg
  • Fruitful discussions
  • Wilco Fleuren
  • Erik Franck
  • Nanning de Jong
  • Arnold Kuzniar

head of group
suggestions
suggestions
suggestions
suggestions
21
Where to find
  • Web interface
  • http//www.cmbi.ru.nl/phylopat
  • (accessible through www.cmbi.ru.nl and
    www.nbic.nl)
  • Publication
  • Hulsen T., Groenen P.M.A., de Vlieg J.
  • BMC Bioinformatics 2006, 7 398
  • http//www.biomedcentral.com/1471-2105/7/398
  • Powered by Ensembl
  • http//www.ensembl.org/info/about/ensembl_powered
    .html

22
Bioinformatics Werkbespreking 2006-11-07
  • 1 PhyloPat phylogenetic pattern analysis of
    eukaryotic genes (20 slides)
  • 2 Chicken-human immunogenomics project (9
    slides)

23
Chicken-human immunogenomics project (part of
Biorange SP3.2.2)
In collaboration with Martien Groenen,
Hinri Kerstens (Animal Sciences Group, Wageningen
UR)
  • Goals
  • study evolution of genes/proteins involved in
    immune system, from chicken to human
  • check for expansions and deletions in families
  • zoom in to interesting families

24
Proteins -gt Genes
  • Earlier initiatives based on proteins (Protein
    World, IPI, ParAlign, MCL)
  • Disadvantages
  • large scale computations needed for orthology
    determination
  • Difficult to study lineage-specific expansions
    because of alternative transcripts, isoforms
  • Difficult to connect to WUR synteny data
  • --gt Genes connect to PhyloPat tool

25
PhyloPat
  • PhyloPat queries the orthologies of all complete
    genomes within Ensembl database using
    phylogenetic patterns
  • Advantages
  • Usage of accurate orthology determination of
    Ensembl (BLAST/SW, MUSCLE, PHYML), single linkage
    clustering by ourselves)
  • No alternative transcripts, isoforms
  • Easy to connect to WUR synteny data
  • 26 species, from S.cer. to H.sap.
  • Disadvantage
  • Genome information sometimes incomplete (but
    Pre-versions and low coverage genomes are not
    included)

26
Immunophyle
  • Application to immune system parse through
    PhyloPat set using IRIS database
  • Take all HUGO IDs from IRIS database, input in
    PhyloPat -gt 585 immunologic lineages containing
    18,933 genes from 26 species
  • Divided into immunologic 22 categories from IRIS
    database (adaptive immunity, innate immunity,
    inflammation, chemotaxis, etc.
  • Connected to GO, InterPro, KEGG, etc. by FatiGO

27
Immunophyle
  • http//www.cmbi.ru.nl/immunophyle

28
Categories
29
Example Toll-like receptors
GeneGo MetaCore, canonical pathway
30
Example Toll-like receptors
Check ImmunoPhyle for each gene involved in the
TLR pathway
Green first occurrence
Red deletion
31
Current/future directions
  • Connect to literature (CoPub?)
  • Connect to expression data, protein interaction
    data
  • Zoom in to families immunology expertise needed!
Write a Comment
User Comments (0)
About PowerShow.com