Title: Bioinformatics Werkbespreking 2006-11-07
1Bioinformatics Werkbespreking 2006-11-07
- 1 PhyloPat phylogenetic pattern analysis of
eukaryotic genes (20 slides) - 2 Chicken-human immunogenomics project (9
slides)
2PhyloPatphylogenetic pattern analysisof
eukaryotic genes
- Tim Hulsen
- 2006-10-17
- BeNeLux BioInformatics Conference 2006
3Introduction (1)
- Phylogenetic patterns show presence/absence of
genes over a certain set of species - e.g. for 10 species 0011101011
- Very useful for all kinds of evolutionary
analyses - Origin of certain genes
- Deletion of certain genes
- Clustering of genes with similar patterns likely
to have similar function / be in same pathway
4Introduction (2)
- Earlier phylogenetic pattern initiatives
- Phylogenetic Pattern Search (PPS), incorporated
into COG (Natale et al., 2000) - Extended Phylogenetic Patterns Search (EPPS)
(Reichard Kaufmann, 2003) - Incorporated into OrthoMCL-DB (Chen et al., 2006)
- All applied on proteins, not on genes!
- ? PhyloPat phylogenetic pattern analysis of
eukaryotic genes
5Method
- Genes easier to check for lineage-specific
expansions (no alternative transcripts or splice
forms) less redundant - Basis Ensembl (EnsMart) database 21 fully
available genomes (i.e. no Pre! versions or low
coverage genomes) S. cer. to H. sap. - Make use of accurate Ensembl orthology pipeline
(combination of BLAST,SW,MUSCLE and PHYML) - Single linkage cluster algorithm create
orthologous groups containing ALL genes in Ensembl
6Results
- 446,825 genes were clustered into 147,922 groups,
using 3,164,088 orthologies from 21 species - Species ordered from low ( ) to high (
), i.e. approximate distance to human - Can be queried in several ways
- Output in HTML, Excel or plain text format
7Web interface
http//www.cmbi.ru.nl/phylopat
8Pattern/ID Search
- Binary string
- 0absent, 1present, absent/present
- e.g. 0000011111111
- ? must be absent in non-chordata
, must be present in all mammals - MySQL regular expression
- e.g. 01100
- ? gives all genes that occur only in ten
subsequent species - Input list of Ensembl/EMBL IDs (PhyloPat contains
EMBL to Ensembl mapping)
9Output
10Phylogenetic Tree
11Oligo-/Polypresent Genes
- Oligopresent present in only one/two species
(oligofew), - e.g. 000000010000000000100
- These two species should be highly related
- C. sav C. int
1737 div. 100 Mya - (Boffelli et al., 2004)
- T. nig T. rub
1572 div. 85 Mya - (Yakanoue et al., 2006)
- A. gam A. Aeg 1058 div.
140 Mya - (Service, 1993)
- P. tro H. sap
887 div . 6 Mya - (Glazko Nei, 2003)
- R. nor M. Mus
713 div. 20 Mya - (Springer et al., 2003)
- Polypresent present in all species, except for
one/two (polymany), - e.g. 111110111110111111111
- These two species should be related too similar
analysis possible
12Omnipresent genes
- Omnipresent present in all 21 species
(omniall) 111111111111111111111 - Currently 1001 omnipresent groups
- Tend to have very general/important functions,
mostly involved in transcription/translation
13FatiGO analysis
- FatiGO connection with GO terms, KEGG pathways,
InterPro domains, etc. (El-Shahrour et al., 2004) - Analysis of all human genes in output by just
one mouse click - e.g. omnipresent genes
14Other possibilities
- Anti-correlating patterns
- e.g. 001111100011000000000
- and 110000011100111111111
- ? could be completely different, or very
similar (analogous)! - Easy homology-inferred functional annotation
(using information from other genes in the same
lineage)
15Case study Hox genes (1)
- Hox genes determine where limbs and other body
segments will grow in a developing embryo - Should exist mostly in vertebrates
- Expansion in teleost fish species (
, 8-11) - seven Hox clusters instead of the mammalian four
- Search Ensembl database for human genes with
term hox in annotation - 44 genes found -gt enter in PhyloPat -gt 32 groups
found (PP)
16Case study Hox genes (2)
PPID genes per species phylogenetic
pattern gene name(s) PP022041
011111136562233233222 011111111111111111111
MSX1, MSX2 PP024984 001000011111001111111
001000011111001111111 HOXC4 PP027791
001110023343233333333 001110011111111111111
TLX1, TLX2, TLX3 PP049478 000000221153112322223
000000111111111111111 HOXB8, HOXC8,
HOXD8 PP053824 000000011120010101011
000000011110010101011 HOXD11 PP053827
000000022211111111111 000000011111111111111
HOXA10 PP053828 000000021111212122222
000000011111111111111 HOXC13, HOXD13 PP053829
000000063341122222222 000000011111111111111
HOXA1, HOXB1 PP053830 000000011110010111111
000000011110010111111 HOXB4 PP053832
000000021111011111111 000000011111011111111
HOXA5 PP053833 000000021110111111011
000000011110111111011 HOXB2 PP053834
000000031101011111111 000000011101011111111
HOXD3 PP053835 000000021110111111101
000000011110111111101 HOXA9 PP053836
000000021111111111111 000000011111111111111
HOXA3 PP053838 000000021110101111111
000000011110101111111 HOXC12 PP053839
000000011111111110111 000000011111111110111
HOXD4 PP053840 000000021111201011101
000000011111101011101 HOXC11 PP053842
000000043221111111111 000000011111111111111
HOXA13 PP053844 000000032231011111111
000000011111011111111 HOXB5 PP053845
000000021111111111011 000000011111111111011
HOXB3 PP053846 000000021121111111111
000000011111111111111 HOXD10 PP053847
000000022211111111111 000000011111111111111
HOXA2 PP053849 000000034151132333323
000000011111111111111 HOXA6, HOXB6,
HOXC6 PP053853 000000011101111111011
000000011101111111011 HOXA4 PP053854
000000032252223133213 000000011111111111111
HOXB9, HOXC9, HOXD9 PP053858 0000000111200111111
11 000000011110011111111 HOXA11 PP070659
000000000121212222222 000000000111111111111
HOXA7, HOXB7 PP075622 000000000010001111111
000000000010001111111 HOXC5 PP084287
000000000001101111111 000000000001101111111
HOXC10 PP085049 000000000001011011111
000000000001011011111 HOXD1 PP087941
000000000000111011111 000000000000111011111
HOXD12 PP089685 000000000000111111111
000000000000111111111 HOXB13
17Case study Hox genes (3)
PPID(s) name cl.A cl.B
cl.C cl.D first sp. position PP053829,085049
HOX1 HOXA1 HOXB1 HOXD1
T. nigrov. anterior PP053847,053833
HOX2 HOXA2 HOXB2 T. nigrov.
anterior PP053836,053845,053834 HOX3
HOXA3 HOXB3 HOXD3 T. nigrov.
PG3 PP053832,053844,075622 HOX5 HOXA5
HOXB5 HOXC5 T. nigrov. central PP053849
HOX6 HOXA6 HOXB6 HOXC6
T. nigrov. central PP053835,053854
HOX9 HOXA9 HOXB9 HOXC9 HOXD9 T. nigrov.
posterior PP053827,084287,053846 HOX10
HOXA10 HOXC10 HOXD10 T. nigrov.
posterior PP053858,053840,053824 HOX11
HOXA11 HOXC11 HOXD11 T. nigrov.
posterior PP053838,087941 HOX12
HOXC12 HOXD12 T. nigrov.
posterior PP053842,089685,053828 HOX13
HOXA13 HOXB13 HOXC13 HOXD13 T. nigrov.
posterior PP053853,053830,024984,053839 HOX4
HOXA4 HOXB4 HOXC4 HOXD4 A. gamb.
central PP027791 TLX TLX1
TLX2 TLX3 A. gamb. PP070659
HOX7 HOXA7 HOXB7
G. acul. central PP049478
HOX8 HOXB8 HOXC8 HOXD8 C. intest.
central PP022041 MSX
MSX1 MSX2 C. eleg.
First vertebrate
Non- vertebrate
Vertebrate
Non- vertebrate
Non- vertebrate
18Conclusions
- PhyloPat quick and easy tool for phylogenetic
pattern search on complete Ensembl database - Also usable for study of lineage-specific
expansions of genes - Just updated to Ensembl v41 (released last
Thursday) 5 new species - D.nov E.tel L.afr O.cun
O.lat - extra option gene neighborhood
19Gene neighborhood
Conservation of gene order functionally related
Equal color belonging to same orthologous group
20Acknowledgements
supervisor
- Supervision
- Peter Groenen
- Jacob de Vlieg
- Fruitful discussions
- Wilco Fleuren
- Erik Franck
- Nanning de Jong
- Arnold Kuzniar
head of group
suggestions
suggestions
suggestions
suggestions
21Where to find
- Web interface
- http//www.cmbi.ru.nl/phylopat
- (accessible through www.cmbi.ru.nl and
www.nbic.nl) - Publication
- Hulsen T., Groenen P.M.A., de Vlieg J.
- BMC Bioinformatics 2006, 7 398
- http//www.biomedcentral.com/1471-2105/7/398
- Powered by Ensembl
- http//www.ensembl.org/info/about/ensembl_powered
.html
22Bioinformatics Werkbespreking 2006-11-07
- 1 PhyloPat phylogenetic pattern analysis of
eukaryotic genes (20 slides) - 2 Chicken-human immunogenomics project (9
slides)
23Chicken-human immunogenomics project (part of
Biorange SP3.2.2)
In collaboration with Martien Groenen,
Hinri Kerstens (Animal Sciences Group, Wageningen
UR)
- Goals
- study evolution of genes/proteins involved in
immune system, from chicken to human - check for expansions and deletions in families
- zoom in to interesting families
24Proteins -gt Genes
- Earlier initiatives based on proteins (Protein
World, IPI, ParAlign, MCL) - Disadvantages
- large scale computations needed for orthology
determination - Difficult to study lineage-specific expansions
because of alternative transcripts, isoforms - Difficult to connect to WUR synteny data
- --gt Genes connect to PhyloPat tool
25PhyloPat
- PhyloPat queries the orthologies of all complete
genomes within Ensembl database using
phylogenetic patterns - Advantages
- Usage of accurate orthology determination of
Ensembl (BLAST/SW, MUSCLE, PHYML), single linkage
clustering by ourselves) - No alternative transcripts, isoforms
- Easy to connect to WUR synteny data
- 26 species, from S.cer. to H.sap.
- Disadvantage
- Genome information sometimes incomplete (but
Pre-versions and low coverage genomes are not
included)
26Immunophyle
- Application to immune system parse through
PhyloPat set using IRIS database - Take all HUGO IDs from IRIS database, input in
PhyloPat -gt 585 immunologic lineages containing
18,933 genes from 26 species - Divided into immunologic 22 categories from IRIS
database (adaptive immunity, innate immunity,
inflammation, chemotaxis, etc. - Connected to GO, InterPro, KEGG, etc. by FatiGO
27Immunophyle
- http//www.cmbi.ru.nl/immunophyle
28Categories
Cat Category ( lineages, genes) Lin Sc Ce Ag Aa Dm Cs Ci Tn Tr Ol Ga Dr Xt Gg Md Dn Bt Cf Et La Rn Mm Oc Mm Pt Hs Total
All All immunologic lineages (585,18933) 585 54 156 193 211 214 219 239 876 830 824 855 1015 686 685 969 740 1121 948 870 802 1070 1163 1131 818 1087 1157 18933
InImm Innate Immunity (272, 8640) 272 17 51 77 89 81 81 93 351 355 339 351 420 304 295 466 355 566 435 416 384 517 571 539 384 535 568 8640
Inflm Inflammation (117, 4568) 117 13 34 45 57 43 53 55 202 200 194 197 237 179 150 227 200 267 221 215 197 263 302 271 194 265 287 4568
Chmtx Chemotaxis (54, 2374) 54 4 12 18 24 18 22 28 107 118 112 122 125 96 69 157 90 135 121 112 103 124 147 132 86 141 151 2374
Phago Phagocytosis (17, 890) 17 1 4 9 10 10 8 10 46 43 41 47 50 32 31 42 34 51 45 46 42 49 58 44 33 51 53 890
Compl Complement (33, 958) 33 0 3 13 7 7 11 19 45 41 43 43 54 37 31 50 36 58 45 48 34 60 62 55 43 54 59 958
Cy_Ch Cytokines and Chemokines (109, 2947) 109 2 11 14 20 18 18 18 122 119 124 119 148 92 120 144 106 219 173 143 133 175 187 195 143 190 194 2947
AdImm Adaptive Immunity (140, 4983) 140 17 44 37 40 48 59 62 212 207 204 219 253 158 170 246 188 330 260 225 223 276 315 324 244 303 319 4983
ClRsp Cellular Response (63, 2358) 63 6 26 20 23 22 36 41 106 101 102 100 119 78 96 116 93 138 112 105 104 124 148 148 111 137 146 2358
HmRsp Humoral Response (34, 1087) 34 3 9 8 8 9 8 9 48 46 48 45 47 37 40 49 43 60 58 55 50 61 65 76 68 65 72 1087
BMImm Barrier and Mucosal Immunity (18, 713) 18 0 1 10 9 15 2 4 20 25 17 27 24 18 11 33 30 68 42 38 32 48 58 47 25 52 57 713
Devlp Development of Immune System (50, 2044) 50 5 18 23 25 23 22 29 109 90 89 96 124 64 72 106 74 109 108 103 86 114 122 116 92 108 117 2044
AgPrc Antigen Processing (31, 830) 31 3 8 9 11 11 10 12 34 31 36 38 56 22 25 39 40 49 35 37 36 39 40 63 35 54 57 830
PtSig Immune Pathway or Signalling (224, 8245) 224 13 63 70 87 81 93 102 400 381 382 390 480 301 296 446 302 454 415 371 344 459 508 489 337 480 501 8245
Recpt Receptor (118, 3506) 118 2 18 16 20 18 18 24 148 151 150 158 187 125 124 165 141 205 191 170 154 227 240 226 156 231 241 3506
IndIm Induced by Immunomodulator (86, 3487) 86 7 23 28 25 40 29 32 172 163 159 175 200 129 122 171 130 224 184 154 154 198 218 197 151 193 209 3487
ImDef Involved in Immunodeficiency (30, 1013) 30 4 8 15 12 9 18 28 44 44 41 38 64 35 42 48 34 61 45 45 43 54 56 58 52 56 59 1013
AutIm Involved in Autoimmunity (19, 530) 19 0 1 12 6 1 5 3 23 24 26 23 32 18 20 25 19 29 30 28 22 29 30 31 29 31 33 530
ExpIT Expressed Primarily in Immune Tissues (134, 3970) 134 11 22 36 27 31 32 42 157 160 153 173 186 137 118 249 155 238 201 179 170 257 274 260 158 261 283 3970
Other Other (43, 1843) 43 9 28 32 31 37 23 25 99 82 80 90 105 86 78 84 78 98 82 75 74 93 92 96 73 94 99 1843
InKil Innate NK Killing (33, 1015) 33 1 4 9 9 6 6 8 31 37 28 32 35 44 23 38 49 70 49 52 50 74 88 72 38 80 82 1015
RlDis Related to Disease (91, 3141) 91 6 14 32 28 32 25 34 151 143 133 144 176 115 128 159 131 190 159 153 133 170 184 184 145 182 190 3141
Coagl Coagulation (51, 2624) 51 5 25 36 44 33 31 33 132 123 122 124 154 114 81 124 108 141 123 121 109 145 162 139 106 138 151 2624
29Example Toll-like receptors
GeneGo MetaCore, canonical pathway
30Example Toll-like receptors
Check ImmunoPhyle for each gene involved in the
TLR pathway
Lineage Sc Ce Ag Aa Dm Cs Ci Tn Tr Ol Ga Dr Xt Gg Md Dn Bt Cf Et La Rn Mm Oc Mm Pt Hs HUGO
IP406 0 0 0 0 0 0 0 0 0 0 0 0 2 1 1 2 2 2 0 1 2 2 3 2 3 3 TLR1/6/10
IP308 0 0 0 0 0 0 0 1 1 1 1 1 0 2 1 0 1 1 0 0 1 1 1 1 1 1 TLR2
IP197 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 TLR3
IP430 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1 1 1 1 1 1 1 1 TLR4
IP289 0 0 0 0 0 0 0 2 2 2 3 1 1 1 1 0 1 1 1 0 1 0 1 0 0 1 TLR5
IP359 0 0 0 0 0 0 0 1 1 1 2 4 1 1 1 1 0 1 2 2 1 2 2 1 2 2 TLR7/8
IP550 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0 1 1 TLR9
IP458 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1 1 1 1 1 0 1 1 IRAK1
IP475 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 1 1 1 1 1 IRAK2
IP397 0 0 0 0 0 0 0 0 0 0 1 3 0 0 1 1 1 1 1 1 1 1 1 1 1 1 IRAK3/IRAK-M
IP321 0 0 0 0 0 0 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 IRAK4
IP539 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 IL4
IP421 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 1 1 IL6
IP294 0 0 0 0 0 0 0 1 1 1 1 2 1 0 1 1 1 1 1 1 0 0 1 1 1 1 IL8
IP078 0 5 0 0 0 1 2 3 3 3 4 1 3 4 5 3 4 4 3 4 4 4 4 4 4 4 LBP
IP484 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1 1 1 1 1 1 1 1 LTA
IP057 0 1 1 0 0 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 TOLLIP
IP045 0 1 1 1 1 2 2 4 4 3 4 4 3 3 3 1 4 4 3 3 4 4 4 2 4 4 NFKB1,NFKB2,NFKBIA
IP132 0 0 1 0 1 1 0 1 1 1 1 1 1 1 1 2 2 1 1 1 1 1 2 1 1 1 TRAF6
IP059 0 1 1 1 1 1 1 7 7 5 6 4 4 0 3 0 3 2 2 0 2 3 2 0 2 3 JUN/JUNB/JUND
IP145 0 0 0 1 0 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 MAP3K7/TAK1
IP158 0 0 0 0 0 1 1 2 2 2 2 3 2 2 2 2 2 2 2 1 2 2 2 2 2 2 MAP3K7IP2
IP222 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 MAP3K14
IP101 0 1 1 1 1 0 0 5 4 5 5 4 1 3 4 3 3 3 3 3 3 3 3 3 3 3 MAP4K4
IP434 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 1 0 1 1 1 1 0 1 1 MAP2K3
Green first occurrence
Red deletion
31Current/future directions
- Connect to literature (CoPub?)
- Connect to expression data, protein interaction
data - Zoom in to families immunology expertise needed!