Title: The Genome Gamble, Knowledge or Carnage?
1The Genome Gamble, Knowledge or Carnage?
- Comparative Genomics Leading the Way _at_ Organon
Tim Hulsen, Oss, November 11, 2003
2Summary
- (1) An introduction to orthology and paralogy
- (2) Orthology determination within eukaryotes
- (3) Testing the advantages of our ortholog set
- (4) Using evolutionary conservation of
co-expression for function prediction - (5) Evolutionary conservation of chromosomal
distance and orientation
3(1) An introduction to orthology and paralogy
- Homologous genes genes that have a common
ancestor - Orthologous genes genes that evolved from a
common ancestor through a speciation event (?
equivalents in different species) - Paralogous genes genes that evolved from a
common ancestor through a duplication event
4Orthology and paralogy explained graphically
(from http//www.ncbi.nlm.nih.gov/Education/BLASTi
nfo/Orthology.html)
5The importance of orthology and paralogy
- Orthology relationships especially important for
function prediction orthologous genes generally
have the same function but in different species - Paralogy relationships can be used for function
prediction too paralogous genes are often
involved in the same process, but have different
molecular functions (e.g. globins)
6(2) Orthology determination within eukaryotes
- Not much eukaryotic orthology available at this
moment - euKaryotic Orthologous Groups (KOG,NCBI)
- Inparanoid
- OrthoMCL
- Existing databases are either too inclusive or
too restrict - Most methods rely on best bidirectional hit
(E-value), while orthology is an evolutionary
principle.. should be determined using
phylogenetic trees!
7 Our orthology determination
within eukaryotes
- Hs
-
-
At, Ce, Dm, Ec, Gt, Hs, Mm, Sc, Sp - Zgt20, RHgt0.5QL
-
- 24,263 groups
GENOME
Hs-Mm 85,848 pairs Hs-Dm 55,934 pairs etc.
TREE SCANNING
8Our orthology determination using phylogenetic
trees
- Example BMP6 (Bone Morphogenetic Protein 6) ? 5
orthologous relations are defined, all Hs-Mm
9The ortholog database Eukaryortho
http//t2.teras.sara.nl4086
(only accessible from Organon, CMBI and SARA)
10(3) Testing the advantages of our ortholog set
- Quality of orthology difficult to test
- Orthologs should have more or less the same
function --gt use conservation of function as an
orthology benchmark - Gene Ontology (GO) database hierarchical system
of function and location descriptions - Orthologs are in same functional category when
they are in the same 4th level GO Molecular
Function class
11GO molecular function benchmark
- 0
- 1
- 2
- 3
- 4
- Molecular function one of the three subroots
(together with biological process and cellular
location) - True orthologs should share a 4th level
molecular function (here GO0019912) - Our Hs-Mm ortholog set 67
- KOG Hs-Mm ortholog set 51
12Co-expression benchmark
- Second method comparing expression profiles of
each orthologous gene pair - Using GeneLogic Expressor data set
- Human chips 3269 samples, 44792 fragments, 115
tissue categories, 15 SNOMED tissue categories - Mouse chips 859 samples, 36701 fragments, 25
tissue categories, 12 SNOMED tissue categories
13SNOMED tissue categories used for co-expression
calculation
HUMAN MOUSE
1 Blood vessel 1 Blood vessel
2 Cardiovascular system 2 Cardiovascular system
3 Digestive organs 3 Digestive organs
4 Digestive system 4 Digestive system
5 Endocrine gland -
6 Female genital system 5 Female genital system
7 Hematopoietic system 6 Hematopoietic system
8 Integumentary system 7 Integumentary system
HUMAN MOUSE
9 Male genital system 8 Male genital system
10 Musculoskeletal system 9 Musculoskeletal system
11 Nervous system 10 Nervous system
12 Product of conception -
13 Respiratory system 11 Respiratory system
14 Topographic region -
15 Urinary tract 12 Urinary tract
14Calculating the correlation
- N?xy (?x)(?y)
- r ----------------------------------------------
--- - sqrt( (N?x2 - (?x)2)(N?y2 (?y)2) )
Human gene 1 206316_s_at Mouse gene 1 162926_at Tissue category Human gene 2 205428_s_at Mouse gene 2 97166_at
41.04 83.56 1 62.95 49.11
30.78 61.11 2 67.72 45.18
74.73 92.95 3 93.2 40.76
43.9 78.85 4 68.48 41.2
39.23 88.93 5 54.8 41.24
88.72 100.7 6 52.16 49.64
39.71 83.15 7 73.56 42.84
135.42 169.28 8 46.59 49.58
55.98 79.91 9 205.58 0
0 59.05 10 142.9 34.7
54.78 97.37 11 48.57 48.04
68.11 87.85 12 48.97 46.26
? High correlation 0.914167 ? High correlation 0.914167 ? Low correlation -0.935731 ? Low correlation -0.935731
15Co-expression comparison of our ortholog set to
the KOG set
16(4) Using evolutionary conservation of
co-expression for function prediction
Human
Gene A Gene B
Co-expression Cab (-1ltcorr.lt1)
(Co-expression calculated over 115 tissues in
human, 25 in mouse)
Human/Mouse
Gene A Gene B
Cab gt Cab
? Increases probability that A and B are involved
in the same process
17GO biological process benchmark
- 0
- 1
- 2
- 3
- 4
- Biological process one of the three subroots
(together with cellular location and molecular
function) - Both orthologs and paralogs are often involved in
the same process/pathway (sharing a 4th level
biological process, here GO0007584)
18Conservation of co-expression used in function
prediction
19The importance of (conserved) co-expression for
function prediction
- Co-expression without conservation can already be
used for function prediction - Paralogous conservation gives a 2x higher
accuracy - Orthologous conservation gives a 3x or 4x higher
accuracy - Alternative for GO Biological Process KEGG
Pathway database ? similar results
20(5) Evolutionary conservation of chromosomal
distance and orientation
Human
Gene A Gene B
Distance Dab ( bp) Orientation Oab
(??,??,??) Co-expression Cab (-1ltcorr.lt1)
Dab lt Dab Oab Oab Cab gt Cab
(Co-expression calculated over 115 tissues in
human, 25 in mouse)
Human/Mouse
Gene A Gene B
? Increases probability that A and B are involved
in the same process
21Function prediction using co-expression and
chromosomal distance (without conservation)
22Conservation of chromosomal distance used in
function prediction
23The importance of chromosomal distance and
orientation for function prediction
- Chromosomal distance in eukaryotes less important
than in prokaryotes (due to the absence of
operons) - Only genes with distance lt 1 Mbp seem to be
coregulated - Conservation of relative orientation seems to be
important only for very close gene pairs - Limited number of genes can be functional
annotated using the conservation of chromosomal
distance and orientation
24Conclusions
- Orthologous and paralogous relations can be used
to improve function prediction - Our orthologous pairs of Protein World proteins
perform better than KOG, in terms of
co-expression and involvement in the same process - Chromosomal distance and relative orientation
between genes can be used for function prediction
too, in a limited number of cases - Future plans find examples where the function of
a protein can be predicted using these methods
25Credits
- Martijn Huynen
- Peter Groenen
- Others at Comics
- Others at Organon Bioinf.