Title: Various Career Options Available
1 Basics of Comparative Genomics Dr G. P. S.
Raghava
2- AIM To understand Biology of Organisms
- Importance More than 100 genomes sequenced, more
than 250 in progress - Definition Comparison of set of proteins of
one genome to another genome comparision of
gene location, gene order and gene regulation - Application
- Visualization of information on genome
- Genome annotation (Prediction of gene, repeats,
regulation region) - Evolutionary information (gene loss, duplication,
horizontal gene transfer, ancestor) - Essential genes for cell survival
- Classification of genes based on function
- Tools and Databases
3What is comparative genomics?
- Analyzing comparing genetic material from
different species to study evolution, gene
function, and inherited disease - Understand the uniqueness between different
species
4Why Comparative Genomics ?
- It tells us what are common and what are unique
between different species at the genome level. - Genome comparison may be the surest and most
reliable way to identify genes and predict their
functions and interactions. - e.g., to distinguish orthologs from
paralogs - The functions of human genes and other DNA
regions can be revealed by studying their
counterparts in lower organisms.
5What is compared?
- Gene location
- Gene structure
- Exon number
- Exon lengths
- Intron lengths
- Sequence similarity
- Gene characteristics
- Splice sites
- Codon usage
- Conserved synteny
6Few facts from genome comparision
- High degree of conservation of microbial proteins
(70 ancestral conserved region) - Protein related with ENERGY process are generally
found all genomes - Proteins related to COMMUNICATION repersent
repersent most distinctive function in each
genome - INFORMATION related protein have complex
behaviour - High frequence (10) non-orthologous gene
displacement
7Few Terminologies
- Homology - Homology is the relationship of any
two characters ( such as two proteins that have
similar sequences ) that have descended, usually
through divergence, from a common ancestral
character. Homologues are thus components or
characters (such as genes/proteins with similar
sequences) that can be attributed to a common
ancestor of the two organisms during evolution.
8Homologoues can either be orthologues xenologues,
paralogues or.
- Orthologues are homologues that have evolved from
a common ancestral gene by speciation. They
usually have similar functions. - Paralogues are homologues that are related or
produced by duplication within a genome followed
by subsequent divergence. They often have
different functions. - Xenologues are homologous that are related by an
interspecies (horizontal transfer) of the genetic
material for one of the homologues. The functions
of the xenologues are quite often similar.
9Analogues
- Analogues are non-homologues genes/proteins that
have descended convergently from an unrelated
ancestor. They have similar functions although
they are unrelated in either sequence or
structure.
10Frequently used terms
- Homology
- Orthologous Common ancestral gene. They usually
have similar functions - Paralogous duplication of gene within genome
have usually different functions - Xenologous That are related by an interspecies
(horizontal gene transfer) of the genetic
material, have similar function - Analogous Not evolve from same ancestor
- Similarity sequence similarity
- Percent Identitity
11Visualising Genome Information
12Genome Annotation
- The Process of Adding Biology Information and
- Predictions to a Sequenced Genome Framework
13All-against-all Self-comparison
- How?
- Making a database of the proteome
- Use each protein as a query in a similarity
search against the database - (BLAST, WU-BLAST or FASTA)
- Generate a matrix of alignment scores (P or E
value) - A conservative cutoff E value 10e-6
- Why?
- Number of Gene Families
- This comparison distinguishes unique proteins
from proteins arisen from gene duplication, and
also reveals the of gene families. - Paralogs
- Significantly matched pairs of protein sequences
may be paralogs.
14Between-Proteome Comparisons Why?
- To identify orthologs, gene families, and domains
- Orthologs (proteins that share a common
ancestry function) - A pair of proteins in two organisms that align
along most of their lengths with a highly
significant alignment score. - These proteins perform the core biological
functions shared by the two organisms. - Two matched sequences (X in A, Y in B) may not be
orthologs - (Y and Z are paralogs in B, X and Z are
orthologs) - Identify true orthologs
- highest-scoring match (best hit)
- E value lt 0.01
- gt 60 alignment over both proteins
15Between-Proteome Comparisons How?
- Choose a yeast protein and perform a database
similarity search of the worm proteome
(WU-BLAST) a yeast-versus-worm search - Group the worm seqs that match the yeast query
seq with a high P value (10-10 to 10-100), also
include the yeast query seq in the group - From the group made in 2, choose a worm seq and
make a search of the yeast proteome, using the
same P limit - Add any matching yeast seq to the group made in
2 - Repeat 3 4 for all initially matched seqs in
the group - Repeat 1-5 for every yeast protein
- As 1-6, perform a comparable worm-versus-yeast
search - Coalesce the groups of related seqs. and remove
any redundancies so that every sequence is
represented only once. - Eliminate any matched pairs in which less than
80 of each seq is in the alignment
16Figure 1 Regions of the human and mouse
homologous genes Coding exons (white), noncoding
exons (gray, introns (dark gray), and intergenic
regions (black). Corresponding strong (white) and
weak (gray) alignment regions of GLASS are shown
connected with arrows. Dark lines connecting the
alignment regions denote very weak or no
alignment. The predicted coding regions of
ROSETTA in human, and the corresponding regins in
mouse, are shown (white) between the genes and
the alignment regions.
17Target Validation
- Target validation involves taking steps to prove
that a DNA, RNA, or protein molecule is directly
involved in a disease process and is therefore a
suitable target for development of a new
therapeutic compound. - Genes that do not belong to an established
family are critical to many disease processes and
also need to be validated as potential drug
targets.
18(No Transcript)
19(No Transcript)