Title: Functional annotation Part I
1Functional annotationPart I
2Content
- Introduction
- Why is the manual annotation important?
- Protein database (RefSeq, UniProt and nr)
- Gene Ontology terms
- Discovering conserved Protein domains
- PSSMs, HMMs and Patterns
- Domain databases
- InterPro
- CDD
- Phylogenomics
- Comparison and Problems
3IntroductionHow evolve new functions?
Bacteria
Bee
Horizontal gene transfer
Domain shuffling
Duplication
Mutations
Super Bee
4Can transfer Annotation relatedFold but not
Function
Can not transfer Fold or Functional
Annotation ("Twilight Zone")
Can transfer both Fold Functional Annotation
Same Function
Fold
Function
ID
M. Gerstein et al., Yale
5Why is the manual annotation so important?Wrong
functional annotations
WRONG
6Problems
- Where to set the cutoffs?
- Which are the true orthologs?
- Errors in databases
- Where can I find functional information?
- Which descriptions are experimentally verified?
- Less information
- How reliable are the results from automatic
annotation tools?
7Why is the manual annotation so important?
-gt For validation and to get more information!
8Information collection
9Protein databases
10RefSeq, UniProt and nr
11RefSeq entry
12UniProt entry Cross-references
13Gene Ontology
14Ontologies
Passagier transport
Military vehicles
Is a
Is a
Is a
car
tank
drives on
has
drives on
4 wheels
road
15Gene Ontology
- 3 domains
- Molecular function
- Biological process
- Cellular component
- Controlled vocabulary for functional annotation
- Added to many protein sequence database entries
Taken from SGD
16Evidence codes http//www.geneontology.org/GO.evid
ence.shtml
- Automatically derived
- IEA
- Reviewed
- ISS
- RCA
- IC
- TAS
- NAS
- Experimentally verified
- IEP
- IMP
- IDA
- IPI
- IGI
- No data available
- ND
17Protein domain search
18Domain search Prosite Patterns
- For highly conserved signatures derived from a
multiple alignment
RU1A_HUMAN SRSLKMRGQAFVIFKEVSSAT SXLF_DROME KLTG
RPRGVAFVRYNKREEAQ ROC_HUMAN VGCSVHKGFAFVQYVNERNAR
ELAV_DROME GNDTQTKGVGFIRFDKREEAT
RK G EDRKHPCG AG F IV x FY
Example from R. Durbin et al. 1998
19Domain searchProfile Hidden-Markov-Models (HMM)
Transition probabilities A 0.2 C 0.3 T 0.4 G
0.1
Dj
Delete state
Ij
Insertion state
Begin
Mj
End
Match state
A 0.2 C 0.3 T 0.4 G 0.1
A 0.6 C 0.3 T 0.1 G 0.0
20InterPro
- Database of protein families, domains and
functional sites - Hosted at the European Bioinformatics Institute
(EBI) - Consortium of member databases (PROSITE, Pfam,
Prints, ProDom, SMART and TIGRFAMs, Superfamily,
Panther) - Tool for searching InterProScan
- InterPro2GO (GOA project)
21CDD Conserved Domain Database
- Contains protein domain models imported from
Pfam, SMART, COG, KOG - Curated and provided at NCBI
- Since 2003
- Search tool RPSBlast
- 27036 PSSMs (Position specific scoring matrices)
(status December 2008) - Count amino acids at each position in multiple
alignment - Compute percentage
- Compute log ratio
-
-
22Phylogenomics
Transfers functional annotations (e.g. Gene
Ontology terms) within a phylogentic tree by
considering the evolutionary history of each
gene.
Better prediction accuracy than BLAST
23SIFTER
- Tool developed by Barbara Engelhardt (UC
Berkeley)? - Prediction accuracy about 96
24Phylogenomics
25(No Transcript)
26Problems
- domain shuffling
- gene loss
- paralogous genes with different functions
- Cutoffs
- Missing data (e.g. ontology terms)
- Horizontal gene transfer
27Summary
- Manual functional annotation is necessary
- For data validation
- To prevent errors in public databases
- To get more information
- Different methods for searching functional sites
- HMMs, Prosite patterns, PSSMs
- Enable a higher sensitivity and low false
positive rate - Problems
- Gene loss, domain shuffling, paralogous genes,
missing data, wrong cutoffs, horizontal gene
transfer