Functional annotation Part I - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Functional annotation Part I

Description:

Molecular function. Biological process. Cellular component ... Different methods for searching functional sites. HMMs, Prosite patterns, PSSMs ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 28
Provided by: Anni
Category:

less

Transcript and Presenter's Notes

Title: Functional annotation Part I


1
Functional annotationPart I
  • Anika Joecker

2
Content
  • Introduction
  • Why is the manual annotation important?
  • Protein database (RefSeq, UniProt and nr)
  • Gene Ontology terms
  • Discovering conserved Protein domains
  • PSSMs, HMMs and Patterns
  • Domain databases
  • InterPro
  • CDD
  • Phylogenomics
  • Comparison and Problems

3
IntroductionHow evolve new functions?
Bacteria
Bee
Horizontal gene transfer
Domain shuffling
Duplication
Mutations
Super Bee
4
Can transfer Annotation relatedFold but not
Function
Can not transfer Fold or Functional
Annotation ("Twilight Zone")
Can transfer both Fold Functional Annotation
Same Function
Fold
Function
ID
M. Gerstein et al., Yale
5
Why is the manual annotation so important?Wrong
functional annotations
WRONG
6
Problems
  • Where to set the cutoffs?
  • Which are the true orthologs?
  • Errors in databases
  • Where can I find functional information?
  • Which descriptions are experimentally verified?
  • Less information
  • How reliable are the results from automatic
    annotation tools?

7
Why is the manual annotation so important?
-gt For validation and to get more information!
8
Information collection
9
Protein databases
10
RefSeq, UniProt and nr
11
RefSeq entry
12
UniProt entry Cross-references
13
Gene Ontology
14
Ontologies
Passagier transport
Military vehicles
Is a
Is a
Is a
car
tank
drives on
has
drives on
4 wheels
road
15
Gene Ontology
  • 3 domains
  • Molecular function
  • Biological process
  • Cellular component
  • Controlled vocabulary for functional annotation
  • Added to many protein sequence database entries

Taken from SGD
16
Evidence codes http//www.geneontology.org/GO.evid
ence.shtml
  • Automatically derived
  • IEA
  • Reviewed
  • ISS
  • RCA
  • IC
  • TAS
  • NAS
  • Experimentally verified
  • IEP
  • IMP
  • IDA
  • IPI
  • IGI
  • No data available
  • ND

17
Protein domain search
18
Domain search Prosite Patterns
  • For highly conserved signatures derived from a
    multiple alignment

RU1A_HUMAN SRSLKMRGQAFVIFKEVSSAT SXLF_DROME KLTG
RPRGVAFVRYNKREEAQ ROC_HUMAN VGCSVHKGFAFVQYVNERNAR
ELAV_DROME GNDTQTKGVGFIRFDKREEAT
RK G EDRKHPCG AG F IV x FY
Example from R. Durbin et al. 1998
19
Domain searchProfile Hidden-Markov-Models (HMM)
Transition probabilities A 0.2 C 0.3 T 0.4 G
0.1
Dj
Delete state
Ij
Insertion state
Begin
Mj
End
Match state
A 0.2 C 0.3 T 0.4 G 0.1
A 0.6 C 0.3 T 0.1 G 0.0

20
InterPro
  • Database of protein families, domains and
    functional sites
  • Hosted at the European Bioinformatics Institute
    (EBI)
  • Consortium of member databases (PROSITE, Pfam,
    Prints, ProDom, SMART and TIGRFAMs, Superfamily,
    Panther)
  • Tool for searching InterProScan
  • InterPro2GO (GOA project)

21
CDD Conserved Domain Database
  • Contains protein domain models imported from
    Pfam, SMART, COG, KOG
  • Curated and provided at NCBI
  • Since 2003
  • Search tool RPSBlast
  • 27036 PSSMs (Position specific scoring matrices)
    (status December 2008)
  • Count amino acids at each position in multiple
    alignment
  • Compute percentage
  • Compute log ratio

22
Phylogenomics
  • Phylogenomics

Transfers functional annotations (e.g. Gene
Ontology terms) within a phylogentic tree by
considering the evolutionary history of each
gene.
Better prediction accuracy than BLAST
23
SIFTER
  • Tool developed by Barbara Engelhardt (UC
    Berkeley)?
  • Prediction accuracy about 96

24
Phylogenomics
25
(No Transcript)
26
Problems
  • domain shuffling
  • gene loss
  • paralogous genes with different functions
  • Cutoffs
  • Missing data (e.g. ontology terms)
  • Horizontal gene transfer

27
Summary
  • Manual functional annotation is necessary
  • For data validation
  • To prevent errors in public databases
  • To get more information
  • Different methods for searching functional sites
  • HMMs, Prosite patterns, PSSMs
  • Enable a higher sensitivity and low false
    positive rate
  • Problems
  • Gene loss, domain shuffling, paralogous genes,
    missing data, wrong cutoffs, horizontal gene
    transfer
Write a Comment
User Comments (0)
About PowerShow.com