Genome Annotation Continued - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Genome Annotation Continued

Description:

Genome Annotation Continued This week s lab. Genome annotation - web based databases for assigning gene function. – PowerPoint PPT presentation

Number of Views:141
Avg rating:3.0/5.0
Slides: 20
Provided by: RobertBr158
Learn more at: https://msu.edu
Category:

less

Transcript and Presenter's Notes

Title: Genome Annotation Continued


1
Genome Annotation Continued
  • This weeks lab.
  • Genome annotation - web based databases for
    assigning gene function.

2
Last weeks lab
  • E-value
  • Score
  • Blastx
  • Taxonomy

3
Lab
  • Sequence assembly and analysis
  • Assemble individual sequence reads
  • Phred 30 - good or bad?

4
Linking Protein Sequence, Structure, and Function
Protein sequences
Protein
CDD Conserved functional domains in proteins
represented by a PSSM
Domains
PSI-BLAST, RPS-BLAST, CDART
3D Domains
NCBI Field Guide
5
Position Specific Substitution Rates
Active site serine
Weakly conserved serine
6
Position Specific Score Matrix (PSSM)
A R N D C Q E G H I L K M
F P S T W Y V 206 D 0 -2 0 2 -4 2 4
-4 -3 -5 -4 0 -2 -6 1 0 -1 -6 -4 -1 207 G
-2 -1 0 -2 -4 -3 -3 6 -4 -5 -5 0 -2 -3 -2 -2
-1 0 -6 -5 208 V -1 1 -3 -3 -5 -1 -2 6 -1
-4 -5 1 -5 -6 -4 0 -2 -6 -4 -2 209 I -3 3
-3 -4 -6 0 -1 -4 -1 2 -4 6 -2 -5 -5 -3 0 -1
-4 0 210 S -2 -5 0 8 -5 -3 -2 -1 -4 -7 -6
-4 -6 -7 -5 1 -3 -7 -5 -6 211 S 4 -4 -4 -4
-4 -1 -4 -2 -3 -3 -5 -4 -4 -5 -1 4 3 -6 -5 -3
212 C -4 -7 -6 -7 12 -7 -7 -5 -6 -5 -5 -7 -5 0
-7 -4 -4 -5 0 -4 213 N -2 0 2 -1 -6 7 0
-2 0 -6 -4 2 0 -2 -5 -1 -3 -3 -4 -3 214 G
-2 -3 -3 -4 -4 -4 -5 7 -4 -7 -7 -5 -4 -4 -6 -3
-5 -6 -6 -6 215 D -5 -5 -2 9 -7 -4 -1 -5 -5
-7 -7 -4 -7 -7 -5 -4 -4 -8 -7 -7 216 S -2 -4
-2 -4 -4 -3 -3 -3 -4 -6 -6 -3 -5 -6 -4 7 -2 -6
-5 -5 217 G -3 -6 -4 -5 -6 -5 -6 8 -6 -8 -7
-5 -6 -7 -6 -4 -5 -6 -7 -7 218 G -3 -6 -4 -5
-6 -5 -6 8 -6 -7 -7 -5 -6 -7 -6 -2 -4 -6 -7 -7
219 P -2 -6 -6 -5 -6 -5 -5 -6 -6 -6 -7 -4 -6 -7
9 -4 -4 -7 -7 -6 220 L -4 -6 -7 -7 -5 -5 -6
-7 0 -1 6 -6 1 0 -6 -6 -5 -5 -4 0 221 N
-1 -6 0 -6 -4 -4 -6 -6 -1 3 0 -5 4 -3 -6 -2
-1 -6 -1 6 222 C 0 -4 -5 -5 10 -2 -5 -5 1
-1 -1 -5 0 -1 -4 -1 0 -5 0 0 223 Q 0 1
4 2 -5 2 0 0 0 -4 -2 1 0 0 0 -1 -1 -3 -3
-4 224 A -1 -1 1 3 -4 -1 1 4 -3 -4 -3 -1
-2 -2 -3 0 -2 -2 -2 -3
Serine is scored differently in these two
positions
Active site nucleophile
7
Hidden Markov Models
  • A statistical model that can be applied to any
    system that is represented as a discrete state.
  • Applies to protein and nt sequences.
  • Can be thought of much like PSSMs used in
    PSI-BLAST.
  • After several interations.
  • Are used in gene finding and protein profile
    analysis.

8
Uses of HMMs in protein function analysis.
  • TIGRFAMs
  • Strive to annotate function of an entire protein
  • PFAMs
  • Strive to annotate domains of proteins.

9
Homologs, orthologs, and paralogs.
  • Homologous genes are genes that share a common
    evolutionary ancestor.
  • Orthologs are genes found in different organisms
    that arose from a common ancestor. Speciation.
  • Paralogs are genes found in the same organism
    that arose from a common ancestor. Duplication
    could have occurred in the species or earlier,
    often have diverged in function

10
Orthologs may differ in function!
11
TIGRFAM
  • Curated such that proteins in a TIGRFAM should
    have the same function if they are equivalogs.
  • Proteins have identity over their entire length.
  • Equivalog family all proteins that are
    conserved with respect to function since their
    last common ancestor.
  • Superfamily - all proteins with homology but may
    have different biological functions.
  • Subfamily - incomplete set of proteins with
    homology - may have diverse biological functions.

12
PFAM
  • More likely to describe a protein domain rather
    than a family.
  • Pfams will not overlap.
  • Crosslisted in TIGRFAM page.
  • 70 of proteins in SWISS-Prot have a Pfam match.

13
COGs
  • Cluster of orthologous groups
  • Pairwise comparison of orthologs from many
    bacterial genomes.
  • Suggests function only (book example).

14
Gene Ontology (GO)
  • The goal of the Gene Ontology project is to
    produce a controlled vocabulary that can be
    applied to all organisms even as knowledge of
    gene and protein roles in cells is accumulating
    and changing.
  • Biological process, Molecular function, Cellular
    component

15
Literature Curation
  • Saccharomyces genome database (SGD) for example.
  • Manual curation of the literature for
    experimental evidence linking function to
    annotation.

16
Additional databases
  • SMART - Simple Modular Architecture Research
    Tool.
  • PROSITE - Protein motifs
  • PRODOM - A database based on PSI-BLAST PSSMs.
  • InterPro - A database that brings together many
    of the above databases so that you can search
    them all at once.
  • Others.

17
CDD
  • Conserved domain database - linking all of this
    information together.
  • Consists of SMART, Pfam, and COGs (KOGs).
    Searchable directly - automatically searched by
    BLAST.
  • Linked to CDART - allows the identification of
    proteins with a similar domain architecture.

18
Bottom line about databases
  • Are useful tools in assigning possible functions.
  • Be careful about annotations
  • example -proteins in the same COG can be
    orthologs that have evolved different functions.
  • Many annotations are not backed up by
    experimental data.
  • Some databases are automated - have not been
    checked for accuracy.

19
Annotation can not be guaranteed without
experimental evidence.
  • Functional genomics
Write a Comment
User Comments (0)
About PowerShow.com