Subsystem Approach to Genome Annotation - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Subsystem Approach to Genome Annotation

Description:

Subsystem Approach to Genome Annotation National Microbial Pathogen Data Resource www.nmpdr.org Claudia Reich NCSA, University of Illinois, Urbana – PowerPoint PPT presentation

Number of Views:111
Avg rating:3.0/5.0
Slides: 21
Provided by: Claudia172
Category:

less

Transcript and Presenter's Notes

Title: Subsystem Approach to Genome Annotation


1
Subsystem Approach to Genome Annotation
  • National Microbial Pathogen Data Resource
  • www.nmpdr.org
  • Claudia Reich
  • NCSA, University of Illinois, Urbana

2
Complete Microbial Genomes
  • 464 complete microbial genomes in NCBI as of
    3-1-07
  • 691 microbial genomes in progress as of 3-1-07

3
Making Sense of Genome Data
  • Locate Genes identify ORFs automatically
  • GeneMark
  • NCBIs ORF Finder
  • Glimmer
  • Critica
  • Assign Function by sequence similarity to
    experimentally characterized proteins
  • BLAST family of sequence comparison tools

4
Problems with Assignments by Similarity
  • When ORF is a member of a protein family
  • Paralogous genes
  • ORFs encoding similar proteins acting on
    different substrates
  • Assignments can be transitive, and many times
    removed from experimental data

5
Other Factors Can Aid in Function Assignments
  • Molecular phylogeny
  • Paralogous and orthologous families
  • Conserved gene neighborhood
  • Metabolic context
  • Bidirectional best hit matches across multiple
    genomes

6
Incorporating Information Other Than Similarity
  • KEGG manually curated pathway and metabolic maps
  • GO vocabularies that describe ORFs as associated
    with
  • biological processes
  • cellular components
  • molecular function
  • MetaCyc experimentally elucidated metabolic
    pathways

7
What is Needed
  • A system that
  • integrates all the above concepts
  • organizes genomic data in structured idioms
  • allows high-throughput annotation of newly
    sequenced genomes
  • resolves discrepancies in different annotation
    tools
  • informs experimental research

8
Enter the SEED
  • Database and annotation environment
  • Underlies, and accessible through, NMPDR
    (www.nmpdr.org)
  • Expert annotation via subsystems building
  • Provides the most accurate genome annotations
    available
  • Argonne National Lab, University of Chicago,
    UIUC, FIG

9
What is a Subsystem?
  • Any organizing biological principle
  • metabolic pathway
  • amino acid biosynthesis, nitrogen fixation,
    glycolysis
  • complex structure
  • ribosome, flagellum
  • set of defining features
  • virulome, pathogenicity islands
  • functional concept
  • bacterial sigma factors, DNA binding proteins

10
Subsystems are
  • Sets of functional roles, which are functions, or
    abstractions of functions (such as an EC number),
    that together implement a specific biological
    process or concept
  • Created manually by expert curators
  • Experts annotate single subsystems over the
    complete collection of genomes, thus contributing
    and sharing their expertise with the scientific
    community

11
How Subsystems are Built
  • Create a subsystem for the biological concept,
    and define the functional roles
  • In one (or a few) key organisms that include the
    subsystem, find the genes and assign meaningful
    functional names
  • Project the annotations to orthologous genes
  • Expand to more genomes, creating a Populated
    Subsystem

12
Populated Subsystems
  • Are Spreadsheets where
  • Columns functional roles
  • Rows specific genomes
  • Cells genes in the organism that implement the
    functional role

13
How to Access Subsystems
  • From Home page (left navigation bar) Subsystem
    Summaries select organism
  • From Organism pages
  • From Subsystem Search
  • From protein pages to specific subsystems

14
Subsystem Pages in NMPDR
  • Table of Functional Roles
  • Subsystem diagram (if appropriate)
  • Populated subsystem spreadsheet
  • Customizable spreadsheet viewing options
  • Functional variants and subsets of roles
  • Curators notes

15
Benefits of Subsystems
  • More accurate annotations
  • Annotation of protein families
  • Analysis of sets of functionally related proteins
  • Less error-prone to automatic projections to
    novel genomes

16
Subsystems Reveal Interesting
  • Pathway variants
  • Are they clustered by phylogeny?
  • Delta subunit of RNA polymerase only Bacillales
  • Are they clustered by functional niche?
  • Horizontal gene transfer?
  • Fused genes
  • ? and ? subunit of RNA polymerase fused in
    Helicobacter
  • Fissioned genes
  • ? subunit of RNA polymerase is fissioned in
    Cyanobacteria

17
Subsystems Reveal Interesting
  • Duplicate assignments
  • More than one gene for one functional role?
  • Alpha subunit of RNA polymerase in Magnetococcus
    and Francisella
  • Same sequenced region in more than one contig in
    partially assembled genomes?
  • Frameshifts or other sequencing errors?
  • Annotation errors?

18
Subsystems Reveal Interesting
  • Missing genes
  • Is the function essential?
  • Is the function conserved?
  • Does the missing gene cluster with homologs in
    other organisms?
  • Is the function performed by a newly recruited
    gene?
  • Has a gene been acquired by horizontal gene
    transfer and now performs that function?

19
Synthesis of Selenocysteinyl-tRNA
  • Two known pathway variants
  • One step in Bacteria
  • SelA is annotated
  • Two steps in Archaea and Eucarya
  • PSTK was missing until very recently

20
Explore Selenocysteine Usage
  • Start by searching for gene name, selA, in an
    organism known to use Sec, E. coli K12
  • Start from subsystem tree expand category of
    "Protein metabolism," expand subcategory of
    "Selenoproteins"
  • Open "Selenocysteine metabolism" subsystem from
    protein page or SS tree
  • Genomes arranged phylogenetically
  • Roles defined on mouse-over
  • What genes are missing in which organisms?
  • Are there Sec metabolism genes present in any
    organisms that do not have proteins that need
    Sec?
  • Are there organisms known to need Sec for certain
    proteins, but that do not have a complete Sec
    biosynthesis pathway?
  • Why is there a hypothetical protein included in
    this subsystem?
Write a Comment
User Comments (0)
About PowerShow.com