A QuickStart Guide to Using PhyloFacts - PowerPoint PPT Presentation

1 / 11
About This Presentation
Title:

A QuickStart Guide to Using PhyloFacts

Description:

... of annotation errors in a database of protein sequences' Bioinformatics 2002 ... molecular function: advances and challenges,' Bioinformatics 2004 (20)2:170-179 ... – PowerPoint PPT presentation

Number of Views:19
Avg rating:3.0/5.0
Slides: 12
Provided by: kimmensj
Category:

less

Transcript and Presenter's Notes

Title: A QuickStart Guide to Using PhyloFacts


1
A Quick-Start GuidetoUsing PhyloFacts
  • February 21, 2008

2
Overview
A
  • Background
  • Browsing the library and reading PhyloFacts
    books
  • Submitting sequences for functional and
    structural classification
  • Database queries

B
C
D
3
Background
Retrieve paper
http//phylogenomics.berkeley.edu/phylofacts/
4
Background
Simple overview of different webservers (how to
use)
Detailed description of PhyloFacts construction
and recommended use and interpretation
5
Homology-based functional annotations are fraught
with systematic error
Background
Gilks et al, Modeling the percolation of
annotation errors in a database of protein
sequences Bioinformatics 2002 Galperin and
Koonin 1998 "Sources of Systematic Error in
Functional Annotation of Genomes" In Silico
Biology. Brenner, 1999 "Errors in Genome
Annotation" Trends Genet. Brown Sjölander,
"Functional Classification using Phylogenomic
Inference." PLoS Computational Biology, 2006
6
Structural phylogenomic inference of protein
function addresses these errors
Background
7
Phylogenomic library construction
Background
Cluster genome into global homology groups
8
Types of PhyloFacts books
Background
  • Global homology sequences that share a common
    domain architecture
  • Alignable over entire length
  • Homologs retrieved using FlowerPower
  • Domain sequences that contain a structural
    domain
  • Seeded using a PDB structure or SCOP domain
  • Conserved region sequences that share a region
    of similarity
  • Correspondence to structure unknown
  • Motif short regions (typically lt50aa) conserved
    for functional reasons

9
Proteins are composed of modular structural
domains which are found in different domain
architectures
Background
Leucine-Rich Repeat (LRR)
Toll-Interleukin Receptor (TIR) domain
PhyloFacts Global Homology books include only
those sequences that can be predicted to share
the same domain architecture (series of
structural domains). These are more suitable for
predicting function. PhyloFacts Domain books
model individual domains that may be found in
different domain architectures these thus
include sequences with different overall folds
and functions.
10
KEGG Orthology Group K00002 spans five domain
architectures
Background
Group 1 Zinc-binding dehydrogenase(all cellular
organisms)
ADH_N
ADH_zinc_N
Group 2 Iron-binding dehydrogenase (all
cellular organisms)
Group 3 Cofactor-binding domain of zinc-binding
dehydrogenase (Bacteria/Eukarya)
ADH_zinc_N
Group 4 Sequences of unknown function
(Halobacterium)
ADH_zinc_N
PF02894
Group 5 Aldo-keto reductase (Bacteria/Eukarya)
11
Summary
  • Each book in PhyloFacts contains
  • a multiple sequence alignment
  • one or more phylogenetic trees
  • Hidden Markov models for each subfamily and
    family predicted PFAM domains
  • predicted trans-membrane helices
  • predicted subfamilies
  • homologous solved 3D structures
  • predicted functional residues
  • GO annotations and evidence codes
  • UniProt definitions
  • links to literature
  • links to genome databases and other external
    resources
  • Graphical user interfaces to view
  • Multiple sequence alignment
  • Phylogenetic tree(s)
  • 3D structure
  • PhyloFacts is an encyclopedia of protein families
    across the Tree of Life
  • The majority of PhyloFacts books represent
    proteins sharing a common domain architecture
  • The second largest fraction are based on protein
    structures and structural domains
  • Functional annotation of a sequence included in a
    PhyloFacts book is enabled by examination of the
    sequence in its evolutionary context
  • New sequences can be classified to PhyloFacts
    families and subfamilies using the Sequence
    Search page
  • Results include functional classification,
    prediction of 3D structure and detection of
    remote homologs
Write a Comment
User Comments (0)
About PowerShow.com