P-POD - PowerPoint PPT Presentation

About This Presentation
Title:

P-POD

Description:

The Princeton Protein Orthology Database (P-POD): a comparative genomics ... ORF name or gene name. SGD. S.cerevisiae. ENSDARP00000007117, ZDB-GENE-040808-60 ... – PowerPoint PPT presentation

Number of Views:129
Avg rating:3.0/5.0
Slides: 21
Provided by: henkno
Category:
Tags: pod | orf | orthology

less

Transcript and Presenter's Notes

Title: P-POD


1
P-POD
  • The Princeton Protein Orthology Database
  • Literature Discussion
  • Tim Hulsen 2008-05-08

2
P-POD - Manuscript
  • The Princeton Protein Orthology Database (P-POD)
    a comparative genomics analysis tool for
    biologists
  • Heinicke S1,, Livstone MS1,, Lu C1,, Oughtred
    R1,, Kang F1, Angiuoli SV2,3, White O2, Botstein
    D1, Dolinski K1
  • PLoS ONE. 2007 Aug 22 2(1) e766
  • PubMed ID 17712414
  • 1 Lewis-Sigler Institute for Integrative
    Genomics, Princeton University, Princeton, New
    Jersey, United States of America.
  • 2 The Institute for Genomic Research, Rockville,
    Maryland, United States of America
  • 3 Center for Bioinformatics and Computational
    Biology, University of Maryland, College Park,
    Maryland, United States of America
  • These authors contributed equally to this work

3
P-POD - Introduction
  • Existing many biological databases that provide
    comparative genomics information and tools
  • None of these combine results from multiple
    comparative genomics methods with manually
    curated information from the literature
  • ? P-POD Princeton Protein Orthology Database
  • Visualizes phylogenetic relationships among
    predicted orthologs
  • Shows the orthologs in a wider evolutionary
    context
  • Contains experimental results manually collected
    from the literature, that can be compared to the
    computational analyses
  • Shows links to relevant human disease and gene
    information via the OMIM, model organism and
    sequence database

4
P-POD Ortholog methods
  • Orthology is determined using OrthoMCL
  • Can be run on multiple species at once
  • One of the better performing algorithms in terms
    of sensitivity and specificity (Alexeyenko et
    al., 2006 and Chen et al., 2007)
  • Evolutionary context is determined using Jaccard
  • Clustering algorithm to find related proteins
  • Larger groups than just orthologs
  • Manuscript in preparation

5
P-POD Covered species
  • P-POD contains 8 species
  • Plasmodium falciparum
  • Homo sapiens
  • Drosophila melanogaster
  • Mus musculus
  • Arabidopsis thaliana
  • Caenorhabditis elegans
  • Danio rerio
  • Saccharomyces cerevisiae
  • ?Most widely studied organisms, from a wide
    evolutionary range

6
P-POD Source Species Databases
7
P-POD Supported identifiers
Organism Source Database Valid gene/protein identifier(s) Examples
P.falciparum PlasmoDB PlasmoDB ID PF08_0034
H.sapiens ENSEMBL ENSEMBL peptide ID, peptide name ENSP00000266970, CDK2
D.melanogaster FlyBase FlyBase ID CG17520-PA, CkIIalpha-PA
M.musculus ENSEMBL ENSEMBL peptide ID ENSMUSP00000068896
A.thaliana TAIR TAIR identifier or gene name AT1G25490.1, PAB4
C.elegans WormBase WormBase identifier or gene name C09G4.1, dbr-1
D.rerio ENSEMBL ENSEMBL peptide ID, ZFIN ID ENSDARP00000007117, ZDB-GENE-040808-60
S.cerevisiae SGD ORF name or gene name YNL098C, DPM1
OMIM IDs
8
P-POD Orthology and clustering numbers
  • 25,271 OrthoMCL families
  • 15,050 Jaccard Clustering families
  • 165,970 proteins (154,736 OrthoMCL and 152,799
    Jaccard)
  • 984 families containing proteins in all species
    (omnipresent)
  • 112 families with exactly one protein in each of
    the 8 species involved in core biological
    processes, such as
  • Translation
  • Transport
  • Cell cycle regulation
  • Cytoskeleton organization

9
P-POD Proteins in families, and orphans
  • Relatively low percentages of orphans (lt13,
    except for S. cerevisiae and P. falciparum)
  • These numbers confirm the high conservation of
    proteins across eukaryotes, with the notable
    exception the Plasmodium outlier
  • Yeast complete protein set used, including 800
    ORFS flagged as Dubious by SGD. If these are
    excluded, the percentage of orphans drops to 20

10
P-POD Compared to other orthology databases
Tot.
2
1
2
1
2
1
1
2
4
11
P-POD - Pipeline
12
P-POD Pipeline Components
4 Li L, Stoeckert CJ Jr, Roos DS (2003)
OrthoMCL identification of ortholog groups for
eukaryotic genomes. Genome Res 13 21782189 5
Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL
W improving the sensitivity of progressive
multiple sequence alignment through sequence
weighting, position-specific gap penalties and
weight matrix choice. Nucleic Acids Res 22
46734680 29 Samuel Lattimore B, van Dongen S,
Crabbe MJ (2005) GeneMCL in microarray analysis.
Comput Biol Chem 29 354359
13
P-POD The Database
  • P-POD uses the Generic Model Organism Database
    (GMOD) database package using PostgreSQL software
  • GMOD is the Generic Model Organism Database
    project, a collection of open source software
    tools for creating and managing genome-scale
    biological databases. You can use it to create a
    small laboratory database of genome annotations,
    or a large web-accessible community database.
    GMOD tools are in use at many large and small
    community databases
  • Other popular GMOD tools are Apollo (Genome
    annotation editor), Gbrowse (Genome annotation
    viewer), Cmap (Comparative map viewer), Sybil
    (Comparative genome viewer), Chado (Biological
    database schema) and BioMart (Data mining system)

14
P-POD - Web Interface (1)
  • The web interface allows users to search and
    browse the data in several ways
  • Results can be queried by various peptide
    identifiers or gene names
  • Searches generate result pages that contain
  • a hyperlinked phylogenetic tree of predicted
    orthologs generated by OrthoMCL or of more
    distantly-related proteins generated by Jaccard
    clustering
  • a list of diseases and genes associated with the
    human ortholog(s) as documented in OMIM
  • a manually curated list of papers with
    cross-complementation experiments involving the
    yeast ortholog(s), from SGD database
  • a downloadable ClustalW alignment of family
    members
  • Web address http//ortholog.princeton.edu

15
P-POD WebInterface(2)
INPUT
OrthoMCL
OMIM
SGD Lit.
CLUSTALW
16
P-POD WebInterface(3)
SGD Lit.
CLUSTALW
JACCARD
17
P-POD Comparison of methods
  • Orthology/clustering methods OrthoMCL and Jaccard
    can be compared using P-POD
  • Jaccard is far more inclusive than OrthoMCL
  • Shown at the right OrthoMCL family of the alpha
    tubulins. It contains only the alpha tubulins,
    while the Jaccard family contains the alpha,
    beta, and gamma tubulins

18
P-POD Discussion (1)
  • P-POD shows direct orthology (OrthoMCL) and
    broader evolutionary clustering (Jaccard)
  • P-POD uses a generic, modular database schema
    (GMOD) in combination with a freely available
    database system (PostgreSQL)
  • P-POD provides experimental evidence of
    conservation curated from the primary literature
  • Three sets of users
  • Molecular biologists that query the database over
    the web to browse orthology data for their
    favorite proteins
  • Model organism database developers, who will
    quickly be able to provide comparative genomics
    tools with their species of interest by
    implementing our system
  • Computational biologists who are developing novel
    comparative genomics algorithms will find the
    curated information and computational data from
    other methods extremely useful in assessing their
    approach

19
P-POD Discussion (2)
  • P-POD can be downloaded in its entirety for
    installation on ones own system
  • Software developers can use the P-POD database
    infrastructure when developing their own
    comparative genomics resources and database tools

20
P-POD Future plans
  • Provide regular updates to the data contained
    within the database
  • Add new features to the web interface
  • Expand upon the amount of data stored within the
    database
  • Provide curated literature describing
    experimental confirmation of orthology
  • Include literature from other species than just
    S. cerevisiae
  • As more refined methods for automatic detection
    of orthology are developed, they can be
    incorporated into the P-POD tool, taking
    advantage of the modular design scheme
Write a Comment
User Comments (0)
About PowerShow.com