Title: Bio-Trac 25 (Proteomics: Principles and Methods)
1Tutorial Bioinformatics Resources
(http//pir.georgetown.edu/pirwww/workshop/bioinfo
_resource.html)
- Bio-Trac 25 (Proteomics Principles and Methods)
- October 5, 2007
- Zhang-Zhi Hu, M.D.
- Research Associate Professor
- Protein Information Resource, Department of
- Biochemistry and Molecular Cellular Biology
- Georgetown University Medical Center
2What is Bioinformatics?
computer mouse bioinformatics
(information) (biology)
- NIH Biomedical Information Science and Technology
Initiative (BISTI) Working Definition (2000) -
Research, development, or application of
computational tools and approaches for expanding
the use of biological, medical, behavioral or
health data, including those to acquire, store,
organize, archive, analyze, or visualize such
data.
3Molecular Biology Database Collection
(http//nar.oxfordjournals.org/cgi/content/full/35
/suppl_1/D3/DC1)
4Database Collection in Nucleic Acids Res.
52007
Online Access to Database Collection
http//pir.georgetown.edu/pirwww/workshop/2005_dat
abase_update.html
http//www.oxfordjournals.org/nar/database/cap/
6Overview
Database Contents, Search and Retrieval
- Text search / Information retrieval
- Sequence genomics databases
- Protein family databases
- Database of protein functions
- Databases of protein structures
- Proteomics databases
7Entrez Text Searches
(http//www.ncbi.nlm.nih.gov/Entrez/)
Lab
8PubMed Literature Database
(http//www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD
SearchDBPubMed)
Lab
9iProLINK Protein Literature Mining Resource
Text mining for protein phosphorylation
Gene/protein name thesaurus synonyms, ambiguous
names
http//pir.georgetown.edu/iprolink/
Lab
10BioThesaurus Gene/protein name searches -
synonyms, ambiguous names
Synonyms CRYAA crystallin, alpha A CRYA1 HSPB4
http//pir.georgetown.edu/iprolink/biothesaurus
Lab
11RLIMS-P Text mining for protein phosphorylation
http//pir.georgetown.edu/iprolink/rlimsp/
Lab
12UniProt Text Search
(http//www.pir.uniprot.org/cgi-bin/textSearch)
Google type search vs. Boolean searches AND,
OR, NOT
Lab
13PIR Text Search (I)
(http//pir.georgetown.edu/pirwww/search/textsearc
h.html)
Search alpha crystallin A chain that are in
protein families?
Search for synonyms
Lab
14PIR Text Search (II)
Search what crystallins are enzymes and what
families they belong to?
Can you find which crystallins have 3D structure
determined?
Lab
15I. Sequence Genomics Databases
- GenBank An annotated collection of all publicly
available nucleotide and protein sequences. - RefSeq NCBI non-redundant set of reference
sequences, including genomic DNA, transcript
(RNA), and protein products - UniProt Consortium Database Universal protein
resource, a central repository of protein
sequence and function. - Entrez Gene Gene-centered information at NCBI.
- UniGene Unified clusters of ESTs and full-length
mRNA sequences . - OMIM Online Mendelian inheritance in man a
catalog of human genetic and genomic disorders. - Model Organism Genome Databases MGD, RGD, SGD,
Flybase - GeneCards Integrated database of human genes,
maps, proteins and diseases. - SNP Consortium Database International HapMap
Project Genes associated with human disease
(http//www.oxfordjournals.org/nar/database/cap/)
16UniProt Consortium Databases
Universal Protein Resource
http//beta.uniprot.org/
(http//www.uniprot.org)
17UniProt Sequence Report (I)
UniProtKB
Whats the difference between CRYAA_RABIT
CYRBAA?
(http//www.pir.uniprot.org/cgi-bin/unipEntry?idC
RYAA_RABIT)
Lab
18UniProt Report (II) UniRef100 90
UniRef100
(http//www.pir.uniprot.org/cgi-bin/unipEntry?idU
niRef100_P02489)
UniRef90
(http//www.pir.uniprot.org/cgi-bin/unipEntry?idU
niRef90_P02489)
19Entrez Gene Gene centric information
http//www.ncbi.nlm.nih.gov/entrez/query.fcgi?dbg
enecmdRetrievedoptGraphicslist_uids12954ubo
r0_RefSeq
20OMIM Online Mendelian inheritance in man
(http//www.ncbi.nlm.nih.gov/entrez/dispomim.cgi?i
d123580)
21II. Protein Family Databases
- Whole Proteins
- PIRSF Network Classification Based on
Evolutionary Relationship of Whole Protein - COG (Clusters of Orthologous Groups) of Complete
Genomes - PANTHER Proteins Classified into
Families/Subfamilies of Shared Function - ProtoNet Automated Hierarchical Classification
of Proteins - Protein Domains
- Pfam Alignments and HMM Models of Protein
Domains - SMART Protein Domain Families
- CDD Conserved Domain Database
- Protein Motifs
- PROSITE Protein Patterns and Profiles
- BLOCKS Protein Sequence Motifs and Alignments
- PRINTS Compendium of Protein Fingerprints (a
group of conserved motifs) - Integrated Family Databases
- InterPro Integrate Pfam, PRINTS, PROSITES,
ProDom, SMART, PIRSF, SuperFamily
22Protein Clustering
Initial version
COGs (http//www.ncbi.nlm.nih.gov/COG/)
New version Includes Eukaryotic Clusters - KOGs
23PIRSF Full Length Classification iProClass
Family Report
Lab
(http//pir.georgetown.edu/cgi-bin/ipcSF?idSF0022
80)
24Domain Classification Pfam Domain
(http//www.sanger.ac.uk/cgi-bin/Pfam/swisspfamget
.pl?nameCRYAA_RABIT)
(http//pir.georgetown.edu/cgi-bin/ipcEntry?idP02
493)
25Pfam Domain
(http//www.sanger.ac.uk/cgi-bin/Pfam/getacc?PF005
25)
26Protein Motifs PROSITE A database of protein
families and domains. It consists of biologically
significant sites, patterns and profiles.
(http//us.expasy.org/prosite/)
27Integrated Family Classification
- InterPro
- An integrated resource unifying PROSITE, PRINTS,
ProDom, Pfam, SMART, and TIGRFAMs, PIRSF.
(http//www.ebi.ac.uk/interpro/search.html)
Mapping of families
28III. Databases of Protein Functions
- Metabolic Pathways, Enzymes, and Compounds
- Enzyme Classification Classification and
Nomenclature of Enzyme-Catalysed Reactions
(EC-IUBMB) - KEGG (Kyoto Encyclopedia of Genes and Genomes)
Metabolic Pathways - LIGAND (at KEGG) Chemical Compounds, Reactions
and Enzymes - EcoCyc Encyclopedia of E. coli Genes and
Metabolism - MetaCyc Metabolic Encyclopedia (Metabolic
Pathways) - BRENDA Enzyme Database
- UM-BBD Microbial Biocatalytic Reactions and
Biodegradation Pathways - Inter-Molecular interactions and Regulatory
Pathways - IntAct Protein interaction data from literature
and user submission - BIND Descriptions of interactions, molecular
complexes and pathways - DIP Catalogs experimentally determined
interactions between proteins - Reactome - A curated knowledgebase of biological
pathways - BioCarta Biological pathways of human and mouse
- GO Gene Ontology Consortium Database
- Pathway Resources - Pathguide
29Biological Pathway Resource Collection
http//www.pathguide.org/
- Protein-protein interactions
- Metabolic pathways
- Signaling pathways
- Pathway diagrams
- Transcription factors / gene regulatory networks
- Protein-compound interactions
- Genetic interaction networks
30KEGG Metabolic Regulatory Pathways
Lab
- KEGG is a suite of databases and associated
software, integrating our current knowledge - on molecular interaction networks, the
information of genes and proteins, and of
chemical - compounds and reactions. (http//www.genome.ad.
jp/kegg/kegg2.html)
(http//www.genome.ad.jp/dbget-bin/show_pathway?hs
a002204.3.2.1)
31BioCyc EcoCyc/MetaCyc Metabolic Pathways
- The BioCyc Knowledge Library is a collection of
Pathway/Genome Databases (http//biocyc.org/)
32BioCarta Cellular Pathways
(http//www.biocarta.com/index.asp)
33Reactome http//www.reactome.org/
- Collaboration of CSHL, EBI and GO Consortium
- Curated resource of core pathways and reactions
in human biology - Authored by biological researchers of field
experts - Cross-referenced with NCBI, Ensembl and UniProt,
HapMap, KEGG - Inferred orthologous events in 22 non-human
species (mouse, rat)
34Transforming Growth Factor (TGF) beta signaling
Homo sapiens
(http//reactome.org/cgi-bin/eventbrowser?DBgk_cu
rrentFOCUS_SPECIESHomo20sapiensID170834)
Reactome events and objects (including modified
forms and complex)
Event -gtREACT_6879.1 Activated type I receptor
phosphorylates R-SMAD directly Homo sapiens
Object -gt REACT_7364.1 Phospho-R-SMAD
cytosol Event -gt REACT_6760.1 Phospho-R-SMAD
forms a complex with CO-SMAD Homo
sapiens Object -gt REACT_7344.1
Phospho-R-SMADCO-SMAD complex cytosol Event -gt
REACT_6726.1 The phospho-R-SMADCO-SMAD
transfers to the nucleus Object -gt REACT_7382.2
Phospho-R-SMADCO-SMAD complex nucleoplasm
35Protein-Protein Interaction Database - IntAct
(http//www.ebi.ac.uk/intact/)
36Gene Ontology (GO)
(http//www.geneontology.org/)
- Molecular Function - Biological Process -
Cellular Component
37IV. Databases of Protein Structures
- Protein Structure
- PDB Structure Determined by X-ray
Crystallography and NMR - PDBsum Summaries and analyses of PDB structures
- MMDB NCBIs database of 3D structures, part of
NCBI Entrez - SWISS-MODEL Repository Database of annotated
protein 3D models - ModBase Annotated comparative protein structure
models - Structure Classification
- CATH Hierarchical Classification of Protein
Domain Structures - SCOP Familial and Structural Protein
Relationships - FSSP Protein Fold Classification Based on
Structure--Structure Alignment
38PDB Experimental 3D Structure Repository
Rat gamma-crystallin (chain A, B.)
Can you do a text search at PIR to find this
(CRGE_RAT)?
(http//www.rcsb.org/pdb/)
Lab
39PDBsum
Pictorial Database to Provide Summary and
Analysis to PDB Entries
Search
3-D structure summary
2-D structure
(http//www.ebi.ac.uk/thornton-srv/databases/pdbsu
m/)
40Protein Structural Classification (1)
CATH Hierarchical domain classification of
protein structures (http//www.cathdb.info/latest/
index.html)
41Protein Structural Classification (2)
SCOP comprehensive description of structural
and evolutionary relationships between all
proteins whose structure is known.
(http//scop.mrc-lmb.cam.ac.uk/scop/data/scop.b.ht
ml)
42SWISS-MODEL Repository
A database of annotated three-dimensional
comparative protein structure models
(http//swissmodel.expasy.org/repository/smr.php?s
ptr_acCRGE_RATjob2)
43VI. Proteomic Resources
- GELBANK (http//gelbank.anl.gov) 2D-gel patterns
of species with completed genomes. - SWISS-2DPAGE (http//www.expasy.org/ch2d/) index
of 2D-gels - PEP (http//cubic.bioc.columbia.edu/ pep/)
Predictions for Entire Proteomes summarized
analyses of protein sequences - Integr8 (http//www.ebi.ac.uk/integr8/) A
browser for information relating to completed
genomes and proteomes, based on data contained in
Genome Reviews and the UniProt proteome sets - PRIDE (http//www.ebi.ac.uk/pride/) PRoteomics
IDEntifications database Expression Profiling
databases - GPMdb (http//gpmdb.thegpm.org/) Mass Spec
Proteomics Databases
442D-Gel Image Databases
Lab
(http//us.expasy.org/ch2d/)
Part of WORLD-2DPAGE index to 2-D PAGE databases
and services
(http//us.expasy.org/swiss-2dpage/acP02489)
45GPMdb MS Data Search
(http//gpmdb.thegpm.org/)
Craig, et al., J Proteome Res. 2004, 31234-42.
46PRIDE centralized, standards compliant, public
data repository for proteomics data
http//www.ebi.ac.uk/pride/
47Lab
- Text search / Information retrieval
- Literature search and text mining
- Finding synonyms (BioThesaurus)
- Information extraction (e.g., protein
phosphorylation sites) - Find the sequence for the rabbit alpha crystallin
A chain - Find all alpha crystallin A chain classified in
protein families - Search crystallins that have active enzyme
activities - Find crystallins that have determined 3D
structures - Database contents (reports)
- Sequence genomics databases (UniProt)
- Protein family databases (PIRSF)
- Database of protein functions (KEGG)
- Databases of protein structures (PDB)
- Proteomics databases (Swiss-2D)
- Protein Examples
- Rabbit alpha crystallin A (UniProtKB
CRYAA_RABIT/P02493) - Delta crystallin II (Argininosuccinate lyase)
(UniProtKB ARLY2_ANAPL/P24058) - Any additional proteins of your interest for
search and retrieval