Title: BioTrac 25 Proteomics: Principles and Methods
1Tutorial Bioinformatics Resources
(http//pir.georgetown.edu/pirwww/workshop/bioinfo
_resource.html)
- Bio-Trac 25 (Proteomics Principles and Methods)
- October 6, 2006
- Zhang-Zhi Hu, M.D.
- Research Associate Professor
- Protein Information Resource, Department of
- Biochemistry and Molecular Cellular Biology
- Georgetown University Medical Center
2What is Bioinformatics?
computer mouse bioinformatics
(information) (biology)
- NIH Biomedical Information Science and Technology
Initiative (BISTI) Working Definition (2000) -
Research, development, or application of
computational tools and approaches for expanding
the use of biological, medical, behavioral or
health data, including those to acquire, store,
organize, archive, analyze, or visualize such
data.
3Molecular Biology Database Collection
(http//nar.oxfordjournals.org/cgi/content/full/34
/suppl_1/D3/DC1)
4Database Collection in Nucleic Acids Res.
52006
Online Access to Database Collection
http//pir.georgetown.edu/pirwww/workshop/2005_dat
abase_update.html
http//www.oxfordjournals.org/nar/database/cap/
6Overview
Database Contents, Search and Retrieval
- Text search / Information retrieval
- Sequence genomics databases
- Protein family databases
- Database of protein functions
- Databases of protein structures
- Proteomics databases
7Entrez Text Searches
(http//www.ncbi.nlm.nih.gov/Entrez/)
8PubMed Literature Database
(http//www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD
SearchDBPubMed)
9UniProt Text Search
(http//www.pir.uniprot.org/cgi-bin/textSearch)
Google type search vs. Boolean searches AND, OR,
NOT
10PIR Text Search (I)
(http//pir.georgetown.edu/pirwww/search/textsearc
h.html)
Search alpha crystallin A chain and protein
family?
Search for synonyms
11PIR Text Search (II)
Search what crystallins are enzymes ?
Can you find which crystallin that has 3D
structure determined?
12I. Sequence Genomics Databases
- GenBank An annotated collection of all publicly
available nucleotide and protein sequences. - RefSeq NCBI non-redundant set of reference
sequences, including genomic DNA, transcript
(RNA), and protein products - UniProt Consortium Database Universal protein
resource, a central repository of protein
sequence and function. - Entrez Gene Gene-centered information at NCBI.
- UniGene Unified clusters of ESTs and full-length
mRNA sequences . - OMIM Online Mendelian inheritance in man a
catalog of human genetic and genomic disorders. - Model Organism Genome Databases MGD, RGD, SGD,
Flybase - GeneCards Integrated database of human genes,
maps, proteins and diseases. - SNP Consortium Database
- International HapMap Project Genes associated
with human disease
13UniProt Consortium Databases
Universal Protein Resource
(http//www.uniprot.org)
UniProtKB UniRef UniParc
14UniProt Sequence Report (I)
UniProtKB
Whats the difference between CRYAA_RABIT
CYRBAA?
(http//www.pir.uniprot.org/cgi-bin/unipEntry?idC
RYAA_RABIT)
15UniProt Sequence Report (II)
(http//www.pir.uniprot.org/cgi-bin/unipEntry?idU
niRef100_P02489)
UniRef100 90
(http//www.pir.uniprot.org/cgi-bin/unipEntry?idU
niRef90_P02489)
16Entrez Gene Gene centric information
http//www.ncbi.nlm.nih.gov/entrez/query.fcgi?dbg
enecmdRetrievedoptGraphicslist_uids12954ubo
r0_RefSeq
17OMIM Online Mendelian inheritance in man
(http//www.ncbi.nlm.nih.gov/entrez/dispomim.cgi?i
d123580)
18II. Protein Family Databases
- Whole Proteins
- PIRSF A Network Classification System of Protein
Families - COG (Clusters of Orthologous Groups) of Complete
Genomes - ProtoNet Automated Hierarchical Classification
of Proteins - Protein Domains
- Pfam Alignments and HMM Models of Protein
Domains - SMART Protein Domain Families
- CDD Conserved Domain Database
- Protein Motifs
- PROSITE Protein Patterns and Profiles
- BLOCKS Protein Sequence Motifs and Alignments
- PRINTS Protein Sequence Motifs and Signatures
- Integrated Family Databases
- iProClass Superfamilies/Families, Domains,
Motifs, Rich Links - InterPro Integrate Pfam, PRINTS, PROSITES,
ProDom, SMART, PIRSF, SuperFamily
19Protein Clustering
Initial version
COGs (http//www.ncbi.nlm.nih.gov/COG/)
New version Includes Eukaryotic Clusters
20Domain Classification
(http//www.sanger.ac.uk/cgi-bin/Pfam/swisspfamget
.pl?nameCRYAA_RABIT)
(http//pir.georgetown.edu/cgi-bin/ipcEntry?idCRY
AA_RABIT)
21Pfam Domain
(http//www.sanger.ac.uk/cgi-bin/Pfam/getacc?PF005
25)
22Integrated Family Classification
- InterPro
- An integrated resource unifying PROSITE, PRINTS,
ProDom, Pfam, SMART, and TIGRFAMs, PIRSF.
(http//www.ebi.ac.uk/interpro/search.html)
Mapping of families
23PIRSF Full Length Classification iProClass
Family Report
(http//pir.georgetown.edu/cgi-bin/ipcSF?idSF0022
80)
24Protein Motifs PROSITE A database of protein
families and domains. It consists of biologically
significant sites, patterns and profiles.
(http//us.expasy.org/prosite/)
25III. Databases of Protein Functions
- Metabolic Pathways, Enzymes, and Compounds
- Enzyme Classification Classification and
Nomenclature of Enzyme-Catalysed Reactions
(EC-IUBMB) - KEGG (Kyoto Encyclopedia of Genes and Genomes)
Metabolic Pathways - LIGAND (at KEGG) Chemical Compounds, Reactions
and Enzymes - EcoCyc Encyclopedia of E. coli Genes and
Metabolism - MetaCyc Metabolic Encyclopedia (Metabolic
Pathways) - BRENDA Enzyme Database
- UM-BBD Microbial Biocatalytic Reactions and
Biodegradation Pathways - Inter-Molecular interactions and Regulatory
Pathways - IntAct Protein interaction data from literature
and user submission - BIND Descriptions of interactions, molecular
complexes and pathways - DIP Catalogs experimentally determined
interactions between proteins - Reactome - A curated knowledgebase of biological
pathways - BioCarta Biological pathways of human and mouse
- GO Gene Ontology Consortium Database
26KEGG Metabolic Regulatory Pathways
- KEGG is a suite of databases and associated
software, integrating our current knowledge - on molecular interaction networks, the
information of genes and proteins, and of
chemical - compounds and reactions. (http//www.genome.ad.
jp/kegg/kegg2.html)
(http//www.genome.ad.jp/dbget-bin/show_pathway?hs
a002204.3.2.1)
27BioCyc EcoCyc/MetaCyc Metabolic Pathways
- The BioCyc Knowledge Library is a collection of
Pathway/Genome Databases (http//biocyc.org/)
28BioCarta Cellular Pathways
(http//www.biocarta.com/index.asp)
29Reactome http//www.reactome.org/
30BIND Protein-Protein Interaction
(http//www.bind.ca/)
31Gene Ontology(http//www.geneontology.org/)
Three GOs Molecular Function Biological
Process Cellular Component
32IV. Databases of Protein Structures
- Protein Structure
- PDB Structure Determined by X-ray
Crystallography and NMR - PDBsum Summaries and analyses of PDB structures
- MMDB NCBIs database of 3D structures, part of
NCBI Entrez - SWISS-MODEL Repository Database of annotated
protein 3D models - ModBase Annotated comparative protein structure
models - Structure Classification
- CATH Hierarchical Classification of Protein
Domain Structures - SCOP Familial and Structural Protein
Relationships - FSSP Protein Fold Classification Based on
Structure--Structure Alignment
33PDB Experimental 3D Structure Repository
Rat gamma-crystallin, chain A, B.
Can you do a text search at PIR to find this?
(http//www.rcsb.org/pdb/)
34PDBsum
Summary and Analysis (http//www.ebi.ac.uk/thornto
n-srv/databases/pdbsum/)
Search
3-D structure summary
2-D structure
35Protein Structural Classification (1)
CATH Hierarchical domain classification of
protein structures (http//www.biochem.
ucl.ac.uk/bsm/cath_new/)
36Protein Structural Classification (2)
SCOP comprehensive description of structural
and evolutionary relationships between all
proteins whose structure is known.
(http//scop.mrc-lmb.cam.ac.uk/scop/data/scop.b.ht
ml)
37SWISS-MODEL Repository
A database of annotated three-dimensional
comparative protein structure models
(http//swissmodel.expasy.org/repository/smr.php?s
ptr_acCRGE_RATjob2)
38VI. Proteomic Resources
- GELBANK (http//gelbank.anl.gov) 2D-gel patterns
of species with completed genomes. - SWISS-2DPAGE (http//www.expasy.org/ch2d/) index
of 2D-gels - PEP (http//cubic.bioc.columbia.edu/ pep/)
Predictions for Entire Proteomes summarized
analyses of protein sequences - Integr8 (http//www.ebi.ac.uk/integr8/) A
browser for information relating to completed
genomes and proteomes, based on data contained in
Genome Reviews and the UniProt proteome sets - PRIDE (http//www.ebi.ac.uk/pride/) PRoteomics
IDEntifications database Expression Profiling
databases - GPMdb (http//gpmdb.thegpm.org/) Mass Spec
Proteomics Databases
392D-Gel Image Databases (1)
(http//us.expasy.org/ch2d/2d-index.html)
Part of WORLD-2DPAGE Index to 2-D PAGE
databases and services
(http//us.expasy.org/cgi-bin/nice2dpage.pl?P02489
)
402D-Gel Image Databases (2)
(http//gelbank.anl.gov/2dgels/index.asp)
41GPMdb MS Data Search
http//gpmdb.thegpm.org/
Craig, et al., J Proteome Res. 2004, 31234-42.
42iProLINK Protein Literature Mining Resource
Text mining of protein phosphorylation
Gene/protein name thesaurus synonyms, ambiguous
names
http//pir.georgetown.edu/iprolink/
43Lab
- Rabbit alpha crystallin A (UniProt
CRYAA_RABIT/P02493)
- Delta crystallin II (Argininosuccinate lyase)
(UniProt ARLY2_ANAPL/P24058)
- Choose additional protein IDs to browse the
variety of molecular biology databases each
sequence report links to.