- PowerPoint PPT Presentation

About This Presentation
Title:

Description:

Identification Quantification MALDI, MS/MS Store peak lists and all meta data PMF MS/MS DIGE LC-MS & Tags Proteins ... Protein translations for coding regions are ... – PowerPoint PPT presentation

Number of Views:74
Avg rating:3.0/5.0
Slides: 82
Provided by: Gia78
Category:

less

Transcript and Presenter's Notes

Title:


1
Proteomics Bioinformatics
MBI, Master's Degree Program in Helsinki, Finland
Lecture 4
10 May, 2007
Sophia Kossida, BRF, Academy of Athens,
Greece Esa Pitkänen, Univeristy of Helsinki,
Finland Juho Rousu, University of Helsinki,
Finland
2
Proteomics and biology /Applications
Protein Expression Profiling Identification of
proteins in a particular sample as a function of
a particular state of the organism or cell
Proteome Mining Identifying as many as possible
of the proteins in your sample
Post-translational modifications Identifying how
and where the proteins are modified
DATABASES

Functional proteomics
TOOLS
Protein-protein interactions Protein-network
mapping Determining how the proteins interact
with each other in living systems
Protein quantitation or differential analysis
Structural Proteomics
3
Databases and tools
Melanie
4
General workflow of proteomics analysis
Proteins/peptides
Digestion and/or separation
2D gel image aquisition and storage
External data sources taxonomy, ontologies,
bibliography Applications Systems biology
(pathways, interactions..) biomarker-discovery,
drug targets
MALDI, MS/MS
Store peak lists and all meta data
Identification Quantification
PMF MS/MS DIGE LC-MS Tags
5
General workflow of proteomics analysis
Digestion and/or separation
Make 2D
Proteins/peptides
2D Page data bases Swiss 2D PAGE, Gelbank,
Cornelia, WordPAGE
Imaging tools Melanie, PDQuest Progenesis Delta
2D
Sequence data bases EMBL Nucleotide Sequence
Database GenBank UniProtKB/Swiss-Prot TrEMBL
Ensemble EST database PIR
Storing/ organising Proteincsape MSight
KEGG PDB
DIP OMIM Reactome PROSIT Pfam SPIN BOND STRING Am
iGO David PubMed MEDLINE
MALDI, MS/MS
Mascot Sequest Aldente Popitam Phenyx FindMod Prof
ound PepFrag MS-Fit OMSSA Search XLinks TagIdent
Identification Quantification
6
General workflow of proteomics analysis
Make 2D
2D Page data bases
  • Imaging Softwares
  • The ability to compare two gels (images) and
    then identify differently expressed spots
  • Melanie
  • PDQuest
  • Progenesis
  • Delta 2D
  • 2D gel databases
  • Data integration on the web
  • Image data and textual information
  • Swiss 2D PAGE
  • Gelbank
  • Cornelia
  • WordPAGE

Proteinscape platform for storing, organizing
data MSight -representation of mass spectra along
with data from the separation
7
2D Gel Databases
  •  

Swiss-2DPAGE www.expasy.ch GelBank
http//www.gelscape.ualberta.ca8080/htm/gdbIndex.
html  Cornea 2D-PAGE http//www.cornea-proteom
ics.com/ World 2DPAGE, Index of 2D gel
databases http//ca.expasy.org/ch2d/2d-index.html
8
Swiss 2D PAGE viewer
9
Gel bank
10
Cornea
11
World-2DPAGE
http//ca.expasy.org/ch2d/2d-index.html
12
Make 2D database
A software package to create, convert, publish,
interconnect and keep up to date 2DE-databases.
Provided by ExPASY The database is queryable via
description, accession or spot clicking. Cross-ref
erences are provided to other federated 2D PAGE
database entries, Medline and SWISS-PROT Entries
are linked to images showing the experimentally
determined and theoretical protein
locations. Search via clickable images,
-keywords
Data can be marked to be public, as well as fully
or partially private. An administration Web
interface, highly secured, makes external data
integration, data export, data privacy control,
database publication and versions' control a very
easy task to perform.
It runs on most UNIX-based operating systems
(Linux, Solaris/SunOS, IRIX). Being continuously
developed, the tool is evolving in concert with
the current Proteomics Standards Initiative of
the Human Proteome Organization (HUPO).
13
Federated database
A collection of databases that are treated as one
entity and viewed through a single user interface
(pc.mag.com)
Robustness Consistency Maintenance of the
database Data quality
Limitations of current databases Do not contain
strict/detailed descriptions of protocol
(buffers, sample volume, staining techniques all
important information for gel comparisons). Design
ed as 2D (and not proteomics) databases and
therefore not readily expandable to incorporate
other proteomics data e.g. MS, MDLC. Designed for
reference gels, not on-going projects.
14
Guidelines for building a federated 2-DE database
http//ca.expasy.org/ch2d/fed-rules.html
Individual entries in the database must be
accessible by a keyword search. Other methods are
possible but not required. The database must be
linked to other databases by active hypertext
cross-references, linking together all related
databases. Database entries must be at least
linked to the main index. A main index has to be
supplied that provides a means of querying all
databases through one unique query point.
Individual protein entries must be available
through clickable images. 2DE analysis software
designed for use with federated databases, must
be able to access individual entries in any
federated 2DE databases.
for a complete reference, see Appel et al.,
Electrophoresis 17, 1996, 540-546, 1996)
15
Image analysis software
ImageMaster2D/ Melanie PDQuest (Bio-Rad,
USA) Progenesis (Nonlinear, UK) Delta2D
(Decodon, Germany)
16
Melanie
http//au.expasy.org/melanie/
17
Melanie
http//www.2d-gel-analysis.com/
18
PDQuest
http//www.bio-rad.com/
19
Progenesis
http//www.nonlinear.com/products/progenesis/
20
Delta 2D
http//www.decodon.com/Solutions/Delta2D/
21
ProteinScape
Platform for storing, organizing, analyzing data
generated during the proteomics workflow.
  • Hierarchy
  • Project
  • Sample
  • Gel
  • Spots
  • MS Data
  • Search Events

22
MSight
Specifically developed for the representation of
mass spectra along with data from the separation
http//www.expasy.org/MSight
23
General workflow of proteomics analysis
Sequence data bases EMBL Nucleotide Sequence
Database GenBank UniProtKB/Swiss-Prot
TrEMBL Ensemble EST database PIR
MALDI, MS/MS
Store peak lists and all meta data
PMF MS/MS DIGE LC-MS Tags
Identification Quantification
24
EMBL Nucleotide Sequence Database
Collaboration between GenBank (USA) and DNA
Database of Japan (DDBJ) and EBI. New collected
sequence data is exchanged, and each database is
updated daily.
25
EBI
26
GenBank
Gen Bank is the NIH genetic sequence database, an
annotated collection of all publicly available
DNA sequences.
GenBank is available for searching at NCBI
Each entry includes a concise description of the
sequence, the scientific name and the taxonomy of
the source organism, and a table of features that
identifies coding regions and other sites of
biological significance, such as transcription
units, sites of mutations or modifications and
repeats. Protein translations for coding regions
are included in the feature table. Bibliographic
references are included along with a link to the
Medline unique identifier for all published
sequences.
http//www.psc.edu/general/software/packages/genba
nk/genbank.html
27
Search GenBank
http//www.ncbi.nlm.nih.gov/Genbank/index.html
28
DDBJ
29
INSDC
30
UniProt
Universal Protein Resource
  • Joining the information contained in
    UniProtKB/Swiss-Prot, UniProteKB/TrEMBL and PIR.
  • It is comprised of three components
  • UniProt Knowledge base (curated protein
    information, including function, classification,
    and cross-reference.
  • UniProt Reference Clusters (combines closely
    related sequences into a single record to speed
    searches.)
  • UniProt Archive (is a repository, reflecting the
    history of all protein sequences)

31
ExPASy Proteomics Server
Expert Protein Analysis System Proteomics server
of the Swiss Institute of Bioinformatics (SIB) is
dedicated to the analysis of protein sequences
and structures as well as 2D-PAGE.
http//ca.expasy.org/
32
UniProtKB/Swiss-Prot
The UniProt KB/Swiss-Prot Protein Knowledgebase
is a annotated protein sequence database
established in 1986. It is maintained
collaboratively by the SIB (Swiss Institute of
Bioinformatics) and the European Bioinformatics
Institute (EBI)
http//ca.expasy.org/sprot/
33
Swiss Prot
34
TrEMBL
  • Uni ProtKB/TrEMBL is a computer-annotated protein
    sequence database complementing the
    UniProtKB/Swiss-Prot Protein Knowledgebase.
  • It contains the translations of all coding
    sequences (CDS) present in the EMBL/GenBank/DDBJ
    Nucleotide Sequence Databases and also protein
    sequences extracted from the literature or
    submitted to UniProtKB/Swiss-Prot.
  • The database is enriched with automated
    classification and annotation.

35
PIR
http//pir.georgetown.edu/pirwww/
36
ESTdb
Expressed Sequence Tags, EST is a unique DNA
sequence within a coding region of a gene that is
useful for identifying full-length genes and
serves as a landmark for mapping. The dbEST is a
division of GenBank that contains sequence data
and other information on singke-pass cDNA
sequences, from a number of organisms.
http//www.ncbi.nlm.nih.gov/dbEST/
37
Ensemble
Ensemble is a joint project between the EMBL-EBI
and the Welcome Trust Sanger Institute that aims
at developing a system that maintains automatic
annotation of large eukaryotic genomes. Access to
all the software and data is free and without
constraints of any kind.
http//www.ebi.ac.uk/ensembl/
38
IPI- International Protein Index
39
General workflow of proteomics analysis
Mascot Sequest Aldente Popitam Phenyx FindMod Prof
ound PepFrag MS-Fit OMSSA Search XLinks TagIdent
MALDI, MS/MS
Store peak lists and all meta data
PMF MS/MS DIGE LC-MS Tags
Identification Quantification
40
Proteomics tools
http//restools.sdsc.edu/biotools/biotools19.html
http//ca.expasy.org/tools/
41
(No Transcript)
42
PROWL
43
Identification and Characterization Tools
PMFdata
MS/MS data
Mascot (Matrix Science) Aldente
(ExPasy) Profound (Rockefeller
University) MS-Fit (Prospector UCSF)
Sequest Mascot OMSSA X!Hunter
44
Identification and Characterization Tools
Popitam (ExPASy, SIB) Phenyx GeneBio,
Swizerland) PepFrag (Rockefeller University,
USA) SearchXLinks (Caesar, Germany)
45
Popitam
Popitam is designed to characterize peptides
with unexpected modification (e.g.
post-translational modifications or mutations) by
tandem mass spectrometry (ExPASy, SIB)
http//expasy.org/cgi-bin/popitam/help.pl
46
Popitam results
47
Phenyx
Phenyx is a software platform for the
identification and characterization of proteins
and peptides from mass spectrometry
data. Developed by GeneBio in collaboration with
SIB
http//www.phenyx-ms.com/about/about_phenyx.html
48
PEPFRAG
Searches known protein sequences with peptide
fragment mass information
http//prowl.rockefeller.edu/
49
SearchXLinks
http//www.searchxlinks.de/
Analysis of mass spectra of modified,
cross-linked, and digested proteins, the amino
acid of which is known
50
Identification and Characterization Tools
FindMod predicts potential protein
post-translational modifications (PTM) and finds
potential single amino acid substitutions in
peptides. FindPept identifies peptides that
result from unspecific cleavage of proteins from
experimental masses, taking into account
artefactual chemical modifications,
posttranslational modifications (PTM) and
protease autolytic cleavage. GlycoMod predicts
possible oligosaccharide structures that occur on
proteins from their experimentally determined
masses.
AACompIdent achieves identification with amino
acid composition TagIdent identifies proteins
with isoelectric point, pI, molecular weight, MW,
and sequence tag generating a list of proteins
close to a given pI and Mw. Multident achieves
cross-species identification with multiple
parameters (pI, Mw, sequence tag and peptide mass
fingerprinting data)
http//au.expasy.org/tools/findmod/
51
General workflow of proteomics analysis
KEGG PDB
DIP OMIM Reactome PROSIT Pfam SPIN BOND STRING Am
iGO David PubMed MEDLINE
MALDI, MS/MS
Store peak lists and all meta data
PMF MS/MS DIGE LC-MS Tags
Identification Quantification
52
KEGG
KEGG Kyoto Encyclopedia of Genes and Genomes
  • Organism specific entry points
  • -KEGG Organisms
  • Subject specific entry points
  • -DRUG, GLYCAN, REACTION, KAAS

http//www.genome.jp/kegg/kegg2.html
53
KEGG
KEGG is a biological systems database
integrating both molecular building block
information and higher-level systematic
information.
Manually drawn pathway maps representing our
knowledge on the molecular interaction and
reaction networks for metabolism, other cellular
processes, and human diseases. Functional
hierarchies and binary relations of KEGG objects,
including genes and proteins, compounds and
reactions, drugs and diseases, and cells and
organisms. Gene catalogs of all complete
genomes and some partial genomes with ortholog
annotation (KO assignment), enabling KEGG PATHWAY
mapping and BRITE mapping. A composite database
of chemical substances and reactions representing
our knowledge on the chemical repertoire of
biological systems and environments.
54
Search Pathway
Carbon fixation
55
Search Pathway
56
Pathways _motifs
57
Reactome
58
Reactome
59
PubMed
http//www.ncbi.nlm.nih.gov/entrez/query.fcgi?DBp
ubmed
60
David
http//david.abcc.ncifcrf.gov/home.jsp
61
Protein Data Bank
62
OMIM
This database is a catalog of human genes and
genetic disorders. The database contains textual
information and references. It also contains
links to MEDLINE and sequence records
http//www.ncbi.nlm.nih.gov/entrez/query.fcgi?dbO
MIM
63
Protein family classification
PROSITE (ExPASY) Pfam (Sanger
Institute) SMART (EMBL)
64
Prosit
A Pseudo-Rotational Online Service and
Interactive Tool
Proteins can be grouped on the basis of their
sequences, into a limited number of
families. Some regions have been better
conserved than others during evolution. These
regions are generally important for the function
of a protein and/or the maintenance of the three-
dimensional structure. By analyzing the constant
and variable properties of such groups of similar
sequences, it is possible to derive a signature
for a protein family or domain, which
distinguishes its members from all other
unrelated proteins.
ww
http//au.expasy.org/prosite/
65
PROSIT
66
PROSIT
67
PROSIT
68
Pfam
Multiple sequence alignments and HMMs of protein
domains and families, at Sanger Institute.
http//www.sanger.ac.uk/Software/Pfam/help/index.s
html
69
Browse interactions
70
http//smart.embl-heidelberg.de/
71
Structure data bases/interactions
STRING (EMBL) BOND (Unleashed Informatics) Cytos
cape DIP (UCLA) iHOP SPIN-PP (protein-protein
interfaces in the PDB) MIPS (Mammalian
Protein-Protein Interaction Database) InterAct
(protein interactions from literature curation)
72
STRING
http//string.embl.de
73
STRING search results
74
STRING graphical
75
STRING_ new node
76
BOND
BOND
The Biomolecular Object Network Databank
http//bond.unleashedinformatics.com
77
Cytoscape
Cytoscape is an open source bioinformatics
software platform for visualizing molecular
interactions with gene expression profiles and
other state data.
78
Node label position can be controled by new GUI
in VizMapper.
79
Cytoscape_ plugins
  • Plugins available for network and molecular
    profile analysis.
  • for example
  • Filter the network
  • Find active subnetworks/ pathway modules
  • Find clusters

A tool to determine which Gene Ontology (GO)
categories are statistically over respresented in
a set of genes or a subgraph of a biological
network.
80
Database of Interacting Proteins
The DIP database catalogs experimentally
determined interactions between proteins. It
combines information from a variety of sources to
create a single, consistent set of
protein-protein interactions.
http//dip.doe-mbi.ucla.edu/
81
iHOP
http//www.ihop-net.org/UniPub/iHOP/
Write a Comment
User Comments (0)
About PowerShow.com