Computer Analysis of Mass Spectrometry Data - PowerPoint PPT Presentation

1 / 82
About This Presentation
Title:

Computer Analysis of Mass Spectrometry Data

Description:

Simplest form of protein identification (not sequencing) ... Supernatant transferred to fresh eppendorf. Sample transferred to target plate ... – PowerPoint PPT presentation

Number of Views:151
Avg rating:3.0/5.0
Slides: 83
Provided by: proteinseq
Category:

less

Transcript and Presenter's Notes

Title: Computer Analysis of Mass Spectrometry Data


1
Computer Analysis ofMass Spectrometry Data
  • David Perkins
  • Proteomics Section,
  • Hammersmith Hospital Campus,
  • Imperial College School of Medicine.
  • david.perkins_at_imperial.ac.uk

2
Introduction
  • Background to protein sequencing and
    identification using mass spectrometry (MS)
  • Software and computational techniques for
    analysis of MS and MS/MS data.

3
(No Transcript)
4
Peptide Mass Fingerprinting
  • Simplest form of protein identification (not
    sequencing)
  • Majority of proteins in a sample are identified
    using this technique
  • Involves a simple enzymatic digest of a protein
    and the measurement of the mass of the resultant
    peptide fragments
  • Concentrations as low as 10 femtomoles (10-15)

5
Sample Preparation for Peptide Mass Fingerprinting
  • Excise band from gel
  • Tryptic Digestion of gel fragment
  • Supernatant transferred to fresh eppendorf
  • Sample transferred to target plate

6
Enzymatic Cleavage
Peptide Fragments
Native Protein
Enzyme
7
Sample Preparation Robot
8
MALDI Mass Spectrometer
  • Matrix Assisted Laser Desorption Ionisation
  • Peptides are mixed with matrix and then applied
    to wells on a target plate
  • Peptide ions are generated by a LASER firing at
    the target plate
  • The time of firing of the LASER and the arrival
    time of the ions at the detector are known, the
    relative masses can then be calculated
  • Only singly charged ions are generated, other
    types of spectrometer may generate multiply
    charged ions

9
MALDI Internals
10
Micromass MALDI
11
Typical Fingerprint Spectrum
12
Isotopic Cluster
13
Poorly Resolved Peak
14
Protein Identification Using Peptide Mass
Fingerprinting
  • Produce a theoretical digest of all the proteins
    in a database with a specific enzyme
  • Compare these theoretical masses with
    experimentally observed masses
  • Assign a score to matching peptides/proteins

15
Which Observed Masses to Include ?
The optimum dataset for a peptide mass
fingerprint is all the correct peptides and none
of the wrong ones ! By correct, we mean that the
textbook cleavage rules were followed. In
practice, this rarely (if ever) happens.
  • Enzymatic cleavage not perfect (partials)
  • Sequence coverage may be poor
  • Mixtures and contamination
  • Identifying real peaks
  • Residue modifications
  • Mass accuracy

16
Choice of Enzyme
  • Enzymes of low specificity are next to useless as
    they produce a complex mixture of similar masses
  • Enzymes of high specificity may produce no
    cleaved peptide at all
  • Trypsin especially good since this ensures basic
    residues are at the C terminal of a peptide and
    so reduces their disruptive influence on peptide
    fragmentation

17
Enzyme Specificity
Enzyme Cleave At Dont Cleave
N or Cterm
Trypsin KR P
C
Lys-C K P
C
Lys-C/P K
C
Arg-C R P
C
V8-E E P
C
V8-DE DE P
C
Chymotrypsin FYWLIVM P C
18
Missed Cleavages
  • Digests are usually not perfect
  • Cleavage sites may be missed by an enzyme
  • These incorrectly cleaved peptides are known as
    partials
  • Reduce the discrimination of a search

19
Search Masses
  • Select masses which are large enough to provide
    discrimination
  • Larger masses are more likely to be imperfect
    cleavages
  • Masses smaller than 500 Da likely to be matrix
  • With Trypsin, a mass range of 1000 to 3000 Da is
    usually safe
  • Mass tolerance is important in obtaining good
    discrimination

20
Constraining Protein Mass
  • To increase discrimination, the mass of the
    intact protein can be used in a search
  • This is dangerous since this may be just a
    fragment of an entire protein

21
Autolysis Products
  • Some digests may be dominated by the autolysis
    peaks of the enzyme used
  • In these cases, the known masses of these
    products may be filtered

22
Residue Modifications
  • Some residues may be modified during the sample
    preparation procedure
  • This introduces discrepancies in the expected and
    observed masses
  • For example, Met residues are often oxidised

23
MOWSE
  • One of the first programs for identifying
    proteins by peptide mass fingerprinting
  • Developed by Darryl Pappin and Alan Bleasby
  • Developed alongside the OWL non-redundant protein
    database

24
Problems with MOWSE
  • Databases had to be pre-indexed, these indexes
    are large and slow to build
  • Does not handle variable modifications
  • Indexing means that databases cant be regularly
    updated easily
  • Limited functionality

25
MASCOT
  • Take advantage of multi-processor systems
  • Totally web based
  • No pre-indexing of databases
  • Increased functionality
  • Copes with multiple modifications
  • Easily expandable
  • Increased speed

26
Search Speed
Search speed is very important as databases
increase in size and automation leads to a high
throughput of samples. Also, if the algorithms
are efficient more elaborate searches may be
undertaken, for instance with large numbers of
variable residue modifications and different
mass tolerance to attempt to make more sense of
data derived from mixtures or with contamination
  • Ability to use multiple processors when available
  • Very efficient I/O, databases may also be mapped
    to memory
  • Efficient cleavage site and mass calculation

27
Thread Models
  • Boss/Worker
  • Peer
  • Pipeline
  • MASCOT is based on the Boss/Worker model

28
Boss/Worker Model
Output
The Boss accepts input and then distributes the
work to other threads
29
Peer Model
Output
Output
Output
Each Thread is responsible for its own input
30
Pipeline Model
Input Stream
Output
Thread A
Thread B
Thread C
A single thread accepts input, passing the data
on to the next thread for further processing
31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
(No Transcript)
35
Peptide Mass Fingerprinting -Related Search
Methods
  • Masses may be combined with sequence information
    1234.5 seq(c-ABCD) seq(EF)
  • These searches are very valuable as even small
    amounts of sequence information may be very
    discriminating
  • Sequence information is derived from the partial
    interpretation of a MS/MS spectrum
  • Know as the sequence tag method

36
Composition Queries
  • Composition information may also be used with
    mass information to refine queries
  • Chemical or enzymatic analysis, such as N
    terminal analysis with Edman, may give
    composition information
  • A typical query would
    be 1234.5 comp(2H0M)

37
MASCOT Queries
  • One of the most powerful features of MASCOT is
    the ability to mix all the types of query in one
    search
  • MASCOT allows the user to specify a particular
    species to further increase search discrimination

38
Databases Searched with Peptide Mass Fingerprint
Data
  • Non-identical protein databases are the ideal
  • EST sequences are too short to contain meaningful
    information for these searches
  • Non-redundant databases may be problematic
  • MASCOT translates nucleic acid databases on the
    fly

39
MSDB
  • A non-identical protein sequence database
    designed for mass spectrometry searches
  • Additional information, such as multiple species
    lines, in the textual information
  • De-convolution of SWISSPROT and other sequences
  • Nightly updates
  • Links to source databases

40
Is The Protein Identified ?
  • Most samples are identified using just peptide
    mass fingerprinting
  • With the growth of databases, this trend will
    continue
  • Some samples do not have representatives in any
    of the databases, to sequence these proteins more
    analysis is required

41
(No Transcript)
42
MS/MS Analysis
  • Also known as tandem MS
  • Individual peptides from the enzymatic digests
    are fragmented further
  • From this ladder sequences may be reconstructed
  • Much more discriminating search than simple
    peptide mass fingerprinting

43
MS/MS Analysis
  • Carried out on nanospray/electrospray mass
    spectrometers
  • Rather than spotted on a target plate, the sample
    is introduced through an inlet from a capillary
  • Peptides identified by the MALDI analysis are
    fragmented inside the mass spectrometer and the
    resultant daughter ions observed

44
Stylized Nanospray Mass Spectrometer
45
Micromass QTOF
46
Finnigan Ion Trap
47
Daughter Ions
  • Unlike the MALDI, ions produced by
    electrospray/nanospray machines may carry
    multiple charges
  • Various types of ions are produced, categorized
    by their charge and their direction in the
    peptide sequence
  • Fortunately the peptides fragment at the peptide
    bonds

48
B and Y Fragment Ions
Y-ions from C to N terminus
Y ion
Y ion
Y ion
3
1
2
O
O
O
O
C
C
C
C
OH
NH
2
B ion
B ion
B ion
3
1
2
B-ions from N to C terminus
49
















Typical MS/MS Spectrum Mass is on the X axis,
intensity on the Y axis








50
MASCOT Searches with MS/MS Data
  • In a similar fashion to peptide mass
    fingerprinting, the predicted fragment ion mass
    from each peptide of a database sequence are
    calculated
  • The calculated and observed ion masses are
    compared and given a score
  • Individual peptide scores are combined to give a
    protein score

51
Problems with MS/MS data
  • The type of daughter ions produced may be large
    and are dependant on the machine and analytic
    procedure used
  • Searches tend to be used with a no enzyme
    option which introduces a large number of
    calculations as peptide boundaries cant be
    predicted
  • Residue modifications are far more difficult to
    handle, the number of mass permutations being
    very large

52
(No Transcript)
53
(No Transcript)
54
(No Transcript)
55
(No Transcript)
56
(No Transcript)
57
(No Transcript)
58
(No Transcript)
59
(No Transcript)
60
(No Transcript)
61
(No Transcript)
62
Databases Searched with MS/MS Data
  • Non-identical protein databases are the ideal
  • EST databases translated in 6 frames are very
    useful as individual peptides may be identified
  • Translated nucleic acid databases
  • Non-redundant databases create problems

63
De-Novo Sequencing
  • If the protein is still not identified, the
    sequence of a peptide has to be reconstructed
    from the MS/MS data
  • Very time consuming and demands a great deal of
    skill, noisy data is very problematic
  • Sequencing is carried out by finding mass
    differences between peaks that correspond to
    amino acid masses

64
Tags
  • Easy to find initial masses in ladder
  • Tags modify the fragmentation of the peptide
  • Reduce isobaric problems
  • Neutralise the adverse effects of certain
    residues on peptide fragmentation

65
Example Tags
66





2044.9 1933.7 1862.7 1763.7 1634.6 1521.6
1450.4 1321.4 1220.3 1106.5 959.4 831.3
760.4 689.3 618.2 517.0 446.3 375.1






Gln Ala Val Glu Xle
Ala Glu Thr Asn Phe
Gln Ala Ala Ala Thr
Ala Ala Thr Lys












256.1 327.3 426.2 555.5 668.2
739.2 868.2 969.2 1083.1 1230.6 1358.3
1429.4 1500.4 1571.5 1672.3 1743.5 1814.4 1915.5



100
2.71
y-ion series b-ion series
80
60
40
1571.45
20
1725.29
1814.37
500
1000
1500
2000
67
(No Transcript)
68
Automation
Automation is critical to maintain a high
throughput of samples. It is essential to produce
closer integration of machine control and data
analysis software
  • New generation of Mass Spectrometers, quadrupole
    machines with LASER sources
  • Laboratory Information Management Systems
  • Automated sample preparation

69
Laboratory Information Management System
Mass Spectrometer
Data Reduction Peak Processing
Submission into Microarray/Proteomics database
MASCOT Search Engine
Re-search after database updates
Protein Identified
Protein not Identified
Automatic report generation for sample submitter
Via WWW
Results database
70
Future of MASCOT
  • Homology searching
  • Post processing of results for easier
    interpretation
  • Distributed processing - Linux cluster. MASCOT is
    based on the Boss/Worker model so is easy to port
  • Development of a standard API to allow simpler
    automation and extensions to functionality

71
MASCOT Homology Searching
  • Identification dependant on at least some of the
    peptide sequences being identical to a database
    sequence
  • Homology searching (for instance allowing common
    substitutions to occur by default) would overcome
    this limitation
  • Lead to less selectivity and also increased
    search times

72
Post processing of Results
  • Allows easier interpretation by, e.g. removing
    all identical peptide matches from the report
    page
  • Text mining to interpret the results of a search,
    for instance are all the proteins identified
    involved in a particular cellular process ?
  • Important when dealing with quantitative studies

73
Distributed Processing
  • Ability to use as much processing power as
    possible when dealing with high throughput data,
    for instance the thousands of peptides from LC
    MS/MS
  • Implemented in MASCOT using a MPI style mechanism
  • has the ability to dynamically add/remove
    processors for data processing

74
Processing Farm
75
Standard Programming Interface
  • A standard interface to MASCOT routines allowing
    users to, e.g produce a bespoke interface
  • Allows integration with instrument control
    software (although this is dependant on the
    goodwill of the manufacturers !)

76
MSDB developments
  • Inclusion of variable splicing regions from
    SWISSPROT
  • Integration of textual information from all
    source databases
  • Clustering of highly similar sequences into
    families with extra annotation
  • Inclusion of more translations from nucleic acid
    databases

77
Identification of proteins using short peptide
sequences
  • FASTS is most commonly used tool at the moment,
    but it is relatively slow and doesnt take into
    account peptide masses and other information
  • New functionality for MASCOT based on tri-peptide
    indices and using mass and residue modification
    information

78
MS/MS Data Mining
  • MS/MS data may contain useful information in
    addition to sequence
  • Statistical methods for mining MS/MS data for, eg
    fragmentation efficiency etc
  • Predictive tool for de-novo sequencing
  • Understanding of physical/chemical processes
    involved in fragmentation

79
PEDRo
  • Software and schemata for modelling, capturing
    and disseminating proteomics experimental data
  • Lack of this system hinders the handling,
    exchange and dissemination of proteomics data
  • Implemented in XML
  • Analogous to the MIAME guidelines for
    transcriptomics
  • http//pedro.man.ac.uk

80
PEDRo Schema
81
Matrix Science
  • Dr. John Cottrell
  • Dr. David Creasy
  • URL http//www.matrixscience.com

82
Imperial College
  • Prof. Darryl Pappin (now director of research
    ABI)
  • Dr Mike Bartlet-Jones
  • Dr Inga Bellahn
  • URL http//csc-fserve.hh.med.ic.ac.uk
Write a Comment
User Comments (0)
About PowerShow.com