NoDupe algorithm to detect and group similar mass spectra' - PowerPoint PPT Presentation

About This Presentation
Title:

NoDupe algorithm to detect and group similar mass spectra'

Description:

Identifying peptides from spectral collections is time consuming. ... Microtubule-Associated protein sample: MAP purified from bovine brains ... – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 22
Provided by: syst50
Category:

less

Transcript and Presenter's Notes

Title: NoDupe algorithm to detect and group similar mass spectra'


1
NoDupe algorithm to detect and group similar
mass spectra.
2
  • Reducing the number of similar spectra in
    proteomic experiments Why?
  • Identifying peptides from spectral collections is
    time consuming.
  • Detecting similarities reduces number of spectra
    to be processed.
  • Dynamic exclusion feature of the mass
    spectrometer does not eliminate all duplicate
    spectra.
  • a. Peptides may elute over a period of time
  • b. Peptide mixture may have high complexity.

3
MS/MS spectra from the same peptide may look
different
  • Signal to noise ratio.
  • Variations in collision energy.
  • Random noise.

4
Finding degree of similarity between two spectra
  • Dot product comparison used to find similarity.
  • Vectors are built for each spectra.
  • Greater angles imply greater differences between
    spectra.
  • Angles nearing zero imply considerable
    similarity.

5
NoDupe Algorithm
  • Created in Java programming language.
  • Spectra are grouped on the based on their
    similarities.
  • Preprocessing done to reduce complexity.
  • Optionally removes duplicate spectra from each LC
    run retaining only one representative spectrum.

6
NoDupe Preprocessing
  • All fragment ions in a run are assigned to bins
    1.0057 m/z ions wide.
  • Intensities of succeeding peaks in the same bins
    are added.
  • Intensities of peaks are normalized by the sum of
    intensities of all peaks.
  • Smaller peaks are emphasized.
  • Peaks of very low intensities are removed.
  • Sum of square roots of the intensities is
    calculated.
  • Only significant peaks are retained and the rest
    are discarded.

7
Results of preprocessing
8
NoDupe Finding similarities
  • Scans are sorted based on the precursor m/z.
  • Spectral contrast angles are calculated for pairs
    of spectra within 3 m/z of each other.
  • ia peak intensity of spectrum A
  • ib peak intensity of spectrum B
  • ? spectral contrast angle
  • For identical spectra, ? 0
  • For completely dissimilar spectra, ? p / 2

9
Spectral contrast angles
10
Similarity angle cutoff is taken as 1.1
11
NoDupe Selecting representative spectra
  • Match count is for spectra is calculated.
  • Duplicates are detected based on the match count.
  • Ties are broken based on number of peaks removed
    during preprocessing.

12
Samples used
  • Gel band sample Protein complex from stable HEK
    293 cell.
  • Microtubule-Associated protein sample MAP
    purified from bovine brains
  • Rat hippocampus sample protein from rat brains.
  • Sample complexity varied from 18.3 to 34.6
    spectra/min.

13
Experimental process
  • LC separations were done for all three samples.
  • 2to3 algorithm was applied to remove spectral
    copies with incorrect charge state assignments.
  • They used NoDupe to reduce the number of spectra.

14
Observations
  • Large number of peaks removed.
  • For the peptide VAAPEEHPVLLTEAPLNPK,
  • Approximately 70 of the peaks in the spectra
    were removed. number of peaks and relative
    standard deviation diminished.
  • The relative standard deviation diminished from
    26 to 20.

15
Observations Clusters
  • Average cluster size among was found to be around
    4.
  • Spectral pairs were the most common kind of
    clusters.
  • Two-thirds of the spectra were not significantly
    similar to any other spectra.
  • High confidence peptides were lost when duplicate
    spectra were removed.

16
Identifications lost
  • 4 to 14 of the identifications were lost.
  • Without removing the duplicate spectra 5 to 19
    of the identifications were lost.
  • Angle is found to be 0.847.

17
For group size 2
  • Since there are only two spectra in this group,
    the most representative one is chosen.
  • Scan 491 is chosen as only 21 of the peaks are
    remaining as opposed to 24.
  • Since pairs are common, there might be a
    significant loss of protein identifications.

18
Lost spectra
Scan 4892 was not found to be similar enough by
NoDupe.
19
Duplicate spectra and peptides identified
20
Where it can be used
  • Grouping results in substantial savings in time.
  • Instead of finding the best sequence for each
    spectrum, it will find the spectrum that best
    matches each of the spectra in a group.
  • If the database is large, it is more effective in
    saving time.
  • A narrower mass window can be used.
  • Alleviates random matching.
  • Spectral libraries will be more effective if they
    contain representative spectra than randomly
    chosen ones.
  • Spectra that are in the same groups but receive
    different identifications by De Novo examination
    can be flagged.

21
Acknowledgments
  • The paper presented was Similarity among tandem
    mass spectra from proteomic experiments
    detection, similarity and utility David L.Tabb,
    Michael J.MacCoss, Christine C.Wu, Scott
    D.Anderson, and John R.Yates.
  • Thanks to Prof. Haixu Tang for guiding me.
Write a Comment
User Comments (0)
About PowerShow.com