Title: Glycomics
1Glycomics
2What is Glycomics?
- Glycomic analyses seek to understand how a
collection of glycans relates to a particular
biological event. - Glycomes can far exceed proteomes and
transcriptomes with respect to complexity - some estimates have placed the vertebrate glycome
at more than one million discrete structures. - Many aspects of glycobiology can be understood
only with a systems-level analysis - glycomic changes during development and cancer
progression - many GBPs are oligomerized on cells and interact
with multivalent arrays of glycans on opposing
cells - multiple discrete glycan epitopes work in concert
to engage two cells or deliver a signal from one
cell to the other
3Virtual glycomes
- More than 250 glycosyltransferases have been
found encoded in the human genome as well as many
nucleotide sugar biosynthetic enzymes and Golgi
transporters - Expression patterns of many glycosyltransferases
have been determined in human and mouse tissues
using northern blots, quantitative PCR and
transcriptomic analyses - To construct virtual glycomes based on this
information is of limited value because the
combinatorial action of glycosyltransferases in
many competing biosynthetic pathways renders the
complete glycome very difficult to predict with
any accuracy - The glycome can change dramatically in response
to a subtle change in the cellular system - Glycan synthesis is not isolated in the cell
(other enzymes may compete for common
intermediates) - variations in dietary monosaccharides
4Tools to characterize the glycome
- Mass spectrometry
- Lectin and antibody arrays
- Cell and tissue analysis using lectins and
antibodies - Imaging the glycomes by metabolic and covalent
labeling - Glycan arrays
- Comparative glycomics
5Mass spectrometry
- Glycoprotein- or glycolipid-enriched sample are
prepared from cell lysates and subsequently
analyzed by (tandem) mass spectrometry - For glycoproteins, the N-glycans can be
selectively released enzymatically or chemically,
separated by HPLC (high-pressure liquid
chromatography) methods, and then sequenced. The
O-glycans are released chemically and sequenced
as well. - Glycolipids can often be directly sequenced
without separation of the lipid component. - Glycosaminoglycans are difficult to be analyzed
because of their large size. - Small fragments can be sequenced by mass
spectrometry in conjunction with enzymatic
digestion
6Glycan profiling for biomarker discovery
7Mass spectrometry pros and cons
- Advantages high-throughput, any given subtype
can be profiled at once - There is no method at present by which highly
complex samples possessing many glycan subtypes
can be analyzed in one mass spectrometry
experiment - Many MS experiments tend to partially or
completely destroy the sample or miss potentially
important modifications such as sulfation and
O-acetylation.
8Glycan sequencing using MS
- Database searching matching fragment ions in an
MS/MS spectrum - Glycofragment / GlycoSearchMS against SweetDB
- GlycosidIQ against GlycoSuite
- Cannot identify new glycan structures Number of
known glycan structure is limited - De novo sequencing
- Enumerating all possible structures STAT,
StrOligo, OSCAR - Only suitable for small glycans
- Can be used for MSn analysis
- Dynamic programming algorithm
- Linkage prediction from fragmentation ions
9Glycan MS/MS spectra
10De novo glycan sequencing problem
- Monosaccharide sequence
- Branching structure
- Linkages
Glycan MS/MS spectra
Tang. H. et.al. ISMB 2005
11Solution dynamic programming algorithm
Tang. H. et.al. ISMB 2005
12Using cross-ring ions to score paths
13Structural Characterizations of glycans
- The challenge isoforms with different linkages
(and sometimes different branching structures) - Occurrences of cross-ring fragment ions are not
sufficient to distinguish glycan isoforms - Solution scoring functions for distinguishing
different linkages of isoformal glycans
14Relative intensities of cross ring fragments are
different
Dextran (1-6)
Maltooligosaccharides (1-4)
15Ranking ion type based on intensity
Dextran_glc9_Oligosaccharide5
Maltooligosaccharides_glc10_Oligosaccharide6
16Two tails students t-tests based on ranks of
cross-ring fragment ions
Ion Type Probability 1,4 1,6
0,2A 0.448175
0,2X 0.001822 v v
0,3A 0.85771
0,3X 0.947922
0,4A 4.33E-25 v
0,4X 2.98E-25 v
1,4A 9.93E-07 v
1,4X 4.80E-04 v
1,5A 0.315954
1,5X 0.442356
2,4A 2.28E-27 v
2,4X 1.05E-08 v
2,5A 0.21049
2,5X 0.011821
3,5A 8.70E-10 v v
3,5X 0.270859
- 1,3A/1,3X are not considered since they do not
exist in either fragmentation of 1,6 linkage or
1,4 linkage - The significantly different cross-ring ion types
(in red) were used to distinguish 1,6/1,4
linkages - Checked ion types are used in later rank
comparison for linkage discrimination
17Rank based discriminate analysis
18Mass spectrometry for glycoproteomics
- Mapping sites of attachment of glycans to the
underlying protein scaffold (i.e., for
glycoproteomic analysis) - Slow reaction collision-induced dissociation
- Fragmentation at the glycosidic bonds mainly
- fast reaction photodissociation, electron
transfer dissociation (ETD) - Fragmentation at peptide bonds mainly
- High energy HCD (scan low m/z)
- Small oligosaccharide fragments
19Site-specific protein glycosylation anslysis
using mass spectrometry
- LC/MS (ion trap) identification of co-eluted
Cluster of peptide glycoforms (CPG), i.e.
glycopeptides with various glycans attached to
the same peptide backbone - LC/MS/MS identification of glycopeptides based
on their fragmentation pattern
20Identification of CPG from MS1 problem
formulation
Peptide backbone mass 1000
Masses of glycoforms 200, 250, 300, 350, 400,
450
Masses of peptide glycoforms (expected to
observe) 1200, 1250, 1300, 1350, 1400, 1450
Masses observed (other ions missing ions)
1100, 1200, 1210, 1250, 1290, 1300, 1310, 1370,
1380, 1400, 1430, 1450, 1490
21Identification of CPG from MS1 spectrum
convolution
Masses observed Y 1100, 1200, 1210, 1250,
1290, 1300, 1310, 1370, 1380, 1400, 1430, 1450,
1490
Masses of glycan forms X 200, 250, 300, 350,
400, 450
650, 700, 750, 800, 850, 900, 750, 800, 850,
900, 950,1000, 760, 810, 860, 910, 960,1010, 800,
850, 900, 950,1000, 1050, 840, 890, 940,
990,1040,1090, 850, 900, 950, 1000,1050,1100, 860,
910, 960, 1010,1060,1110, 920, 970, 1020,
1070,1120,1170, 930, 980, 1030, 1080,1130,1180,
950, 1000, 1050, 1100,1150,1200, 980, 1030, 1080,
1030,1180,1230, 1000, 1050, 1100,1150,1200,
1250, 1040, 1090, 1140,1190,1240, 1290
Peptide mass 1000, with highest multiplicity (5)
in Y?X.
22Implementing spectrum convolution practical
issues
- Incorporating peak intensity into the scoring of
spectrum convolution - The same glycopeptide may carry different
charges - There may be more than one clusters of sister
glycopeptides co-eluted in the same LC window
23Identification of individual glycopeptides from
their fragmentary (MS/MS) spectra
m/z
Finding the largest subset of peaks, such that
the mass difference between any consecutive peaks
corresponds to the mass of a monosaccharide.
24Challenges ofdirect glycoproteomic analysis
- Augmented ion complexity
- Number of ion species multiplied
- Instrument duty cycle
- Glycopeptides often correspond to abundant ions
(than peptides).
25Our approach
- We developed a new experiment protocol using
- Iterative (replicated) experiments (to overcome
the limit of duty cycle) - with time-segmented inclusion list (to replace
intensity-based ion selection)
26Motivation of dynamic analysis
- Intensity-based ion selection is inflexible.
- Many glycopeptides have lower-than-average
intensity in real sample. - Dynamic exclusion helps a little bit. But the
duty cycle is an inevitable limit.
27Motivation of Dynamic Analysis
28Our approach
- We select ions by their glycomic feature, not by
intensity. - We call it Targeted Data Dependent Acquisition,
or TDDA.
Pick labeled ions
29Our approach
- The time-segmented inclusion list is available
through Thermo LTQ-MS instrument. - TDDA is implemented using the time-segmented
inclusion list.
User interface of time-segmented inclusion list.
30Selecting ions for inclusion list
- Group ions into clusters of putative
micro-heterogeneities. - Insert ions of big clusters into inclusion list.
31Iterative experiments applying TDDA
Initial Experiment
Software Analysis
Experiment with Inclusion List
Inclusion List
No
Terminate?
Sample Preparation
The end
Yes
32CID vs. ETD supplementary fragmentations
J. M. Hoga, Journal of Proteome Research 2005 4
(2), 628-632
33Lectin and antibody arrays
34Lectin array pros and cons
- Cons providing global information about the
types of glycan epitopes that are present in the
sample but does not give any detailed structural
information, nor does the experiment provide
information regarding which proteins the glycans
are attached to. - Pros high-throughput platform (allows for rapid
comparison of many glycomes in search of global
changes that might motivate further mass
spectrometry studies)
35Imaging the glycome
36Specific expressions of glycans
Gagneux Varki 1999 Glycobiology 9747-755
37Glycan arrays
- Profiling the GBPs, e.g. plant GBPs, viral
antigens, GBPs in the innate and adaptive
immune system - DC-SIGN and DC-SIGNR C-type lectins
- expressed on dendritic cells and plays a key role
in adhesion of T cells as well as in the
recognition of pathogens such as HIV - sharing 77 sequence identity, but with distinct
ligand specificities - Influenza Virus Hemagglutinin (HA)
- The HA glycoprotein mediates host-cell
recognition - Human viral HA preferentially recognizes glycans
terminated by NeuAca2-6Gal, whereas avian HA
preferentially recognizes glycans containing
NeuAca2-3Gal - The upper airway epithelial cells (target) in
humans contain mainly NeuAca2-6Gal, whereas in
birds both the airways and intestine contain
mainly NeuAca2-3Gal linkages
38Hemagglutinin (viral lectin)
- The influenza virus hemagglutinin was the first
GBP isolated from a microorganism (1950) - 3D structure determined in 1981 (Wiley)
- complex structure with sialyllactose.
- Mainly bind to terminal residues, some can bind
to internal sequences found in linear or branched
glycans - The specificity of these interactions can be
highly selective. - For example, the human influenza viruses bind
primarily to cells containing Siaa2-6Gal
linkages, whereas other animal and bird influenza
viruses preferentially bind to Siaa2-3Gal
termini. - Influenza C, in contrast, binds preferentially to
glycoproteins containing terminal 9-O-acetylated
sialic acids. - Many other viruses (e.g., reovirus, rotavirus,
Sendai, and polyomavirus) also appear to use
sialic acids in specific linkages for infection. - Other viruses display glycosaminoglycan-binding
proteins that can bind to heparan sulfate
proteoglycans, often with high specificity for
certain sulfated sequences
39Glycan topology determines human adaptation of
avian H5N1 virus hemagglutinin (HA)
- Transmission and virulence of influenza viruses
is the binding of HA to sialylated glycans on the
epithelial cell surface - Transmission from birds to humans is believed to
be closely associated with the ability of the HA
to switch its preference from 2-3 sialylated
glycans (2-3) to 2-6 sialylated glycans (2-6),
which are extensively expressed in the human
upper respiratory epithelia - Glycan arrays for the glycan binding specificity
of wild-type and mutant H1, H3 and H5 HAs show
confounding results - The relationship between the HA glycan binding
specificity and transmission efficiency has been
demonstrated on the highly pathogenic and
virulent 1918 H1N1 viruses. - Switching the receptor binding specificity of the
highly transmissible and pathogenic human H1N1
(A/South Carolina/1/18 SC18) virus from 2-6 to
2-3 has produced a virus (AV18) not
transmissible. Although A/New York/1/18 (NY18)
H1N1 virus, which shows mixed 2-3/2-6 binding,
does not transmit efficiently, the A/Texas/36/91
(Tx91) H1N1 strainthat also binds to both 2-3
and 2-6transmits efficiently.
40Data mining of glycan array data
- HA binding array
- Extracted features
Chandrasekaran, et. al. Nature Biotechnology 26,
107 - 113 (2008)
41(No Transcript)
42Data mining of glycan array data
- Correlations between glycan features and the HA
binding to these glycans are given as logistical
regression classifiers - Variations around the trisaccharide 2-3 motif
primarily influence the differential 2-3 binding
of H1, H3 and H5 HAs. - The 2-6 classifier common to the human-adapted H1
and H3 HAs is consistent with its gain in ability
to bind long 2-6. Although the glycan binding of
wild-type and mutant H5N1 HAs is not supported by
the long 2-6 classifier, it is consistent with
both 2-3 and short 2-6 classifiers.
Chandrasekaran, et. al. Nature Biotechnology 26,
107 - 113 (2008)
43Combining with other evidences
- Predominant expression of 2-6 in the human upper
respiratory epithelium, and the expression of
long oligosaccharide branches with multiple
lactosamine repeats on the apical side of the
upper respiratory epithelia - Structure analysis of HA and glycans suggested
the existence of two subtype Has the cone-like
topology is characteristic of 2-3 as well as
short 2-6 glycans such as single lactosamine
branches, and the umbrella-like topology is
unique to 2-6 and is typically adopted by long
glycans with multiple repeating
lactosamine.units - Both SC18 and Mos99 HAs show substantial and
preferential binding to the apical side of the
tracheal tissue, in comparison to the deep lung
tissue.
Chandrasekaran, et. al. Nature Biotechnology 26,
107 - 113 (2008)
44Cone-like (left) and umbrella-like (right)
topologies of 2-3 and 2-6 siaylated glycans
binding to influenza viral HAs
Chandrasekaran, et. al. Nature Biotechnology 26,
107 - 113 (2008)