Title: Canadian Bioinformatics Workshops
1Canadian Bioinformatics Workshops
22
Module Title of Module
3Module 1 Introduction to Metabolomics
- David Wishart
- Informatics and Statistics for Metabolomics
- June 15-16, 2015
4Learning Objectives
- To define metabolomics and the size of the
metabolome(s) - To appreciate the importance and potential
applications of metabolomics - To understand the operational principles of key
metabolomics technologies (LC, GC, MS and NMR) - To understand the difference between targeted and
untargeted metabolomics
5Schedule
6The Pyramid of Life
Metabolomics Proteomics Genomics
Metabolome
Environmental Influence
Physiological Influence
Proteome
Genome
7What is Metabolomics?
- Genomics - A field of life science research that
uses High Throughput (HT) technologies to
identify and/or characterize all the genes in a
given cell, tissue or organism (i.e. the genome). - Metabolomics - A field of life science research
that uses High Throughput (HT) technologies to
identify and/or characterize all the small
molecules or metabolites in a given cell, tissue
or organism (i.e. the metabolome).
8What is a Metabolite?
- Any organic molecule detectable in the body with
a MW lt 1500 Da - Includes peptides, oligonucleotides, sugars,
nucelosides, organic acids, ketones, aldehydes,
amines, amino acids, lipids, steroids, alkaloids,
foods, food additives, toxins, pollutants, drugs
and drug metabolites - Includes human microbial products
- Concentration gt detectable (1 pM)
9What is a Metabolome?
- The complete collection of small molecule
metabolites in a cell, organ, tissue or organism - Includes endogenous and exogenous molecules as
well as transient or even theoretical molecules - Defined by the detection technology
- Metabolome size is always ill-defined
10Different Metabolomes
All Mammals All Microbes All Plants
60,000 Chemicals
100,000 Chemicals
300,000 Chemicals
The Pyramid of Life
11Human Metabolomes (2015)
3670 (T3DB) 1240 (DrugBank) 28500
(FooDB) 1550 (DrugBank) 19700 (HMDB)
M mM ?M
nM pM
fM
12Theoretical Human Metabolomes
100,000 (Lipidome) 10,000 (Drug
metabolome) 100,000 (Food metabolome) 10,000
(Secondome)
M mM ?M
nM pM
fM
13Why is Metabolomics Important?
14Small Molecules Count
- gt95 of all diagnostic clinical assays test for
small molecules - 89 of all known drugs are small molecules
- 50 of all drugs are derived from pre-existing
metabolites - 30 of identified genetic disorders involve
diseases of small molecule metabolism - Small molecules serve as cofactors and signaling
molecules to 1000s of proteins
15Metabolites Are the Canaries of the Genome
A single base change can lead to a 10,000X change
in metabolite levels
16Metabolomics is More Time Sensitive Than Other
Omics
Response
Metabolomics
Response
Proteomics
Response
Genomics
Time
17Metabolism is Understood
18The Metabolome is Connected to all other Omes
Meta bolome
Proteome
Genome
The Pyramid of Life
19The Metabolome is Connected to All Other Omes
- Small molecules (i.e. AMP, CMP, GMP, TMP) are the
primary constituents of the genome
transcriptome - Small molecules (i.e. the 20 amino acids) are the
primary constituents of the proteome - Small molecules (i.e. lipids) give cells their
shape, form, integrity and structure - Small molecules (sugars, lipids, AAs, ATP) are
the source of all cellular energy - Small molecules serve as cofactors and signaling
molecules for both the proteome and the genome - The genome proteome largely evolved to catalyze
the chemistry of small molecules
20Metabolomics Enables Systems Biology
Bioinformatics
Meta bolomics
Cheminformatics
Systems Biology
Proteomics
Genomics
21Metabolomics Applications
- Genetic Disease Tests
- Nutritional Analysis
- Clinical Blood Analysis
- Clinical Urinalysis
- Cholesterol Testing
- Drug Compliance
- Transplant Monitoring
- MRS and CS imaging
- Toxicology Testing
- Clinical Trial Testing
- Fermentation Monitoring
- Food Beverage Tests
- Nutraceutical Analysis
- Drug Phenotyping
- Water Quality Testing
- Petrochemical Analysis
22Metabolomics Methods
23Metabolomics Workflow
Biological or Tissue Samples
Extraction
Biofluids or Extracts
Chemical Analysis
Data Analysis
24Comparing Omics Coverage
Metabolomics Proteomics Genomics
200 Chemicals
Completeness
5000 Proteins
22,000 Genes
The Pyramid of Life
25Why Metabolomics is Difficult
Metabolomics Proteomics Genomics
2x105 Chemicals
Chemical Diversity
20 Amino acids
4 Bases
The Pyramid of Life
26Metabolomics Technologies
- UPLC, HPLC
- CE/microfluidics
- LC-MS
- FT-MS
- QqQ-MS
- NMR spectroscopy
- X-ray crystallography
- GC-MS
- FTIR
27Chromatography
28Chromatography
- The separation of components in a mixture that
involves passing the mixture dissolved in a
"mobile phase" through a stationary phase, which
separates the analyte to be measured from other
molecules in the mixture based on differential
partitioning between the mobile and stationary
phases - Column, thin layer, liquid, gas, affinity, ion
exchange, size exclusion, reverse phase, normal
phase, gravity, high pressure
29High Pressure (Performance) Liquid Chromatography
- HPLC
- Developed in 1970s
- Uses high pressures (6000 psi) and smaller (5
mm), pressure-stable particles - Allows compounds to be detected at ppt (parts per
trillion) level - Allows separation of many types of polar and
nonpolar compounds
30HPLC Modalities
- Reversed phase for separation of non-polar
molecules (non-polar stationary phase, polar
mobile phase) - Normal phase for separation of non-polar
molecules (polar stationary phase,
non-polar/organic mobile phase) - HILIC hydrophilic interaction liquid
chromatography for separation of polar molecules
(polar stationary phase, mixed polar/nonpolar
mobile phase)
31HPLC Columns
32Reverse Phase Column
33HPLC Separation Efficiency
34HPLC Schematic
35Gradient HPLC Schematic
36HPLC of a Biological Mixture
37Gas Chromatography
38Gas Chromatography
- Involves a sample being vaporized to a gas and
injected into a column - Sample is transported through the column by an
inert gas mobile phase - Column has a liquid or polymer stationary phase
that is adsorbed to the surface of a metal tube - Columns are 1.5-10 m in length and 2-4 mm in
internal diameter - Samples are usually derivatized with TMS to make
them volatile
39TMS Derivatization
40Gas Chromatography
41GC-Columns
Polysiloxane
42Retention Time/Index
- Retention time (RT) is the time taken by an
analyte to pass through a column - RT is affected by compound, column (dimensions
and stationary phase), flow rate, pressure,
carrier, temp. - Comparing RT from a standard sample to an unknown
allows compound ID - Retention index (RI) is the retention time
normalized to the retention times of adjacently
eluting n-alkanes
43Compound Identification and Quantification
44GC-MS Chromatogram of a Biological Mixture
45Mass Spectrometry
- Analytical method to measure the molecular or
atomic weight of samples
46Typical Mass Spectrometer
47MS Principles
- Different compounds can be uniquely identified by
their mass
Butorphanol L-dopa Ethanol
CH3CH2OH
MW 327.1 MW 197.2 MW 46.1
48Mass Spectrometry
- For small organic molecules the MW can be
determined to within 1 ppm or 0.0001 which is
sufficiently accurate to confirm the molecular
formula from mass alone - For large biomolecules the MW can be routinely
determined within an accuracy of 0.002 (i.e.
within 1 Da for a 40 kD protein) - Recall 1 dalton 1 atomic mass unit (1 amu)
49Different Types of MS
- GC-MS - Gas Chromatography MS
- separates volatile compounds in gas column and
IDs by mass - LC-MS - Liquid Chromatography MS
- separates delicate compounds in HPLC column and
IDs by mass - MS-MS - Tandem Mass Spectrometry
- separates compound fragments by magnetic or
electric fields and IDs by mass fragment patterns
50Masses in MS
- Monoisotopic mass is the mass determined using
the masses of the most abundant isotopes - Average mass is the abundance weighted mass of
all isotopic components
51Isotopic Distributions
1H 99.9 12C 98.9 35Cl
68.1 2H 0.02 13C 1.1 37Cl
31.9
52Isotopic Distributions
1H 99.9 12C 98.9 35Cl
68.1 2H 0.02 13C 1.1 37Cl
31.9
100
32.1
6.6
2.1
0.06
0.00
m/z
53Mass Spec Principles
Sample
_
Detector
Ionizer
Mass Analyzer
54Typical Mass Spectrum
aspirin
55Typical Mass Spectrum
- Characterized by sharp, narrow peaks
- X-axis position indicates the m/z ratio of a
given ion (for singly charged ions this
corresponds to the mass of the ion) - Height of peak indicates the relative abundance
of a given ion (not reliable for quantitation) - Peak intensity indicates the ions ability to
desorb or fly (some fly better than others)
56Resolution Resolving Power
- Width of peak indicates the resolution of the MS
instrument - The better the resolution or resolving power, the
better the instrument and the better the mass
accuracy - Resolving power is defined as
- M is the mass number of the observed mass (DM) is
the difference between two masses that can be
separated
57Resolution in MS
58Resolution in MS
Low resolution Instrument (Ion trap)
2847
High resolution Instrument (TOF)
59Resolution/Resolving Power
MW(mono) 3482.7473 MW(ave) 3484 Blue DM/M
1000 Red DM/M 3000 Green DM/M 10000 Black
DM/M 30000
60Mass Spectrometer Schematic
61Different Ionization Methods
- Electron Ionization (EI - Hard method)
- Small molecules, 1-1000 Daltons, structure
- Chemical Ionization (CI Semi-hard)
- Small molecules, 1-1000 Daltons, simple spectra
- Electrospray Ionization (ESI - Soft)
- Small molecules, peptides, proteins, up to
200,000 Daltons - Matrix Assisted Laser Desorption (MALDI-Soft)
- Smallish molecules, peptides, proteins, DNA, up
to 500 kD
62(No Transcript)
63Electron Impact Ionization
- Sample introduced into instrument by heating it
until it evaporates - Gas phase sample is bombarded with electrons
coming from rhenium or tungsten filament (energy
70 eV) - Molecule is shattered into fragments (70 eV gtgt
5 eV bonds) - Fragments sent to mass analyzer
- Most commonly used in GC-MS
64EI Fragmentation of CH3OH
CH3OH
CH3OH
CH3OH
CH2OH
H
CH3OH
CH3
OH
CHOH
H
CH2OH
65Electron Impact MS of CH3OH
Molecular ion
EI Breaks up Molecules in Predictable Ways
66Soft Ionization Methods
337 nm UV laser
Fluid (no salt)
_
Gold tip needle
cyano-hydroxy cinnamic acid
MALDI
ESI
67Electrospray (Detail)
68Electrospray (Detail)
69Electrospray Ionization
- Sample dissolved in polar, volatile buffer (no
salts) and pumped through a stainless steel
capillary (70 - 150 mm) at a rate of 10-100
mL/min - Strong voltage (3-4 kV) applied at tip along with
flow of nebulizing gas causes the sample to
nebulize or aerosolize - Aerosol is directed through regions of higher
vacuum until droplets evaporate to near atomic
size (still carrying charges)
70Electrospray Ionization
5H2O/95CH3CN
95H2O/5CH3CN
100 V 1000 V 3000 V
71Electrospray Ionization
- Can be modified to nanospray system with flow lt
1 mL/min - Very sensitive technique, requires less than a
picomole of material - Strongly affected by salts detergents
- Positive ion mode measures (M H) (add formic
acid to solvent) - Negative ion mode measures (M - H)- (add ammonia
to solvent)
72Mass Spectrometer Schematic
73Different Types of Mass Analyzers
- Magnetic Sector Analyzer (MSA)
- High resolution, exact mass, original MA
- Quadrupole Analyzer (Q or Q)
- Low (1 amu) resolution, fast, cheap
- Time-of-Flight Analyzer (TOF)
- No upper m/z limit, high throughput
- Ion Cyclotron Resonance (FT-ICR)
- Highest resolution, exact mass, costly
74MS Mass Accuracy
Mass Accuracy
Type
0.1 - 1 ppm
FT-ICR-MS
0.5 - 1 ppm
Orbitrap
1 - 2 ppm
Magnetic Sector
3 - 5 ppm
TOF-MS
3 - 5 ppm
Q-TOF
3 - 5 ppm
Triple Quad
50-200 ppm
Linear IonTrap
(10 ppm in Ultra-Zoom)
75Mass Chromatograms
- Standard output from an LC-MS or GC-MS
experiment - X-axis is retention time, Y-axis is signal
intensity - Total Ion Current (TIC) chromatogram is summed
intensity across the entire range of masses being
detected at every point in the analysis - Base Peak chromatogram (BPC) is like a TIC but
displays only the most intense peak in each
spectrum - Extracted Ion chromatogram (EIC) contains one or
more analytes extracted from the TIC or BPC
76Mass Chromatograms of Biological Mixtures
Tomato Extract
Arabidopsis Extract
77NMR Spectroscopy
78Explaining NMR
79Principles of NMR
- Measures nuclear magnetism or changes in nuclear
magnetism in a molecule - NMR spectroscopy measures the absorption of light
(radio waves) due to changes in nuclear spin
orientation - NMR only occurs when a sample is in a strong
magnetic field - Different nuclei absorb at different energies
(frequencies)
80Protons (and other nucleons) Have Spin
Spin up Spin down
81Each Spinning Proton is Like a Mini-Magnet
Spin up Spin down
82Principles of NMR
N
N
hn
hn
S
S
Low Energy High Energy
83Bigger Magnets are Better
Increasing magnetic field strength
low frequency high frequency
84A Modern NMR Instrument
Radio Wave Transceiver
85NMR Magnet
86NMR Magnet Cross-Section
Sample Bore
Cryogens
Magnet Coil
Magnet Legs
Probe
87An NMR Probe
88NMR Sample Probe Coil
891H NMR Spectra Exhibit...
- Chemical Shifts (peaks at different frequencies
or ppm values) - Splitting Patterns (from spin coupling)
- Different Peak Intensities ( 1H)
90Chemical Shifts
- Key to the utility of NMR in chemistry
- Different 1H in different molecules exhibit
different absorption frequencies - Each compound can be defined by a unique pattern
of chemical shifts (a fingerprint) - Chemical shifts are mostly affected by
electronegativity of neighbouring atoms, bonds or
groups
91Characteristic Chemical Shifts
92Assigning Simple NMR Spectra
TMS
93Assigning Simple NMR Spectra
94NMR Spectra Need Fixin
Before
After
Baseline correction
Water suppression
Referencing
Shimming
Phasing
95NMR Spectra Need Fixin
- Chemical shift referencing (TMS, DSS)
- Calibrates/normalizes chemical shifts
- Shimming
- Fixes line shape to look Lorentzian
- Phasing
- Fixes line shape to look absorptive
- Water suppression/removal
- Removes large water signal
- Baseline correction
- Makes spectrum look flat not wobbly
96NMR Spectrum of a Biological Mixture
97Technology Sensitivity
Unknowns
4
LC-MS or DI-MS
3
GC-MS TOF
Metabolites or Features detected (Log10)
2
NMR
1
Knowns
GC-MS Quad
0
M mM ?M
nM pM
fM
Sensitivity or LDL
98Comparison
NMR (with cold probe) GC-MS DI-MS
Techniques
Metabolites Water-soluble (amino acids, organic acids, sugars) mainly water-soluble (some hydrophobic) Mainly hydrophobic (some water-soluble)
Types of samples Biofluids, plant, bacterial, animal tissue extracts, Food Biofluids, plant, bacterial, animal tissue extracts, Food Mainly biofluids
Sample Volume 100 µL (min) 30-50 µL (min) 10 µL
99Comparison
NMR GC-MS DI-MS
Sample prep time 30 -120 min/20 samples 30 -120 min/20 samples 3-4 h for 96 samples
Run time 10-90 min/sample 30-60 min/sample 7 min/sample
Data Analysis 30-60 min / sample 30-60 min / sample 1-2 h for 96 samples
Limit of Detection 5 µM 100 nM 5 nM
No. of metabolites 20-150 20-50 100-180
Overlapping Metabolites 10-15 10-15 10-15
Cross-checking 10-30 10-30 10-30
100Whats Possible
- NMR-based metabolomics (50-200 metabolites
identified/quantified, mM sensitivity) - GC-MS based metabolomics (70-120 metabolites
identified/quantified, ltmM sensitivity) - DI-MS based metabolomics (180 metabolites
identified/quantified, nM sensitivity) - LC-MS based metabolomics (300-500 metabolites
identified/quantified, nM sensitivity) - Lipidomics (3000 lipids identified and
semi-quantified, nM sensitivity) - Specialty phytochemical, nutrient, drug and
pesticide analysis (mostly HPLC, nM sensitivity)
1012 Routes to Metabolomics
Quantitative (Targeted) Methods
Chemometric (Profiling) Methods
102Profiling (Untargeted)
Data Reduction
Data Collection
Sample Prep
Metabolite Identification
103Quantitative (Targeted)
Sample Prep
Biological Interpretation
Data Reduction
Metabolite Identification Quantification
104From Spectra to Lists
105From Lists to Pathways
106From Pathways Lists to Models Biomarkers
107Key Informatics Challenges in Metabolomics
- Spectra -gt Lists
- Data integrity and quality
- Data alignment and normalization
- Data reduction and classification
- Assessment of significance
- Metabolite identification/quantification
- Lists -gt Pathways Biomarkers
- Pathway mapping and identification
- Biological interpretation