Title: Designing a high quality metabolomics experiment
1Designing a high quality metabolomics experiment
- Grier P Page Ph.D.Senior Statistical Geneticist
- RTI International
- Atlanta Office
- gpage_at_rti.org
- 770-407-4907
2Metabolomics is Powerful and Central
3 4Errors Errors Everywhere
5(No Transcript)
6UMSA Analysis
Day 1
Day 2
Insulin Resistant
Insulin Sensitive
7Primary consideration of good experimental
design
- Understand the strengths and weaknesses of each
step of the experiments. - Take these strengths and weaknesses into account
in your design.
8(No Transcript)
9From Drug Discov Today. 2005 Sep
110(17)1175-82.
10- State the Question and Articulate the Goals
11The Myth That Metabolomics does not need a
Hypothesis
- There always needs to be a biological question in
the experiment. If there is not even a question
dont bother. - The question could be nebulous What happens to
the metabolome of this tissue when I apply Drug
A. - The purpose of the question is to drive the
experimental design. - Make sure the samples answer the question Cause
vs. effect.
12(No Transcript)
13Design Issues
- Known sources of non-biological error (not
exhaustive) that must be addressed - Technician / post-doc
- Reagent lot
- Temperature
- Protocol
- Date
- Location
- Cage/ Field positions
14 15Biological replication is essential.
- Two types of replication
- Biological replication samples from different
individuals are analyzed - Technical replication same sample measured
repeatedly - Technical replicates allow only the effects of
measurement variability to be estimated and
reduced, whereas biological replicates allow this
to be done for both measurement variability and
biological differences between cases. Almost all
experiments that use statistical inference
require biological replication.
16How many replicates?
- Controlled experiments cell lines, mice, rats
8-12 per group. - Human studies discovery 20 per group
- For predictive models 100 per group, need
model building and validation sets - The more the better, always.
17- Experimental Conduct
- All experiments are subject to non-biological
variability that can confound any study
18Control Everything!
- Know what you are doing
- Practice!
- Practice!
19- What if you cant control or make all things
uniform
20What are Orthogonalization and Randomization ?
- Orthogonalization- spreading the biological
sources of error evenly across the non-biological
sources of error. - Maximally powerful for known sources of error.
- Randomization spear the biological sources of
error at random across the non-biological sources
of error. - Useful for controlling for unknown sources of
error
21Examples of Orthogonalization and Randomization ?
Randomize
The experiment
Orthogonalize
Order Sample
1 7
2 6
3 4
4 1
5 2
6 8
7 5
8 3
Sample Treatment Variety
1 1 1
2 1 2
3 1 1
4 1 2
5 2 1
6 2 2
7 2 1
8 2 2
Order Sample
1 1
2 2
3 5
4 6
5 8
6 7
7 4
8 3
22Statistical analyses have assumptions too
23Statistical analyses
- Supervised analyses linear models etc
- Assume IID (independently identically distibuted)
- Normality
- Sometimes can rely on central limit
- Weird variances
- Using fold change alone as a statistic alone is
not valid. - Shrinkage and or use of Bayes can be a good
thing. - False-discovery rate is a good alternative to
conventional multiple-testing approaches. - Pathway testing is desirable.
24Classification
- Supervised classification
- Supervised-classification procedures require
independent cross-validation. - See MAQC-II recommendations Nat Biotechnol. 2010
August 28(8) 827838. doi10.1038/nbt.1665. - Wholly separate model building and validation
stages. Can be 3 stage with multiple models
tested - Unsupervised classification
- Unsupervised classification should be validated
using resampling-based procedures.
25Unsupervised classification - continued
- Unsupervised analysis methods
- Cluster analysis
- Principle components
- Separability analysis
- All have assumptions and input parameters and
changing them results in very different answers
26(No Transcript)
27(No Transcript)
28- Sample size estimation for metabolomics studies
29There is strength in numbers power and sample
size .
- Unsupervised analyses
- Principal components, clustering, heat maps and
variants - These are actually data transformations or data
display rather than hypothesis testing, thus
unclear if sample size estimation is appropriate
or even possible. - Stability of clustering may be appropriate to
think about. Garge et al 2005 suggested 50
samples for any stability.
30Sample size in supervised experiments
- Supervised analyses
- Linear models and variants
- Methods are still evolving, but we suggest the
approach we developed for microarrays may be
appropriate for metabolomics (being evaluated)
31(No Transcript)
32(No Transcript)
33Metabolomics does not reveal everything and
different technologies show different things
34- Technology and detection evolves over time.
35Technologies are not perfect in agreement
36The human urine metabolome
37- Sample, Image and Data Quality Checking
38(No Transcript)
39(No Transcript)
40(No Transcript)
41(No Transcript)
42(No Transcript)
43Metabolite quality
- Still evolving field
- RTI is one of the Metabolomics Reference
Standards Synthesis Centers
44- Know your data - What should it look like
45These are OK
46These are not OK
47- One bad sample can contaminate an experiment
48Histogram of p-values
49Potentially Bad Data
50Histogram of p-values with bad data removed
51- Quality of Database, Bioinformatics and
Interpretative tools
52Understand what databases include, dont include,
and assumptions
- Just because a database says something does not
mean it is right. Read the evidence. - Databases are biased.
- Databases are incomplete
- Databases have lots of data
- Understand data before you use it
- Database are useful!
53Issues in the Annotation of Genes, proteins,
metabolites
54Annotation is inconsistent across sources
55Issues with pathway data
56(No Transcript)
57TCA cycle from Ingenuity
58TCA from GeneMAPP
59TCA cycle from Ingenuity
60Share Your Data
61Metabolomics WorkBench
- http//www.metabolomicsworkbench.org/
62MetaboLights
63Overshare your data and show work
- Practice compendium research to allow others to
replicate your work - Many high profile omic studies are not even
technically reproducible
64Use metabolomics databases
- Limited in the literature so far. Some work on
tissue and species metabolomes.
65Summary
- Design your experiment well
- Conduct your experiment well
- Control for non-biological sources of error
- Know what is good and bad quality data at each
stage including metabolite, image, data, and
annotation - If you are aware of these issues and control for
them highly powerful and reproducible metabolite
experimentation is possible. - Else you get garbage
- Share your data and use shared data
66References
- The MicroArray Quality Control (MAQC)-II study of
common practices for the development and
validation of microarray based predictive models.
Nat Biotechnol. 2010 August 28(8) 827838. - Microarray data analysis from disarray to
consolidation and consensus. Nat Rev Genet. 2006
Jan7(1)55-65. - Baggerly K. "Disclose all data in publications."
Nature. 2010 Sep 23467(7314)401. PMID 20864982
- Repeatability of published microarray gene
expression analyses. Nat Genet. 2009
Feb41(2)149-55 - A design and statistical perspective on
microarray gene expression studies in nutrition
the need for playful creativity and scientific
hard-mindedness. Nutrition. 2003
Nov-Dec19(11-12)997-1000. - 39 Steps. From Drug Discov Today. 2005 Sep
110(17)1175-82.
67If time allows
68RTI Regional Comprehensive Metabolomics Resource
Core(RTI RCMRC)
- Susan Sumner, PhD
- Director RTI RCMRC
- Discovery Sciences
- Proteomics and Metabolomics Programs
- RTI International
69Contact Information for the RTI RCMRC
- Susan C.J. Sumner, PhD
- Director RTI RCMRC
- Senior Scientist nanoSafety
- RTI International
- Discovery Sciences
- 3040 Cornwallis Drive
- Research Triangle Park
- North Carolina 27709
- ssumner_at_rti.org
- 919-541-7479 (office)
- 919-622-4456 (cell)
- Jason P. Burgess, PhD
- Program Coordinator, RTI RCMRC
- Associate Director, Discovery Sciences
- RTI International
- 3040 Cornwallis Drive
- Research Triangle Park
- North Carolina 27709
- jpb_at_rti.org
- 919-541-6700 (office)
70 MS and NMR Instruments at RTI and DHMRI
RTI DHMRI Mass Spectrometers (38) LC-MS
13 6 GC-MS 4 3 GC x GC-TOF-MS
1 1 ICP-MS 6 1 MALDI ToF/ToF
2 1 NMR (6) 2 4
71Some RTI Metabolomics Applications and Pilots
- Experience with adolescent and adult human
subject research, animal model and cell based
research, e.g., - Apoptosis- cells
- Drug induced liver injury- animal models
- in utero exposure to chemicals and fetal
imprinting- animal models - Dietary exposure and imprinting- animal models
- NAFLD - pediatric obesity microbiome
- Weight Loss- pediatric obesity
- Preterm delivery- human subjects
- Response to vaccine- human subjects
- Nicotine withdrawal- human subjects
- Colon cancer- human subjects
72Pilot and Feasibility Studies
- The aim of the pilot and feasibility program is
to foster collaborations and promote the use of
metabolomics. - Studies will be selected through an application
process. - Application involves abstract, description of
samples available (matrix type, volume, type and
duration of storage, sample processing, freeze
thaws, etc), description of phenotypes, and plan
for subsequent grant/contract submissions for
metabolomics analysis beyond initial pilot study. - Applications may also include technology
development. - Applications must agree to deposit data in DRCC,
coauthor publications, and submit joint
grant/contract proposals. - Deadlines being defined