Metabolomics a Promising - PowerPoint PPT Presentation

About This Presentation
Title:

Metabolomics a Promising

Description:

Metabolomics a Promising omics Science By Susan Simmons University of North Carolina Wilmington Collaborators Dr. David Banks, Duke Dr. Chris Beecher, University ... – PowerPoint PPT presentation

Number of Views:154
Avg rating:3.0/5.0
Slides: 42
Provided by: peopleUnc73
Learn more at: http://people.uncw.edu
Category:

less

Transcript and Presenter's Notes

Title: Metabolomics a Promising


1
Metabolomics a Promising omics Science
By Susan Simmons University of North Carolina
Wilmington
2
Collaborators
  • Dr. David Banks, Duke
  • Dr. Chris Beecher, University of Michigan
  • Dr. Xiaodong Lin, University of Cincinnati
  • Dr. Young Truong, UNC
  • Dr. Jackie Hughes-Oliver, NC State
  • Dr. Stanley Young, NISS
  • Dr. Ann Stapleton, UNCW Biology
  • Dr. Robert Simmons, MD

3
What is Metabolomics?
  • The word metabolome was first used less than a
    decade ago (1998) and referred to all low
    molecular mass compounds synthesized and
    modified by a living cell or organism
    (Villas-Boas, 2007)
  • The complete human metabolome consists of
    endogenous (1800) and exogenous metabolites
    (MANY!!)
  • Human Metabolome Project

4
(No Transcript)
5
Fluorene degradation - Reference pathway
(www.genome.jp/KEGGKyoto Encyclopedia of Genes
and Genomes)
6
Mass Distribution of Compounds in the Human
Metabolome
7
History of Metabolomics
  • Machinery to detect metabolites have existed
    since the late 1960s
  • First paper appeared in 1971 (Robinson and
    Pauling)
  • First paper involving metabolomics came about
    in the late 1990s

8
Why Metabolomics can be promising
  • Easy to use screening for disease
  • Assist in identifying gene function
  • Drug discovery
  • Assessment of toxicity (especially liver
    toxicity) in new drugs.
  • Nutrigenomics and diet strategies

9
Genomics,Proteomics and Metabolomics
10
The emerging science of Metabolomics
11
Metabolomics
12
Biochemical Profile Map to Metabolic Pathways
13
Data Collection and Measurement Issues
  • To obtain data, a tissue sample is taken from a
    patient. Then
  • The sample is prepped and put onto wells on a
    silicon plate.
  • Each wells aliquot is subjected to gas and/or
    liquid chromatography.
  • After separation, the sample goes to a mass
    spectrometer.

14
(No Transcript)
15
Data Collection and Measurement Issues
  • The sample prep involves stabilizing the sample,
    adding spiked-in calibrants, and creating
    multiple aliquots (some are frozen) for QC
    purposes. This is roboticized.
  • Sources of error in this step include
  • within-subject variation
  • within-tissue variation
  • contamination by cleaning solvents
  • calibrant uncertainty
  • evaporation of volatiles.

16
Data Collection and Measurement Issues
  • The result of this is a set of m/z ratios and
    timestamps for each ion, which can be viewed as a
    2-D histogram in the m/z x time plane.
  • One now estimates the amount of each metabolite.
    This entails normalization, which also introduces
    error.
  • The caveats pointed out in Baggerley et al.
    (Proteomics, 2003) apply.

17
Data Collection and Measurement Issues
  • Baseline correction
  • Alignment
  • Estimating quantity of specific metabolites.

18
(No Transcript)
19
Data Collection and Measurement Issues
  • Let z be the vector of raw data, and let x be the
    estimates. Then the measurement equation is
  • G(z) x µ e
  • where µ is the vector of unknown true values
    and e is decomposable into separate components.
  • For metabolite i, the estimate Xi is
  • gi(z) lnS wij ??sm(z) c(m,t)dm dt.

20
Data Collection and Measurement Issues
  • The law of propagation of error (this is
    essentially the delta method) says that the
    variance in X is about
  • Sni1 (?g /? zi)2 Varzi
  • Si?k 2 (?g/?zi)(?g/?zk) Covzi, zk
  • The weights depend upon the values of the spiked
    in calibrants, so this gets complicated.

21
Data Collection and Measurement Issues
  • Cross-platform experiments are also crucial for
    medical use. This leads to key comparison
    designs. Here the same sample (or aliquots of a
    standard solution or sample) are sent to multiple
    labs. Each lab produces its spectrogram.
  • It is impossible to decide which lab is best, but
    one can estimate how to adjust for interlab
    differences.

22
Data Collection and Measurement Issues
  • The Mandel bundle-of-lines model is what we
    suggest for interlaboratory comparisons. This
    assumes
  • Xik ai ßi ?k eik
  • where Xik is the estimate at lab i for
    metabolite k, ?k is the unknown true quantity of
    metabolite k, and
  • eik N(0,sik2).

23
Data Collection and Measurement Issues
  • To solve the equations given values from the
    labs, one must impose constraints. A Bayesian
    can put priors on the laboratory coefficients and
    the error variance.
  • Metabolomics needs a multivariate version, with
    models for the rates at which compounds
    volatilize.

24
(No Transcript)
25
(No Transcript)
26
Statistical issues
  • Many missing values!!!
  • Outliers
  • Distribution of metabolites are not normally
    distributed
  • nltp
  • Correlated metabolites

27
Statistical Issues
  • PCA or ICA
  • Partial Least Squares
  • Clustering
  • Random Forest, SVM
  • rSVD

28
Statistical issues
  • Dealing with missing values
  • Replacing missing values by 0s is not
    necessarily a good idea. Not truly 0.
  • Minimum, half-min, uniform(0, minimum)
  • Random forest imputation
  • Observing conditional distribution (Dr. Young
    Truong at UNC)

29
Statistical Issues
  • Prediction and Classification
  • Partial least squares
  • Random Forest
  • SVM
  • Neural networks

30
Statistical Issues
  • Identifying relationships
  • MDS
  • Clustering
  • rSVD (PowerMV from NISS)

31
ALS metabolomic data set
  • We had abundance data on 317 metabolites from 63
    subjects. Of these, 32 were healthy, 22 had ALS
    but were not on medication, and 9 had ALS and
    were taking medication.
  • The goal was to classify the two ALS groups and
    the healthy group.
  • Here pgtn. Also, some abundances were below
    detectability.

32
ALS metabolomic data set
  • Using the Breiman-Cutler code for Random Forests,
    the out-of-bag error rate was 7.94 29 of the
    ALS patients and 29 of the healthy patients were
    correctly classified.
  • 20 of the 317 metabolites were important in the
    classification, and three were dominant.
  • RF can detect outliers via proximity scores.
    There were four such.

33
ALS Metabolomic data set
  • Several support vector machine approaches were
    tried on this data
  • Linear SVM
  • Polynomial SVM
  • Gaussian SVM
  • L1 SVM (Bradley and Mangasarian, 1998)
  • SCAD SVM (Fan and Li, 2000)
  • The SCAD SVM had the best loo error rate, 14.3.

34
ALS Metabolomic data set
  • Robust SVD (Liu et al., 2003) is used to
    simultaneously cluster patients (rows) and
    metabolites (columns). Given the patient by
    metabolite matrix X, one writes
  • Xik ri ck eik
  • where ri and ck are row and column effects.
    Then one can sort the array by the effect
    magnitudes.

35
ALS metabolomic data set
  • To do a rSVD use alternating L1 regression,
    without an intercept, to estimate the row and
    column effects. First fit the row effect as a
    function of the column effect, and then reverse.
    Robustness stems from not using OLS.
  • Doing similar work on the residuals gives the
    second singular value solution.

36
(No Transcript)
37
NCI data set
  • NCI 60 cell lines
  • 9 cancer types breast, CNS, colon, melanoma,
    renal, leukemia, prostate, ovarian, lung
  • GC-LS
  • Melanoma vs CNS (8 cell lines for melanoma and 6
    cell lines for CNS)

38
Variable Importance using RF
39
Component 1 versus 2
40
Useful websites
  • Deconvolution of peaks, software AMDIS
    (http//chemdata.nist.gov/massspc/amdis NIST,
    Gaithersburg, USA)
  • Human Metabolome database (www.hmdb.ca)
  • KEGG (www.genome.jp/kegg)
  • http//www.niss.org/PowerMV/
  • Many, many others

41
Concluding Remarks
  • Many interesting statistical issues still need to
    be addressed.
  • Measurement issues and interlaboratory
    differences need to be properly addressed.
  • Statistical issues in analyzing metabolomic data
    still remain an interesting challenge.
  • Metabolomics is an important part in
    understanding systems biology.
Write a Comment
User Comments (0)
About PowerShow.com