Diapositive 1 - PowerPoint PPT Presentation

About This Presentation
Title:

Diapositive 1

Description:

The metabonomics Identification of biomarkers: Experimental data: Biologists have a pool of n rats with a control pathological state. – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 36
Provided by: Rjan71
Category:

less

Transcript and Presenter's Notes

Title: Diapositive 1


1
Combination of Independent Component Analysis
and statistical modeling

for the identification of metabonomic
biomarkers in 1H-NMR spectroscopy Réjane
Rousseau (Institut de Statistique, UCL, Belgium)
2
The metabonomics
specific region biomarker
Biofluid (Urine)
1H-NMR spectroscopy

Whithout contact
METABOLITES TISSUES Organs
After contact
Frequency domain
Spectral alterations Altered metabolites
detection
  • One molecule several peaks
  • ( 1 to 3 ) with specific positions
  • in the frequency domain.
  • The concentration of a molecule is
  • proportional to the area under the
  • curve in its peaks.

Identification of biomarkers   which part of
the spectrum to examine? 
  • How?
  • In an experimental database
  • with a methodology based on Principal Component
    Analysis (PCA)

Objective propose a methodology combining ICA
and statistical modeling
3
Identification of biomarkers
  • Experimental data
  • Biologists have a pool of n rats with a
    control pathological state.
  • They collect n samples of urine .
  • The H-NMR produces n spectra of m values
    Spectral data
  • Each sample or spectrum is described by l
    variables in a matrix of design Y.
  • One of these variables describes the
    characteristic related to the biomarkers yk
  • Biomarker identification statistical methods to
    answer to the question
  •  In these multivariate data,
    which are the most altered variables xj

Spectral data X(n x m)
Design data Y(n x l)
y1 y2 yl
13 320
11 200
11 100
12 270
y1 age of the rat yky2 severity
of diabetis
xj
4
Example controlled data
  • Advantage of controlled data
  • we know the spectral regions that should be
    identified as biomarkers.
  • The controlled data
  • 28 spectra of 600 points X(28 x 600)
  • Each spectrum a sample of urine
  • a chosen concentration of Citrate
  • a
    chosen concentration of Hippurate
  • X(28x600) Y (28x2) y1
    concentration of citrate

  • y2 concentration of hippurate
  • We need a biomarker to detect changes of the
    level of citrate described by y1
  •  Which are the spectral regions xj the most
    altered when the y1 changes?
  • Spectral regions corresponding to Citrate
    the biomarkers to identify.

hippurate (y2)
citrate (y1)
5
spectrum 3000 points
A spectrum of 600 values xj with ? xj 1

Urine Citrate Hippurate

Natural urine
Hippurate y2

Hippurate

Citrate y1
14 mixtures in 2 replications 28 samples
Citrate
The biomarkers to identify. spectral regions
corresponding to Citrate
6
NEW
USUAL
I. Reduction of the dimension PCA
I. Reduction of the dimension ICA
  • XTC SAT
  • Components
  • are independent
  • with a biological meaning
  • Examination of the ALL components
  • to visualize unconnected
  • molecules in samples

XC TP
  • Principal components are
  • uncorrelated
  • in the direction of maximum of variance
  • Examination of the 2 first components

Score plot Loadings
L1
L2
II. Biomarker discovery through Statistical
modelling
ex Citrate plays an important role
Comparison of the intensities of biomarkers
between spectra from ? conditions
Identificationof biomarker
Identificationof biomarker
This is only powerful if the biological
question is related to the highest
variance in the dataset!
7
The proposed methodology
  • Resulting components
  • meaningfull component
  • - with some advantages over
  • the principal components
  • Part I Dimension reduction
  • with ICA on the spectral data
  • XTC S.AT

Part II Biomarker discovery through
statistical modelling
Identification of biomarker
Comparison of the values in biomarker between
spectra from different conditions
8
What is Independent component analysis (ICA)?
  • The idea
  • Each observed vector of data is a linear
    combination of unknown independent (not only
    linearly independent) components
  • The ICA provides the independent components
    (sources, sk) which have created a vector of data
    and the corresponding mixing weights aki.
  • How do we estimate the sources?
  • with linear transformations of observed
    signals that maximize the independence of the
    sources.
  • How do we evaluate this property of independence?
  • Using the Central Limit Theorem (), the
    independence of sources components can be reflect
  • by non-gaussianity.
  • Solving the ICA problem consists of
    finding a demixing matrix which maximises the
    non-gaussianity of the estimated sources under
    the constraint that their variances are constant.
  • Fast-ICA algorithm
  • - uses an objective function related to
    negentropy
  • - uses fixed-point iteration scheme.
  • almost any measured quantity which depends on
    several underlying independent factors has a
    Gaussian PDF

9
I. Dimension reduction by ICA
X (nxm) n spectra defined by m variables
ex (28x600)
Transposition
XT (mxn)
Centering
mixture and sj have zero mean
XTC S.AT
XTC (mxn)
  • Each spectrum is
  • a weighted sum of the
  • independent spectral expressions
  • which each one can correspond to
  • an independent (composite) metabolite contained
    in
  • the studied sample.
  • (aT , weight ? quantity)

Whitening PCA
  1. we obtain uncorrelated scores with unit variance
    ? demixing matrix is orthogonal
  2. possibility to discard irrelevant scores chose
    the number of sources to estimate

T (mxq) XTC. P
ICA
S (mxq) XTC. P.W XTC. A
10
Example I. Dimension reduction by ICA
XTC S.AT
XTC(600 x 28) S (600 x 6)
AT (6x28)
s1 s2 s3 s4 s5 s6
xTC1 xTC28
s1,1 s1,6

sij


s600,1
at1,1 at1,28




at6,1
at1 at2 at3 at4 at5 at6
........ .... ....
Urine citrate hippurate
11
S (600 x 6)
AT
28 spectra

Natural urine
aTi,8
Citrate
Hippurate
12
Note Comparison with the usual PCA
  • Similarities projection methods linearly
    decomposing multi-dimensional into components.
  • Differences
  • The number of sources, q, has to be fixed
  • Sources are not naturally sorted according to
    their importances
  • The independence condition the biggest
    advantage of the ICA
  • - independent components are more
    meaningful than uncorrelated components
  • - more suitable for our question in which
    the component of interest are not always in the
    direction
  • with the maximum variance .

PCA
ICA
1
2
Natural urine
13
PCA
ICA
Hippurate Citrate
Natural urine
Loading 1
s1
Citrate
Loading 2
s2
Hippurate Citrate
Hippurate
Loading 3
s3
PC2
aT3
PC1
aT2
14
The proposed methodology
  • Part I Dimension reduction
  • with ICA on the spectral data
  • XTC S.AT

q sources representing the spectra of
independent (unrelated) composite metabolites
contained in the samples.
Part II Biomarker discovery
through statistical modeling - on
the mixing matrix AT - with
covariates chosen in the design matrix
Identification of biomarkers
Comparison of the intensities in biomarkers
between spectra from different conditions
15
PART II Biomarker discovery with statistical
model
  • The idea Among the q recovered sj , we
    suppose that some sources
  • present biomarker regions for a chosen factor yk
  • are interpretable as the spectra of pure or
    composite independent
  • metabolite which has a concentration in
    the samples influenced by a chosen factor yk
  • have weights influenced by a chosen factor yk

AT (q x n)
  • Modelisation of the relation between the weight
    vector and the design variables

16
PART II
  • Part I ICA on the spectral data XTC S.AT

Part II Biomarker discovery through statistical
modelling
Step 1 Fit a linear model on AT
  • relation between the weight vector and the
    covariates in design variables
  • different models.

Step 2 Biomarker identification
Step 3 comparison of the intensities
in biomarkers between spectra
from different conditions
  • apply statistical tests on the parameters
  • of the models
  • selection of sources with
  • significant effects.
  • prediction of mixing weights by the model
  • reconstruction with biomarkers sources
  • comparison between factor levels.

17
Step 1 Fit a model on AT
  • The design matrix Y is rewritten into 2 separate
    matrixs
  • Z1 the (n x p1) incidence matrix for the p1
    covariates with fixed effects
  • Z2 the (n x p2) incidence matrix for the p2
    covariates with random effects
  • For each of the q recovered sj , we assume a
    linear relation between its vector of weights and
    the design variables

  • aj Z1 ßj Z2
    ?j ej
  • Models with only fixed effects covariates
    aj Z1 ß j ej
  • Case 1 categorical covariates ANOVA
  • ? ex1 biomarker to discriminate 2 groups of
    subjects disease sane.
  • ex2 biomarker to discriminate 3 groups of
    subjects disease1, disease2 sane
  • Case 2 quantitative covariates linear
    regression

18
Step 1 Fit a model example
  • For each of the q 6 recovered sj, we construct
    a multiple linear regression model
  • with 2 quantitative covariates ( p 3) and
    no interaction
  • aj ßj0 ßj1 y1 ßj2 y2 ej
  • with ßj0 the intercept
  • y1 the citrate concentration in
    mg (quantitative)
  • y2 the hippurate concentration in
    mg (quantitative)
  • ß1 the effect of citrate on the
    mean aj for a fixed value of hippurate
  • ß2 the effect of hippurate on the
    mean aj for a fixed value of citrate
  • ej the vector of independent
    random error N (0,s2)
  • For each of the q recovered sj, the fitted model
    by least square technique is
  • âj bj0 bj1 y1 bj2 y2
  • In this example, we want to identify biomarkers
    for the concentration of citrate.
  • The covariate of interest yk y1

19
Step 1 Fit a model example

s3Hippurate
s2 Citrate
a2
a3
Citrate (y1)
Citrate (y1)
hippurate (y2)
hippurate (y2)
(y1)
(y1)
20
Step 2 Biomarker identification
  • Goal
  • Among the q sources , we want to select the ones
    presenting a significant effect of the
  • chosen covariate yk on their weights.
  • These discriminant sources represent the
    spectrum of an independent metabolite
  • with a concentration depending on the chosen
    covariate biomarkers.
  • For each source sj, test the significance of the
    parameter ßjk of the covariate of interest yk
  • (ex research of biomarkers for the dose
    of citrate y1, we test each of the 6 ßj1)
  • H0 ßjk 0 vs H1 ßjk ? 0
  • compute the following statistic tj bjk /
    s(bjk) t(n-p)
  • take the corresponding p-value pj P( t (n-p)
    ? tj )
  • We are in a multiple tests situation
  • the selection of a significant set of r
    coefficients ßjk based on q pj obtained from q
    individual tests.
  • ? Bonferroni correction select, in a (m
    x r) matrix S, the r sources with pj lt 0.05/q

21
Step 2 Biomarker identification example
We research of biomarkers for the dose of
citrate y1yK ? we test each of the 6 ßj1
P-values
a 0.05/6
Sources
9.18 x 10-13
1.84x10-15
2.86 x 10-31
22
Step 3 Comparison of the intensities in
biomarkers
  • Goal comparison of the effects on the biomarker
    caused by ? changes in yk.
  • Choose 3 or more values of yk
  • yk1 a first value of reference of yk
  • yk2 a new value of interest of yk
  • yk3 a second new value of interest of yk
  • Compute
  • The effect on the biomarker of the change of yk
    from yk1 to yk2
  • C1 S ßk (yk2- yk1 )
  • The effect on the biomarker of the change of yk
    from yk1 to yk3
  • C2 S ßk (yk3- yk1 )

23
Step 3 example
yk1 yk2 yk3 yk4
Citrate yk
24
Conclusions
  • With the presented methodology combining ICA with
    statistical modeling,
  • we visualize the independent metabolites
    contained in the studied biofluid (through the
    sources) and their quantity (through the mixing
    weights)
  • we identify biomarkers or spectral regions
    changing according to a chosen factor by a
    selection of source.
  • we compare the effects on this spectral
    biomarkers caused by different changes of this
    factor.
  • In comparison with the PCA, ICA
  • gives more biologically meaningful and natural
    representations of this data.

25
  • Thank you
    for your attention

26
Example2 the data
  • 18 spectra of 600 values
  • 1 characteristic in Y
  • X(18x600) Y (18x1)
  • y1 disease
    group of the rat (qualitative)
  • We want biomarkers for group of disease described
    in y1.
  • ? a model with
    qualitative covariates

Group 1 disease 1 Group 2 disease 2 Group 3 no
disease
27
Example2 Part I. Dimension reduction by ICA
XTC S.AT
S (600 x 5)
AT (5x18)
28
Example 2 Part II biomarkers discovery through
statistical modeling
  • Step 1 Fit a model on AT Models with
    only a categorical covariate with fixed effects
    ANOVA I

  • aj Z1 ß j ej
  • Step 2 Biomarker identification
  • For each of the q recovered sj, test the effect
    of y1 ? Fj statistics? pj
  • Bonferroni correction select, in a (m x r)
    matrix S, the r sources with pj lt 0.05/q

0.009797431
0.0002412604
0.005710213
29
Step 3 Comparison of the intensities in
biomarkers
  • Goal comparison of the effects on the biomarker
    caused by ? changes in yk.
  • Choose 3 or more values of yk
  • yk1 a first value of reference of yk
  • yk2 a new value of interest of yk
  • yk3 a second new value of interest of yk
  • Compute
  • The effect on the biomarker of the change of yk
    from yk1 to yk2
  • C1 S ßk (yk2- yk1 )
  • The effect on the biomarker of the change of yk
    from yk1 to yk3
  • C2 S ßk (yk3- yk1 )

30
Step 3 Comparison of the intensities in
biomarkers
Goal comparison of the effects on the biomarker
caused by the changes of group.
Citrate
31
  • Others slides

32
Example 1 Step 4 Comparison of the intensities
in biomarkers
  • For a first chosen value of interest of yk (not
    necessary observed) yk0
  • Choose a value of reference for the other
    factors y0?k
  • ? Z01 vector of values (1,
    yk0 , y0?k)
  • For each of the r source selected in S, use the
    model to predict weights for the biomarkers
  • âj (yk0) bj Z01
  • ? â (yk0) a vector
    of r new weights
  • Reconstruction of the values to expect in the
    biomarkers

  • Sâ(yk0)

33
(No Transcript)
34
Example 2 the reconstructed spectra
35
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com