Some Two-Block Problems - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Some Two-Block Problems

Description:

Commonly we handle safety by collection of QSAR models predicting individual AEs. ... These dimensions drive Y, and Xk blocks. ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 30
Provided by: Dou9140
Category:

less

Transcript and Presenter's Notes

Title: Some Two-Block Problems


1
Some Two-Block Problems
  • Douglas M. Hawkins
  • NCSU ECCR NISS Feb 2007
  • (work with Despina Stefan.)

2
Disclaimer
  • No new results or material coming up.
  • I will present some things that are known, and
    some useful-looking extensions.
  • Extensions that look worth pursuing are for
    another day.

3
Two Conceptual Settings
  • Usual QSAR has
  • One dependent variable (activity)
  • A vector of predictors (structure)
  • and seeks a model connecting the two.
  • Variants include
  • A vector of dependent variables, and/or
  • Predictors break logically into blocks

4
Example first type
  • In drug discovery, concern is with efficacy (one
    measure) also safety (many measures.)
  • Safety endpoints constitute a vector of
    dependents to relate to the vector of predictors.
  • Commonly we handle safety by collection of QSAR
    models predicting individual AEs. But other
    approaches are possible

5
Example second type
  • Or we may have a single dependent, and predictors
    may break into blocks. eg
  • Molecular structure variables,
  • Microarray measures,
  • Proteomic measures,
  • Ames test toxicity.

6
First type in detail
  • In first setting, we have
  • m - component vector Y of dependents,
  • p - component vector X of predictors.
  • that we seek to relate.
  • Classical tool is canonical correlation analysis

7
Canonical Correlation
  • Consider classical setting psychometrics
  • X and Y are scores on two batteries of tests
    thought to measure innate ability.
  • Seek a common linking subspace. Find coefficient
    vectors a and b such that
  • aTX and bTY
  • are maximally correlated.

8
Canonical continued
  • Idea is that aTX, bTY capture a latent dimension
    conceptually like a factor analysis factor.
  • Having found maximizing pair a, b, go off at
    right angles and get another orthogonal
    maximizing pair. Do so repeatedly.
  • Finding k such significant coefficient vector
    pairs points to the data containing k dimensions
    in which X, Y co-vary. So CC is a dimension
    reduction method (DRM)

9
How do we fit CC?
  • Least-squares criterion leads to a closed-form
    eigenvalue problem.
  • Another potential approach use alternating fit
    algorithm
  • Get trial b.
  • Regress bTY on X to get a trial a.
  • Regress aTX on Y to get a new trial b.
  • Iterate to convergence

10
Algorithm continued
  • This gives first coefficient vector pair.
  • Deflate both X and Y.
  • Start all over and get second coefficient pair.
  • Continue until you have enough dimensions.
  • Hideously inefficient calculation compared to
    eigen approach.

11
What about outliers?
  • As usual, LS susceptible to outlier problems and
    so CC is also.
  • Alternating optimization algorithm allows choice
    of other outlier-resistant criteria. For example
    use L1 criterion, or trimmed least squares to get
    a robust CC.
  • I dont know anyone who has tried this idea, but
    it is straightforward to do.

12
Non-negative CC
  • Alternating optimization provides route to
    non-negative canonical correlation (NNCC).
  • Fit alternating regressions, as in sketch.
  • But restrict coefficients to be non-negative
    using standard inequality-constrained regression
    methods.
  • This leads to NNCC.

13
Robust NNCC
  • When fitting the alternating regressions, use
    outlier-resistant criterion.
  • For example L1 norm.
  • Marriage of L1 norm, non-negative coefficients
    leads to a linear program. This may prove to be
    surprisingly reasonable computationally.

14
And while we are at it
  • If we use L1 criterion, and non-negative
    coefficients, we can also impose an L1 penalty on
    coefficient vector.
  • This leads to a linear programming problem.
    Koenker/Portnoy paper suggests this can be solved
    in time competitive with L2 regression.
  • L1 penalty on coefficient vector, the LASSO, is
    known to be a route to automatic sparsity.

15
Detour Ridge and LASSO
  • In regression, penalizing L2 norm of coefficient
    vector gives ridge regression L1 gives the
    LASSO.
  • LASSO gives sparse coefficients ridge does not.
    Given a set of equivalent predictors LASSO
    keeps one and drops the rest ridge smoothes all
    their coefficients toward a common consensus
    figure.

16
CC is not widely used.
  • CC unhelpful in safety studies we care about
    incidence of headaches and of diarrhea, not about
    0.7headache-0.5diarrhea
  • But CC can be a valuable screen. Variables with
    large loadings apparently relate in some way to
    variables on the other side. Converse though is
    not true.
  • Extended robust and/or NN versions could be
    valuable tools.

17
PLS
  • PLS is also able to handle relating a vector Y
    and a vector X.
  • Computation is a lot faster than CC.
  • But also has an underlying LS criterion, so you
    are still at mercy of outliers,
  • and also gives you linear combinations of
    variables not easy to interpret.

18
Second Setting
  • Suppose we have predictors that divide into
    natural blocks X1, X2, Xk.
  • Obvious analysis method adjoins all predictors,
    fits QSAR in the usual way - nothing new.

19
Predictor Blocks
  • Or can form subsets of blocks (2k-1 possible) and
    fit QSAR on each subset of blocks. Use measures
    of additional information to see how much each
    block adds to predecessors. Helpful to know if
    microarray adds usefully to atom pairs.
  • Again, nothing earth-shattering. Exhaustive
    enumeration of blocks thinkable as typically have
    only a few blocks.

20
Different Way of Thinking
  • Return to CC.
  • Was not wonderfully helpful as modeling tool.
  • But might be successful as a DRM.

21
A DRM Model
  • Suppose there are a few latent dimensions.
    These dimensions drive Y, and Xk blocks.
  • Maybe we can recover latent dimensions from the
    X, and use these to predict Y.
  • Potential for huge reduction in standard errors
    of components if the model holds.
  • Principal component regression (PCR) is a special
    case of this, got when we have only one block.

22
Example
  • With two blocks of predictors, X1, X2
  • Do a CC of the two blocks.
  • Use these apparently-common dimensions as
    predictors of Y.

23
Is this like a PCA of adjoined X?
  • In principle, no. Getting under the hood of
    eigensolution to CC, step 1 is
  • Multistandardize transform X to WEX,
    transform Y to VFY where elements of W are
    uncorrelated and elements of V are uncorrelated.
  • Do SVD of cross-covariance matrix of W and V.
  • Multistandardization step flattens out principal
    components of both X and Y.

24
which means.
  • To come out of the CC as an important latent
    dimension, covarying within either X or Y is not
    enough the dimension needs to be common between
    the two blocks.
  • Thus CC of the two blocks is, in principle, a
    different DRM approach.

25
Three or more blocks
  • CC covers two predictor blocks. There are
    several ways to generalize to three or more
    blocks.
  • Recent U of MN PhD thesis by Despina Stefan
    discussed a number of them.
  • In it, she looked at generalized CC as a DRM
    method for use in QSAR.

26
Does it work?
  • She simulated a setting with 3 latent dimensions
    that determined both the blocks of X and the
    dependent Y.
  • Doing this DRM on the predictor blocks and
    regressing on the constructed variables was
    highly effective when there was appreciable noise
    in the relationships from the latent dimensions
    to the X and Y.

27
Real-data results
  • Limited testing on real data sets to date.
    Results have been OK, but not earth-shattering.
    We await the setting where there really are a few
    underlying latent dimensions.

28
And non-negative?
  • These results were in sign-unconstrained setting.
    It is reasonable to expect them to carry over to
    non-negative equivalents. NN variants of the
    multi-block approach as a DRM should be
    straightforward and potentially powerful QSAR
    tools.

29
Wrapup
  • The first setting, vector Y, is familiar from the
    early days of psychometrics. Robust and/or NN
    variants seem ripe for picking.
  • Second setting, multiple predictor blocks, is
    gaining relevance. Robust and/or NN variants
    seem straightforward to develop.
  • Work on unrestricted formulations indicates
    potential for specialized DRM approaches this
    should carry over.
Write a Comment
User Comments (0)
About PowerShow.com