Cancer Biomarkers, Insufficient Data, Multiple Explanations - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Cancer Biomarkers, Insufficient Data, Multiple Explanations

Description:

... of diseases such as ovarian cancer, breast cancer, prostate and bladder cancers. ... Each Oi denote positive or negative, for classification. ... – PowerPoint PPT presentation

Number of Views:70
Avg rating:3.0/5.0
Slides: 28
Provided by: bioinforma3
Category:

less

Transcript and Presenter's Notes

Title: Cancer Biomarkers, Insufficient Data, Multiple Explanations


1
Cancer Biomarkers, Insufficient Data, Multiple
Explanations
2
  • Do not trust everything you read

3
Side Topic Biomarkers from mass spec data
  • SELDI
  • Surface Enhanced Laser Desorption/Ionization. It
    combines chromatography with TOF-MS.
  • Advantages of SELDI technology
  • Uses small amounts (lt 1?l/ 500-1000 cells) of
    sample (biopsies, microdissected tissue).
  • Quickly obtain protein mapping from multiple
    samples at same conditions.
  • Ideal for discovering biomarkers quickly.

4
ProteinChip Arrays
5
SELDI Process
copy from http//www.bmskorea.co.kr/new01_21-1.htm
6
Protein mapping
C
C
N
N
C
7
Biomarker Discovery
  • Markers can be easily found by comparing protein
    maps.
  • SELDI is faster and more reproducible than 2D
    PAGE.
  • Has been being used to discover protein
    biomarkers of diseases such as ovarian cancer,
    breast cancer, prostate and bladder cancers.

(Normal)
 
(Cancer)
Modified from Ciphergen Web Site)
8
Inferring biomarkers
  • Conrads-Zhou-Petercoin-Liotta-Veenstra (Cancer
    diagnosis using proteomic patterns, Expert. Rev.
    Mold Diagn. 34(2003) 411-420) Genetic
    algorithm, found 10 proteins (out of 15000
    peaks!) as biomarker to separate about 100
    spectra.
  • Question
  • Are they the only separator?
  • Might they be by-products of other more important
    proteins?
  • Could some of them be noises/pollutants?

9
Multiple Decision List Experiment(Research work
with Jian Liu)
  • We answer the first question. Last 2 are not our
    business.
  • Decision List
  • If M1 then O1
  • else if M2 then O2
  • else if Mj-1 then Oj-1
  • else Oj
  • Each term Mi is a monomial of boolean variables.
    Each Oi denote positive or negative, for
    classification.
  • k-decision list Mis are monomials of size k. It
    is pac-learnable (Rivest, 1987)

10
Learning Algorithm for k-DL
  • k-decision list Algorithm
  • Stage i find a term that
  • Covers only the yet-uncovered cancer (or normal)
    set
  • The cover is largest
  • Add the term to decision list as i-th term,
    with Oicancer (or normal)
  • Repeat until, say, cancer set is fully covered.
    Then last Ojnormal.
  • Multiple k-decision list at each stage, find
    many terms that have the same coverage.

11
Thousands of Solutions
  • proteins nodes training of
    equivalent testing accuracy
  • used in 3-DL accuracy terms in
    each DL node max min
  • 3 4 (43,79)
    (1,3,7,7) (45/81) (44/81)
  • 5 5 (42,81)
    (1,5,1,24,24) (44/81) (43/81)
  • 8 5 (43,81)
    (1,9,1,29,92) (43/81) (42/81)
  • 10 5 (43,81)
    (1,9,3,46,175) (43/81) (42/81)
  • 15 6 (45,80)
    (1,10,5,344,198,575) (46/81) (43/71)
  • 20 5 (45,81)
    (1,10,5,121,833) (45/81) (43/73)
  • 25 4 (45,81)
    (1,14,2,1350) (46/81) (44/80)
  • 30 4 (45,81)
    (1,18,2,2047) (46/81) (44/80)
  • ---------------2-DL --------------------------
    --------------------------------------------------
    -
  • 50 4 (45/81)
    (1,4,8,703) (46/81) (42/78)
  • 100 4 (45/81)
    (1,4,60,2556) (46/81) (42/76)
  • WCX I Performance of multiple decision lists.
    Notation (x/y) (normal/cancer)
  • Example 14602556 gt ½ million 2-decision lists
    (involving 7.8 proteins each on average) are
    perfect for (4581) spectra, but most fail to
    classify 45814681 spectra perfectly (On
    average such a random hypothesis cuts off 3
    healthy womens ovary and leave one cancer
    undetected). Why should we trust that 10
    particular proteins that are perfect for 216
    spectra?
  • Need a new learning theory to deal with small
    amount of data, too many relevant attributes.

12
Excursion into Learning Theory
  • OED Induction is the process of inferring a
    general law or principle from the observations of
    particular instances''.
  • Science is induction from observed data to
    physical laws.
  • But, how?

13
Occams Razor
  • Commonly attributed to William of Ockham
    (1290--1349). It states Entities should not be
    multiplied beyond necessity.
  • Commonly explained as when have choices, choose
    the simplest theory.
  • Bertrand Russell It is vain to do with more
    what can be done with fewer.'
  • Newton (Principia) Natura enim simplex est, et
    rerum causis superfluis non luxuriat''.

14
Example. Inferring a DFA
  • A DFA accepts 1, 111, 11111, 1111111 and
    rejects 11, 1111, 111111. What is it?
  • There are actually infinitely many DFAs
    satisfying these data.
  • The first DFA makes a nontrivial inductive
    inference, the 2nd does not.

1
1
1
1
1
1
1
15
Exampe. History of Science
  • Maxwell's (1831-1879)'s equations say that (a)
    An oscillating magnetic field gives rise to an
    oscillating electric field (b) an oscillating
    electric field gives rise to an oscillating
    magnetic field. Item (a) was known from M.
    Faraday's experiments. However (b) is a
    theoretical inference by Maxwell and his
    aesthetic appreciation of simplicity. The
    existence of such electromagnetic waves was
    demonstrated by the experiments of H. Hertz in
    1888, 8 years after Maxwell's death, and this
    opened the new field of radio communication.
    Maxwell's theory is even relativistically
    invariant. This was long before Einsteins
    special relativity. As a matter of fact, it is
    even likely that Maxwell's theory influenced
    Einsteins 1905 paper on relativity which was
    actually titled On the electrodynamics of moving
    bodies'.
  • J. Kemeny, a former assistant to Einstein,
    explains the transition from the special theory
    to the general theory of relativity At the time,
    there were no new facts that failed to be
    explained by the special theory of relativity.
    Einstein was purely motivated by his conviction
    that the special theory was not the simplest
    theory which can explain all the observed facts.
    Reducing the number of variables obviously
    simplifies a theory. By the requirement of
    general covariance Einstein succeeded in
    replacing the previous gravitational mass' and
    inertial mass' by a single concept.
  • Double helix vs triple helix --- 1953, Watson
    Crick

16
Counter Example.
  • Once upon a time, there was a little girl named
    Emma. Emma had never eaten a banana, nor had she
    ever been on a train. One day she had to journey
    from New York to Pittsburgh by train. To relieve
    Emma's anxiety, her mother gave her a large bag
    of bananas. At Emma's first bite of her banana,
    the train plunged into a tunnel. At the second
    bite, the train broke into daylight again. At the
    third bite, Lo! into a tunnel the fourth bite,
    La! into daylight again. And so on all the way to
    Pittsburgh. Emma, being a bright little girl,
    told her grandpa at the station Every odd bite
    of a banana makes you blind every even bite puts
    things right again.' (N.R. Hanson, Perception
    Discovery)

17
PAC Learning (L. Valiant, 1983)
  • Fix a distribution for the sample space (P(v) for
    each v in sample space). A concept class C is
    pac-learnable (probably approximately correct
    learnable) iff there exists a learning algorithm
    A such that, for each f in C and e (0 lt e lt 1),
    algorithm A halts in a polynomial number of steps
    and examples, and outputs a concept h in C which
    satisfies the following. With probability at
    least 1- e,
  • Sf(v) ? h (v) P(v) lt e

18
Simplicity means understanding
  • We will prove that given a set of positive and
    negative data, any consistent concept of size
    reasonably' shorter than the size of data is an
    approximately' correct concept with high
    probability. That is, if one finds a shorter
    representation of data, then one learns. The
    shorter the conjecture is, the more efficiently
    it explains the data, hence the more precise the
    future prediction.
  • Let a lt 1, ß 1, m be the number of examples,
    and s be the length (in number of bits) of the
    smallest concept in C consistent with the
    examples. An Occam algorithm is a polynomial time
    algorithm which finds a hypothesis h in C
    consistent with the examples and satisfying
  • K(h) sßma

19
Occam Razor Theorem
  • Theorem. A concept class C is polynomially
    pac-learnable if there is an Occam algorithm for
    it. I.e. With probability gt1- e, Sf(v) ? h (v)
    P(v) lt e
  • Proof. Fix an error tolerance e (0 lt e lt1).
    Choose m such that
  • m max (2sß/e)1/(1- a) , 2/e
    log 1/e .
  • This is polynomial in s and 1/ e. Let m be
    as above. Let S be a set of r concepts, and let f
    be one of them.
  • Claim The probability that any concept h in S
    satisfies P(f ? h) e and is consistent with m
    independent examples of f is less than (1- e )m
    r.
  • Proof Let Eh be the event that hypothesis h
    agrees with all m examples of f. If P(h ? f )
    e, then h is a bad hypothesis. That is, h and f
    disagree with probability at least e on a random
    example. The set of bad hypotheses is denoted by
    B. Since the m examples of f are independent,
  • P( Eh ) (1- e )m .
  • Since there are at most r bad hypotheses,
  • P( Uh in B Eh) (1- e)m r.

20
Proof of the theorem continues
  • The postulated Occam algorithm finds a hypothesis
    of Kolmogorov complexity at most sßma. The number
    r of hypotheses of this complexity satisfies
  • log r sßma .
  • By assumption on m, r (1- e )-m/ 2 (Use e lt -
    log (1- e) lt e/(1-e) for 0 lt e lt1). Using the
    claim, the probability of producing a hypothesis
    with error larger than e is less than
  • (1 - e )m r (1- e )m/2 lt e.
  • The last inequality is by substituting m.

21
Inadequate data, Too many relevant attributes?
  • Data in biotechnology is often expensive or hard
    to get.
  • Pac-learning theory, MDL, SVM, Decision tree
    algorithm all need sufficient data.
  • Similar situation in expression arrays where
    too many attributes are relevant which ones to
    choose?

22
Epicurus Multiple Explanations
  • Greek philosopher of science Epicurus
    (342--270BC) proposed the Principle of Multiple
    Explanations If more than one theory is
    consistent with the observations, keep all
    theories. 1500 years before Occams razor!
  • There are also some things for which it is not
    enough to state a single cause, but several, of
    which one, however, is the case. Just as if you
    were to see the lifeless corpse of a man lying
    far away, it would be fitting to state all the
    causes of death in order that the single cause of
    this death may be stated. For you would not be
    able to establish conclusively that he died by
    the sword or of cold or of illness or perhaps
    by poison, but we know that there is something of
    this kind that happened to him.' Lucretius

23
Can the two theories be integrated?
  • When we do not have enough data, Epicurus said
    that we should just be happy to keep all the
    alternative consistent hypotheses, not selecting
    the simplest one. But how can such a
    philosophical idea be converted to concrete
    mathematics and learning algorithms?

24
A Theory of Learning with Insufficient Data
  • Definition. With the pac-learning notations, a
    concept class is polynomially Epicurus-learnable
    iff the learning algorithm always halts within
    time and number of examples p(f, 1/e), for some
    polynomial p, with a list of hypotheses of which
    one is probably approximately correct.
  • Definition Let a lt 1 and ß 1 be constants, m be
    the number of examples, and s be the length (in
    number of bits) of the smallest concept in C
    consistent with the examples. An Epicurus
    algorithm is a polynomial time algorithm which
    finds a collection of hypotheses h1, hk in C
    that
  • they are all consistent with the examples
  • they satisfy K(hi ) sß ma , for i1, , k,
    where K(x) is Kolmogorov complexity of x.
  • they are mutually error-independent with respect
    to the true hypothesis h, that is h1 ? h, , hk
    ? h are mutually independent, where hi ? h is the
    symmetric difference of the two concepts.

25
  • Theorem. A concept class C is polynomially
    Epicurus-learnable if there is an Epicurus
    algorithm outputting k hypotheses and using
  • m 1/k max (2sß/e)1/(1-a), 2/e log 1/e
  • examples, where 0ltelt1 is error tolerance.
  • This theorem gives a sample-size vs learnability
    tradeoff. When k1, then this becomes old Occams
    Razor theorem.
  • Admittedly, error-independence requirement is too
    strong to be practical.

26
  • Proof. Let m be as in theorem, C contain r
    concepts and f be one of them, and h1 hk be
    the k error-independent hypotheses from the
    Epicurus algorithm.
  • Claim. The probability that h1 hk in C satisfy
    P(f ? hi ) e and are consistent with m
    independent examples of f is less than (1- e )km
    Crk.
  • Proof of Claim. Let E1..k be the event that
    hypothesis h1 hk all agree with all m examples
    of f. If P(hi ? f ) e, for i1 k, then,
    since the m examples of f are independent and
    hi's are mutually f-error-independent,
  • P( E(h1 hk) ) (1- e )km
    .
  • Since there are at most Crk sets of bad
    hypothesis choices,
  • P( U E(h1 hk his are
    bad hypotheses) (1- e )km Crk.
  • The postulated Epicurus algorithm finds k
    consistent hypotheses of Kolmogorov complexity at
    most sßma The number r of hypotheses of this
    complexity satisfies log r sß ma . By
    assumption on m, Crk (1- e )-km/2 Using the
    claim, the probability all k hypotheses having
    error larger than e is less than
  • (1 - e )km Crk (1- e )km/2
  • Substituting m we find that the right-hand
    side is at most e.

27
  • When there is not enough data to assure that the
    Occam learning converges, do Epicurus learning
    and leave the final selection to the experts.
Write a Comment
User Comments (0)
About PowerShow.com