Nonlinear Knowledge in Kernel Approximation - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Nonlinear Knowledge in Kernel Approximation

Description:

Primary objective: Incorporate prior knowledge over completely ... Secondary objective: Achieve transparency of the prior knowledge for practical applications ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 32
Provided by: tedw5
Category:

less

Transcript and Presenter's Notes

Title: Nonlinear Knowledge in Kernel Approximation


1
Nonlinear Knowledge in Kernel Approximation
  • Olvi Mangasarian
  • UW Madison UCSD La Jolla
  • Edward Wild
  • UW Madison

2
Objectives
  • Primary objective Incorporate prior knowledge
    over completely arbitrary sets into function
    approximation without transforming (kernelizing)
    the knowledge
  • Secondary objective Achieve transparency of the
    prior knowledge for practical applications

3
Outline
  • Use kernels for function approximation
  • Incorporate prior knowledge
  • Previous approaches require transformation of
    knowledge
  • New approach does not require any transformation
    of knowledge
  • Knowledge given over completely arbitrary sets
  • Experimental results
  • Two synthetic examples and one real world example
    related to breast cancer prognosis
  • Approximations with prior knowledge more accurate
    than approximations without prior knowledge

4
Function Approximation
  • Given a set m of points in n-dimensional real
    space Rn and corresponding function values in R
  • Points are represented by rows of a matrix A ? Rm
    ? n
  • Exact or approximate function values for each
    point are given by a corresponding vector y ? Rm
  • Find a function fRn ? R based on the given data
  • f(Ai0)? yi

5
Function Approximation
  • Problem utilizing only given data may result in
    a poor approximation
  • Points may be noisy
  • Sampling may be costly
  • Solution use prior knowledge to improve the
    approximation

6
Adding Prior Knowledge
  • Standard approach fit function at given data
    points without knowledge
  • Constrained approach satisfy inequalities at
    given points
  • 2004 MSW Paper satisfy linear inequalities over
    polyhedral regions
  • Proposed new approach satisfy nonlinear
    inequalities over arbitrary regions

7
Kernel Approximation
  • Approximate f by a nonlinear kernel function K
  • f(x) ? K(x0, A0)? b
  • K(x0, A0)? ?????exp(-?x-Ai2)?i
  • Error in kernel approximation of given data
  • -s ? K(A, A0)? be - y ? s
  • e is a vector of ones in Rm

8
Kernel Approximation
  • Trade off between solution complexity
  • (?1) and data fitting (s1)
  • Convert to a linear program
  • At solution
  • e0a ?1
  • e0s s1

9
Incorporating Nonlinear Prior Knowledge (MSW 2004)
  • Bx ? d ? ?0Ax b ? h0x ?
  • Need to kernelize knowledge from input space to
    feature space of kernel
  • Requires change of variable x A0t
  • BA0t ? d ? ?0AA0t b ? h0A0t ?
  • K(B, A0)t ? d ? ?0K(A, A0)t b ? h0A0t ?
  • Motzkins theorem of the alternative gives an
    equivalent linear system of inequalities which is
    added to a linear program
  • Achieves good numerical results, but
    kernelization is not readily interpretable in the
    original space

10
Incorporating Nonlinear Prior Knowledge New
Approach
  • Start with arbitrary nonlinear knowledge
    implication
  • g(x) ? 0 ? K(x0, A0)? b ? h(x), 8x 2 ? ½ Rn
  • g, h are arbitrary functions
  • g?! Rk, h?! R
  • Problem need to add this knowledge to the
    optimization problem
  • Logically equivalent system
  • g(x) ? 0, K(x0, A0)? b - h(x) lt 0 has no
    solution x ? ?

11
Prior Knowledge as a System of Linear Inequalities
  • Use a theorem of the alternative for convex
    functions Assume that g(x), K(x0, A0)? b,
    -h(x) are convex functions of x, that ? is convex
    and 9 x 2 ? g(x)lt0. Then either
  • I. g(x) ? 0, K(x0, A0)? b - h(x) lt 0 has a
    solution x ? ?, or
  • II. ?v ? Rk, v ? 0 K(x0, A0)? b - h(x)
    v0g(x) ? 0 ?x ? ?
  • But never both.
  • If we can find v ? 0 K(x0, A0)? b - h(x)
    v0g(x) ? 0 ?x ? ?, then by above theorem
  • g(x) ? 0, K(x0, A0)? b - h(x) lt 0 has no
    solution x ? ? or equivalently
  • g(x) ? 0 ? K(x0, A0)? b ? h(x), 8x 2 ?

12
Proof
  • ?I ? II
  • Follows from OLM 1969, Corollary 4.2.2 and the
    existence
  • of an x 2 ? such that g(x) lt0.
  • ?I ? II
  • Suppose not. That is, there exists x 2 ?, v 2
    Rk,, v 0
  • g(x) ? 0, K(x0, A0)? b - h(x) lt 0, (I)
  • v ? 0, v0g(x) K(x0, A0)? b - h(x) ? 0 , 8 x
    2 ?? (II)
  • Then we have the contradiction
  • 0 gt v0g(x) K(x0, A0)? b - h(x) ? 0
  • Requires no assumptions on g, h, K, or ?
    whatsoever

13
Example g(x) 1250 - x3 , f(x) x4, h(x) x2
5000
v0g(x) f(x) - h(x) 0
g(x) 0 ) f(x) h(x)
II
I
x3 ? 1250
x4 ? x2 5000
14
Incorporating Prior Knowledge
  • Linear semi-infinite program infinite number of
    constraints
  • Discretize finite linear program
  • Slacks allow knowledge to be satisfied inexactly
  • Add term to objective function to drive slacks to
    zero

15
Numerical Experience
  • Evaluate on three datasets
  • Two synthetic datasets
  • Wisconsin Prognostic Breast Cancer Database
  • Compare approximation with prior knowledge to one
    without prior knowledge
  • Prior knowledge leads to an improved
    approximation
  • Prior knowledge used cannot be handled exactly by
    previous work
  • No kernelization needed on knowledge set

16
Two-Dimensional Hyperboloid
  • f(x1, x2) x1x2

17
Two-Dimensional Hyperboloid
x2
  • Given exact values only at 11 points along line
    x1 x2
  • At x1 2 -5, , 5

x1
18
Two-Dimensional Hyperboloid Approximation without
Prior Knowledge
19
Two-Dimensional Hyperboloid
  • Add prior knowledge
  • x1x2 ? 1 ? f(x1, x2) ? x1x2
  • Nonlinear term x1x2 can not be handled exactly by
    any previous approach
  • Discretization used only 11 points along the line
    x1 -x2, x1 ? -5, -4, , 4, 5

20
Two-Dimensional Hyperboloid Approximation with
Prior Knowledge
21
Two-Dimensional Tower Function
22
Two-Dimensional Tower Function Data
  • Given 400 points on the grid -4, 4 ? -4, 4
  • Values are ming(x), 2, where g(x) is the exact
    tower function

23
Two-Dimensional Tower Function Approximation
without Prior Knowledge
24
Two-Dimensional Tower FunctionPrior Knowledge
  • Add prior knowledge
  • (x1, x2) ? -4, 4 ? -4, 4 ? f(x) g(x)
  • Prior knowledge is the exact function value.
  • Enforced at 2500 points on the grid
    -4, 4 ? -4, 4 through above implication
  • Principal objective of prior knowledge is to
    overcome poor given data

25
Two-Dimensional Tower Function Approximation with
Prior Knowledge
26
Predicting Lymph Node Metastasis
  • Number of metastasized lymph nodes is an
    important prognostic indicator for breast cancer
    recurrence
  • Determined by surgery in addition to the removal
    of the tumor
  • Wisconsin Prognostic Breast Cancer (WPBC) data
  • Lymph node metastasis for 194 patients
  • 30 cytological features from a fine-needle
    aspirate
  • Tumor size, obtained during surgery
  • Task predict the number of metastasized lymph
    nodes given tumor size alone

27
Predicting Lymph Node Metastasis
  • Split data into two portions
  • Past data 20 used to find prior knowledge
  • Present data 80 used to evaluate performance
  • Simulates acquiring prior knowledge from an
    experts experience

28
Prior Knowledge for Lymph Node Metastasis
  • Use kernel approximation without knowledge on the
    past data
  • f1(x) K(x0, A10)?1 b1
  • A1 is the matrix of the past data points
  • Use density estimation to decide where to enforce
    knowledge
  • p(x) is the empirical density of the past data
  • Number of metastasized lymph nodes is greater
    than the predicted value on the past data, with
    tolerance of 0.01
  • p(x) ? 0.1 ? f(x) ? f1(x) - 0.01

29
Predicting Lymph Node Metastasis Results
  • Table shows root-mean-squared-error (RMSE) of
    past data (20) approximation f1(x) on present
    data (80)
  • Leave-one-out (LOO) RMSE reported for
    approximations with and without knowledge
  • Improvement due to knowledge 14.8

30
Conclusion
  • Added general nonlinear prior knowledge to kernel
    approximation
  • Implemented as linear inequalities in a linear
    programming problem
  • Knowledge incorporated transparently
  • Demonstrated effectiveness
  • Two synthetic examples
  • Real world problem from breast cancer prognosis
  • Future work
  • More general prior knowledge with inequalities
    replaced by more general functions
  • Apply to classification problems

31
Questions
  • Websites linking to papers and talks
  • http//www.cs.wisc.edu/olvi/
  • http//www.cs.wisc.edu/wildt/
Write a Comment
User Comments (0)
About PowerShow.com