Three feature selection problems with solutions - PowerPoint PPT Presentation

1 / 17

About This Presentation

Title:

Three feature selection problems with solutions

Description:

IAMB is consistent under the composition property assumption (X Y | Z ? X W | Z X YW | Z). KIAMB: IAMB with randomness at step 4. Satisfied by. Gaussian distributions. ... – PowerPoint PPT presentation

Number of Views:34

Avg rating:3.0/5.0

Slides: 18

Provided by: jose73

Category:

more less

Transcript and Presenter's Notes

Title: Three feature selection problems with solutions

1
Three feature selection problems (with solutions)

Jose M. Peña
IISLAB, IDA
Linköping University
Sweden
jospe_at_ifm.liu.se
www.ida.liu.se/jospe

Joint work with Roland Nilsson Johan
Björkegren Jesper Tegnér
2
Outline

Problem I Posterior distribution.
Solution Markov boundary.
Peña, J. M., Nilsson, R., Björkegren, J. and
Tegnér, J. (2007). Towards Scalable and Data
Efficient Learning of Markov Boundaries.
International Journal of Approximate Reasoning,
45(2), 211-232.
Peña, J. M. (2008). Learning Gaussian Graphical
Models of Gene Networks with False Discovery Rate
Control. In Proceedings of the 6th European
Conference on Evolutionary Computation, Machine
Learning and Data Mining in Bioinformatics
(EvoBIO 2008) Lectures Notes in Computer
Science 4973, 165-176.
Problem II Class label.
Solution Bayes relevant features.
Nilsson, R., Peña, J. M., Björkegren, J. and
Tegnér, J. (2007). Consistent Feature Selection
for Pattern Recognition in Polynomial Time.
Journal of Machine Learning Research, 8, 589-612.
Peña, J. M. (2009). On the Possible Ordering of
Discrete Features Subsets. Submitted.
Problem III All relevant features.
Solution RIT algorithm.
Nilsson, R., Peña, J. M., Björkegren, J. and
Tegnér, J. (2007). Detecting Multivariate
Differentially Expressed Genes. BMC
Bioinformatics, 8150.

3
Problem I Posterior distribution

The Markov boundary of Y, SM, is the minimal set
of features such that p(p(YX) p(Y SM)) 1.
If p(X) gt 0 then SM is unique.
If p(X) gt 0 then Z ? SM iff p(p(YX) ? p(YX\Z))
gt 0.

Z is strongly relevant
no exhaustive search required - data inefficient
4
Algorithms for SM

Satisfied by
Gaussian distributions.
Distributions perfect to some graph.
Closed under marginalizacion and conditioning.

(Tsamardinos et al., 2003)

IAMB is consistent under the composition property
assumption (X - Y Z ? X - W Z ? X - YW Z).
KIAMB IAMB with randomness at step 4.

5
Thrombin data
Data provided by DuPont Pharmaceuticals for KDD
Cup 2001. 1909 training instances 634 testing
instances 139351 binary features (3-D properties
of a drug compound tested for binding to
thrombin, a key receptor in blood clotting)
6
Preliminaries for problem II

Classifier, gX-gtY.
Bayes classifier, g(X) arg maxy p(yX).
Risk, R(g) p(g(X) ? Y) Sx,y p(x,y) 1g(x) ?
Y.

7
Problem II Class label

Let X0,1,Y-1,1, f(x)gt0 and p(Y1x)x/3
for all x.
Then, SMX but g(x)-1 for all x and, thus, X is
irrelevant for classification.
Z is Bayes relevant iff p(g(X)?g(X\Z)) gt 0. Let
S denote the set of Bayes relevant features.
If p(x)gt0 and p(Yx) has a single maximum for all
x, then p(g(X)?g(X\Z)) gt 0 iff
R(g(X))?R(g(X\Z)).
If p(x)gt0 and p(Yx) has a single maximum for all
x, then S is the only minimal feature subset
such that R(g(S)) R(g(X)).

no exhaustive search required - data inefficient
8
UCI data sets
Consistent version of the one-shot approach

The following backward search is correct
SX
Repeat while possible
If there exists Z ? S such that
R(g(S))R(g(S\Z)), then SS\Z.
Data inneficient. Forward approaches ?

9
S may differ from SM

S ? SM.
But the converse may not be true.

10
Possible orderings of discrete feature subsets

Any strictly increasing Bayes risk ordering is
possible as long as R(g(ST))ltR(g(S)).
E.g.,
R(g(X1,X2,X3)) lt R(g(X1,X2)) lt R(g(X1,X3)) lt
R(g(X2,X3)) lt R(g(X3)) lt R(g(X2)) lt R(g(X1))
lt R(g(Ø))
Finding the feature subset of size k that has
minimal Bayes risk requires exhaustive search.
As we have seen, finding S does not require
exhaustive search.
Open problem Is any sequence of Bayes risks
possible ?
Analogous results exist for continuous domains,
though not Gaussian. See Cover and Van Campenhout
(1978) and Van Campenhout (1980).

11
Problem III All relevant features

Z is weakly relevant iff p(p(YX) p(YX\Z)) 1
but
p(p(YS) ? p(YS,Z)) gt 0 with S ? X\Z.
The set of all-relevant features, SA, is the set
of strongly and weakly relevant features.

12
Algorithm for SA

Satisfied by
Gaussian distributions.
Distributions perfect to some graph.
Closed under marginalizacion and conditioning.

There exists f(X,Y) gt 0 such that searching for
SA implies an exhaustive search.
RIT is consistent under the following
assumptions
composition (X - Y Z ? X - W Z ? X - YW Z),
and
weak transitivity (X - Y Z ? X - Y ZV ? X - V
Z ? V - Y Z).
RIT performs at most SAX tests (SAltX).

13
Algorithm for SA
no exhaustive search required data efficient
14
Algorithm for SA with FDR control
15
Simulated data
16
Diabetes data
Data from Gunton et al. (2005) Cell, 122. 7
Normal vs. 15 type 2 diabetic patients, and 5000
genes kept after filtering out those with low
variance. 3 genes are univariately
differentially expressed Arnt, Cdc14a and Ddx3Y
(370 if no control for multiplicity).
Dopey1 was recently shown to be active in the
vesicle traffic system, the mechanism
that delivers insulin receptors to the cell
surface.
4 genes encoded TFs, which is intriguing since a
large fraction of previously discovered
diabetes-related genes are TFs.
So does Ddx3Y (only 6 genes annotated with this
function).
17
Summary

Problem I Posterior distribution.
Solution Markov boundary.
Peña, J. M., Nilsson, R., Björkegren, J. and
Tegnér, J. (2007). Towards Scalable and Data
Efficient Learning of Markov Boundaries.
International Journal of Approximate Reasoning,
45(2), 211-232.
Peña, J. M. (2008). Learning Gaussian Graphical
Models of Gene Networks with False Discovery Rate
Control. In Proceedings of the 6th European
Conference on Evolutionary Computation, Machine
Learning and Data Mining in Bioinformatics
(EvoBIO 2008) Lectures Notes in Computer
Science 4973, 165-176.
Problem II Class label.
Solution Bayes relevant features.
Nilsson, R., Peña, J. M., Björkegren, J. and
Tegnér, J. (2007). Consistent Feature Selection
for Pattern Recognition in Polynomial Time.
Journal of Machine Learning Research, 8, 589-612.
Peña, J. M. (2009). On the Possible Ordering of
Discrete Features Subsets. Submitted.
Problem III All relevant features.
Solution RIT algorithm.
Nilsson, R., Peña, J. M., Björkegren, J. and
Tegnér, J. (2007). Detecting Multivariate
Differentially Expressed Genes. BMC
Bioinformatics, 8150.