Robust Feature Selection by Mutual Information Distributions - PowerPoint PPT Presentation

About This Presentation

Title:

Robust Feature Selection by Mutual Information Distributions

Description:

Robust Feature Selection by Mutual Information Distributions Marco Zaffalon & Marcus Hutter IDSIA Galleria 2, 6928 Manno (Lugano), Switzerland www.idsia.ch/~{zaffalon ... – PowerPoint PPT presentation

Number of Views:58

Avg rating:3.0/5.0

Slides: 15

Provided by: zaffalon

Category:

Tags: distributions | feature | information | mutual | robust | selection | spam

Transcript and Presenter's Notes

Title: Robust Feature Selection by Mutual Information Distributions

1
Robust Feature Selection by Mutual Information
Distributions

Marco Zaffalon Marcus Hutter
IDSIA
Galleria 2, 6928 Manno (Lugano), Switzerland
www.idsia.ch/zaffalon,marcus
zaffalon,marcus_at_idsia.ch

2
Mutual Information (MI)

Consider two discrete random variables (?,?)
(In)Dependence often measured by MI
Also known as cross-entropy or information gain
Examples
Inference of Bayesian nets, classification trees
Selection of relevant variables for the task at
hand

3
MI-Based Feature-Selection Filter (F)Lewis, 1992

Classification
Predicting the class value given values of
features
Features (or attributes) and class random
variables
Learning the rule features ? class from data
Filters goal removing irrelevant features
More accurate predictions, easier models
MI-based approach
Remove feature ? if class ? does not depend on
it
Or remove ? if
is an arbitrary threshold of
relevance

4
Empirical Mutual Informationa common way to use
MI in practice
j\i 1 2 r
1 n11 n12 n1r
2 n21 n22 n2r

s ns1 ns2 nsr

Data ( ) ? contingency table
Empirical (sample) probability
Empirical mutual information
Problems of the empirical approach
due to random fluctuations? (finite
sample)
How to know if it is reliable, e.g. by

5
We Need the Distribution of MI

Bayesian approach
Prior distribution for the unknown chances
(e.g., Dirichlet)
Posterior
Posterior probability density of MI
How to compute it?
Fitting a curve by the exact mean, approximate
variance

6
Mean and Variance of MIHutter, 2001 Wolpert
Wolf, 1995

Exact mean
Leading and next to leading order term (NLO) for
the variance
Computational complexity O(rs)
As fast as empirical MI

7
MI Density Example Graphs
8
Robust Feature Selection

Filters two new proposals
FF include feature ? iff
(include iff proven relevant)
BF exclude feature ? iff
(exclude iff proven irrelevant)
Examples

9
Comparing the Filters

Experimental set-up
Filter (F,FF,BF) Naive Bayes classifier
Sequential learning and testing
Collected measures for each filter
Average of correct predictions (prediction
accuracy)
Average of features used

10
Results on 10 Complete Datasets

of used features
Accuracies NOT significantly different
Except Chess Spam with FF

11
Results on 10 Complete Datasets - ctd
12
FF Significantly Better Accuracies

Chess
Spam

13
Extension to Incomplete Samples

MAR assumption
General case missing features and class
EM closed-form expressions
Missing features only
Closed-form approximate expressions for Mean and
Variance
Complexity still O(rs)
New experiments
5 data sets
Similar behavior

14
Conclusions

Expressions for several moments of MI
distribution are available
The distribution can be approximated well
Safer inferences, same computational complexity
of empirical MI
Why not to use it?
Robust feature selection shows power of MI
distribution
FF outperforms traditional filter F
Many useful applications possible
Inference of Bayesian nets
Inference of classification trees

Write a Comment

User Comments (0)

About PowerShow.com

Recommended Relevance Latest Highest Rated Most Viewed

Sort by:

Related More from user

CrystalGraphics Presentations

Introducing-PowerShowcom PowerPoint PPT Presentation

Introducing-PowerShowcom - Introducing-PowerShowcom (Without Music)

CrystalGraphics 3D Character Slides for PowerPoint PowerPoint PPT Presentation

CrystalGraphics 3D Character Slides for PowerPoint - CrystalGraphics 3D Character Slides for PowerPoint

Chart and Diagram Slides for PowerPoint PowerPoint PPT Presentation

Chart and Diagram Slides for PowerPoint - Beautifully designed chart and diagram s for PowerPoint with visually stunning graphics and animation effects. Our new CrystalGraphics Chart and Diagram Slides for PowerPoint is a collection of over 1000 impressively designed data-driven chart and editable diagram s guaranteed to impress any audience. They are all artistically enhanced with visually stunning color, shadow and lighting effects. Many of them are also animated. And they’re ready for you to use in your PowerPoint presentations the moment you need them. – PowerPoint PPT presentation

Related Presentations

Robust Feature Selection by Mutual Information Distributions PowerPoint PPT Presentation

Robust Feature Selection by Mutual Information Distributions - Empirical (sample) probability: Empirical mutual information: ... Filters: two new proposals. FF: include feature iff (include iff 'proven' relevant) ... | PowerPoint PPT presentation | free to view

Bayesian Treatment of Incomplete Discrete Data applied to Mutual Information and Feature Selection PowerPoint PPT Presentation

Bayesian Treatment of Incomplete Discrete Data applied to Mutual Information and Feature Selection - Bayesian Treatment of Incomplete Discrete Data applied to Mutual Information and Feature Selection Marcus Hutter & Marco Zaffalon IDSIA Galleria 2, 6928 Manno (Lugano ... | PowerPoint PPT presentation | free to view

Bayesian Treatment of Incomplete Discrete Data applied to Mutual Information and Feature Selection PowerPoint PPT Presentation

Bayesian Treatment of Incomplete Discrete Data applied to Mutual Information and Feature Selection - Bayesian Treatment of Incomplete Discrete Data applied to Mutual Information and Feature Selection | PowerPoint PPT presentation | free to view

Multi-modality%20image%20registration%20using%20mutual%20information%20based%20on%20gradient%20vector%20flow PowerPoint PPT Presentation

Multi-modality%20image%20registration%20using%20mutual%20information%20based%20on%20gradient%20vector%20flow - Multi-modality image registration using mutual information based on gradient vector flow Yujun Guo May 1,2006 | PowerPoint PPT presentation | free to view

CS 433/557 Algorithms for Image Analysis PowerPoint PPT Presentation

CS 433/557 Algorithms for Image Analysis - Intensity-Based. Rigid Template matching. What if intensities of your image ... Intensity can also be a considered a feature but it may not be very robust (e.g. ... | PowerPoint PPT presentation | free to view

Word sense disambiguation and information retrieval PowerPoint PPT Presentation

Word sense disambiguation and information retrieval - (17.1) '..., everybody has a career and none of them includes washing DISHES' ... One sense per collocation. Also automatic selection from machine readable dictionary ... | PowerPoint PPT presentation | free to view

Information Theoretic Clustering, Co-clustering and Matrix Approximations Inderjit S. Dhillon University of Texas, Austin PowerPoint PPT Presentation

Information Theoretic Clustering, Co-clustering and Matrix Approximations Inderjit S. Dhillon University of Texas, Austin - Title: Information Theoretic Clustering and Co-Clustering for Text Mining Author: Inderjit Dhillon Last modified by: Inderjit Dhillon Created Date | PowerPoint PPT presentation | free to view

CS276B Text Information Retrieval, Mining, and Exploitation PowerPoint PPT Presentation

CS276B Text Information Retrieval, Mining, and Exploitation - CS276B Text Information Retrieval, Mining, and Exploitation Lecture 4 Text Categorization I Introduction and Naive Bayes Jan 21, 2003 Is this spam? From ... | PowerPoint PPT presentation | free to view

Information Theoretic Clustering, Co-clustering and Matrix Approximations Inderjit S. Dhillon University of Texas, Austin PowerPoint PPT Presentation

Information Theoretic Clustering, Co-clustering and Matrix Approximations Inderjit S. Dhillon University of Texas, Austin - Information Theoretic Clustering, Coclustering and Matrix Approximations Inderjit S' Dhillon Univers | PowerPoint PPT presentation | free to view

Raged: robust analysis of gene expression data PowerPoint PPT Presentation

Raged: robust analysis of gene expression data - DNA microarrays are increasing the level of understanding of complex biological systems. an exponential growth in the size and ... Formalism. Bayesian network. ... | PowerPoint PPT presentation | free to view

Universit PowerPoint PPT Presentation

Universit - Outline Decision tree representation ID3 learning algorithm Entropy, information gain Overfitting ... information gain Overfitting Decision Tree for PlayTennis ... | PowerPoint PPT presentation | free to view

Data Mining Classification: Alternative Techniques PowerPoint PPT Presentation

Data Mining Classification: Alternative Techniques - Add conjuncts that maximizes FOIL's information gain measure: R0: {} = class (initial rule) ... Add conjuncts as long as they improve FOIL's information gain ... | PowerPoint PPT presentation | free to view

IFT6255: Information Retrieval A synthesis, analysis and comparison of text classification algorithms Ligen Wang Jing Bai PowerPoint PPT Presentation

IFT6255: Information Retrieval A synthesis, analysis and comparison of text classification algorithms Ligen Wang Jing Bai - IFT6255: Information Retrieval A synthesis, analysis and comparison of text classification algorithms Ligen Wang Jing Bai Overview Definition of text classification ... | PowerPoint PPT presentation | free to view

Information Theoretic Signal Processing and Machine Learning PowerPoint PPT Presentation

Information Theoretic Signal Processing and Machine Learning - Classification results Information forces in training ITL - Applications ITL System identification Feature extraction Blind source separation Clustering ITL ... | PowerPoint PPT presentation | free to view

PVC Pipe Materials Guide PowerPoint PPT Presentation

PVC Pipe Materials Guide - When it comes to selecting pipe for your trenchless application, there is no shortage of options — steel, iron, clay, concrete and plastic. Given the plethora of choices, the decision of which pipe to use for your project can be daunting. What you need is information on what each pipe brings to the table. To help you out, Trenchless Technology contacted pipe associations and manufacturers of the eight pipes typically used in trenchless projects to point out what their particular conduit offers. The information presented is intended to provide just a preliminary glimpse at the different pipe on the market. For more etail information, please contact the manufacturer, pipe association or your consulting engineer. All information presented was provided by the various pipe associations or manufacturers. | PowerPoint PPT presentation | free to view

Detecting robust time-delayed regulation in Mycobacterium tuberculosis PowerPoint PPT Presentation

Detecting robust time-delayed regulation in Mycobacterium tuberculosis - Detecting robust time-delayed regulation in Mycobacterium tuberculosis ... dnaB. fadd23. ruvC. dnaE2. agenda. Methods of Building GRN ... | PowerPoint PPT presentation | free to view

Information Theoretic Clustering, Co-clustering and Matrix Approximations Inderjit S. Dhillon University of Texas, Austin PowerPoint PPT Presentation

Information Theoretic Clustering, Co-clustering and Matrix Approximations Inderjit S. Dhillon University of Texas, Austin - Hard Clustering -- Each object belongs to a single cluster ... Key goal is to exploit the 'duality' between row and column clustering to ... | PowerPoint PPT presentation | free to view

IFT6255:%20Information%20Retrieval%20%20A%20synthesis,%20analysis%20and%20comparison%20of%20text%20classification%20algorithms%20%20%20Ligen%20Wang%20Jing%20Bai PowerPoint PPT Presentation

IFT6255:%20Information%20Retrieval%20%20A%20synthesis,%20analysis%20and%20comparison%20of%20text%20classification%20algorithms%20%20%20Ligen%20Wang%20Jing%20Bai - CHI) Mutual Information (MI) Term Strength (TS) 11. Classification Algorithms ... Support Vector Machine (SVM) Voting algorithms. 12. Rocchio's Algorithm ... | PowerPoint PPT presentation | free to view

CS276 Information Retrieval and Web Search PowerPoint PPT Presentation

CS276 Information Retrieval and Web Search - Naive Bayes (simple, common method) Support-vector machines (new, more powerful) ... Na ve Bayes Conditional Independence Assumption: ... Naive Bayes assumption: ... | PowerPoint PPT presentation | free to view

IFT6255:%20Information%20Retrieval%20%20Text%20classification PowerPoint PPT Presentation

IFT6255:%20Information%20Retrieval%20%20Text%20classification - CHI) or. A: w and Cj B: w and not Cj. C: not w and Cj D: not w and not Cj. Mutual ... Support Vector Machine (SVM) Voting algorithms. 13. Rocchio's Algorithm ... | PowerPoint PPT presentation | free to view

CS276B Text Information Retrieval, Mining, and Exploitation PowerPoint PPT Presentation

CS276B Text Information Retrieval, Mining, and Exploitation - I am 22 years old and I have already purchased 6 properties using the ... e.g., 'is a toner cartridge ad' :'isn't' Methods (1) Manual classification ... | PowerPoint PPT presentation | free to view

An Architecture for Optimal and Robust Composition of Services across the Wide-Area Internet PowerPoint PPT Presentation

An Architecture for Optimal and Robust Composition of Services across the Wide-Area Internet - speech. Provider R. Cellular. Phone. Email. repository. Provider A. Video-on-demand. server ... Internet has ~100,000 globally visible APs [Jamin'00] Evaluation ... | PowerPoint PPT presentation | free to view

LING 572 PowerPoint PPT Presentation

LING 572 - Covering basic statistical methods that produce state-of-the-art results ... Random variable and random vector ... Convert an instance into a feature vector ... | PowerPoint PPT presentation | free to view

Information Theoretic Clustering, Co-clustering and Matrix Approximations Inderjit S. Dhillon University of Texas, Austin PowerPoint PPT Presentation

Information Theoretic Clustering, Co-clustering and Matrix Approximations Inderjit S. Dhillon University of Texas, Austin - Dmoz Science HTML data. 49 leaves in the hierarchy. 5000 documents with 14538 words ... Results (Dmoz) Hierarchical Classifier (Na ve Bayes at each node) ... | PowerPoint PPT presentation | free to view

Designing Distributed Applications using Mobile Agents PowerPoint PPT Presentation

Designing Distributed Applications using Mobile Agents - networks of mobile and fixed people, devices and applications ... manage, manipulate or collate information from many distributed sources. ... | PowerPoint PPT presentation | free to view

Information Retrieval and Web Search PowerPoint PPT Presentation

Information Retrieval and Web Search - Many systems partly rely on machine learning (Autonomy, MSN, Verity, Enkata, Yahoo! ... Chi-square test. Information theory: ... statistic (CHI) ... | PowerPoint PPT presentation | free to view

Kein Folientitel PowerPoint PPT Presentation

Kein Folientitel - This results in the definition of a new measure: the Model Conditioned Data ... From the definition of algorithmic mutual information is possible to derive a ... | PowerPoint PPT presentation | free to view