Title: Three Papers: AUC, PFA and BIOInformatics
1Three Papers AUC, PFA and BIOInformatics
- The three papers are posted online
2Learning Algorithms for Better Ranking
- Jin Huang, Charles X. Ling Using AUC and
Accuracy in Evaluating Learning Algorithms. IEEE
Trans. Knowl. Data Eng. 17(3) 299-310 (2005) - Find the citations online (google scholar)
- Goal accuracy vs ranking
- Secondary Goal Decision Tree vs Bayesian
Networks in Ranking - Design Algorithms That Directly Optimize Ranking
3Accuracy not good enough
Higher ranking more desirable
Cutoff line
Classifier 1
Classifier 2
Accuracy of Classifier1 4/5 Accuracy of
Classifier2 4/5 But intuitively, Classifier 1 is
better!
4Accuracy vs ranking
- Accuracy-based making two assumptions balanced
class distribution and equal costs for
misclassification - Ranking step aside these assumptions
- Problem Training examples are labeled, not
ranked - How to evaluate ranking?
5ROC curve
(Provost Fawcett, AAAI97)
6How to calculate AUC
- Rank test examples in an increasing order
- Let ri be the rank of the ith positive example
(left low r_i, right high r_i better) - S0 ? ri
- AUC
- (Hand Till, 2001, MLJ)
7An example
Classifier 1
ri 5 7 8 9 10
Better result
S0 578910 39 AUC (39 5x6/2) / 25
24/25
8ROC curve and AUC
- If A dominates D, then A is better than D
- Often A and B are not dominating each other
- AUC (area under the ROC curve)
- Overall performance
- AUC for evaluating ranking
9ROC curve and AUC
- Traditional learning algorithms produce poor
probability estimates as by-product. - Decision tree algorithms
- Strategies to improve
- How about Bayesian network learning algorithms ?
10Evaluation of Classifiers
- Classification accuracy or error rate.
- ROC curve and AUC.
11AUC
Classifier 1
Classifier 2
The AUC of Classifier1 24/25 The AUC of
Classifier2 16/25 Classifier 1 is better than 2!
12AUC is more discriminating
- For N examples
- (N1) different accuracies
- N (N1)/2 different AUC values
- AUC is a better and more discriminating
evaluation measure than accuracy
13Naïve Bayes vs C4.4
Overall, Naïve Bayes outperforms C4.4 in AUC
LingZhang, submitted, 2002
14PCA in Face Recognition
15Problem with PCA
- The features are principal components
- Thus they do not correspond directly to the
original features - Problem with face recognition wish to pick a
subset of original features rather than composed
ones - Principal Feature Analysis pick the best,
uncorrelated, subset of features of a data set - Equivalent to finding q dimensions of a random
variable Xx1,x2, , xnT
16How to find the q features?
q1, q2, q3, qn
ith row ith feature
q
17The subspace
18Algorithm
19Result
20When PCA does not work
21PCA Clustering Bad Idea
22More
23Rand Index for Clusters (Partitions)
24Results