Treatment Learning: Implementation and Application

About This Presentation

Title:

Treatment Learning: Implementation and Application

Description:

Treatment Learning: Implementation and Application Ying Hu Electrical & Computer Engineering University of British Columbia – PowerPoint PPT presentation

Number of Views:95

Avg rating:3.0/5.0

Slides: 19

Provided by: Zhu116

Category:

more less

Transcript and Presenter's Notes

Title: Treatment Learning: Implementation and Application

1
Treatment LearningImplementation and Application

Ying Hu
Electrical Computer Engineering
University of British Columbia

2
Outline

An example
Background Review
TAR2 Treatment Learner
TARZAN Tim Menzies
TAR2 Ying Hu Tim Menzies
TAR3 improved tar2
TAR3 Ying Hu
Evaluation of treatment learning
Application of Treatment Learning
Conclusion

3
First Impression

Boston Housing Dataset
(506 examples, 4 classes)

4
Review Background

What is KDD ?
KDD Knowledge Discovery in Database fayyad96
Data mining one step in KDD process
Machine learning learning algorithms
Common data mining tasks
Classification
Decision tree induction (C4.5) quinlan86
Nearest neighbors cover67
Neural networks rosenblatt62
Naive Bayes classifier duda73
Association rule mining
APRIORI algorithm agrawal93
Variants of APRIORI

5
Treatment Learning Definition

Input classified dataset
Assume classes are ordered
Output Rxconjunction of attribute-value pairs
Size of Rx of pairs in the Rx
confidence(Rx w.r.t Class) P(ClassRx)
Goal to find Rx that have different level of
confidence across classes
Evaluate Rx lift
Visualization form of output

6
Motivation Narrow Funnel Effect

When is enough learning enough?
Attributes lt 50, accuracy decrease 3-5
shavlik91
1-level decision tree is comparable to C4
Holte93
Data engineering ignoring 81 features result in
2 increase of accuracy kohavi97
Scheduling random sampling outperforms complete
search (depth-first) crawford94
Narrow funnel effect
Control variables vs. derived variables
Treatment learning finding funnel variables

7
TAR2 The Algorithm

Search attribute utility estimation
Estimation heuristic Confidence1
Search depth-first search
Search space confidence1 gt threshold
Discretization equal width interval binning
Reporting Rx
Lift(Rx) gt threshold
Software package and online distribution

8
The Pilot Case Study

Requirement optimization
Goal optimal set of mitigations in a cost
effective manner

Risks
Cost
relates
Requirements
incur
reduce
achieve
Mitigations
Benefit

Iterative learning cycle

9
The Pilot Study (continue)

Cost-benefit distribution (30/99 mitigations)

10
Problem of TAR2

Runtime vs. Rx size

To generate Rx of size r
To generate Rx from size 1..N

11
TAR3 the improvement

Random sampling
Key idea
Confidence1 distribution probability
distribution
sample Rx from confidence1 distribution
Steps
Place item (ai) in increasing order according to
confidence1 value
Compute CDF of each ai
Sample a uniform value u in 0..1
The sample is the least ai whose CDFgtu
Repeat till we get a Rx of given size

12
Comparison of Efficiency
13
Comparison of Results

10 UCI domains, identical best Rx

Final Rx TAR219, TAR320

14
External Evaluation
C4.5 Naive Bayes

FSS framework

All attributes (10 UCI datasets)
Feature subset selector TAR2less
15
The Results

Number of attributes

Accuracy using C4.5
(avg decrease 0.9)

Accuracy using Naïve Bayes

(Avg increase 0.8 )
16
Compare to other FSS methods

of attribute selected (C4.5 )

of attribute selected (Naive Bayes)

17/20, fewest attributes selected
Another evidence for funnels

17
Applications of Treatment Learning

Downloading site http//www.ece.ubc.ca/yingh/
Collaborators JPL, WV, Portland, Miami
Application examples
pair programming vs. conventional programming
identify software matrix that are superior error
indicators
identify attributes that make FSMs easy to test
find the best software inspection policy for a
particular software development organization
Other applications
1 journal, 4 conference, 6 workshop papers

18
Main Contributions