Ch. 5 - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Ch. 5

Description:

Patroon ~ Comprimeerbaarheid ~ Voorspelbaarheid ... 'Data | patroon/hypothese' = examples die niet worden verklaard door hypothese ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 27
Provided by: mvanso
Category:
Tags: patroon

less

Transcript and Presenter's Notes

Title: Ch. 5


1
Ch. 5
2
Evaluate models
  • Useful
  • In context of learning algorithm Which model is
    better?
  • Evaluating methods which method find best models?

3
Confidence intervals (again)
  • Bernoulli process p
  • Sample distribution for p
  • Sample size N proportion p -gt
  • Mean p
  • Variance p(1-p)/N
  • Can be approximated by normal distribution

4
Cross validation
  • Goal find average accuracy of models found by
    method on data
  • Split data into N subsets
  • Train on N-1 subsets and test on remaining
  • Rotate
  • Average
  • NB
  • Models may be different!
  • Variance is also interesting!
  • Pessimistic estimate for when we use entire
    sample for training

5
Issues
  • Stratification by class
  • Also possible multiple random samples (repeated
    holdout)
  • Also possible leave-one-out
  • Computationally more expensive
  • Looks good but ..
  • Small samples -gt extreme acuracy scores

6
  • Also possible bootstrapping
  • Sample by replacement a set of N from the sample
    (of N) -gt training data
  • Test data sample training data
  • Estimate error from error in training data and
    error in test data by weigthing them by the
    probability of being selected as training/test
    data
  • Repeat
  • Useful for small datasets

7
Comments
  • No single best way
  • Problem become less when more more data are
    available
  • Look at variance in predictions of models found
    at different samples!
  • Look at learning curve to see if more data will
    help

8
5.6 Predicting probabilities
  • Predict class vs. distribution
  • Decision tree?
  • Naive Bayes?
  • Rule learners?
  • Different loss function
  • Sofar zero-one
  • Prediction error for distribution
  • Quadratic error

9
  • Quadratic error ?j (pj aj)2
  • For 1 instance

10
Aside probability matching
  • Suppose that we know p, actual p
  • Which prediction minimises quadratic error?
  • E(Quadratic error) E?j (pj aj)2
  • ? (E(p) 2 2.E(p.a) E(a)2
  • ? p 2 2.p.p p2
  • ? ((p p) 2 p(p p) )
  • Pick p for p -gt minimise quadratic error
  • Remaining error is just variance

11
Informational loss function
  • -log2 pi (pi is actual p)
  • If true probabilities are p1, p2, .. Then the
    expected value is
  • -p1.2log p1 p2.2log p2 - ..
  • Minimised if we take pp
  • Related to MDL principle

12
Counting costs
  • Different errors have different costs, asymmetric
  • More than two classes Confusion matrix
  • Use kappa as measure of strength of association
  • Cost-sensitive associate costs with confusions
  • Cost-sensitive classification minimise costs
    (instead of errors)
  • Cost-sensitive learning minimise costs (instead
    of errors)

13
Lift chart
  • Marketing
  • Goal effect of selecting different subsets (e.g
    of clients)
  • Improvement in accuracy (true positives true
    negatives) when a subset is selected
  • True positives vs. (sub)set size
  • Idea rank examples by predicted probability and
    plot accuracy for increasing subsets

14
ROC curves
  • ROC Receiver Operating Characteristic
  • Plot true positive / false positives for
    different thresholds
  • Learning/classification that outputs
    probabilities, with variable threshold
  • Shifting threshold so that more positive are
    accepted -gt more true positives AND more false
    positives

15
False positives
16
  • Compare methods by their ROC curves
  • Set some thresholds for learner/classifier
  • Find points on curve
  • Draw smooth curve
  • Compare area under ROC curve (AURC)
  • Interesting one methods ROC curve may dominate
    another completely but they may cross! one
    method better for small number of classifications

17
Recall, precision and F
  • Recall retrieved relevant / relevant
  • Precision retrieved relevant / total
  • F 2.recall.precision / recall precision

18
Sensitivity and specificity
  • Sensitivity proportion with disease who are
    positive (true positive)
  • Specificity no disease and negative (1 false
    positive)

19
Error Curves
  • P(error)
  • P()

20
Minimal Description Length
  • Occams razor if model 1 and model 2 both explain
    data D and model 1 is simpler than model 2 then
    prefer model 1
  • Idee MDL
  • Patroon Comprimeerbaarheid Voorspelbaarheid
  • Randomness onvoorspelbaarheid
    oncomprimeerbaarheid geen patroon
  • MDL voor ML
  • Maak compact, simpel model voorzover mogelijk
  • Rest is random/niet-cmprimeerbaar

21
Wat is eenvoud?
  • Compactheid minimale aantal bits nodig voor
    coderen

22
Waarom is MDL interessant voor ML?
  • Theorie over wat het best model is voor data!

23
Two-part code optimisation
  • Codeer data door patroon/hypothese
    onverklaarde data
  • Patroon/hypothese vinden ML
  • Plan
  • Minimaliseer L(patroon/hypothese) L(Data
    patroon/hypothese)
  • Data patroon/hypothese examples die niet
    worden verklaard door hypothese

24
Is two-part code optimisation optimaal?
  • Ja!
  • P(H Data) P(Data. H) P(H) / P(Data)
  • posterior probability Conditional . Prior
  • Neem H waarvoor P(H Data) maximaal
  • P(data) constant voor alle H
  • -2log(P(H Data)) -2log(P(Data. H))
    -2log(P(H)) 2logP(E)))
  • 2log(P(x)) min. aantal bits voor coderen van
    P(x)!!

25
Issues
  • In Bayesian formulation P(H) ?
  • Coderen
  • Lengte meten
  • Modellen vinden
  • NB geen aanname over representatie, domein, ..!
  • Dezelfde aanpak voor reeksen, tabellen, !

26
Voorbeelden
  • Beslisbomen rule sets
  • Naïve Bayes
Write a Comment
User Comments (0)
About PowerShow.com