Machine Learning and WEKA - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Machine Learning and WEKA

Description:

Weights represent importance of features (if normalized): sign(X) = wX - b ... Free. General Machine Learning Package. Cross-Validation. Some Limited Data ... – PowerPoint PPT presentation

Number of Views:148
Avg rating:3.0/5.0
Slides: 21
Provided by: romane
Category:
Tags: weka | learning | machine

less

Transcript and Presenter's Notes

Title: Machine Learning and WEKA


1
Machine Learningand WEKA
  • Roman Eisner
  • Wishart Lab Presentation
  • Sept 21, 2007

2
Outline
  • Motivating Machine Learning
  • Predictors
  • Evaluation
  • Interpretation
  • Software
  • WEKA

3
Motivating Machine Learning
  • Machine Learning
  • study of computer algorithms that improve
    automatically through experience
  • extract information from data automatically, by
    computational and statistical methods
  • Supervised Learning
  • Mapping a set of inputs, to a desired output
  • Regression, Classification

4
Motivating Machine Learning
  • Successful Applications in
  • natural language processing
  • search engines
  • medical diagnosis
  • bioinformatics and cheminformatics
  • detecting credit card fraud
  • stock market analysis
  • classifying DNA sequences
  • speech and handwriting recognition
  • object recognition in computer vision
  • game playing
  • robot locomotion

5
Predictors
  • Given input X x1, x2, xp produce an output
    Y
  • e.g. Given an image of a face, predict gender
  • e.g. Given urine metabolite profile, predict
    cancer state
  • e.g. Given protein sequence, predict function
  • Terminology
  • xi feature
  • Y label

6
Predictors - Workflow
LearningAlgorithm
Predictor
Predictions
7
Predictors
8
(No Transcript)
9
Predictors
  • SVM
  • Bayesian Networks
  • Decision Trees
  • Nearest Neighbor
  • Neural Networks
  • Many others

10
Predictors
  • Which to use? Depends on the Application
  • Data distribution / complexity
  • Data set size
  • Interpretability
  • Regression vs Classification
  • Speed Requirements
  • Other data set constraints

11
Evaluation
  • Accuracy
  • Percentage of Predictions that are correct
  • Problematic for some disproportional Data Sets
  • Precision
  • Percent of positive predictions correct
  • Recall (Sensitivity)
  • Percent of positive labeled samples predicted as
    positive
  • Specificity
  • The percentage of negative labeled samples
    predicted as negative.

12
Evaluation
13
Evaluation
Accuracy 100?
14
Evaluation
15
Evaluation
16
Evaluation
  • External Evaluation set
  • Cross-Validation
  • Split dataset into k parts
  • Use 1,2,,k-1 to create predictor kth to
    evaluate
  • Repeat for all k parts

17
Interpretation
  • Machine Learning Algorithms have varying levels
    of interpretability
  • Linear SVMs
  • Just a linear function over the features
  • Weights represent importance of features (if
    normalized)
  • sign(X) wX - b
  • Must use external test set to ensure analysis is
    not misguided

18
Software
  • Software tends to be varied and quite specialized
  • BNT - Bayesian Nets
  • Libsvm - SVM
  • Netlab - Neural Nets
  • Often dont include code for analysis and proper
    evaluation
  • Preferred data format

19
WEKA Waikato Environment for
Knowledge Analysis
  • Free
  • General Machine Learning Package
  • Cross-Validation
  • Some Limited Data Visualization
  • Many Machine Learning Algorithms
  • Java API

20
References
  • Machine Learning, Tom Mitchell, McGraw Hill,
    1997.
  • http//en.wikipedia.org/wiki/Machine_learning
  • http//en.wikipedia.org/wiki/Support_vector_machin
    e
Write a Comment
User Comments (0)
About PowerShow.com