Machine%20Learning - PowerPoint PPT Presentation

About This Presentation
Title:

Machine%20Learning

Description:

Perceptron / Winnow (very simple rules for special cases) Various gradient descent methods ... Perceptron / Winnow. Perceptron. Add the misclassified instance ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 26
Provided by: mhor4
Category:

less

Transcript and Presenter's Notes

Title: Machine%20Learning


1
Machine Learning
  • Márk Horváth
  • Morgan Stanley
  • FID
  • Institutional Securities

2
Content
  • AI Paradigm
  • Data Mining
  • Weka
  • Application Areas
  • Introduce many fields and the whole paradigm
  • No time for details

3
AI Paradigm
  • The area of computer science which deals with
    problems, that we where not able to cope with
    before.
  • Computer science is a branch of mathematics, btw.
  • Algorithms solving problems mainly through
    interaction with the problem. The programmer does
    not have to understand the solution to the
    problem itself, but only the details of the
    learning algorithm.

4
AI Paradigm
  • Why AI?
  • new, fast expanding science, applicable at most
    of other sciences
  • it also deals with explaining evidence
  • interdisciplinar
  • math
  • computer science
  • applied math
  • philosophy of science
  • biology (many naturally inspired algorithms,
    thinking machine)
  • Why Machine Learning / Data Mining?
  • it can be applied on any data (financial,
    medical, demographical, )

5
AI Paradigm
  • 1965 John McCarthy gt 42 years
  • Hilbert, theorem proving machine
  • Occam (XIV.)
  • Many distinct fields
  • Many algorithms at each field
  • gt 1 hour is nothing.
  • Empirical and theoretical science
  • Intuition needed to use and hybridize
  • Few proves
  • Area too big to grasp everything in detail, but
    concepts are important
  • gt BIG PICTURE, no formulas!

6
AI Taxonomy
AI
Model / PCA, ICA
Logic / Expert Sys
Machine Learning / Data Mining / Function
Approximation
Optimization
Control
Clustering
AGI

Decision Tree / Covering
Linear Regression / Gradient Methods
Kernel Based / Nearest Neighbor
Naiive Bayes
0R, 1R (max likelihood)

7
Data Mining vs. Statistics
  • Statistics
  • hypothesis testing
  • DM
  • search through hypothesizes
  • Empirical side
  • Many methods work which are proven to not
    converge
  • Some methods do not work while they should (due
    to computation power problems, slow convergence)

8
Relation, Attribute, Class
(O, A, P) X MYCT x MMIN x MMAX x CACH x CHMIN x
CHMAX (Attribute, Feature) Y class (Class,
Target) O X x Y ?( Y X ) ?
_at_relation 'cpu _at_attribute MYCT real _at_attribute
MMIN real _at_attribute MMAX real _at_attribute CACH
real _at_attribute CHMIN real _at_attribute CHMAX
real _at_attribute class real performance _at_data 125
,256,6000,256,16,128,199 29,8000,32000,32,8,32,253
29,8000,16000,32,8,16,132 26,8000,32000,64,8,32,2
90 23,16000,32000,64,16,32,381
9
General View of Data Mining
  • Language
  • Build model / search over the Language

10
Simple Cases
  • 0R
  • 1R (nominal class)
  • Max likelihood
  • Linear Regression

11
Data Mining Taxonomy
  • Regression vs. Classification (exchangeable)
  • Deterministic vs. Stochastic (exchangeable
    Chebyshev)
  • Batch driven vs. Updateable (exchangeable, but
    with cost)
  • Symbolic vs. Subsymbolic

12
Methodology
  • Clean data
  • Try many methods
  • Optimize good methods
  • Hybridize good methods, make meta algorithms

13
Evaluation Measures
  • Mean Absolute Error / Root Mean Squared Error
  • Correlation Coefficient
  • Information gain
  • Custom (e.g. weighted)
  • Significance analysis (Bernoulli process)

14
Overfitting, Learning Noise
  • Philosophical question
  • When do we accept or deny a model?
  • No chance to prove, only to reject
  • Train / (Validation) / Test
  • Cross-validation, leave one out
  • Minimum Description Length principle
  • Occam
  • Kolmogorov complexity

15
Nearest Neighbor / Kernel
  • Instance based
  • Statistical (k neighbors)
  • Distance Euclidian, Manhattan / Evolved
  • Missing Attribute maximal distance
  • KD-tree (log(n)), ball tree, metric tree

16
Decision Trees / Covering
  • Divide and Conquer
  • Split by the best feature
  • User Classifier / REP Tree

17
Naiive Bayes
  • Independent Attributes
  • P(X Y) P(Y X) P(X) / P(Y) ? P(Y
    Xi) P(X) / P(Y)
  • Discrete Class

18
Artificial Neural Networks
  • Structure (Weka)
  • Theoretical limitations (Minsky, AI winter)
  • Recurrent networks for time series

19
Feedforward Learning Rules
  • Learning rules
  • Perceptron / Winnow (very simple rules for
    special cases)
  • Various gradient descent methods
  • Slower than perceptron
  • Faster than doing derivation of the whole
    expression
  • Local search
  • Evolution
  • Global search
  • Bit slower, but easy to hybridize with local
    search
  • Can evolve
  • Weights
  • Structure
  • Transfer functions
  • Recurrent networks

20
Perceptron / Winnow
  • Perceptron
  • Add the misclassified instance to the weight
  • Converges if the space is separable
  • Winnow
  • Binary
  • Increase or decrease non zero attribute weights

21
Feature extraction
  • Discretization
  • PCA/ICA
  • Various state space transitions
  • Evolving features
  • Clustering

22
Meta / Hybrid Methods
  • LEGO )
  • Vote (many ways)
  • Use meta algorithm to predict based on base
    methods
  • Embed
  • Apply regression in the leaves of decision trees
  • Embed decision tree, or training samples in ANN
  • Unify
  • Choose a general purpose language
  • Use conventional training methods to build models
  • Hybridize training methods, evolve
  • Easy to write articles, countless new ideas

23
Practical Uses
  • New paradigm
  • Countless applications
  • At all natural sciences
  • finance, psychology, sociology, biology,
    medicine, chemistry,
  • actually discovering and explaining evidence is
    science itself
  • Business
  • predictive enterprise

24
Applications in AI
  • Optimal Control (model building)
  • Using in other AI methods
  • Speech recognition
  • OCR
  • Speech synthesis
  • Vision, recognition
  • AGI (logic, DM, evolution, clustering,
    reinforcement learning, )

25
TDK, Article
  • Any topic youve found interesting
Write a Comment
User Comments (0)
About PowerShow.com