Title: Machine%20Learning
1Machine Learning
- Márk Horváth
- Morgan Stanley
- FID
- Institutional Securities
2Content
- AI Paradigm
- Data Mining
- Weka
- Application Areas
- Introduce many fields and the whole paradigm
- No time for details
3AI Paradigm
- The area of computer science which deals with
problems, that we where not able to cope with
before. - Computer science is a branch of mathematics, btw.
- Algorithms solving problems mainly through
interaction with the problem. The programmer does
not have to understand the solution to the
problem itself, but only the details of the
learning algorithm.
4AI Paradigm
- Why AI?
- new, fast expanding science, applicable at most
of other sciences - it also deals with explaining evidence
- interdisciplinar
- math
- computer science
- applied math
- philosophy of science
- biology (many naturally inspired algorithms,
thinking machine) - Why Machine Learning / Data Mining?
- it can be applied on any data (financial,
medical, demographical, )
5AI Paradigm
- 1965 John McCarthy gt 42 years
- Hilbert, theorem proving machine
- Occam (XIV.)
- Many distinct fields
- Many algorithms at each field
- gt 1 hour is nothing.
- Empirical and theoretical science
- Intuition needed to use and hybridize
- Few proves
- Area too big to grasp everything in detail, but
concepts are important - gt BIG PICTURE, no formulas!
6AI Taxonomy
AI
Model / PCA, ICA
Logic / Expert Sys
Machine Learning / Data Mining / Function
Approximation
Optimization
Control
Clustering
AGI
Decision Tree / Covering
Linear Regression / Gradient Methods
Kernel Based / Nearest Neighbor
Naiive Bayes
0R, 1R (max likelihood)
7Data Mining vs. Statistics
- Statistics
- hypothesis testing
- DM
- search through hypothesizes
- Empirical side
- Many methods work which are proven to not
converge - Some methods do not work while they should (due
to computation power problems, slow convergence)
8Relation, Attribute, Class
(O, A, P) X MYCT x MMIN x MMAX x CACH x CHMIN x
CHMAX (Attribute, Feature) Y class (Class,
Target) O X x Y ?( Y X ) ?
_at_relation 'cpu _at_attribute MYCT real _at_attribute
MMIN real _at_attribute MMAX real _at_attribute CACH
real _at_attribute CHMIN real _at_attribute CHMAX
real _at_attribute class real performance _at_data 125
,256,6000,256,16,128,199 29,8000,32000,32,8,32,253
29,8000,16000,32,8,16,132 26,8000,32000,64,8,32,2
90 23,16000,32000,64,16,32,381
9General View of Data Mining
- Language
- Build model / search over the Language
10Simple Cases
- 0R
- 1R (nominal class)
- Max likelihood
- Linear Regression
11Data Mining Taxonomy
- Regression vs. Classification (exchangeable)
- Deterministic vs. Stochastic (exchangeable
Chebyshev) - Batch driven vs. Updateable (exchangeable, but
with cost) - Symbolic vs. Subsymbolic
12Methodology
- Clean data
- Try many methods
- Optimize good methods
- Hybridize good methods, make meta algorithms
13Evaluation Measures
- Mean Absolute Error / Root Mean Squared Error
- Correlation Coefficient
- Information gain
- Custom (e.g. weighted)
- Significance analysis (Bernoulli process)
14Overfitting, Learning Noise
- Philosophical question
- When do we accept or deny a model?
- No chance to prove, only to reject
- Train / (Validation) / Test
- Cross-validation, leave one out
- Minimum Description Length principle
- Occam
- Kolmogorov complexity
15Nearest Neighbor / Kernel
- Instance based
- Statistical (k neighbors)
- Distance Euclidian, Manhattan / Evolved
- Missing Attribute maximal distance
- KD-tree (log(n)), ball tree, metric tree
16Decision Trees / Covering
- Divide and Conquer
- Split by the best feature
- User Classifier / REP Tree
17Naiive Bayes
- Independent Attributes
- P(X Y) P(Y X) P(X) / P(Y) ? P(Y
Xi) P(X) / P(Y) - Discrete Class
18Artificial Neural Networks
- Structure (Weka)
- Theoretical limitations (Minsky, AI winter)
- Recurrent networks for time series
19Feedforward Learning Rules
- Learning rules
- Perceptron / Winnow (very simple rules for
special cases) - Various gradient descent methods
- Slower than perceptron
- Faster than doing derivation of the whole
expression - Local search
- Evolution
- Global search
- Bit slower, but easy to hybridize with local
search - Can evolve
- Weights
- Structure
- Transfer functions
- Recurrent networks
20Perceptron / Winnow
- Perceptron
- Add the misclassified instance to the weight
- Converges if the space is separable
- Winnow
- Binary
- Increase or decrease non zero attribute weights
21Feature extraction
- Discretization
- PCA/ICA
- Various state space transitions
- Evolving features
- Clustering
22Meta / Hybrid Methods
- LEGO )
- Vote (many ways)
- Use meta algorithm to predict based on base
methods - Embed
- Apply regression in the leaves of decision trees
- Embed decision tree, or training samples in ANN
- Unify
- Choose a general purpose language
- Use conventional training methods to build models
- Hybridize training methods, evolve
- Easy to write articles, countless new ideas
23Practical Uses
- New paradigm
- Countless applications
- At all natural sciences
- finance, psychology, sociology, biology,
medicine, chemistry, - actually discovering and explaining evidence is
science itself - Business
- predictive enterprise
24Applications in AI
- Optimal Control (model building)
- Using in other AI methods
- Speech recognition
- OCR
- Speech synthesis
- Vision, recognition
- AGI (logic, DM, evolution, clustering,
reinforcement learning, )
25TDK, Article
- Any topic youve found interesting