Learning - PowerPoint PPT Presentation

1 / 63

About This Presentation

Title:

Learning

Description:

Self-Organized Map for Musical Chord Identification. 2 Notes in common. 1 Note in common ... Finish = Comedy : Yes ( 3.0 / 0.0 ) | Finish = Tragedy : Yes ( 0.0 / 0.0 ) ... – PowerPoint PPT presentation

Number of Views:39

Avg rating:3.0/5.0

Slides: 64

Provided by: shyhka

Category:

Tags: learning

more less

Transcript and Presenter's Notes

Title: Learning

1
Learning

Shyh-Kang Jeng
Department of Electrical Engineering/
Graduate Institute of Communication Engineering
National Taiwan University

2
References

J. P. Bigus and J. Bigus, Constructing
Intelligent Agents with Java, Wiley Computer
Publishing, 1998
S. Russell and P. Norvig, Artificial
Intelligence A Modern Approach, Englewood
Cliffs, NJ Prentice Hall, 1995

3
Learning Agents
Performance standard
Sensors
Environment
feedback
changes
knowledge
Learning goals
Effectors
Agent
4
Forms of Learning

Rote learning
Parameter or weight adjustment
Induction
Clustering, chunking, or abstraction of knowledge
------------------------------------------------
Data mining as knowledge discovery

5
Learning Paradigms

Supervised learning
Programming by examples
Most common
Historical data is often used as the training
data
Unsupervised learning
Perform a type of feature detection
Reinforcement learning
Error information is less specific

6
Neural Networks

Borrowing heavily from the metaphor of the human
brain
Can be used in supervised, unsupervised, and
reinforcement learning scenarios
Can be used for classification, clustering, and
prediction
Most are implemented as programs running on
serial computers

7
Nerve Cell
Axon from Another cell
Axonal arborization
Dendrite
Axon
Nucleus
Synapse
Soma
8
Neuron
x1
w1j
x2
w2j
w3j
x3
yjf(sumj)

wnj
Processing unit j
xn
9
Logistic (Sigmoid)Activation Function
activation
sum
10
Back Propagation Network
11
Generic Neural Network Learning Algorithm

Assign weights randomly
repeat
for each e in examples
o Output(e)
t observed output values from e
update the weights based on e, o, t
until all examples correctly predicted or
stopping criterion is reached

12
Changes to the Weight
13
Kohonen Map
14
Changes to the Weight
15
Models of Chord Classification
16
Self-Organized Map for Musical Chord
Identification
17
References for Decision Tree Learning

T. Mitchell, Machine Learning, McGraw-Hill, 1997
J. R. Quinlan, C4.5 Programs for Machine
Learning, Morgan Kaufmann Publishers, 1993

18
Play Tennis
19
Entropy and Information Gain
20
Information Gain Computation
21
Option 1
S9,5- E0.940
Wind
Strong
Weak
6,2- E0.811
3,3- E1.00
Gain(S, Wind) 0.94 (8/14)0.811-(6/14)1.00
0.048
22
Option 2
S9,5- E0.940
Humidity
Normal
High
3,4- E0.985
6,1- E0.592
Gain(S, Humidity) 0.94 (7/14)0.985-(7/14)0.592
0.151
23
Information Gain for Four Attributes
24
Partially Learned Tree
D1, D2, , D14 9,5-
Outlook
Sunny
Rain
Overcast
D4,D5,D6,D10,D14 3,2-
D1,D2,D8,D9,D11 2,3-
D3,D7,D12,D13 4,0-
Yes
?
?
25
Next Step
26
Decision Tree
Outlook
Sunny
Rain
Overcast
Wind
Humidity
Yes
Weak
Normal
Strong
High
Yes
No
Yes
No
27
Decision Tree Learning Algorithm (1)

DecisionTreeLearning( examples, attributes,
default )
if examples is empty return default
if all examples have the same classification
return the classification
if attributes is empty return MajorityValue(examp
les)
best ChooseAttribute( attributes, examples )
tree a new decision tree with root test best

28
Decision Tree Learning Algorithm (2)

6. for each value vi of best do
examplesi elements of examples with
best vi
subtree DecisionTreeLearning( examplesi,
attributes-best, MajorityValue(examples) )
Add a branch to tree with label vi and subtree
subtree
7. return tree

29
Output of JDecisionTree
Type Action Yes ( 2.0 / 0.0 ) Type Romance
Yes ( 4.0 / 0.0 ) Type Funny No ( 3.0 / 0.0
) Type History Finish Comedy Yes (
3.0 / 0.0 ) Finish Tragedy Yes ( 0.0 / 0.0
) Finish Insipid Yes ( 0.0 / 0.0 )
Finish Unknown No ( 1.0 / 0.0 ) Type Horror
Yes ( 0.0 / 0.0 ) Type Warm Yes ( 0.0 / 0.0
) Type Science_fiction Yes ( 3.0 / 0.0 ) Type
Art Yes ( 2.0 / 0.0 ) Type Story
Class Movie Yes ( 0.0 / 0.0 ) Class
Series No ( 1.0 / 0.0 ) Class Soap_opera
Yes ( 0.0 / 0.0 ) Class Cartoon Yes ( 2.0
/ 0.0 ) Class Animation Yes ( 0.0 / 0.0 )
30
??????????????IMPECCABLE
Agent
31
??????????????IMPECCABLE
??
32
Website of cable TV service provider
33
eHome Architecture
34
Learning from the User
35
eHome Agents 2002

Lighting control agent
Air condition control agent
TV program selection agent

36
eHome Center
Appliances
User
Server
User Profiles
Information Server
Agent Place
Agent Communities
37
Personalization Agent
38
Expressiveness of Decision Trees

All learning can be seen as learning the
representation of a function
Any Boolean function can be written as a decision
tree
If the function is the parity function, then an
exponentially large decision tree is needed
It is also difficult to use a decision tree to
represent a majority function

39
Ockhams Razor

The most likely hypothesis is the simplest one
that is consistent with all observations
Extracting a pattern means being able to describe
a large number of cases in a concise way
Rather than just trying to find a decision tree
that agrees with the examples, we try to find a
concise one, too

40
Assessing the Performance of the Learning
Algorithm

Collect a large set of examples
Divide it into two disjoint sets the training
set and the test set
Use the learning algorithm with the training set
as examples to generate a hypothesis H
Measure the percentage of examples in the test
set that are correctively classified by H
Repeat above steps for different sizes of
training sets and different randomly selected
training sets of each size

41
Learning Curve

Average prediction quality as a function of the
size of the training set

100
42
Noise and Overfitting

Two or more examples with the same descriptions
but different classifications
In many cases the learning algorithm can use the
irrelevant attributes to make spurious
distinctions among the examples
For decision tree learning, tree-pruning is
useful to deal with overfitting

43
Issues in Applications of Decision-Tree Learning

Missing data
Multvalued attributes
Continuous-valued attributes

44
Tree Pruning

The resultant decision tree is often very complex
tree that overfits the examples by inferring more
structure than is justified by the training case
The complex tree can actually have a higher error
rate than a simple tree
Tasks are often at least partly indeterminate
because the attributes do not capture all
information relevant to classification

45
Pessimistic Pruning (1)

physician fee freeze n
adoption of the budget resolution y D (151)
adoption of the budget resolution u D (1)
adoption of the budget resolution n
education spending n D (6)
education spending y D (9)
education spending u R (1)

46
Pessimistic Pruning (2)

Subtree
education spending n D (6)
education spending y D (9)
education spending u R (1)
Error rate
6U25(0,6) 9U25(0,9)1U25(0,1) 3.273
Error rate if replaced by the most common leaf in
the subtree
16U25(1,16) 2.512

47
Pessimistic Pruning (3)

After pruning
physician fee freeze n
adoption of the budget resolution y D (151)
adoption of the budget resolution u D (1)
adoption of the budget resolution n D (16/1)
Error rate of the new subtree
151U25(0,151)1U25(0,1)16U25(1,16) 4.642
Error rate if the new subtree is replaced by the
most common leaf
168U25(1,168) 2.610

48
References for Probability Interval Estimation

A. M. Mood and F. A. Graybill, Introduction to
the Theory of Statistics, 2nd edition, New York
McGraw-Hill, 1963, Section 11.6.
M. Abramowitz and I. Stegun ed., Handbook of
Mathematical Functions, New York Dover, 1964.
Sections 26.5 and 26.2.

49
Estimate of Binomial Probability

Sample is drawn from a point
binomial with density
The maximum-likelihood estimate of p
Given n and y k , need to estimate the interval
of p such that the probability that the actual
parameter p falls within the interval equals
ConfidenceLevel

50
Interval Estimation

The upper limit is the p value that satisfies
The lower limit is the p value that satisfies

51
Incomplete Beta Function

Cumulative of the beta distribution
Incomplete Beta function

52
Connection between Binomial andBeta Distributions
53
Upper Limit
54
Lower Limit
55
Probably Approximately Correct

Any hypothesis that is seriously wrong will
almost certainly be found out with high
probability after a small number of examples,
because it will make an incorrect prediction
Any hypothesis that is consistent with a
sufficiently large set of training examples is
unlikely to be seriously wrong it must be
probably approximately correct

56
Two Activation Functions for Neurons