Learning - PowerPoint PPT Presentation

1 / 63
About This Presentation
Title:

Learning

Description:

Self-Organized Map for Musical Chord Identification. 2 Notes in common. 1 Note in common ... Finish = Comedy : Yes ( 3.0 / 0.0 ) | Finish = Tragedy : Yes ( 0.0 / 0.0 ) ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 64
Provided by: shyhka
Category:
Tags: learning

less

Transcript and Presenter's Notes

Title: Learning


1
Learning
  • Shyh-Kang Jeng
  • Department of Electrical Engineering/
  • Graduate Institute of Communication Engineering
  • National Taiwan University

2
References
  • J. P. Bigus and J. Bigus, Constructing
    Intelligent Agents with Java, Wiley Computer
    Publishing, 1998
  • S. Russell and P. Norvig, Artificial
    Intelligence A Modern Approach, Englewood
    Cliffs, NJ Prentice Hall, 1995

3
Learning Agents
Performance standard
Sensors
Environment
feedback
changes
knowledge
Learning goals
Effectors
Agent
4
Forms of Learning
  • Rote learning
  • Parameter or weight adjustment
  • Induction
  • Clustering, chunking, or abstraction of knowledge
  • ------------------------------------------------
  • Data mining as knowledge discovery

5
Learning Paradigms
  • Supervised learning
  • Programming by examples
  • Most common
  • Historical data is often used as the training
    data
  • Unsupervised learning
  • Perform a type of feature detection
  • Reinforcement learning
  • Error information is less specific

6
Neural Networks
  • Borrowing heavily from the metaphor of the human
    brain
  • Can be used in supervised, unsupervised, and
    reinforcement learning scenarios
  • Can be used for classification, clustering, and
    prediction
  • Most are implemented as programs running on
    serial computers

7
Nerve Cell
Axon from Another cell
Axonal arborization
Dendrite
Axon
Nucleus
Synapse
Soma
8
Neuron
x1
w1j
x2
w2j
w3j
x3
yjf(sumj)


wnj
Processing unit j
xn
9
Logistic (Sigmoid)Activation Function
activation
sum
10
Back Propagation Network
11
Generic Neural Network Learning Algorithm
  • Assign weights randomly
  • repeat
  • for each e in examples
  • o Output(e)
  • t observed output values from e
  • update the weights based on e, o, t
  • until all examples correctly predicted or
  • stopping criterion is reached

12
Changes to the Weight
13
Kohonen Map
14
Changes to the Weight
15
Models of Chord Classification
16
Self-Organized Map for Musical Chord
Identification
17
References for Decision Tree Learning
  • T. Mitchell, Machine Learning, McGraw-Hill, 1997
  • J. R. Quinlan, C4.5 Programs for Machine
    Learning, Morgan Kaufmann Publishers, 1993

18
Play Tennis
19
Entropy and Information Gain
20
Information Gain Computation
21
Option 1
S9,5- E0.940
Wind
Strong
Weak
6,2- E0.811
3,3- E1.00
Gain(S, Wind) 0.94 (8/14)0.811-(6/14)1.00
0.048
22
Option 2
S9,5- E0.940
Humidity
Normal
High
3,4- E0.985
6,1- E0.592
Gain(S, Humidity) 0.94 (7/14)0.985-(7/14)0.592
0.151
23
Information Gain for Four Attributes
24
Partially Learned Tree
D1, D2, , D14 9,5-
Outlook
Sunny
Rain
Overcast
D4,D5,D6,D10,D14 3,2-
D1,D2,D8,D9,D11 2,3-
D3,D7,D12,D13 4,0-
Yes
?
?
25
Next Step
26
Decision Tree
Outlook
Sunny
Rain
Overcast
Wind
Humidity
Yes
Weak
Normal
Strong
High
Yes
No
Yes
No
27
Decision Tree Learning Algorithm (1)
  • DecisionTreeLearning( examples, attributes,
    default )
  • if examples is empty return default
  • if all examples have the same classification
    return the classification
  • if attributes is empty return MajorityValue(examp
    les)
  • best ChooseAttribute( attributes, examples )
  • tree a new decision tree with root test best

28
Decision Tree Learning Algorithm (2)
  • 6. for each value vi of best do
  • examplesi elements of examples with
  • best vi
  • subtree DecisionTreeLearning( examplesi,
    attributes-best, MajorityValue(examples) )
  • Add a branch to tree with label vi and subtree
    subtree
  • 7. return tree

29
Output of JDecisionTree
Type Action Yes ( 2.0 / 0.0 ) Type Romance
Yes ( 4.0 / 0.0 ) Type Funny No ( 3.0 / 0.0
) Type History Finish Comedy Yes (
3.0 / 0.0 ) Finish Tragedy Yes ( 0.0 / 0.0
) Finish Insipid Yes ( 0.0 / 0.0 )
Finish Unknown No ( 1.0 / 0.0 ) Type Horror
Yes ( 0.0 / 0.0 ) Type Warm Yes ( 0.0 / 0.0
) Type Science_fiction Yes ( 3.0 / 0.0 ) Type
Art Yes ( 2.0 / 0.0 ) Type Story
Class Movie Yes ( 0.0 / 0.0 ) Class
Series No ( 1.0 / 0.0 ) Class Soap_opera
Yes ( 0.0 / 0.0 ) Class Cartoon Yes ( 2.0
/ 0.0 ) Class Animation Yes ( 0.0 / 0.0 )
30
??????????????IMPECCABLE
Agent
31
??????????????IMPECCABLE
??
32
Website of cable TV service provider
33
eHome Architecture
34
Learning from the User
35
eHome Agents 2002
  • Lighting control agent
  • Air condition control agent
  • TV program selection agent

36
eHome Center
Appliances
User
Server
User Profiles
Information Server
Agent Place
Agent Communities
37
Personalization Agent
38
Expressiveness of Decision Trees
  • All learning can be seen as learning the
    representation of a function
  • Any Boolean function can be written as a decision
    tree
  • If the function is the parity function, then an
    exponentially large decision tree is needed
  • It is also difficult to use a decision tree to
    represent a majority function

39
Ockhams Razor
  • The most likely hypothesis is the simplest one
    that is consistent with all observations
  • Extracting a pattern means being able to describe
    a large number of cases in a concise way
  • Rather than just trying to find a decision tree
    that agrees with the examples, we try to find a
    concise one, too

40
Assessing the Performance of the Learning
Algorithm
  • Collect a large set of examples
  • Divide it into two disjoint sets the training
    set and the test set
  • Use the learning algorithm with the training set
    as examples to generate a hypothesis H
  • Measure the percentage of examples in the test
    set that are correctively classified by H
  • Repeat above steps for different sizes of
    training sets and different randomly selected
    training sets of each size

41
Learning Curve
  • Average prediction quality as a function of the
    size of the training set

100
42
Noise and Overfitting
  • Two or more examples with the same descriptions
    but different classifications
  • In many cases the learning algorithm can use the
    irrelevant attributes to make spurious
    distinctions among the examples
  • For decision tree learning, tree-pruning is
    useful to deal with overfitting

43
Issues in Applications of Decision-Tree Learning
  • Missing data
  • Multvalued attributes
  • Continuous-valued attributes

44
Tree Pruning
  • The resultant decision tree is often very complex
    tree that overfits the examples by inferring more
    structure than is justified by the training case
  • The complex tree can actually have a higher error
    rate than a simple tree
  • Tasks are often at least partly indeterminate
    because the attributes do not capture all
    information relevant to classification

45
Pessimistic Pruning (1)
  • physician fee freeze n
  • adoption of the budget resolution y D (151)
  • adoption of the budget resolution u D (1)
  • adoption of the budget resolution n
  • education spending n D (6)
  • education spending y D (9)
  • education spending u R (1)

46
Pessimistic Pruning (2)
  • Subtree
  • education spending n D (6)
  • education spending y D (9)
  • education spending u R (1)
  • Error rate
  • 6U25(0,6) 9U25(0,9)1U25(0,1) 3.273
  • Error rate if replaced by the most common leaf in
    the subtree
  • 16U25(1,16) 2.512

47
Pessimistic Pruning (3)
  • After pruning
  • physician fee freeze n
  • adoption of the budget resolution y D (151)
  • adoption of the budget resolution u D (1)
  • adoption of the budget resolution n D (16/1)
  • Error rate of the new subtree
  • 151U25(0,151)1U25(0,1)16U25(1,16) 4.642
  • Error rate if the new subtree is replaced by the
    most common leaf
  • 168U25(1,168) 2.610

48
References for Probability Interval Estimation
  • A. M. Mood and F. A. Graybill, Introduction to
    the Theory of Statistics, 2nd edition, New York
    McGraw-Hill, 1963, Section 11.6.
  • M. Abramowitz and I. Stegun ed., Handbook of
    Mathematical Functions, New York Dover, 1964.
    Sections 26.5 and 26.2.

49
Estimate of Binomial Probability
  • Sample is drawn from a point
    binomial with density
  • The maximum-likelihood estimate of p
  • Given n and y k , need to estimate the interval
    of p such that the probability that the actual
    parameter p falls within the interval equals
    ConfidenceLevel

50
Interval Estimation
  • The upper limit is the p value that satisfies
  • The lower limit is the p value that satisfies

51
Incomplete Beta Function
  • Cumulative of the beta distribution
  • Incomplete Beta function

52
Connection between Binomial andBeta Distributions
53
Upper Limit
54
Lower Limit
55
Probably Approximately Correct
  • Any hypothesis that is seriously wrong will
    almost certainly be found out with high
    probability after a small number of examples,
    because it will make an incorrect prediction
  • Any hypothesis that is consistent with a
    sufficiently large set of training examples is
    unlikely to be seriously wrong it must be
    probably approximately correct

56
Two Activation Functions for Neurons
  • Step function
  • Sign function

57
Neurons as Logical Gates
58
Nonlinear Regression
59
Perceptrons
60
Linearly Separable Functions
?
XOR
AND
OR
  • A function that can be represented by a
    perceptron if and only if it is linearly separable

61
Back Propagation Network
62
Changes to the Weight
63
Gradient Descent Search
Write a Comment
User Comments (0)
About PowerShow.com