Markov Chains as a Learning Tool - PowerPoint PPT Presentation

1 / 23

About This Presentation

Title:

Markov Chains as a Learning Tool

Description:

Markov Chains as a Learning Tool. * * * * * * * * * * * * * * * * * * * * * * * * Weather: raining today 40% rain tomorrow 60% no rain tomorrow not raining ... – PowerPoint PPT presentation

Number of Views:254

Avg rating:3.0/5.0

Slides: 24

Provided by: IlanG1

Category:

more less

Transcript and Presenter's Notes

Title: Markov Chains as a Learning Tool

1
Markov Chains as a Learning Tool
2
Markov ProcessSimple Example

Weather
raining today 40 rain tomorrow
60 no rain tomorrow
not raining today 20 rain tomorrow
80 no rain tomorrow

Stochastic Finite State Machine
3
Markov ProcessSimple Example

Weather
raining today 40 rain tomorrow
60 no rain tomorrow
not raining today 20 rain tomorrow
80 no rain tomorrow

The transition matrix

Stochastic matrix
Rows sum up to 1
Double stochastic matrix
Rows and columns sum up to 1

Rain No rain
Rain
No rain
4
Markov Process
Let Xi be the weather of day i, 1 lt i lt t. We
may decide the probability of Xt1 from Xi, 1 lt
i lt t.
Markov Property Xt1, the state of the system
at time t1 depends only on the state of the
system at time t
Stationary Assumption Transition probabilities
are independent of time (t)
5
Markov ProcessGamblers Example
Gambler starts with 10 (the initial state) -
At each play we have one of the following
Gambler wins 1 with probability p Gambler
looses 1 with probability 1-p Game ends when
gambler goes broke, or gains a fortune of
100 (Both 0 and 100 are absorbing states)
1-p
6
Markov Process

Markov process - described by a stochastic FSM
Markov chain - a random walk on this graph
(distribution over paths)
Edge-weights give us
We can ask more complex questions, like

7
Markov ProcessCoke vs. Pepsi Example

Given that a persons last cola purchase was
Coke, there is a 90 chance that his next cola
purchase will also be Coke.
If a persons last cola purchase was Pepsi,
there is an 80 chance that his next cola
purchase will also be Pepsi.

transition matrix
coke pepsi
coke
pepsi
8
Markov ProcessCoke vs. Pepsi Example (cont)
Given that a person is currently a Pepsi
purchaser, what is the probability that he will
purchase Coke two purchases from now? Pr
Pepsi???Coke Pr Pepsi?Coke?Coke Pr
Pepsi? Pepsi ?Coke 0.2
0.9 0.8 0.2
0.34
? ? Coke
Pepsi ? ?
9
Markov ProcessCoke vs. Pepsi Example (cont)
Given that a person is currently a Coke
purchaser, what is the probability that he will
buy Pepsi at the third purchase from now?
10
Markov ProcessCoke vs. Pepsi Example (cont)

Assume each person makes one cola purchase per
week
Suppose 60 of all people now drink Coke, and 40
drink Pepsi
What fraction of people will be drinking Coke
three weeks from now?

PrX3Coke 0.6 0.781 0.4 0.438
0.6438 Qi - the distribution in week i Q0
(0.6,0.4) - initial distribution Q3 Q0 P3
(0.6438,0.3562)
11
Markov ProcessCoke vs. Pepsi Example (cont)
Simulation
2/3
PrXi Coke
week - i
12
How to obtain Stochastic matrix?

Solve the linear equations, e.g.,
Learn from examples, e.g., what letters follow
what letters in English words mast, tame, same,
teams, team, meat, steam, stem.

13
How to obtain Stochastic matrix?

Counts table vs Stochastic Matrix

P a s t m e \0
a 0 1/7 1/7 5/7 0 0
e 4/7 0 0 1/7 0 2/7
m 1/8 1/8 0 0 3/8 3/8
s 1/5 0 3/5 0 0 1/5
t 1/7 0 0 0 4/7 2/7
_at_ 0 3/8 3/8 2/8 0 0
14
Application of Stochastic matrix

Using Stochastic Matrix to generate a random
word
Generate most likely first letter
For each current letter generate most likely next
letter

A a s t m e \0
a - 1 2 7 - -
e 4 - - 5 - 7
m 1 2 - - 5 8
s 1 - 4 - - 5
t 1 - - - 5 7
_at_ - 3 6 8 - -
C
If Cr,j gt 0, let Ar,j Cr,1Cr,2Cr,j

15
Application of Stochastic matrix

Using Stochastic Matrix to generate a random
word
Generate most likely first letter Generate a
random number x between 1 and 8. If 1 lt x lt 3,
the letter is s if 4 lt x lt 6, the letter is
t otherwise, its m.
For each current letter generate
most likely next letter Suppose
the current letter is s and we
generate a random number x
between 1 and 5. If x 1, the next
letter is a if 2 lt x lt 4, the next
letter is t otherwise, the current
letter is an ending letter.

A a s t m e \0
a - 1 2 7 - -
e 4 - - 5 - 7
m 1 2 - - 5 8
s 1 - 4 - - 5
t 1 - - - 5 7
_at_ - 3 6 8 - -
If Cr,j gt 0, let Ar,j Cr,1Cr,2Cr,j

16
Supervised vs Unsupervised

Decision tree learning is supervised learning
as we know the correct output of each example.
Learning based on Markov chains is unsupervised
learning as we dont know which is the correct
output of next letter.

17
K-Nearest Neighbor

Features
All instances correspond to points in an
n-dimensional Euclidean space
Classification is delayed till a new instance
arrives
Classification done by comparing feature vectors
of the different points
Target function may be discrete or real-valued

18
1-Nearest Neighbor
19
3-Nearest Neighbor
20
ExampleIdentify Animal Type
14 examples 10 attributes 5 types Whats the
type of this new animal?
21
K-Nearest Neighbor

An arbitrary instance is represented by (a1(x),
a2(x), a3(x),.., an(x))
ai(x) denotes features
Euclidean distance between two instances
d(xi, xj)sqrt (sum for r1 to n (ar(xi) -
ar(xj))2)
Continuous valued target function
mean value of the k nearest training examples

22
Distance-Weighted Nearest Neighbor Algorithm

Assign weights to the neighbors based on their
distance from the query point
Weight may be inverse square of the distances
All training points may influence a particular
instance
Shepards method

23
Remarks

Highly effective inductive inference method for
noisy training data and complex target functions
Target function for a whole space may be
described as a combination of less complex local
approximations
Learning is very simple
- Classification is time consuming (except 1NN)

Write a Comment

User Comments (0)