Title: Programming assign proper weights
1- Programming assign proper weights
- Learning
u1
Wi,1
Wi,2
u2
ui
Wi,n
un
n
Oi f ( S Wj i uj )
Activation
j1
f transfer function (usually non-linear
threshold function, sigmoid, hyperbolic tangent,
etc.)
2Perceptron
Rosenblatt, 1962, Minsky, 1969
W1
X1
W2
Oi
X2
Wn
Xn
1, if S Wi xi gt q
Activation Oi
0, if S Wi xilt q
W0
-q
1
W1
Oi
X1
1, if S Wi xi gt 0
Activation Oi
Wn
Xn
0, if S Wi xilt 0
3Algorithm for Perceptron Learning
- Initialize (w0, w1,.., wn) randomly.
- Iterate through training set collecting
unclassified examples by current weights. - If all examples correctly classified (or up to an
acceptable threshold classified) then quit - else
- compute sum of misclassified examples, x,
- S S x, if failed to fire while it should
- S S - x, if fired while it shouldnt
- 4. Modify weights wt1 wt k S.
4Perceptron learning
Total input in w0 w1 x1 w2 x2
w1
w0
x1
In 0
x2
w2
w2
Decision surface
Learning locating proper decision surface
5To find decision surface use gradient descent
methods
E ( W1,W2) error function sum of distances of
unclassified input vectors from decision surface
E (W1,W2)
W1
W2
6Cannot do XOR problem
Can do it with multi layer
-1.5
1
-0.5
1
1.0
-9.0
x1
1.0
x1
1.0
1.0
x2
x2
Perceptron training doesnt work (Minsky-Papert)
7Back Propagation
1
1
Output
1 e Swij xj
0
Can get stuck in local minima .
Slow speed of learning
(Boltzman machine uses simulated annealing for
search and doesnt get stuck)
8o1
o2
oi
W11
W23
Wij
v1
v3
v2
vj
w11
w32
wjk
xk
Binary or continuous
x1
x2
Output of vj Vjm g( hjm) g(S wjk xkm)
k
Output of oi Oim g( him) g(S Wi j Vjm)
g(S Wij g(S w j k
xkm) )
j
j
k
2
zim -g(S Wij g(S wjk xkm) ) k
Error function E (w) ½ S
Continuous , differentiable. (zi is desired
output
g(h) 1 / (1 e 2h ) or tanh (h)
9To get weights
D Wij -h qE /qWij - h Szim- Oim g(him)
Vjm
m - h S dim Vjm
m where dim g(him) zim- Oim
10D wjk -h qE/q wjk - h S qE/q Vj m q Vj m
/q wjk -h Sm,i zim-Oim g(him) Wij
g(hjm) xkm-h Sm,i dim Wij g(hjm) xkm -h
Sm djm xkm where djm g(hjm) Si dim Wij
11- Judd , 1988
- Teaching nets is NP- Complete on of nodes.
- Possible Solution
- Use probabilistic algorithms to train them
(like GAs).
12Example back propagation network
- Input Vector 19 bits correspond to a person
- 1 0 1 0 1
0 1 . Etc. -
- Output vector 5 bits
- 0 1 0 0
0 - Training and testing data vectors
- 24-bits each
- 80-training vectors
- 20- testing vectors
Mets fan
Likes lemonade
From Chicago
From NY
Cubs fan
democrat
republican
Likes tennis
NY Jets fan
Bears fan
NY Yankees fan
White Sox fan
13- Possible rules from data
- People from Chicago are never fans of NY teams
- 50 of Cub fans are White Sox fans
- All Mets fans are Jets fans.
- Can train BP-network to learn such patterns.
- After training and testing, if we give as input
- 1010000 0 (characteristics of a person)
- We get output 0.44 0.88 0.23 0.03 0.02
- i.e. There is a small chance that person likes
Sox, high probability that likes Bears, no chance
that he likes Mets, Jets or tennis.
14- Input A Democrat Cubs fan
- Output 0.11 0.88 0.14 0.01 0.02
- ________________________________
- Input A Republican Cubs fan
- Output 0.77 0.92 0.13 0.05 0.02
- So, Republican cubs fans also like the Sox, white
Democrats do not. - __________________________________
- Input A Chicagoan who doesnt like Cubs
- Output 0.17 0.26 0.79 0.05 0.04
- He likes tennis
- ( Fuzzy rules)