Title: Programming = assign proper weights
1- Programming assign proper weights
- Learning
u1
Wi,1
Wi,2
u2
ui
Wi,n
un
n
Oi f ( S Wj i uj )
Activation
j1
f transfer function (usually non-linear
threshold function, sigmoid, hyperbolic tangent,
etc.)
2Perceptron
Rosenblatt, 1962, Minsky, 1969
W1
X1
W2
Oi
X2
Wn
Xn
1, if S Wi xi gt q
Activation Oi
0, if S Wi xilt q
W0
-q
1
W1
Oi
X1
1, if S Wi xi gt 0
Activation Oi
Wn
Xn
0, if S Wi xilt 0
3Algorithm for Perceptron Learning
- Initialize (w0, w1,.., wn) randomly.
- Iterate through training set collecting
unclassified examples by current weights. - If all examples correctly classified (or up to an
acceptable threshold classified) then quit - else
- compute sum of misclassified examples, x,
- S S x, if failed to fire while it should
- S S - x, if fired while it shouldnt
- 4. Modify weights wt1 wt k S.
4Perceptron learning
Total input in w0 w1 x1 w2 x2
w1
w0
x1
In 0
x2
w2
w2
Decision surface
Learning locating proper decision surface
5To find decision surface use gradient descent
methods
E ( W1,W2) error function sum of distances of
unclassified input vectors from decision surface
E (W1,W2)
W1
W2
6Cannot do XOR problem
Can do it with multi layer
-1.5
1
-0.5
1
1.0
-9.0
x1
1.0
x1
1.0
1.0
x2
x2
Perceptron training doesnt work (Minsky-Papert)
7Back Propagation
1
1
Output
1 e Swij xj
0
Can get stuck in local minima .
Slow speed of learning
(Boltzman machine uses simulated annealing for
search and doesnt get stuck)
8o1
o2
oi
W11
W23
Wij
v1
v3
v2
vj
w11
w32
wjk
zk
Binary or continuous
?1
?2
Output of vj Vjm g( hjm) g(S wjk zkm)
k
Output of oi Oim g( him) g(S Wi j Vjm)
g(S Wij g(S w j k
zkm) )
j
j
k
2
Error function E (w) ½ S
zim -g(S Wij g(S wjk zkm) )
Continuous , differentiable.
g(h) 1 / (1 e 2h ) or tanh (h)
9To get weights
D Wij -n q
10- Judd , 1988
- Teaching nets is NP- Complete on of nodes.
- Possible Solution
- Use probabilistic algorithms to train them
(like GAs).
11Example back propagation network
- Input Vector 19 bits correspond to a person
- 1 0 1 0 1
0 1 . Etc. -
- Output vector 5 bits
- 0 1 0 0
0 - Training and testing data vectors
- 24-bits each
- 80-training
- 20- testing
Mets fan
Likes lemonade
From chicago
From NY
Cubs fan
Democrat
republican
NY Jets fan
Likes tennis
Beans fan
NY yankees fan
White sox fan
12- Possible rules from data
- People from Chicago are never fans of NY teams
- 50 of Cub fans are White Sox fans
- All Mets fans are Jets fans.
- Can train BP-network to learn such patterns.
- After training , testing given an input
- 1010000 0
- We get output 0.44 0.88 0.23 0.03 0.02
- i.e. There is a small chance that person likes
Sox, high probability that likes Beans, no chance
that he likes Mets, Jets or tennis.
13- Input A Democrat Cubs fan
- Output 0.11 0.88 0.14 0.01 0.02
- Input A Republican Cubs fan
- Output 0.77 o.92 0.13 0.05 0.02
- So, Republican cubs fans also like the sox white
Democrats do not. - Input A Chicagoan who doesnt like Cubs
- Output 0.17 0.26 0.79 0.05 0.04
- He likes tennis
- ( Fuzzy rules)