The Perceptron

About This Presentation

Title:

The Perceptron

Description:

However, one can try to fit it with a sigmoid function. ... Based on the book by Bishop (chapter 3) The sigmoid as a logistic determinant ... – PowerPoint PPT presentation

Number of Views:62

Avg rating:3.0/5.0

Slides: 41

Provided by: wdw

Category:

more less

Transcript and Presenter's Notes

Title: The Perceptron

1
The Perceptron

Single neuron as a feed-forward network
Serves as a classifier

2
(No Transcript)
3
(No Transcript)
4
(No Transcript)
5
(No Transcript)
6
(No Transcript)
7
(No Transcript)
8
(No Transcript)
9
(No Transcript)
10

XOR can be solved by a more complex network with
hidden units

Threshold 1
11
(No Transcript)
12
(No Transcript)
13
(No Transcript)
14
(No Transcript)
15
(No Transcript)
16
(No Transcript)
17
(No Transcript)
18
(No Transcript)
19
(No Transcript)
20
Gradient descent method could be thought
of as a ball rolling down from a hill
the ball will roll down and finally
stop at the valley
21

Gradient direction is the direction of uphill
for example, in the Figure, at
position 0.4, the
gradient is uphill ( F is E,
consider one dim case )

F
F(0.4)
22

Gradient direction is the direction of uphill
In gradient descent algorithm, we have
w(t1) w(t) F(w(t))
h(t)
therefore the ball goes downhill
since F(w(t))
is downhill direction

w(t)
23

Gradient direction is the direction of uphill
In gradient descent algorithm, we have
w(t1) w(t) F(w(t))
h(t)
therefore the ball goes downhill
since F(w(t))
is downhill direction

Gradient direction is the direction of uphill
In gradient descent algorithm, we have
w(t1) w(t) F(w(t))
h(t)
therefore the ball goes downhill
since F(w(t))
is downhill direction
Gradually the ball will stop at a local
minima where
the gradient is zero

25
(No Transcript)
26
(No Transcript)
27
(No Transcript)
28
Another use of the sigmoid function

Suppose data are somewhat contradictory, e.g.
there exist two classes of data, corresponding to
labels of 1 and -1, yet they overlap in their
locations.
Clearly one cannot try to fit such a situation
with a binary perceptron. However, one can try to
fit it with a sigmoid function. This turns out to
coincide with the solution that is favored by a
probability-theory approach.
In the following example we consider data
distributions p(xC1) and p(xC2) that are
somewhat overlapping. The question one tries to
answer is what is p(C1x) using Bayes equality
p(C1x)p(xC1)p(C1)/p(x).

29
The sigmoid as a logistic determinant
Neural Computation with Artificial Neural
Networks, USC 2005. Based on the book by Bishop
(chapter 3)
30
(No Transcript)
31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
(No Transcript)
35
(No Transcript)
36
(No Transcript)
37
(No Transcript)
38
(No Transcript)
39
(No Transcript)
40
(No Transcript)

Write a Comment

User Comments (0)