The Perceptron - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

The Perceptron

Description:

However, one can try to fit it with a sigmoid function. ... Based on the book by Bishop (chapter 3) The sigmoid as a logistic determinant ... – PowerPoint PPT presentation

Number of Views:62
Avg rating:3.0/5.0
Slides: 41
Provided by: wdw
Category:

less

Transcript and Presenter's Notes

Title: The Perceptron


1
The Perceptron
  • Single neuron as a feed-forward network
  • Serves as a classifier

2
(No Transcript)
3
(No Transcript)
4
(No Transcript)
5
(No Transcript)
6
(No Transcript)
7
(No Transcript)
8
(No Transcript)
9
(No Transcript)
10
  • XOR can be solved by a more complex network with
    hidden units

Threshold 1
11
(No Transcript)
12
(No Transcript)
13
(No Transcript)
14
(No Transcript)
15
(No Transcript)
16
(No Transcript)
17
(No Transcript)
18
(No Transcript)
19
(No Transcript)
20
Gradient descent method could be thought
of as a ball rolling down from a hill
the ball will roll down and finally
stop at the valley
21
  • Gradient direction is the direction of uphill
  • for example, in the Figure, at
    position 0.4, the
  • gradient is uphill ( F is E,
    consider one dim case )

F
F(0.4)
22
  • Gradient direction is the direction of uphill
  • In gradient descent algorithm, we have
  • w(t1) w(t) F(w(t))
    h(t)
  • therefore the ball goes downhill
    since F(w(t))
  • is downhill direction

w(t)
23
  • Gradient direction is the direction of uphill
  • In gradient descent algorithm, we have
  • w(t1) w(t) F(w(t))
    h(t)
  • therefore the ball goes downhill
    since F(w(t))
  • is downhill direction

24
  • Gradient direction is the direction of uphill
  • In gradient descent algorithm, we have
  • w(t1) w(t) F(w(t))
    h(t)
  • therefore the ball goes downhill
    since F(w(t))
  • is downhill direction
  • Gradually the ball will stop at a local
    minima where
  • the gradient is zero

25
(No Transcript)
26
(No Transcript)
27
(No Transcript)
28
Another use of the sigmoid function
  • Suppose data are somewhat contradictory, e.g.
    there exist two classes of data, corresponding to
    labels of 1 and -1, yet they overlap in their
    locations.
  • Clearly one cannot try to fit such a situation
    with a binary perceptron. However, one can try to
    fit it with a sigmoid function. This turns out to
    coincide with the solution that is favored by a
    probability-theory approach.
  • In the following example we consider data
    distributions p(xC1) and p(xC2) that are
    somewhat overlapping. The question one tries to
    answer is what is p(C1x) using Bayes equality
    p(C1x)p(xC1)p(C1)/p(x).

29
The sigmoid as a logistic determinant
Neural Computation with Artificial Neural
Networks, USC 2005. Based on the book by Bishop
(chapter 3)
30
(No Transcript)
31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
(No Transcript)
35
(No Transcript)
36
(No Transcript)
37
(No Transcript)
38
(No Transcript)
39
(No Transcript)
40
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com