Title: ??????? V1 ????? ???
1?????? ???????? ?? ????? ???? ?????? ???? ????
??? ?????? ????? ????? ??????? ?"? ???? ?????
2?????? ???????? ?? ????? ????? ???????.
3??????? ???????
??????? ??????? ???????
?????
????? ??????? ?? ????? ???? ??????
4??????? ???????
??????? ??????? ???????
?????
??????? V1 ????? ??? ???? "??? ?????????" ??
??? ??????.
?? 3
?? 2
?? 1
5????? ???????? ?????? ?? "???? ???" Receptive
Feilds (RF)
6????? ??????? ??????? ??????? ??????
?????? ??????? (Receptive Field)
7????? ????? ?????????? ??? ????? ?? ??????
?????? ??????
8????? ??????? ??????? ??????? ?????? ??????
???????, ???? ????????? ??????
???? ??????? ?????
?????? ???
????? ?????? (??????)
9?? ?????? ??????? ??????? "?????" ????? ??????
???? ?????? ??????? ???????
10????? ??????? ??????? ??????? ??????
?????? ??????? (Receptive Field)
???
11??? ?????? ??????? ??????? ??????? - ?????
????? ????? ??????
??? ????? ????? ???
12??? ?????? ??????? ??????? ??????? - ??????
????????
13????? ????? ????
- ???? ??????? ????? ???????????
- ??? ???? ?????? ???? ?? ?? ???? ?? ????? ?????
???? ?????? ??????? ???????? - ?? ???????? ??? ????? ?????????? ???? ????? ????.
14ICAIndependent Component Analysis
- ????? ????? ?? ?????????
- ???? ??? ????? ?????? ?? N ??????, ???? ????
??????? ?????? ????????? - ??????? ?? ???? ?????? ???????? (?? ???????? ???
???? ???? ?? ?? ???? ????????(
15ICA Continued
- ????? ?????? ????? ?? ???? ????? ??????
- X1, X2
- ???? ?? ?????? ?????? ???? ?????? ???????
- ?????? ?? ?? ????? ???? ??? ??? .1
- 2. ?????? ?? ?? ????? ???? ??? ?????.
- 3. ?????????? ?? ?? ????? ???? (?????? ??????)
??? ?????. - 4. ???????? ?? ???? ????? ???? ???? ????
?????????? ???? ???? - ??? ???????? ?? ???? ????? ????? ?????
- N (1,0)
-
16ICA Continued
- ???? ?????? ??? ??? ??? ????? ?? ???? ?? ??
??????? ??? ?????? ???? ??? ????? ????????
???????? (?? ??????? ?????, ?????? ???? ?????
??????? ???????, ??? ???? ??????? ???? ????
?????? ??? ??????? ???????) .
17(No Transcript)
18Solving Problems Using ICA
19Deciding Who are the sources and how to derive
the output
Step 1
20In our case each neuron in the input layer is a
pixel in the picture
Each neuron in the output receives all the pixels
of the picture.
21Overcomplete representation
Input from the retina
The visual cortex
SgtX
S can be viewed as the independent sources that
cause the reaction on the retina. X is that
reaction and S is a reconstruction of the sources
that caused this reaction
22Natural scene
12 X 12 patch
23Row i contains the values of pixel i in each
patch. Number of rows equals to patch size, i.e.
144 in our case.
After the pictures were cutted into 12x12
patches, We have the X matrix which represent
the training data
Column j is all of the pixel values of the j
patch. Number of columns equals the number of
patches, i.e. 15000 in our case.
24Deriving the output
The function above calculates the output of
neuron i when presented with patch k.
25The input output function of each neuron is a non
linear function
Squashing the output sensitivity, to the range of
input values, into a plausible biological
behavior.
26matrix W represents the connection between the
input neurons and output neurons
Row i represents N (number of input neurons)
weights on the i output neuron.
Column j represent M (number of output neurons)
weights from the j input neuron
27Overview to so far
28Step 2
29Why do we preprocess the data ?
- Adhering to the conditions of the strong central
theorem, if not fully then at least some of them.
- Reducing the dimensions those serving two
purposes - 1. Easier to compute
- 2. We can decide the network input and
output size.
30Methods of preprocessing
- Centering (reducing the mean from the data)
- PCA
At least one condition is met. Also calculating
the correlation matrix ,between the pixels,
becomes easier ltXX'gt
reducing the dimensions of the data and getting
rid from the second order statistics
31After Preprocessing
- Data is called whitened.
- The mean of the data is zero.
- The Pixels have no correlation.
- We reduced the data dimension from 144 to 100.
- In order to reconstruct filters later, some
processing is done.
32Step 3
- Cost function and the learning rule
33ICA Again
- ICA can be implemented in several ways. The main
difference between the ways are the method that
is being used to estimate how much is the output
distribution normal.
34Infomax approach
The purpose of the learning process is to
improve the information representation in the
output layer Using Mathematical methods taken
from information theory, information can be
quantified, and algorithms aimed to improve the
NN information representation, through changing
the weights, could be developed. We assume that
the brain is using similar methods to better
represent information
35Three Important Information theory quantities
Information defined as -log(p(x)). The more
rare the appearance of a given value the more
information it carries.
- Entropy the mean value of a given random
variable information
Mutual Information The amount of uncertainty of
a given variable Y which is resolved by observing
X.
36Infomax basic assumptions
Assumption
Intuition
- activity of one neuron of the output layer
shouldnt give information about the activity of
the other neurons
- Minimize mutual information between neurons of
the output layer - Maximize mutual information between the input and
the output - The noise level is fixed, so its effect on the
entropy of the output layer
- different reaction in the output layer for each
input pattern and consentience for the pattern
- only the entropy of the output plays an effect
on the total MI between the layers
37I\O layers Mutual information
Mutual information between the input and the
output layers depends on the entropy of the
output layer, because s value is a function of x.
We Want to Maximize H(s)
38Output Layer Mutual information
- From the equation above (although for discrete
values) we can see that if the neurons are
statistically independent then the log in the
expression becomes zero and the mutual
information is zero - We want to minimize MI in output layer
39Estimating output distribution
- After long and painful math we derive the
mentioned above expression as the estimation to s
distribution.
Chi is called the susceptibility matrix.
40Entropy of output estimation
- because we dont use the explicit equation of s
entropy
The integral solution. H(x) is considered as zero
or constant
Estimation of P(s)
For the same reason as previously mentioned
41The cost function
Sum of each changes in Si in power of 2 according
to the input.
The minus sign takes care for the value of the
error to the decrease as the value on the right
increases
42Geometrical Interpretation of the Cost function
Geometrically, the target is to maximize the
volume change in the transformation. This
improves discrimination. And increases the mutual
information between the input and the output
43Learning Rules
Using gradient descent method we define learning
rules (how to change the W in response for a
given set of outputs) .
The rate of learning
Derivatives of the cost function by W
44Step 4
- Writing the simulation in matlab and getting
results
45??????? ???????
?????? ????? ???????? ?????? ????? ???? ?? ?????
?????? ???????? ?? ????? ????? ??????? ????.
46????? ??????? ?????? ?????
47????????? ?? ????? ??????? ??????? ??????
??????????
Number Of cells
angle (Rad)
48?? ?????
- ????? ????????.
- ????? ???? ????.
- ?????? ?? ????? ??????? ???? ????, ?????
?????????? ???? ?? "??????". - ????? ??????? ?? ?????? ????
- ??????? ???????? ?????? ?????? ???? ????? ????
?????? ??????????? ?????? ???, ????????????,
?????? ???????????.
49?????? ?? ????? ??????
?????? ???????? ???????????. ?????? ?? ???
????? ????? ??????????, ????? ????.
50 When we will discover the laws underlying
natural computation . . .
. . . we will finally understand the
nature of computation itself.
--J. von Neumann
51"?????? ?? ?????? ????...
...???? ?? ??? ??????".
--J. von Neumann
52???? ??? ??"? ???? ?????
???? ?? ??????
53?????? ??? PCA ?-ICA