Title: Intro. to Neural Networks
1Intro. to Neural Networks Using a Radial-Basis
Neural Network to Classify Mammograms
- Pattern Recognition 2nd Presentation
Mohammed Jirari - Spring 2003
2Neural Network History
- Originally hailed as a breakthrough in AI
- Biologically inspired information processing
systems (parallel architecture of animal brains
vs processing/memory abstraction of human
information processing) - Referred to as Connectionist Networks
- Now, better understood
- Hundreds of variants
- Less a model of the actual brain than a useful
tool - Numerous applications
- handwriting, face, speech recognition
- CMU van that drives itself
3Perceptrons
- Initial proposal of connectionist networks
- Rosenblatt, 50s and 60s
- Essentially a linear discriminant composed of
nodes, weights
4Perceptron Example
2(0.5) 1(0.3) -1 0.3 , O1
Learning Procedure
- Randomly assign weights (between 0-1)
- Present inputs from training data
- Get output O, nudge weights to gives results
toward our desired output T - Repeat stop when no errors, or enough epochs
completed
5Perception Training
Weights include Threshold. TDesired, OActual
output.
Example T0, O1, W10.5, W20.3, I12,
I21,Theta-1
6Perceptrons
- Can add learning rate to speed up the learning
process just multiply in with delta computation - Essentially a linear discriminant
- Perceptron theorem If a linear discriminant
exists that can separate the classes without
error, the training procedure is guaranteed to
find that line or plane.
7Strengths of Neural Networks
- Inherently Non-Linear
- Rely on generalized input-output mappings
- Provide confidence levels for solutions
- Efficient handling of contextual data
- Adaptable
- Great for changing environment
- Potential problem with spikes in the environment
8Strengths of Neural Networks
(continued)
- Can benefit from Neurobiological Research
- Uniform analysis and design
- Hardware implementable
- Speed
- Fault tolerance
9Hebbs Postulate of Learning
- The effectiveness of a variable synapse between
two neurons is increased by the repeated
activation of the neuron by the other across the
synapse - This postulate is often viewed as the basic
principal behind neural networks
10LMS Learning
LMS Least Mean Square learning Systems, more
general than the previous perceptron learning
rule. The concept is to minimize the total
error, as measured over all training examples, P.
O is the raw output, as calculated by
E.g. if we have two patterns and T11, O10.8,
T20, O20.5 then D(0.5)(1-0.8)2(0-0.5)2.145
We want to minimize the LMS
C-learning rate
E
W(old)
W(new)
W
11LMS Gradient Descent
- Using LMS, we want to minimize the error. We can
do this by finding the direction on the error
surface that most rapidly reduces the error rate
this is finding the slope of the error function
by taking the derivative. The approach is called
gradient descent (similar to hill climbing).
12Activation Function
- To apply the LMS learning rule, also known as the
delta rule, we need a differentiable activation
function.
Old
New
13LMS vs. Limiting Threshold
- With the new sigmoidal function that is
differentiable, we can apply the delta rule
toward learning. - Perceptron Method
- Forced output to 0 or 1, while LMS uses the net
output - Guaranteed to separate, if no error and is
linearly separable - Gradient Descent Method
- May oscillate and not converge
- May converge to wrong answer
- Will converge to some minimum even if the classes
are not linearly separable, unlike the earlier
perceptron training method
14Backpropagation Networks
- Attributed to Rumelhart and McClelland, late 70s
- To bypass the linear classification problem, we
can construct multilayer networks. Typically we
have fully connected, feedforward networks.
Input Layer
Output Layer
Hidden Layer
I1
O1
H1
I2
H2
O2
I3
Wj,k
1
Wi,j
1
1s - bias
15Backprop - Learning
Learning Procedure
- Randomly assign weights (between 0-1)
- Present inputs from training data, propagate to
outputs - Compute outputs O, adjust weights according to
the delta rule, backpropagating the errors. The
weights will be nudged closer so that the network
learns to give the desired output. - Repeat stop when no errors, or enough epochs
completed
16Backprop - Modifying Weights
We had computed
For the Output unit k, f(sum)O(k). For the
output units, this is
For the Hidden units (skipping some math), this
is
I
H
O
Wi,j
Wj,k
17Backprop
- Very powerful - can learn any function, given
enough hidden units! With enough hidden units, we
can generate any function. - Have the same problems of Generalization vs.
Memorization. With too many units, we will tend
to memorize the input and not generalize well.
Some schemes exist to prune the neural network. - Networks require extensive training, many
parameters to fiddle with. Can be extremely slow
to train. May also fall into local minima. - Inherently parallel algorithm, ideal for
multiprocessor hardware. - Despite the cons, a very powerful algorithm that
has seen widespread successful deployment.
18 Why This Project?
- Breast Cancer is the most common cancer and is
the second leading cause of cancer deaths - Mammographic screening reduces the mortality of
breast cancer - But, mammography has low positive predictive
value PPV (only 35 have malignancies) - Goal of Computer Aided Diagnosis CAD is to
provide a second reading, hence reducing the
false positive rate
19Data Used in my Project
- The dataset used is the Mammographic Image
Analysis Society (MIAS) MINIMIAS database
containing Medio-Lateral Oblique (MLO) views for
each breast for 161 patients for a total of 322
images. - Every image is
- 1024 pixels X 1024 pixels X 256
20Sample of Well-Defined/Circumscribed Masses
Mammogram
21Sample of a Normal Mammogram
22Sample of an Ill-Defined Masses Mammogram
23Sample of an Asymmetric Mammogram
24Sample of an Architecturally Distorted Mammogram
25Sample of a Spiculated Masses Mammogram
26Sample of a Calcification Mammogram
27Approach Followed
- Normalize all images between 0 and 1
- Normalize the features between 0 and 1
- Train the network
- Test on an image (Simulate the network)
- Denormalize the classification values
28Features Used to Train
- Character of background tissue
- Fatty, Fatty-Glandular, and Dense-Glandular
- Severity of abnormality
- Benign or Malignant
- Class of abnormality present
- Calcification, Well-Defined/Circumscribed Masses,
Spiculated Masses, Other/Ill-Defined Masses,
Architectural Distortion, Asymmetry, and Normal
29Radial Basis Network Used
- Radial basis networks may require more neurons
than standard feed-forward backpropagation FFBP
networks - BUT, can be designed in a fraction of the time to
train FFBP - Work best with many training vectors
30Radial Basis Network with R Inputs
31radbas(n)e-(n2)
aradbas(n)
32Radial basis network consists of 2 layers a
hidden radial basis layer of S1 neurons and an
output linear layer of S2 neurons
33Results and Future Work
- The network was able to correctly classify 55 of
the mammograms - I will use more pre-processing including
sub-sampling, segmentation, and statistical
features extracted from the images, as well as
the coordinates of the center of abnormality and
approximate radius of a circle enclosing the
abnormality. - I will use different networks like fuzzy ARTMAP
network, self-organizing network, cellular
networks and compare their results in designing a
good CAD.