Title: Neural networks
1Neural networks Hands on
- Delta rule and Backpropagation algorithm
- MetaNeural format for predictive data mining
- Iris Data
- Magnetocardiogram data
2Neural net yields weights to map inputs to
outputs
Neural Network
?
Molecular weight
w11
h
w11
?
?
Boiling Point
H-bonding
?
?
Biological response
Hydrofobicity
?
h
Electrostatic interactions
w23
?
w34
Observable Projection
Molecular Descriptor
There are many algorithms that can determine the
weights for ANNs
RENSSELAER
3McCulloch-Pitts neuron
RENSSELAER
4Neural network as collection of M-P neurons
RENSSELAER
5Standard Data Mining Terminology
- Basic Terminology
- - MetaNeural Format
- - Descriptors, features, response (or activity)
and ID - - Classification versus regression
- - Modeling/Feature detection
- - Training/Validation/Calibration
- - Vertical and horizontal view of data
- Outliers, rare events and minority classes
- Data Preparation
- - Data cleansing
- - Scaling
- Leave-one-out and leave-several-out validation
- Confusion matrix and ROC curves
6Standard Data Mining Terminology
- Basic Terminology
- - MetaNeural Format
- - Descriptors, features, response (or activity)
and ID - - Classification versus regression
- - Modeling/Feature detection
- - Training/Validation/Calibration
- - Vertical and horizontal view of data
- Outliers, rare events and minority classes
- Data Preparation
- - Data cleansing
- - Scaling
- Leave-one-out and leave-several-out validation
- Confusion matrix and ROC curves
7TERMINOLOGY
- Standard Data Mining Problem
- Header and Data
- MetaNeural Format
- - descriptors and/or features
- - response (or activity to predict)
- - pattern ID
- - data matrix
- Validation/Calibration
- Training/Validation/Test Set
Demo iris_view.bat
8(No Transcript)
9UC URVINE DATA REPOSITORY
Datafile Name Fisher's Iris Datafile Subjects
Agriculture , Famous datasets Description This
is a dataset made famous by Fisher, who used it
to illustrate principles of discriminant
analysis. It contains 6 variables with 150
observations. Reference Fisher, R. A. (1936).
The Use of Multiple Measurements in Axonomic
Problems. Annals of Eugenics 7, 179-188. Story
Names Fisher's Irises Authorization free use
Number of cases 150 Variable Names
1.Species_No Flower species as a code
2.Species_Name Species name 3.Petal_Width
Petal Width 4.Petal_Length Petal Length
5.Sepal_Width Sepal Width 6.Sepal_Length
Sepal Length
10(No Transcript)
11 ANALYZE code has neural networks modules
built-in Either run analyze
root.pat 4331 (single training and testing)
analyze root.pat 4332 (LOO)
analyze root.txt 4333 (bootstrap mode)
Results for analyze are in resultss.xxx and
resultss.ttt Note that patterns have to be
properly scaled first The file name meta
overrides the default input file for analyze
12Neural Network Module in Analyze Code
ROOT ROOT.PAT ROOT.TES (ROOT.WGT) (ROOT.FWT) (ROOT
.DBD)
- Use Analyze root 4331 for easy way
- (the file meta let you override defaults)
Analyze
resultss.XXX resultss.TTT ROOT.TRN (ROOT.DBD) ROOT
.WGT ROOT.FWT
13MetaNeural Input File for the ROOT
Generating and Scaling Data
4 gt 4 layers 2 gt 2 inputs 16
gt hidden neurons in layer 1 4 gt
hidden neurons in layer 2 1 gt
outputs 300 gt epoch length (hintalways use 1,
for the entire batch) 0.01 gt learning parameters
by weight layer (hint 1/ patterns or 1/
epochs) 0.01 0.01 0.5 gt momentum parameters
by weight layer (hint use 0.5) 0.5 0.5 10000000
gt some very large number of training epochs
200 gt error display refresh rate 1
gtsigmoid transfer function 1
gt Temperature of sigmoid check.pat
gt name of file with training patterns (test
patterns in root.tes) 0 gt not
used (legacy entry) 100 gt not used
(legacy entry) 0.02000 gt exit training if
error lt 0.02 0 gt initial weights
from a flat random distribution 0.2
gt initial random weights all fall between 2 and
2
14Generating and Scaling Iris Data
15Run Neural Net for Iris Data