Neural Network Training - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Neural Network Training

Description:

You can break up the hydrophobicity scale into ranges, and assign amino acids on ... to represent each residue by a real number (e.g. hydrophobicity index) ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 31
Provided by: randyz
Category:

less

Transcript and Presenter's Notes

Title: Neural Network Training


1
Lecture 4
  • Neural Network Training Applications
  • Chapts. 6-9
  • Wu McLarty

2
Input/Output Design
  • Choices of methods for encoding inputs and
    decoding outputs drive the design of a neural
    network application.
  • Training against amino acid sequence data is
    perhaps the most complex challenge, as
    applications may involve coding both sequence and
    physicochemical properties.

3
Amino Acid Properties
  • Variation in amino acid properties is found in
    the twenty different sidechains.
  • The sidechains vary in
  • Size, ranging from 1 atom (Glycine) to 16 atoms
    (Tryptophan, Arginine), volume from 60.1 A3
    (Glycine) to 227.8 A3 (Tryptophan).
  • Polarity, ranging from hydrophobic
    (Phenylalanine, Valine, etc.), to neutral polar
    (Serine, Threonine, Tyrosine, etc.) to ionized
    (Glutamate, Lysine, etc.)

4
Protein Folding
  • The varying properties of a the sidechains
    support the phenomenon called protein folding
    some sequences can adopt stable three-dimensional
    conformations called folds. The fold lends the
    sequence a distinct 3D shape, which may include
    functionally-important features such as crevices
    (which may be binding sites for smaller ligand
    molecules).

5
Describing Protein Conformation
http//swissmodel.expasy.org/course/text/chapter1.
htm
6
Some Principles of Protein Folding
  • The fold will tend to sequester hydrophobic
    sidechains from solvent, while keeping polar
    sidechains well-solvated.
  • Hydrophobic sidechains cluster to form the
    hydrophobic core
  • Backbone hydrogen bond donors and acceptors often
    become partly or fully desolvated upon folding
    this potential free-energy penalty is avoided by
    the preference to form regular secondary
    structures (local folds) with regular patterns
    of hydrogen bonding. These structures compensate
    for the loss of hydrogen bonds to solvent.

7
Categories of secondary (local) structure
http//swissmodel.expasy.org/course/text/chapter1.
htm
8
Alpha Helix
9
Beta Sheet
10
Beta Sheet Topologies
11
Antiparallel Example
12
Turns
13
Secondary Structure Propensity
  • A given amino acid residue is more or less
    compatible with a particular secondary
    structure class.
  • AAs in turns tend to be small and possibly polar.
  • Proline is excluded from the interior of helical
    segments, may be found at the ends
  • Leucine often found as interior resides in
    helices.
  • Propensity can be represented using numerical
    indices calculated from statistics determined for
    experimental protein structures (e.g. Chou-Fasman
    rules).

14
What Properties to Encode?
  • One can work directly with the 20 amino acids
    use a 20-character alphabet directly for input.
  • One can use physicochemical properties to
    represent an amino acid for example, a separate
    index representing the hydrophobicity of each
    residue in a window, using a particular scale
    (e.g. Kyte-Doolittle, Engelman, Eisenberg). These
    are all coded as real numbers.
  • You can break up the hydrophobicity scale into
    ranges, and assign amino acids on this basis.
  • Or, use a reduced alphabet, based on similar
    physical properties, or statistical propensity to
    cross-substitute under selective pressure (as
    expressed in the PAM matrices).

15
Reduced Alphabets
16
Sequence Encoding
  • Direct Encoding - code each sequence character as
    a vector
  • Maintains relative positions of sequence
    characters, no loss of information
  • Drawback forced to scan the sequence with a
    window of fixed length
  • Usually use an indicator vector, a string of 0s
    and a single 1, which specifies the identity of
    the residue
  • Also possible to use binary numbers (e.g. A00,
    T01, G10, C11), although this denser
    representation appears not to work as well as the
    indicator approach.
  • Also possible to represent each residue by a real
    number (e.g. hydrophobicity index)

17
  • Indirect Encoding - Use the entire sequence to
    generate the input
  • To include maximal information while restricting
    the size of the input, use n-gram hashing.
  • Assign each residue an identity using a selected
    alphabet of length M.
  • Sliding a window of length n across the sequence,
    and count the number of occurrences of each
    n-tuple.
  • Input is of size Mn, where each input corresponds
    to a possible n-tuple. The magnitude of each
    input is the count accumulated for the
    corresponding n-tuple.
  • Different kinds of measures can be combined for
    example, an n-gram hash vector along with an
    additional input that measures average
    hydrophobic index over the entire sequence.

18
Input Trimming
  • In general, it is desirable to limit the size of
    the input.
  • Smaller networks tend to generalize better
  • Smaller networks are faster to train
  • If inputs are correlated, they can be combined
    into a aggregate descriptor. There are a number
    of statistical methods to approach this,
    including
  • Principle Component Analysis (PCA)
  • Singular Value Decomposition (SVD)
  • Partial Least Square regression (PLS)

19
Output
  • The simplest element to design
  • If N categories are being simultaneously
    predicted, there will need to be N outputs.
  • A yes/no classification can be made for each
    category using a threshold function, or the
    output strengths can be used directly as measures
    of confidence.

20
Network Design
  • As a general principle, networks should be kept
    as small as possible, in terms of both numbers of
    units and connections. Smaller networks are less
    prone to overfitting, and thus generalize
    better.
  • In network growing, the number of units in a
    hidden layer is steadily increased, retraining at
    each step, until the optimum performance of the
    network is realized.
  • In network pruning, one begins with a large
    network with good performance, and then applies
    an automated method to identify connections with
    small weights, and neurons with low activation.
    Connections and neurons can then be culled,
    reducing the size of the network, and presumably
    its capacity to generalize.

21
Network Training
  • Learning rate can have a big impact on how
    quickly an optimum set of parameters is found. A
    poor choice of rate may make it impossible to
    achieve convergence.
  • The backpropagation algorithm is sensitive to the
    initial weights in some cases, convergence may
    depend on the initial conditions.
  • In general, it is not necessary or desirable to
    train a network until the error function reaches
    a minimum. The network may generalize better is
    training is stopped short of the minimum, by
    application of a user-specified tolerance.

22
Training/Validation Sets
  • The most critical component of constructing a
    neural network application
  • Training is hampered by uneven representation of
    categories to be recognized, and by incorrect
    annotation (e.g. bad examples).
  • Often negative examples outnumber positives by
    orders of magnitude one often proceeds by
    generating negatives that are randomized or noisy
    versions of positive examples, thus maintaining a
    balance between these two sets.

23
Evaluation
  • In cases where we assign outputs to discrete
    categories, we can apply a number of standardized
    measures of performance. These require that we
    count up, for a given validation set,
  • True Positives (TP) examples that belong in the
    positive category, and are correctly assigned.
  • False Positives (FP) negative examples that are
    incorrectly assigned as positive.
  • True Negatives (TN) examples that belong in the
    negative category, and are correctly assigned.
  • False Negatives (FN) positive examples that are
    incorrectly assigned as negative.
  • Total Examples TP FP TN FN

24
Evaluation Measures
  • Sensitivity (correctly assigning the
    positives)
  • Specificity (correctly assigning the negatives)

25
  • Positive Predictive Value (probability that a
    positive assignment is correct)
  • Negative Predictive Value (probability that a
    negative assignment is correct)

26
  • Accuracy (Probability of correct assignment,
    positive or negative)

27
Chapters 9-11
  • These chapters provide a literature survey of
    applications of neural network methods to
    sequence analysis and protein structure
    prediction.
  • Its very valuable resource.

28
Loops, etc. in Java
  • Lewis Loftus,
  • Chapt. 5

29
Loops, etc.
  • Its just like in C. We will not cover this in
    any detail. The only interesting extras
  • Be aware that Java often uses boolean variables
    to represent truth values. Recognize the special
    symbols true 1 false 0
  • Be aware that the String class provides the
    equals() method for comparing two strings,
    character-by-character.

30
But, an interesting graphics example!
  • Chapter 5 also provides some more examples of GUI
    design. The most important topic presented has to
    do with how an ActionListener distinguishes
    events generated by different objects.
  • We will look at the program in Listing 5.24,
    QuoteOptions.
Write a Comment
User Comments (0)
About PowerShow.com