Modular Neural Networks - PowerPoint PPT Presentation

1 / 65
About This Presentation
Title:

Modular Neural Networks

Description:

Complexity of learning (divide and conquer) Training of complex network (many layers) ... Kirsch masks. 4 16 x 16. feature maps. 4 8 x 8 matrix. input number. 16 x 16 ... – PowerPoint PPT presentation

Number of Views:267
Avg rating:3.0/5.0
Slides: 66
Provided by: gaborh
Category:

less

Transcript and Presenter's Notes

Title: Modular Neural Networks


1
  • Modular Neural Networks

G. Horváth
Department of Measurement and Information Systems
2
Modular networks
  • Why modular approach
  • Motivations
  • Biological
  • Learning
  • Computational
  • Implementation

3
Motivations
  • Biological
  • Biological systems are not homogenous
  • Functional specialization
  • Fault tolerance
  • Cooperation, competition
  • Scalability
  • Extendibility

4
Motivations
  • Complexity of learning (divide and conquer)
  • Training of complex network (many layers)
  • layer by layer learning
  • Speed of learning
  • Catastrophic interference, incremental learning
  • Mixing supervised and unsupervised learning
  • Hierarchical knowledge structure

5
Motivations
  • Computational
  • The capacity of a network
  • The size of the network
  • Catastrophic interference
  • Generalization capability vs network complexity

6
Motivations
  • Implementation (hardware)
  • The degree of parallelism
  • Number of connections
  • The length of physical connections
  • Fan out

7
Modular networks
  • What modules
  • The modules are disagree on some inputs
  • every module solves the same, whole problem,
  • different ways of solutions (different modules)
  • every module solves different tasks (sub-tasks)
  • task decomposition (input space, output space)

8
Modular networks
  • How combine modules
  • Cooperative modules
  • simple average
  • weighted average (fixed weights)
  • optimal linear combination (OLC) of networks
  • Competitive modules
  • majority vote
  • winner takes all
  • Competitive/cooperative modules
  • weighted average (input-dependent weights)
  • mixture of experts (MOE)

9
Modular networks
  • Construct of modular networks
  • Task decomposition, subtask definition
  • Training modules for solving subtasks
  • Integration of the results
  • (cooperation and/or competition)

10
Modular networks
  • Cooperative networks
  • Ensemble (average)
  • Optimal linear combination of networks
  • Disjoint subtasks
  • Competitive networks
  • Ensemble (vote)
  • Competitive/cooperative networks
  • Mixture of experts

11
Cooperative networks
  • Ensemble of cooperating networks
    (classification/regression)
  • The motivation
  • Heuristic explanation
  • Different experts together can solve a problem
    better
  • Complementary knowledge
  • Mathematical justification
  • Accurate and diverse modules

12
Ensemble of networks
  • Mathematical justification
  • Ensemble output
  • Ambiguity (diversity)
  • Individual error
  • Ensemble error
  • Constraint

13
Ensemble of networks
  • Mathematical justification (contd)
  • Weighted error
  • Weighted diversity
  • Ensemble error
  • Averaging over the input distribution
  • Solution Ensemble of accurate and diverse
    networks

14
Ensemble of networks
  • How to get accurate and diverse networks
  • different structures more than one network
    structure (e.g. MLP, RBF, CCN, etc.)
  • different size, different complexity networks
    (number of hidden units, number of layers,
    nonlinear function, etc.)
  • different learning strategies (BP, CG, random
    search,etc.) batch learning, sequential learning
  • different training algorithms, sample order,
    learning samples
  • different training parameters
  • different starting parameter values
  • different stopping criteria

15
Linear combination of networks
16
Linear combination of networks
  • Computation of optimal coefficients
  •        ? simple average
  •         , k depends on
    the input for different input domains different
    network (alone gives the output)
  • optimal values using the constraint
  • optimal values without any constraint
  •   Wiener-Hopf equation

17
Task decomposition
  • Decomposition related to learning
  • before learning (subtask definition)
  • during learning (automatic task decomposition)
  • Problem space decomposition
  • input space (input space clustering, definition
    of different input regions)
  • output space (desired response)

18
Task decomposition
  • Decomposition into separate subproblems
  • K-class classification K two-class
    problems (coarse decomposition)
  • Complex two-class problems smaller
    two-class problems (fine decomposition)
  • Integration (module combination)

19
Task decomposition
  • A 3-class problem

20
Task decomposition
  • 3 classes

2 small classes
2 small classes
21
Task decomposition
  • 3 classes

2 small classes
2 small classes
2 classes
22
Task decomposition
  • 3 classes

2 small classes
2 small classes
23
Task decomposition
24
Task decomposition
  • A two-class problem
  • decomposed into
  • subtasks

25
Task decomposition
M22
M21
  • M11

M12
AND
AND
OR
26
Task decomposition
  • M11

MIN
  • M12

C1
MAX
Input
M21
MIN
M22
27
Task decomposition
  • Training set decomposition
  • Original training set
  • Training set for each of the (K) two-class
    problems
  • Each of the two-class problems are divided into
    K-1 smaller two-class problems using an inverter
    module really (K-1)/2 is enough

28
Task decomposition
  • A practical example Zip code recognition

29
Task decomposition
  • Zip code recognition (handwritten character
    recognition) modular solution

30
Mixture of Experts (MOE)
µ
S
g1
g2
gM
Gating network
?M
µ1
Expert M
Expert 2
Expert 1
x
31
Mixture of Experts (MOE)
  • The output is the weighted sum of the outputs of
    the experts
  • is the parameter of the i-th expert
  • The output of the gating network softmax
    function
  • is the parameter of the gating network


32
Mixture of Experts (MOE)
  • Probabilistic interpretation
  • The probabilistic model with true parameters
  • a priori probability

33
Mixture of Experts (MOE)
  • Training
  • Training data
  • Probability of generating output from the input
  • The log likelihood function (maximum likelihood
    estimation)

34
Mixture of Experts (MOE)
  • Training (contd)
  • Gradient method
  • The parameter of the expert network
  • The parameter of the gating network

and
35
Mixture of Experts (MOE)
  • Training (contd)
  • A priori probability
  • A posteriori probability

36
Mixture of Experts (MOE)
  • Training (contd)
  • EM (Expectation Maximization) algorithm
  • A general iterative technique for maximum
    likelihood estimation
  • Introducing hidden variables
  • Defining a log likelihood function
  • Two steps
  • Expectation of the hidden variables
  • Maximization of the log likelihood function

37
EM (Expectation Maximization) algorithm
  • A simple example estimating means of k (2)
    Gaussians

38
EM (Expectation Maximization) algorithm
  • A simple example estimating means of k (2)
    Gaussians
  • hidden variables for every observation,
  • (x(l), zi1, zi2)
  • likelihood function
  • Log likelihood function
  • expected value of with given

39
Mixture of Experts (MOE)
  • A simple example estimating means of k (2)
    Gaussians
  • Expected log likelihood function
  • where
  • The estimate of the means

40
Mixture of Experts (MOE)
  • Applications
  • Simple experts linear experts
  • ECG diagnostics
  • Mixture of Kalman filters
  • Discussion comparison to non-modular
    architecture

41
Support vector machines
  • A new approach
  • Gives answers for questions not solved using the
    classical approach
  • The size of the network
  • The generalization capability

42
Support vector machines
  • Classification

Optimal hyperplane
Classical neural learning
Support Vector Machine
43
VC dimension
44
Structural error minimization
45
Support vector machines
  • Linearly separable two-class problem
  • separating hyperpalne

Optimal hyperplane
46
Support vector machines
  • Geometric interpretation

47
Support vector machines
  • Criterion function, Lagrange function
  • a constrained optimization problem
  • conditions
  • dual problem
  • support vectors optimal hyperplane

48
Support vector machines
  • Linearly nonseparable case
  • separating hyperplane
  • criterion function
  • Lagrange function
  • support vectors optimal hyperplane

Optimal hyperplane
49
Support vector machines
  • Nonlinear separation
  • separating hyperplane
  • decision surface
  • kernel function
  • criterion function

50
Support vector machines
  • Examples of SVM
  • Polynomial
  • RBF
  • MLP

51
Support vector machines
  • Example polynomial
  • basis functions
  • kernel function

52
SVR (classification)
Separable samples
Not separable samples
Constraint
Constraint
Minimize
Minimize
53
SVR (regression)
54
SVR (regression)

Constraints
Minimize
55
SVR (regression)
  • Lagrange function
  • dual problem
  • constraints
  • support vectors
  • solution

56
SVR (regression)
57
SVR (regression)
58
SVR (regression)
59
SVR (regression)
60
Support vector machines
  • Main advantages
  • generalization
  • size of the network
  • centre parameters for RBF
  • linear-in-the-parameter structure
  • noise immunity

61
Support vector machines
  • Main disadavantages
  • computation intensive (quadratic optimization)
  • hyperparameter selection
  • VC dimension (classification)
  • batch processing

62
Support vector machines
  • Variants
  • LS SVM
  • basic criterion function
  • Advantages easier to compute
  • adaptivity,

63
Mixture of SVMs
  • Problem of hyper-parameter selection for SVMs
  • Different SVMs, with different hyper-parameters
  • different sigma
  • Soft separation of the input space

64
Boosting techniques
  • Boosting by filtering
  • Boosting by subsampling
  • Boosting by reweighting

65
Other modular architectures
  • Modular classifiers
  • Decoupled modules
  • Hierarchical modules
  • Network ensemble (linear combination)
  • Network ensemble (decision, voting)

66
Modular architectures
Write a Comment
User Comments (0)
About PowerShow.com