Neural Implementations of Bayesian Inference - PowerPoint PPT Presentation

1 / 80
About This Presentation
Title:

Neural Implementations of Bayesian Inference

Description:

Bayesian inference with spikes: Multisensory integration ... Unimodal Gaussian probability distributions over the stimulus. Is this more general? ... – PowerPoint PPT presentation

Number of Views:76
Avg rating:3.0/5.0
Slides: 81
Provided by: Alexandr201
Category:

less

Transcript and Presenter's Notes

Title: Neural Implementations of Bayesian Inference


1
Neural Implementations of Bayesian Inference
Alexandre Pouget Department of Brain and
Cognitive Sciences University of Rochester
2
Outline
  • Encoding probability distributions with spikes
  • Bayesian inference with spikes Multisensory
    integration
  • Bayesian inference with spikes Decision making
  • Alternative schemes
  • Maximum likelihood estimation

3
Visuo-Tactile Integration
(Ernst and Banks, Nature, 2002)
4
Visuo-Tactile Integration
Bimodal p(sVision,Touch)
ap(sVision) p(sTouch)
Probability
p(sVision)
S (Width)
5
Main Issues
  • How do cortical neurons represent probability
    distributions?
  • How do they take products of distributions?
  • How do we make optimal decisions? How do neurons
    collapse distributions onto maximum likelihood
    estimates?

6
Main Issues
  • And how do they do so given the high level of
    variability in neuronal responses in cortex?

7
Poisson Variability in Cortex
The variability is Poisson-like p(rs) (rspike
counts) is bell shaped with variance proportional
to the mean (Fano factors within 0.3-1.8, Fano
factor for a Poisson process is 1)
Trial 1
Trial 2
Trial 3
Trial 4
8
Probabilistic population code
  • As an example, we consider a population of
    neurons with Gaussian tuning curves and
    independent Poisson variability.

rr1,r2,,rn
100
100
80
80
60
Activity
60
Activity (Spike count)
40
40
20
20
0
0
-45
0
45
-45
0
45
Stimulus
Preferred stimulus
Population pattern of activity on a single trial
9
Population codes
  • Standard approach estimating

100
Population vector
r
80
60
Activity (spike count)
40
20
0
-45
0
45
Preferred stimulus
Underlying assumption population codes encode
single values.
10
Probabilistic population codes
  • Alternative compute a posterior distribution,
    p(sr) from (Foldiak, 1993 Sanger 1996).

100
r
80
60
Activity (spike count)
40
20
0
-45
0
45
Preferred stimulus
Variability in neural responses for a constant
stimulus Poisson-like
11
Probabilistic population codes
  • For independent Poisson noise product of experts

12
For independent Poisson noise, There
fore, the gain encodes the certainty associated
with the encoded variable.
13
Gain and variance
  • For independent Poisson noise, we have

This is average of the width of the posterior
14
Experimental Evidence
  • Contrast

Anderson et al, Nature 2000
15
Experimental Evidence
  • Contrast
  • Motion Coherency
  • Retinal Eccentricity

16
Outline
  • Bayesian inference multisensory integration

17
Inferences with probabilistic population codes
100
g1
80
Vision
C1
Activity
60
40
20
0
-45
0
45
Preferred S
100
80
C2
Activity
60
g2
Touch
40
20
0
-45
0
45
Preferred S
18
100
g1
80
C1
Activity
60
40
20
100
gg1g2
0
-45
0
45
80
Preferred S
Activity
60

40
20
100
0
80
-45
0
45
C2
Activity
Preferred S
60
g2
40
20
0
-45
0
45
Preferred S
19
Visuo-Tactile Integration
Bimodal p(sVision,Touch)
ap(sVision) p(sTouch)
Probability
p(sVision)
S (Width)
20
Normalization
  • Divisive normalization can be used to keep
    neurons within their firing range.

21
Assumptions
  • Neural noise independent Poisson noise
  • Gaussian tuning curves
  • Unimodal Gaussian probability distributions over
    the stimulus
  • Is this more general?

22
Bayesian decoder
100
r1
80
C1
Activity
60
40
20
r1r2
100
0
-45
0
45
80
Preferred S
Activity
60

40
20
100
r2
0
80
-45
0
45
C2
Activity
Preferred S
60
40
Bayesian decoder
20
0
-45
0
45
Preferred S
Bayesian decoder
Bayesian inference
23
Variability requirements
  • Exponential distributions

Covariance matrix of r
Derivative of the tuning curves
24
Kernel h(s)
Covariance matrix of r
Derivative of the tuning curves
Covariance between r and s
Local optimal linear estimator!
25
Covariance requirements
  • This family includes any distribution in which
    the covariance matrix is proportional to the
    mean, regardless of the form of the correlations.
  • Any exponential distribution with a fixed Fano
    factor works.

26
Tuning curve requirements
  • The tuning curve f(s) can take any shape.
    However, h(s) has to be the same in all
    populations. What if its not the same?

27
Tuning curves Identical Gaussians
Activity
Preferred S
Cue 1
Cue 2
28
Tuning Curves Gaussians with different widths
Cue 1
Cue 2
29
Tuning curves Gaussians vs Sigmoids
40
30
Activity
20
10
0
-50
0
50
Preferred S
Cue 1
Cue 2
30
Tuning curve requirements
  • Let say r1 has gaussian tuning curves and r2 uses
    sigmoidal tuning curves. Then, the optimal
    combination is a linear combination.
  • The matrix A exists if the tuning curves are
    basis sets.

31
Distribution over s
  • p(rs) does not have to be a normal distribution
    over s.

32
Prior Distributions
  • Prior are easily incorporated
  • Prediction baseline activity in cortex (e.g.
    before the start of a trial) should encode the
    prior distribution
  • There is evidence for this idea in LIP (Glimcher
    and Platt) and the superior Colliculus (Basso and
    Wurtz).

33
Summary
  • Linear combinations of PPCs are equivalent to
    optimal Bayesian inference when the variability
    follows an exponential distribution. This works
    for
  • all covariance matrices that are proportional to
    the mean (Fixed Fano factor)
  • any set of tuning curves that forms a basis set
  • any probability distribution over s
  • any prior distribution over s

34
Integrate and fire neurons
  • Can we get a similar result with realistic
    networks of spiking neurons, such as integrate
    and fire neurons?

35
Integrate and fire neurons
  • Output layer
  • 1200 conductance-based integrate-and-fire
    neurons, 1000 excitatory, 200 inhibitory
  • Lateral connections
  • High Fano factors (0.3 to 1)
  • Correlated activity
  • Linear in rates

100
100
g1
Input near-Poisson correlated spike trains with
different gains and slightly different means
80
80
Activity
Activity
60
60
g2
40
40
20
20
0
0
-45
0
45
-45
0
45
Preferred S
Preferred S
Cue 1
Cue 2
36
Test cue 1 alone
r1
100
80
Activity
60
40
20
0
-45
0
45
Preferred S
100
80
Activity
60
40
20
0
-45
0
45
Preferred S
Cue 1
37
Test cue 2 alone
r2
100
80
Activity
60
40
20
0
-45
0
45
Preferred S
100
80
Activity
60
40
20
0
-45
0
45
Preferred S
Cue 2
38
Test cue1 and cue2 together
r3
100
80
Activity
60
40
20
0
-45
0
45
Preferred S
100
100
80
80
Activity
Activity
60
60
40
40
20
20
0
0
-45
0
45
-45
0
45
Preferred S
Preferred S
Cue 1
Cue 2
39
Compare the distributions
r3
100
80
Activity
60
40
20
How does p(r3s) compare to p(r1s)p(r2s)?
0
-45
0
45
Preferred S
100
100
80
80
Activity
Activity
60
60
40
40
20
20
0
0
-45
0
45
-45
0
45
Preferred S
Preferred S
Cue 1
Cue 2
40
p(r3s) versus p(r1s)p(r2s)
Cue 1
Activity
Preferred S
Cue 2
Activity
Preferred S
Identical tuning curves
41
p(r3s) versus p(r1s)p(r2s)
Cue 1
Activity
Mean
Variance
96
3
95
2.5
94
Preferred S
2
93
Variance of p(r3s)
Mean of p(r3s)
92
1.5
91
0.5
90
0
89
0
0.5
1
1.5
2
2.5
3
89
90
91
92
93
94
95
96
Cue 2
Activity
Mean of p(r1s)p(r2s)
Variance of p(r1s)p(r2s)
Preferred S
Different tuning curves and different correlations
42
p(r3s) versus p(r1s)p(r2s)
Cue 1
Activity
Preferred S
Cue 2
Activity
Preferred S
43
Experimental prediction
  • Multisensory neurons should be linear on average

44
Experimental prediction
  • The main results in the literature are nonlinear
    combinations (superadditivity)!

Wallace, Meredith, and Stein, J Neurophys 1998
45
Experimental prediction
  • The main results in the literature are nonlinear
    combinations (superadditivity)!
  • In fact, nonlinearity is the criteria used to
    define multisensory areas in fMRI
  • Are we already proven wrong?

46
Experimental prediction
Perrault, Vaughan, Stein, and Wallace. J
Neurophys 2005
47
Inference over time
  • Can we generalize this approach to inference over
    time, and more generally time varying signals?

48
Outline
  • Bayesian inference decision making

49
Binary Decision Making
Shadlen et al.
50
Binary Decision Making
  • The Bayesian strategy involves computing the
    posterior distribution given all activity
    patterns from MT up to the current time,
  • Therefore, all we need to do is add the activity
    patterns over time.
  • This predicts that decision neurons act like
    integrators

51
Bayesian decoder
50
40
Activity
30
20
10
0
-45
0
45
Preferred S
52
LIP
Roitman Shadlen, 2002 J. Neurosci.
53
Outline
  • Alternative schemes

54
Alternative schemes
Log likelihood ratio (Shadlen et al Deneve)
55
Log Likelihood
  • Race models and Bayesian approach

Temporal sum
over
56
Differences between Log odds and PPCs
  • With PPCs, LIP neurons do not compute the
    activity difference between MT neurons with
    opposite direction preferences
  • PPCs and log odds turn products into sums but for
    log odds, sums are products regardless of the
    noise distribution. Not so for PPCs
  • At the end of the integration, LIP encodes the
    posterior distribution over direction, i.e, LIP
    knows how much it can trust its choice

57
Alternative schemes
Log likelihood ratio (Shadlen et al Deneve)
  • Log probability
  • (Barlow Rao Jazayeri, Movshon)
  • Probability
  • (Anastasio et al Simoncelli Hoyer and
    Hyvarynen Rao Koechlin et al)
  • Convolution codes
  • (Anderson Zemel, Dayan, Pouget)

58
Alternative schemes
Si
Log likelihood ratio (Shadlen et al Deneve)
100
80
60
  • Log probability
  • (Barlow Rao Jazayeri, Movshon)

Activity
40
20
0
-90
0
90
Stimulus
  • Probability
  • (Anastasio et al Simoncelli Hoyer and
    Hyvarynen Rao Koechlin et al)
  • Convolution codes
  • (Anderson Zemel, Dayan, Pouget)

59
Alternative schemes
Si?
Si?
Log likelihood ratio (Shadlen et al Deneve)
100
80
60
  • Log probability
  • (Barlow Rao Jazayeri, Movshon)

Activity
40
20
0
-90
0
90
Stimulus
  • Probability
  • (Anastasio et al Simoncelli Hoyer and
    Hyvarynen Rao Koechlin et al)
  • Convolution codes
  • (Anderson Zemel, Dayan, Pouget)

60
Alternative schemes
Si?
Log likelihood ratio (Shadlen et al Deneve)
100
80
60
  • Log probability
  • (Barlow Rao Jazayeri, Movshon)

Activity
40
20
0
-90
0
90
Stimulus
  • Probability
  • (Anastasio et al Simoncelli Hoyer and
    Hyvarynen Rao Koechlin et al)
  • Convolution codes
  • (Anderson Zemel, Dayan, Pouget)

61
Alternative schemes
The convolution codes and the log likelihood fail
to account for contrast invariance
0.04
Log P(sr)
0.02
0
-45
0
45
Orientation (deg)
Contrast invariance
Prediction for Convolution codes
Prediction for Log Likelihood
62
Outline
  • Maximum likelihood estimation

63
Decision Making
Superior Colliculus
LIP
64
Maximum Likelihood
Activity
0
Preferred Direction (deg)
65
Neural implementation
  • Attractor networks

66
Optimal decision making
100
80
60
40
20
0
-100
0
100
Superior Colliculus
Preferred saccade direction
LIP
100
80
60
40
20
0
-100
0
100
Preferred saccade direction
67
Nonlinear Networks
  • Networks in which the activity at time t1 is a
    nonlinear function of the activity at the
    previous time step.

68
Line Attractor Networks
  • Attractor network with population code
  • Periodic variable
  • Translation invariant weights

69
Line Attractor Networks
  • Fixed point when

70
Line Attractor Networks
  • Computing the weights

Desired profile
Desired profile over u
71
Line Attractor Networks
  • The problem with the previous approach is that
    the weights tend to oscillate. Instead, we
    minimize
  • The solution is

72
Weight Pattern
5
4
Amplitude
2
0
-2
0
Difference in preferred orientation
73
Optimal decision making
Superior Colliculus
LIP
100
80
60
40
20
0
-100
0
100
Preferred saccade direction
74
Optimal decision making
A maximum likelihood estimate minimizes this
variance
100
80
60
40
20
0
-100
0
100
Superior Colliculus
Preferred saccade direction
LIP
100
80
60
40
20
0
-100
0
100
Preferred saccade direction
75
Is the network an ML estimator?
Variances
Above Maximum Likelihood
Pop Vector
Network
76
Optimality constaint
Eigenvector with eigenvalue equal to 0
Covariance matrix of r
Derivative of the tuning curves
Covariance between r and s
Local optimal linear estimator!
This network is effectively projecting its input
on the LOLE
77
General Results
  • Line attractor networks (stable smooth hills) are
    equivalent to maximum likelihood estimators.
  • This result holds regardless of the exact form
    of the nonlinear activation function.

78
Performance Over Time
6
5
4
3
Standard Deviation (deg)
2
1
0
-1 0 5 10
15
Time ( of iterations)
79
Optimal decision making
100
80
60
40
20
Sensorimotor transformation lecture
0
-100
0
100
f(S)
S
100
80
60
40
20
0
-100
0
100
S
80
Kalman and Particle Filters
Write a Comment
User Comments (0)
About PowerShow.com