Entropy and Information - PowerPoint PPT Presentation

About This Presentation
Title:

Entropy and Information

Description:

Negative synergy is called redundancy. Brenner et al., 00. In the identified neuron H1, compute information in a spike pair, separated by an interval dt: ... – PowerPoint PPT presentation

Number of Views:66
Avg rating:3.0/5.0
Slides: 48
Provided by: Adrienne87
Category:

less

Transcript and Presenter's Notes

Title: Entropy and Information


1
Entropy and Information
For a random variable X with distribution p(x),
entropy is given by HX - Sx p(x) log2p(x)
Information mutual information how much
knowing the value of one random variable r (the
response) reduces uncertainty about another
random variable s (the stimulus). Variability in
response is due both to different stimuli and to
noise. How much response variability is useful,
i.e. can represent different messages, depends on
the noise. Noise can be specific to a given
stimulus. ? Need to know the conditional
distribution P(sr) or P(rs). Take a particular
stimulus ss0 and repeat many times to obtain
P(rs0). Compute variability due to noise noise
entropy Information is the difference between
the total response entropy and the mean noise
entropy I(sr) HP(r) Ss P(s)
HP(rs) .
2
Information in single cells
How can one compute the entropy and information
of spike trains?
Discretize the spike train into binary words w
with letter size Dt, length T. This takes into
account correlations between spikes on timescales
TDt. Compute pi p(wi), then the naïve entropy
is
Strong et al., 1997 Panzeri et al.
3
Information in single cells
Many information calculations are limited by
sampling hard to determine P(w) and
P(ws) Systematic bias from undersampling. Corre
ction for finite size effects
Strong et al., 1997
4
Information in single cells
Information is the difference between the
variability driven by stimuli and that due to
noise. Take a stimulus sequence s and repeat
many times. For each time in the repeated
stimulus, get a set of words P(ws(t)). Should
average over all s with weight P(s) instead,
average over time Hnoise lt HP(wsi)
gti. Choose length of repeated sequence long
enough to sample the noise entropy adequately.
Finally, do as a function of word length T
and extrapolate to infinite T.
Reinagel and Reid, 00
5
Information in single cells
Obtain information rate of 80 bits/sec or 1-2
bits/spike.
6
Information in single cells
How much information does a single spike convey
about the stimulus? Key idea the information
that a spike gives about the stimulus is the
reduction in entropy between the distribution of
spike times not knowing the stimulus, and the
distribution of times knowing the stimulus. The
response to an (arbitrary) stimulus sequence s is
r(t). Without knowing that the stimulus was s,
the probability of observing a spike in a given
bin is proportional to , the mean rate, and
the size of the bin. Consider a bin Dt small
enough that it can only contain a single spike.
Then in the bin at time t,
7
Information in single cells
,
Note substitution of a time average for an
average over the r ensemble.
8
Information in single cells
Given
  • note that
  • It doesnt depend explicitly on the stimulus
  • The rate r does not have to mean rate of spikes
    rate of any event.
  • Information is limited by spike precision, which
    blurs r(t),
  • and the mean spike rate.

Compute as a function of Dt
Undersampled for small bins
9
Information in single cells
An example temporal coding in the LGN (Reinagel
and Reid 00)
10
Information in single cells
Apply the same procedure collect word
distributions for a random, then repeated
stimulus.
11
Information in single cells
Use this to quantify how precise the code is, and
over what timescales correlations are important.
12
Information in single cells
How important is information in multispike
patterns?
The information in any given event can be
computed as
Define the synergy, the information gained from
the joint symbol
or equivalently,
Negative synergy is called redundancy.
Brenner et al., 00.
13
Information in single cells multispike patterns
In the identified neuron H1, compute information
in a spike pair, separated by an interval dt
Brenner et al., 00.
14
Information in single cells
Information in patterns in the LGN
Define pattern information as the difference
between extrapolated word info and one letter
Reinagel and Reid 00
15
Using information to evaluate neural models
We can use the information about the stimulus to
evaluate our reduced dimensionality models.
16
Using information to evaluate neural models
Information in timing of 1 spike
By definition
17
Given
By definition
Bayes rule
18
Given
By definition
Bayes rule
Dimensionality reduction
19
Using information to evaluate neural models
Here we used information to evaluate reduced
models of the Hodgkin-Huxley neuron.
Twist model
2D two covariance modes
1D STA only
20
Adaptive coding
  • Just about every neuron adapts. Why?
  • To stop the brain from pooping out
  • To make better use of a limited
  • dynamic range.
  • To stop reporting already known facts
  • All reasonable ideas.
  • What does that mean for coding?
  • What part of the signal is the brain meant
  • to read?
  • Adaptation can be mechanism for early
  • sensory systems to make use of statistical
  • Information about the environment.
  • How can the brain interpret an adaptive
  • code?

From The Basis of Sensation, Adrian (1929)
21
Adaptation to stimulus statistics information
Rate, or spike frequency adaptation is a classic
form of adaptation. Lets go back to the picture
of neural computation we discussed before Can
adapt both the systems filters the
input/output relation (threshold function) Both
are observed, and in both cases, the observed
adaptations can be thought of as increasing
information transmission through the
system. Information maximization as a principle
of adaptive coding For optimum information
transmission, coding strategy should adjust to
the statistics of the inputs. To compute the
best strategy, have to impose constraints
(StemmlerKoch) e.g. the variance of the output,
or the maximum firing rate.
22
Adaptation of the input/output relation
If we constrain the maximum, the solution for the
distribution of output symbols is P(r) constant
a. Take the output to be a nonlinear
transformation on the input r g(s). From
?
Fly LMC cells. Measured contrast in natural
scenes.
Laughlin 81.
23
Adaptation of filters
Change in retinal filters with different light
level and contrast Changes in V1 receptive
fields with contrast
24
Dynamical adaptive coding
But is all adaptation to statistics on an
evolutionary scale? The world is highly
fluctuating. Light intensities vary by 1010 over
a day. Expect adaptation to statistics to
happen dynamically, in real time. Retina
observe adaptation to variance, or contrast,
over 10s of seconds. Surprisingly slow
contrast gain control effects after 100s of
milliseconds. Also observed adaptation to
spatial scale on a similar timescale.
25
Dynamical adaptive coding
The H1 neuron of the fly visual system. Rescales
input/output relation with steady state stimulus
statistics.
Brenner et al., 00
26
Dynamical adaptive coding
As in the Smirnakis et al. paper, there is rate
adaptation in response to the variance change
27
Dynamical adaptive coding
This is a form of learning. Does the
timescale reflect the time required to learn the
new statistics?
28
Dynamical adaptive coding
As we have been through before, extract the
spike-triggered average
29
Dynamical adaptive coding
Compute the input/output relations, as we
described before s stim . STA P(spikes)
rave P(stims) / P(s)
Do it at different times in variance modulation
cycle.
Find ongoing normalisation with respect
to stimulus standard deviation
30
Dynamical adaptive coding
Take a more complex stimulus randomly modulated
white noise. Not unlike natural stimuli
(Ruderman and Bialek 97)
31
Find continuous rescaling to variance envelope.
32
Dynamical information maximisation
This should imply that information transmission
is being maximized. We can compute the
information directly and observe the
timescale. How much information is available
about the stimulus fluctuations? Return to
two-state switching experiment.
Method Present n different white noise
sequences, randomly ordered, throughout the
variance modulation. Collect word responses
indexed by time with respect to the cycle,
P(w(t)). Now divide according to probe identity,
and compute It(ws) HP(w(t)) Si P(si)
HP(w(t)si) , P(si) 1/n Similarly, one
can compute information about the variance
It(ws) HP(w(t)) Si P(si) HP(w(t)si) ,
P(si) ½ Convert to information/spike by
dividing at each time by mean of spikes.
33
Tracking information in time
34
Adaptation and ambiguity
The stimulus normalization is leading to
information recovery within 100ms. If the
stimulus is represented as normalised, how are
the spikes to be interpreted upstream? The rate
conveys variance information, but with slow
timescales.
35
Tracking information in time the variance
36
What conveys variance information?
Where is the variance information and how can one
decode it?
Notice that the interspike interval histograms in
the different variance regimes are quite
distinct. Need to take log to see this
clearly. Could these intervals provide
enough information, rapidly, to distinguish
the variance?
37
Decoding the varianceinformation
38
Decoding the variance information
Use signal detection theory. Collect the
steady-state distributions P(dsi). Then for
a given observation, compute the likelihood
ratio, P(ds1)/P(ds2). After observing a
sequence of intervals, compute the log-likelihood
ratio for the entire sequence
Since we cant sample the joint distributions, we
will assume that the intervals are independent
(upper bound). Now calculate the signal to noise
ratio of Dn, ltDngt2/var(Dn), as a function of n.
39
Decoding the variance information
On average, the number of intervals required for
accurate discrimination is 5-8.
40
Adaptive coding conclusions
Have shown that information is available from the
spike train in three forms timing of single
spikes the rate the local distribution of spike
intervals. The adaptation properties in some
systems serve to rapidly maximize information
transmission through the system under
conditions of changing stimulus statistics. We
demonstrated this for the variance in other
systems can probe adaptation to more complex
stimulus correlations (Meister) Mechanisms
remain open intrinsic properties conductance
level learning (Tony,StemmlerKoch,
Turrigiano) circuit level learning (Tony)
41
Conclusions

Characterising the neural computation uncovering
the richness of single neurons and
systems Using information to evaluate
coding Adaptation as a method for the brain to
make use of stimulus statistics more
examples? how is it implemented?
42
(No Transcript)
43
The rate dynamics whats going on
  • Recall no fixed timescale
  • Consistent with
  • power-law adaptation

Suggests that rate behaves like fractional
differentiation of the log-variance envelope
44
Fractional differentiation
scaling adaptive response to a square wave
power-law response to a step
Fourier representation (iw)a each frequency
component scaled by wa and with phase shifted
by a constant phase ia ? ap
45
Linear analysis agrees
  • Stimulate with a set of sine waves
  • at different frequencies
  • Variance envelope expsin t/T
  • for a range of frequencies 1/T

46
Fits pretty well
From sinusoid experiments, find exponent a 0.2
47
So its a fractional differentiator
  • connects with universal power-law behaviour
    of receptors
  • uncommon to see it in a higher computation
  • functional interpretation whitening stimulus
    spectrum
  • (van Hateren)
  • introduces long history dependence
  • linear realisation of long memory effects
  • also has the property of emphasizing rapid
    changes and
  • extending dynamic range (Adrian)
  • but whats the mechanism? --- some ideas but
    we dont know
Write a Comment
User Comments (0)
About PowerShow.com