Title: Analysis of uncertain data: Selection of probes for information gathering
1Analysis of uncertain data Selection of probes
for information gathering
Eugene Fink
May 27, 2009
2Outline
- High-level part
- Research interests and dreams
- Proactive learning under uncertainty
- Military intelligence applications
- Technical part
- Evaluation of given hypotheses
- Choice of relevant observations
- Selection of effective probes
3High-Level Part
4Research interests and dreams
- Semi-automated representation changes
- Problem reformulation and simplification
- Selection of search and learning algorithms
- Trade-offs among completeness, accuracy, and
speed of these algorithms
5Research interests and dreams
- Semi-automated representation changes
- Semi-automated reasoning under uncertainty
- Conclusions from incomplete and imprecise data
- Passive and active learning
- Targeted information gathering
6Research interests and dreams
- Semi-automated representation changes
- Semi-automated reasoning under uncertainty
Recent projects
- Scheduling based on uncertain resources and
constraints - Excel tools for uncertain numeric and nominal
data - Analysis of military intelligence and targeted
data gathering
7Representation changes
- Semi-automated representation changes
- Semi-automated reasoning under uncertainty
- Theoretical foundations of AI
- Formalizing messy AI techniques
- AI-complexity and AI-completeness
8Representation changes
- Semi-automated representation changes
- Semi-automated reasoning under uncertainty
- Theoretical foundations of AI
- Algorithm theory
- Generalized convexity
- Indexing of approximate data
- Compression of time series
- Smoothing of probability densities
9Subject of the talk
- Semi-automated representation changes
- Semi-automated reasoning under uncertainty
- Analysis of military intelligence
- Targeted information gathering
- Theoretical foundations of AI
- Algorithm theory
10Learning under uncertainty
Learning is almost always a response to
uncertainty.
If we knew everything, we would not need to learn.
11Learning under uncertainty
Construction of predictive models, response
mechanisms, etc. based on available data.
12Learning under uncertainty
- Passive learning
- Active learning
Targeted requests for additional data, based on
simplifying assumptions.
- The oracle can answer any question.
- The answers are always correct.
- All questions have the same cost.
13Learning under uncertainty
- Passive learning
- Active learning
- Proactive learning
Extensions to active learning aimed at removing
these assumptions.
- Different questions incur different costs.
- We may not receive an answer.
- An answer may be incorrect.
- The information value depends on the intended use
of the learned knowledge.
14Proactive learning architecture
Top-Level Control
modelutility andlimitations
ModelConst-ruction
ModelEvalu-ation
QuestionSelection
currentmodel
Reasoning orOptimization
questions
answers
DataCollection
15Military intelligence applications
We have studied proactive learning in the context
of military intelligence and homeland security.
- The purpose is to develop tools for
- Drawing conclusions from available intelligence.
- Planning of additional intelligence gathering.
16Modern military intelligence
Gather and analyze
Front end Massive data collection, including
satellite and aerial imaging, interviews, human
intelligence, etc.
Back end Sifting through massive data sets, both
public and classified.
Almost no feedback loop back-end analysts are
passive learners, who do not give tasks to
front-end data collectors.
17Traditional goals
- Gather and analyze massive data
- Draw (semi-)reliable conclusions
- Propose actions that are likely to accomplish
given objectives
18Novel goals
Identify critical missing intelligence and plan
effective information gathering.
- Targeted observations (expensive).
- Active probing (very expensive).
19Analysis of leadership and pathways
We can evaluate the intent and possible future
actions of an adversary through the analysis of
its leadership and pathways.
20Analysis of leadership and pathways
We can evaluate the intent and possible future
actions of an adversary through the analysis of
its leadership and pathways.
Leadership Social networks, goals, and pet
projects of decision makers.
Pathways Typical projects and their sequences in
research, development, and production.
research onenhanced orcs
military orcdeployment
secret orcdevelopment
mass orcproduction
21Analysis of leadership and pathways
22Analysis of leadership and pathways
- Construct models of social networks and
production pathways. - For each set of reasonable assumptions about the
adversarys intent, use these models to predict
observable events. - Check which of the predictions match actual
observations.
23Example
Model predictions
If Sauron were secretly forging a new ring
- 80 chance we would observe deliveries of
black-magic materials to Mordor. - 60 chance we would observe an unusual
concentration of orcs.
What can we conclude?
Intelligence The aerial imaging by eagles shows
black-magic deliveries but no orcs.
24Technical Part
Anatole Gershman, Eugene Fink, Bin Fu, and Jaime
G. Carbonell
25General problem
We have to distinguish among n mutually exclusive
hypotheses, denoted H1, H2,, Hn. We base the
analysis on m observable features, denoted obs1,
obs2, , obsm. Each observation is a variable
that takes one of several discrete values.
26Input
- Prior probabilities For every hypothesis, we
know its prior thus, we have an array of n of
priors, prior1..n. - Possible observations For every observation,
obsa, we know the number of its possible values,
numa. Thus, we have the array num1..m with
the number of values for each observation. - Observation distributions For every hypothesis,
we know the related probability distribution of
each observation. Thus, we have a matrix
chance1..n, 1..m, where each element is a
probability-density function. Every element
chancei, a is itself a one-dimensional array
with numa elements, which represent the
probabilities of possible values of obsa. - Actual observations We know a specific value of
each observation, which represents the available
intelligence. Thus, we have an array of m
observed values, val1..m.
27Output
We have to evaluate the posterior probabilities
of the n given hypotheses, denoted post1..n.
28Approach
We can apply the Bayesian rule, but we have to
address two complications.
- The hypotheses may not cover all
possibilities.Sauron may be neither working on a
new ring nor doing white-magic research.
- The observations may not be independent and we
usually do not know the dependencies.The
concentration of orcs may or may not be directly
related to the black-magic deliveries.
29Simple Bayesian case
We have one observed value, vala, and the sum
of the prior1..n probabilities is exactly 1.0.
Integrated likelihood of observing
vala likelihood(vala) chance1, avala
prior1 chancen, avala
priorn.
Posterior probability of Hi posti prob(Hi
vala) chancei, avala priori /
likelihood(vala).
30Rejection of all hypotheses
We have one observed value, vala, and the sum
of the prior1..n probabilities is less than 1.0.
We consider the hypothesis H0 representing the
believe that all n hypotheses are
incorrect prob0 1.0 - prior1 - -
priorn.
Posterior probability of H0 post0 prior0
prob(vala H0) / prob(vala) prior0
prob(vala H0) / (prior0 prob(vala
H0) likelihood(vala)).
31Rejection of all hypotheses
Bad news We do not know prob(vala H0).
Good news post0 monotonically depends on
prob(vala H0) thus, if we obtain lower and
upper bounds for prob(vala H0), we also get
bounds for post0.
Posterior probability of H0 post0 prior0
prob(vala H0) / prob(vala) prior0
prob(vala H0) / (prior0 prob(vala
H0) likelihood(vala)).
32Plausibility principle
Unlikely events normally do not happen thus, if
we have observed vala, then its likelihood must
not be too small.
Plausibility threshold We use a global constant
plaus, which must be between 0.0 and 1.0. If we
have observed vala, we assume that
prob(vala) plaus / numa.
We use it to obtains bounds for prob(vala
H0) Lower (plaus / numa -
likelihood(vala)) / prior0. Upper 1.0.
33Plausibility principle
We substitute these bounds into the dependency of
post0 on prob(vala H0), thus obtaining the
bounds for post0 Lower 1.0 -
likelihood(vala) numa / pluas. Upper
prior0 / (prior0 likelihood(vala)).
We use it to obtains bounds for prob(vala
H0) Lower (plaus / numa -
likelihood(vala)) / prior0. Upper 1.0.
We have derived bounds for the probability that
none of the given hypotheses is correct.
34Judgment calls
A human has to specify a plausibility threshold
and decide between the use of the lower and the
upper bounds.
- Plausibility threshold Reducing it leads to more
reliable conclusions at the expense of a looser
lower bound. We have used 0.1, which tends to
give good practical results. - Lower vs. upper bound We should err on the
pessimistic side. If H0 is a pleasant surprise,
use the lower bound else, use the upper bound.
35Multiple observations
We have multiple observed values, val1..m.
We have tried several approaches
- Joint distributions We usually cannot obtain
joint distributions or information about
dependencies.
- Independence assumption We usually get terrible
practical results, which are no better (and
sometimes worse) than random guessing.
- Use of one most relevant observation We usually
get surprisingly good practical results.
36Most relevant observation
We identify the highest-utility observation and
do not use other observations to corroborate it.
Pay attention only to black-magic deliveries and
ignore observations of orc armies.
Advantage We use a conservative approach, which
never leads to excessive over-confidence.
Drawback We may significantly underestimatethe
value of available observations.
37Most relevant observation
We identify the highest-utility observation and
do not use other observations to corroborate it.
- Selection procedure
- For each of the m observable values
- Compute the posteriors based on this value.
- Evaluate their information utility.
- Select the observable value that gives the
- highest information utility of the posteriors.
38Alternative utility measures
Negation of Shannons entropy post0 log
post0 postn log postn. It rewards
high certainty, that is, situations in which
the posteriors clearly favor one hypothesis over
all others. It is high when the probability of
some hypothesis is close to 1.0 it is low when
all hypotheses are about equally
likely. Drawback It may reward unwarranted
certainty.
39Alternative utility measures
Negation of Shannons entropy post0 log
post0 postn log postn.
Kullback-Leibler divergence post0 log
(post0 / prior0) postn log
(postn / priorn). It rewards situations in
which the posteriors are very different from the
priors. It tends to give preference to
observations that have the potential for
paradigm shifts. Drawback It may encourage
unwarranted departure from the right conclusions.
40Alternative utility measures
Negation of Shannons entropy post0 log
post0 postn log postn.
Kullback-Leibler divergence post0 log
(post0 / prior0) postn log
(postn / priorn).
Task-specific utilities We may construct better
utility measures by analyzing the impact of
posterior estimates on our future actions and
evaluating the related rewards and penalties, but
it involves more lengthy formulas.
41Probe selection
We may obtain additional intelligence by probing
the adversary, that is, affecting it by external
actions and observing its response.
Increase the cost of black-magic materials
through market manipulation and observe whether
Sauron continues purchasing them.
We have to select among k available probes.
42Additional input
- Probe costs For every probe, we know its
expected cost thus, we have an array of k
numeric costs, cost1..k. - Observation distributions The likelihood of
specific observed values depends on (1) which
hypothesis is correct and (2) which probe has
been applied. For every hypothesis and every
probe, we know the related probability
distribution of each observation. Thus, we have
an array with n m k elements,
chance1..n, 1..m, 1..k, where each element is a
probability density function. Every element
chancei, a, j is itself a one-dimensional array
with numa elements, which represent the
probabilities of possible values of obsa.
43Selection procedure
- For each of the k probes
- Consider the related observation distributions.
- Select the most relevant observation.
- Compute the expected gain as the difference
between the expected utility of the posterior
probabilities and the probe cost. - Select the probe with the highest gain.
- If this gain is positive, recommend its
application.
44Extensions
- Task-specific utility functions.
- Accounting for the probabilities of observation
and probe failures. - Selection of multiple observations based on their
independence or joint distributions. - Application of parameterized probes.
45Analysis of
Uncertain Data