Intelligent Data Analysis and Probabilistic Inference presentation

About This Presentation

Transcript and Presenter's Notes

Title: Intelligent Data Analysis and Probabilistic Inference

1
Intelligent Data Analysis and Probabilistic
Inference

Lecturer Duncan Gillies (dfg)

2
Course Outline

Lectures 1-12
Probabilistic Inference and Bayesian Networks
Lecture 13-16
Data Modelling - Distributions, sampling
re-sampling, principal component analysis
Lecture 17-18
Classification - selected topics

3
Coursework

A practical exercise implementing data modelling
algorithms in the language of your choice.
Hand out
Week 4, after lecture 6
Hand in
Week 7
You will require a doc login to submit the
coursework

4
Web Resources

The course material for the course can be found
on my web page
www.doc.ic.ac.uk/dfg
Tutorial solutions will be posted a few days
after the tutorial is set.

5
Lecture 1

Bayes theorem and Bayesian inference

6
Probability and Statistics

Statistics emerged as an important mathematical
discipline in the nineteenth century.
Probability is much older, and has been studied
as long ago as man took an interest in games of
chance.
Our story starts with the famous theorem of the
Rev. Thomas Bayes, published in 1763.

7
Independent Events

For independent events S and D
P(DS) P(D) P(S)?
(read "disease" for D and "symptom" for S)?

8
Dependencies

However in cases where S and D are not
independent we must write
P(DS) P(D) P(SD)?
where P(SD) is the probability of the symptom
given the disease has occurred.

9
Bayes Theorem

Now since conjunction is commutative
P(DS) P(S) P(DS) P(D) P(SD)?
and by rearranging we get
P(DS) P(D) P(SD) / P(S)
(Bayes Theorem.)?

10
Bayes Theorem as an Inference Mechanism

P(DS) P(D) P(SD) / P(S)
P(DS) The probability of the disease given the
symptom is what we wish to infer.
P(D) is the probability of the disease (within a
population) this is a measurable quantity.
P(SD) is the probability of the symptom given
the disease. We can measure this from the case
histories of the disease.

11
Normalisation

P(DS) is a conditional probability so for a
given S its values sum to 1.
Suppose D has just two states d0 and d1, then
P(d0 s0) P(s0d0) P(d0) / P(s0)
P(d1 s0) P(s0d1) P(d1) / P(s0)?
The denominator is the same in both cases so we
can calculate it using P(d0 s0) P(d1 s0) 1
In other words we can eliminate P(s0)?

12
Prior and Likelihood Information

P(DS) ? P(D) P(SD)?
(We write 1/P(S) ? to remind us it is a
constant)?
P(D) is prior information, since we knew it
before we made any measurements
P(SD) is likelihood information, since we gain
it from measurement of symptoms.

13
Bayesian Inference (in its most general form)?

Convert the prior and likelihood information to
probabilities
Multiply them together
Normalise the result to get the posterior
probability of each hypothesis given the
evidence
Select the most probable hypothesis.

14
Prior Knowledge

In simple cases we obtain the prior probability
from statistics. For example, we can calculate
the prior probability as the number of instances
of the disease divided by the number of patients
presenting for treatment.
However in many cases this is not possible -
since the data isnt there.
There may also be prior knowledge in other forms

15
Example from Computer Vision

Consider a program to determine whether an image
contains a picture of a cat.

Drawing by Kliban
16
Feature Extraction

There are a lot of things that we could use to
detect a cat, but lets start with something
simple.
Well write a program to extract circles from the
image. If we find two adjacent circles then well
assume that its a cats eyes.
(I know that they sleep a lot, but never mind)?

17
Representing prior knowledge about a cat
We can formalise our model by specifying Ri ? Rj
(the eyes are approximately the same size)? S
2 (RiRj) (the eyes are spaced correctly)?
18
Semantic description of a Cat

To find our cat we will extract every circle from
the image and record its position and radius.
For each pair of circles we will calculate a
catness measure which is
Catness (Ri - Rj)/Ri (S - 2 (Ri Rj))
/Ri
The catness measure is zero for a perfect match
to our model.

19
Turning measures into probabilities

We could choose some common sense way of changing
our catness measure into a probability.

20
Using a distribution

We can make this look more respectable
mathematically by choosing a distribution

21
Subjective Probability

In all cases we are making a subjective estimate
of the probability.
There is no formal reason for choosing the
distribution other than personal judgement

22
Objective Probabilities

We make measurements from a large set of
photographs

23
Frequencies

To get our discrete probability distribution we
could process a vast library of photographs.
Every time we extract a pair of circles and apply
a catness measure, we also get an expert to
tell us whether the extracted structure does
represent a cat.
For each bin we calculate the ratio of correctly
identified cats to the total.

24
Fitting a Distribution

To overcome experimental error, and to compact
our representation we may try to fit an
appropriate distribution.

25
Subjective vs. Objective Probabilities

There is a long standing debate as to whether the
subjective or the objective approach is the most
appropriate.
Objective may seem more plausible at first, but
does require lots of data and is prone to
experimental error.
NB Some people say the Bayesian approach must be
subjective. I do not subscribe to this.

26
Likelihood

Our prior probabilities represent long standing
beliefs. They can be taken to be our established
knowledge of the subject in question.
When we process data, and make measurements we
create likelihood information.
Likelihood can incorporate uncertainty in the
measurement process.

27
Likelihood and Catness

Our computer vision process could not just
extract a circle, but also tell us how good a
circle it is, for example by counting the pixels
that contribute to it.

28
Problem Break

Using a suitable measure estimate the likelihood
of each circle below

29
Solution

There are 28 possible circle pixels
The first circle has 22, so Likelihood 22/28
0.78
The second circle has 11, so likelihood 11/28
0.39

30
Summary on Bayesian Inference and Cats 1

Returning to our inference rule
P(CI) ? P(C) P(IC)?
PRIOR
P(C) is the probability that two circles
represent a cat, found by measuring catness and
using prior knowledge to convert catness to
probability.

31
Summary on Bayesian Inference and Cats 2

P(CI) ? P(C) P(IC)?
LIKELIHOOD
P(IC) the probability of the image information,
given that two circles represent a cat.
(Rather a round about way of expressing the idea
that image information does actually represent
two circles.)?

32
Prior and Subjective

Should we use subjective or objective methods?
(Many schools of thought exist)?
Prior information should be subjective. It
represents our belief about the domain we are
considering.
(Even if data has made a substantial contribution
to our belief)?

33
Likelihood and Objective

Likelihood information should be objective. It is
a result of the data gathering from which we are
going to make an inference.
It makes some assessment of the accuracy of our
data gathering
In practice either or both forms can be
subjective or objective.

Write a Comment

User Comments (0)

About PowerShow.com

Intelligent Data Analysis and Probabilistic Inference PowerPoint PPT Presentation