Title: Intelligent Data Analysis and Probabilistic Inference
1Intelligent Data Analysis and Probabilistic
Inference
- Lecturer Duncan Gillies (dfg)
-
-
2Course Outline
- Lectures 1-12
- Probabilistic Inference and Bayesian Networks
- Lecture 13-16
- Data Modelling - Distributions, sampling
re-sampling, principal component analysis - Lecture 17-18
- Classification - selected topics
3Coursework
- A practical exercise implementing data modelling
algorithms in the language of your choice. - Hand out
- Week 4, after lecture 6
- Hand in
- Week 7
- You will require a doc login to submit the
coursework
4Web Resources
- The course material for the course can be found
on my web page - www.doc.ic.ac.uk/dfg
- Tutorial solutions will be posted a few days
after the tutorial is set.
5Lecture 1
- Bayes theorem and Bayesian inference
6Probability and Statistics
- Statistics emerged as an important mathematical
discipline in the nineteenth century. - Probability is much older, and has been studied
as long ago as man took an interest in games of
chance. - Our story starts with the famous theorem of the
Rev. Thomas Bayes, published in 1763.
7Independent Events
- For independent events S and D
- Â
- P(DS) P(D) P(S)?
- Â
- (read "disease" for D and "symptom" for S)?
- Â
8Dependencies
- However in cases where S and D are not
independent we must write - P(DS) P(D) P(SD)?
- where P(SD) is the probability of the symptom
given the disease has occurred.
9Bayes Theorem
- Now since conjunction is commutative
- P(DS) P(S) P(DS) P(D) P(SD)?
- and by rearranging we get
- P(DS) P(D) P(SD) / P(S)
- (Bayes Theorem.)?
10Bayes Theorem as an Inference Mechanism
- P(DS) P(D) P(SD) / P(S)
- P(DS) The probability of the disease given the
symptom is what we wish to infer. - P(D) is the probability of the disease (within a
population) this is a measurable quantity. - Â
- P(SD) is the probability of the symptom given
the disease. We can measure this from the case
histories of the disease.
11Normalisation
- P(DS) is a conditional probability so for a
given S its values sum to 1. - Suppose D has just two states d0 and d1, then
- P(d0 s0) P(s0d0) P(d0) / P(s0)
- P(d1 s0) P(s0d1) P(d1) / P(s0)?
- The denominator is the same in both cases so we
can calculate it using P(d0 s0) P(d1 s0) 1 - In other words we can eliminate P(s0)?
12Prior and Likelihood Information
- P(DS) ? P(D) P(SD)?
- (We write 1/P(S) ? to remind us it is a
constant)? - P(D) is prior information, since we knew it
before we made any measurements - Â
- P(SD) is likelihood information, since we gain
it from measurement of symptoms.
13Bayesian Inference (in its most general form)?
- Convert the prior and likelihood information to
probabilities - Multiply them together
- Normalise the result to get the posterior
probability of each hypothesis given the
evidence - Select the most probable hypothesis.
14Prior Knowledge
- In simple cases we obtain the prior probability
from statistics. For example, we can calculate
the prior probability as the number of instances
of the disease divided by the number of patients
presenting for treatment. - However in many cases this is not possible -
since the data isnt there. - There may also be prior knowledge in other forms
15Example from Computer Vision
- Consider a program to determine whether an image
contains a picture of a cat.
Drawing by Kliban
16Feature Extraction
- There are a lot of things that we could use to
detect a cat, but lets start with something
simple. - Well write a program to extract circles from the
image. If we find two adjacent circles then well
assume that its a cats eyes. - (I know that they sleep a lot, but never mind)?
17Representing prior knowledge about a cat
We can formalise our model by specifying Ri ? Rj
(the eyes are approximately the same size)? S
2 (RiRj) (the eyes are spaced correctly)?
18Semantic description of a Cat
- To find our cat we will extract every circle from
the image and record its position and radius. - For each pair of circles we will calculate a
catness measure which is - Catness (Ri - Rj)/Ri (S - 2 (Ri Rj))
/Ri - The catness measure is zero for a perfect match
to our model.
19Turning measures into probabilities
- We could choose some common sense way of changing
our catness measure into a probability.
20Using a distribution
- We can make this look more respectable
mathematically by choosing a distribution
21Subjective Probability
- In all cases we are making a subjective estimate
of the probability. - There is no formal reason for choosing the
distribution other than personal judgement
22Objective Probabilities
- We make measurements from a large set of
photographs
23Frequencies
- To get our discrete probability distribution we
could process a vast library of photographs. - Every time we extract a pair of circles and apply
a catness measure, we also get an expert to
tell us whether the extracted structure does
represent a cat. - For each bin we calculate the ratio of correctly
identified cats to the total.
24Fitting a Distribution
- To overcome experimental error, and to compact
our representation we may try to fit an
appropriate distribution.
25Subjective vs. Objective Probabilities
- There is a long standing debate as to whether the
subjective or the objective approach is the most
appropriate. - Objective may seem more plausible at first, but
does require lots of data and is prone to
experimental error. - NB Some people say the Bayesian approach must be
subjective. I do not subscribe to this.
26Likelihood
- Our prior probabilities represent long standing
beliefs. They can be taken to be our established
knowledge of the subject in question. - When we process data, and make measurements we
create likelihood information. - Likelihood can incorporate uncertainty in the
measurement process.
27Likelihood and Catness
- Our computer vision process could not just
extract a circle, but also tell us how good a
circle it is, for example by counting the pixels
that contribute to it.
28Problem Break
- Using a suitable measure estimate the likelihood
of each circle below
29Solution
- There are 28 possible circle pixels
- The first circle has 22, so Likelihood 22/28
0.78 - The second circle has 11, so likelihood 11/28
0.39
30Summary on Bayesian Inference and Cats 1
- Returning to our inference rule
- P(CI) ? P(C) P(IC)?
- PRIOR
- P(C) is the probability that two circles
represent a cat, found by measuring catness and
using prior knowledge to convert catness to
probability.
31Summary on Bayesian Inference and Cats 2
- P(CI) ? P(C) P(IC)?
- LIKELIHOOD
- P(IC) the probability of the image information,
given that two circles represent a cat. - (Rather a round about way of expressing the idea
that image information does actually represent
two circles.)?
32Prior and Subjective
- Should we use subjective or objective methods?
(Many schools of thought exist)? - Prior information should be subjective. It
represents our belief about the domain we are
considering. - (Even if data has made a substantial contribution
to our belief)?
33Likelihood and Objective
- Likelihood information should be objective. It is
a result of the data gathering from which we are
going to make an inference. - It makes some assessment of the accuracy of our
data gathering - In practice either or both forms can be
subjective or objective.