Intelligent Data Analysis and Probabilistic Inference - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Intelligent Data Analysis and Probabilistic Inference

Description:

Data Modelling - Distributions, sampling re-sampling, principal component analysis ... The catness measure is zero for a perfect match to our model. ... – PowerPoint PPT presentation

Number of Views:103
Avg rating:3.0/5.0
Slides: 34
Provided by: docI1
Category:

less

Transcript and Presenter's Notes

Title: Intelligent Data Analysis and Probabilistic Inference


1
Intelligent Data Analysis and Probabilistic
Inference
  • Lecturer Duncan Gillies (dfg)

2
Course Outline
  • Lectures 1-12
  • Probabilistic Inference and Bayesian Networks
  • Lecture 13-16
  • Data Modelling - Distributions, sampling
    re-sampling, principal component analysis
  • Lecture 17-18
  • Classification - selected topics

3
Coursework
  • A practical exercise implementing data modelling
    algorithms in the language of your choice.
  • Hand out
  • Week 4, after lecture 6
  • Hand in
  • Week 7
  • You will require a doc login to submit the
    coursework

4
Web Resources
  • The course material for the course can be found
    on my web page
  • www.doc.ic.ac.uk/dfg
  • Tutorial solutions will be posted a few days
    after the tutorial is set.

5
Lecture 1
  • Bayes theorem and Bayesian inference

6
Probability and Statistics
  • Statistics emerged as an important mathematical
    discipline in the nineteenth century.
  • Probability is much older, and has been studied
    as long ago as man took an interest in games of
    chance.
  • Our story starts with the famous theorem of the
    Rev. Thomas Bayes, published in 1763.

7
Independent Events
  • For independent events S and D
  •  
  • P(DS) P(D) P(S)?
  •  
  • (read "disease" for D and "symptom" for S)?
  •  

8
Dependencies
  • However in cases where S and D are not
    independent we must write
  • P(DS) P(D) P(SD)?
  • where P(SD) is the probability of the symptom
    given the disease has occurred.

9
Bayes Theorem
  • Now since conjunction is commutative
  • P(DS) P(S) P(DS) P(D) P(SD)?
  • and by rearranging we get
  • P(DS) P(D) P(SD) / P(S)
  • (Bayes Theorem.)?

10
Bayes Theorem as an Inference Mechanism
  • P(DS) P(D) P(SD) / P(S)
  • P(DS) The probability of the disease given the
    symptom is what we wish to infer.
  • P(D) is the probability of the disease (within a
    population) this is a measurable quantity.
  •  
  • P(SD) is the probability of the symptom given
    the disease. We can measure this from the case
    histories of the disease.

11
Normalisation
  • P(DS) is a conditional probability so for a
    given S its values sum to 1.
  • Suppose D has just two states d0 and d1, then
  • P(d0 s0) P(s0d0) P(d0) / P(s0)
  • P(d1 s0) P(s0d1) P(d1) / P(s0)?
  • The denominator is the same in both cases so we
    can calculate it using P(d0 s0) P(d1 s0) 1
  • In other words we can eliminate P(s0)?

12
Prior and Likelihood Information
  • P(DS) ? P(D) P(SD)?
  • (We write 1/P(S) ? to remind us it is a
    constant)?
  • P(D) is prior information, since we knew it
    before we made any measurements
  •  
  • P(SD) is likelihood information, since we gain
    it from measurement of symptoms.

13
Bayesian Inference (in its most general form)?
  • Convert the prior and likelihood information to
    probabilities 
  • Multiply them together
  • Normalise the result to get the posterior
    probability of each hypothesis given the
    evidence 
  • Select the most probable hypothesis.

14
Prior Knowledge
  • In simple cases we obtain the prior probability
    from statistics. For example, we can calculate
    the prior probability as the number of instances
    of the disease divided by the number of patients
    presenting for treatment.
  • However in many cases this is not possible -
    since the data isnt there.
  • There may also be prior knowledge in other forms

15
Example from Computer Vision
  • Consider a program to determine whether an image
    contains a picture of a cat.

Drawing by Kliban
16
Feature Extraction
  • There are a lot of things that we could use to
    detect a cat, but lets start with something
    simple.
  • Well write a program to extract circles from the
    image. If we find two adjacent circles then well
    assume that its a cats eyes.
  • (I know that they sleep a lot, but never mind)?

17
Representing prior knowledge about a cat
We can formalise our model by specifying Ri ? Rj
(the eyes are approximately the same size)? S
2 (RiRj) (the eyes are spaced correctly)?
18
Semantic description of a Cat
  • To find our cat we will extract every circle from
    the image and record its position and radius.
  • For each pair of circles we will calculate a
    catness measure which is
  • Catness (Ri - Rj)/Ri (S - 2 (Ri Rj))
    /Ri
  • The catness measure is zero for a perfect match
    to our model.

19
Turning measures into probabilities
  • We could choose some common sense way of changing
    our catness measure into a probability.

20
Using a distribution
  • We can make this look more respectable
    mathematically by choosing a distribution

21
Subjective Probability
  • In all cases we are making a subjective estimate
    of the probability.
  • There is no formal reason for choosing the
    distribution other than personal judgement

22
Objective Probabilities
  • We make measurements from a large set of
    photographs

23
Frequencies
  • To get our discrete probability distribution we
    could process a vast library of photographs.
  • Every time we extract a pair of circles and apply
    a catness measure, we also get an expert to
    tell us whether the extracted structure does
    represent a cat.
  • For each bin we calculate the ratio of correctly
    identified cats to the total.

24
Fitting a Distribution
  • To overcome experimental error, and to compact
    our representation we may try to fit an
    appropriate distribution.

25
Subjective vs. Objective Probabilities
  • There is a long standing debate as to whether the
    subjective or the objective approach is the most
    appropriate.
  • Objective may seem more plausible at first, but
    does require lots of data and is prone to
    experimental error.
  • NB Some people say the Bayesian approach must be
    subjective. I do not subscribe to this.

26
Likelihood
  • Our prior probabilities represent long standing
    beliefs. They can be taken to be our established
    knowledge of the subject in question.
  • When we process data, and make measurements we
    create likelihood information.
  • Likelihood can incorporate uncertainty in the
    measurement process.

27
Likelihood and Catness
  • Our computer vision process could not just
    extract a circle, but also tell us how good a
    circle it is, for example by counting the pixels
    that contribute to it.

28
Problem Break
  • Using a suitable measure estimate the likelihood
    of each circle below

29
Solution
  • There are 28 possible circle pixels
  • The first circle has 22, so Likelihood 22/28
    0.78
  • The second circle has 11, so likelihood 11/28
    0.39

30
Summary on Bayesian Inference and Cats 1
  • Returning to our inference rule
  • P(CI) ? P(C) P(IC)?
  • PRIOR
  • P(C) is the probability that two circles
    represent a cat, found by measuring catness and
    using prior knowledge to convert catness to
    probability.

31
Summary on Bayesian Inference and Cats 2
  • P(CI) ? P(C) P(IC)?
  • LIKELIHOOD
  • P(IC) the probability of the image information,
    given that two circles represent a cat.
  • (Rather a round about way of expressing the idea
    that image information does actually represent
    two circles.)?

32
Prior and Subjective
  • Should we use subjective or objective methods?
    (Many schools of thought exist)?
  • Prior information should be subjective. It
    represents our belief about the domain we are
    considering.
  • (Even if data has made a substantial contribution
    to our belief)?

33
Likelihood and Objective
  • Likelihood information should be objective. It is
    a result of the data gathering from which we are
    going to make an inference.
  • It makes some assessment of the accuracy of our
    data gathering
  • In practice either or both forms can be
    subjective or objective.
Write a Comment
User Comments (0)
About PowerShow.com