Probability and probability distributions

About This Presentation

Title:

Probability and probability distributions

Description:

The chairs have arms, but no tables. Bring hard folder or paper pad to ... Number of yellow cars to pass in 30 minutes. Chance that a baby will be a boy or girl ... – PowerPoint PPT presentation

Number of Views:133

Avg rating:3.0/5.0

Slides: 69

Provided by: moll2

Category:

more less

Transcript and Presenter's Notes

Title: Probability and probability distributions

1
Probability and probability distributions

PU5005 Lecture 2

2
Administration

SPSS practicals in Computer room 3 (behind 2nd
floor refectory) this will be the room for every
SPSS practical
Occasionally there will be a classroom based
tutorial
Sign attendance register at lecture and practical
5 to be paid to office for handbook if not
already done so
All people attending must be registered for this
course including those staff and PhD students
taking this course only

3
Room change to Med Chi Hall

Med Chi Hall (Medico-Chirurgical Hall), Ground
Floor, Polwarth Building
If entering the Polwarth Building from the west,
follow the main corridor east past the medical
library, through one set of double doors and take
the stairs to your left down one flight. The Med
Chi Hall is through a single wooden door at the
bottom of the stairs. The lecture room itself is
to the left.
If entering the Polwarth Building from the east
through the main entrance, go through the foyer
past the main stairs on the left and through an
open set of double doors. Turn left signposted to
Medical Microbiology. Follow this corridor to its
end where there will be a single wooden door to
the Med Chi Hall. The lecture room itself is to
the left.

4
Room change to Med Chi Hall

Larger room. The chairs have arms, but no tables.
Bring hard folder or paper pad to lean on.
12 Oct (Next Thursday)
Not 19 Oct when we are in this room (1.147) again
26 Oct
2 Nov
9 Nov
16 Nov
23 Nov
30 Nov
7 Dec

5
Why do we need probability?

Basis for statistical inference
Using data from a sample to address questions
regarding the population

6
What is probability?

Nothing new you use it all the time in everyday
life
Time wait in supermarket queue to be served
Number of yellow cars to pass in 30 minutes
Chance that a baby will be a boy or girl
Chance that a person will get cancer in their
lifetime

7
Definitions

An experiment is an activity whose outcome is
uncertain.
Examples Throwing a die (1,2,3,4,5 or
6) Having a disease (yes/no)
An event consists of one or more possible
outcomes.
Experiment Throwing a die
Event A even number

8
Approaches to computing probabilities

Classical
Uses the equally likely outcomes approach
Probability of an event A
Number of equally likely events in A
Total number of events
Relative frequency
the proportion of times the event occurs if the
experiment were repeated a large number of times

9
Approaches to computing probabilities

What is the probability that on one throw of a
die, the 6 will face upwards?

Classical
Number of events in A 1 Total number of equally
likely events 6 Probability 1/6
10
Approaches to computing probabilities

What is the probability that on one throw of a
loaded dice, the 6 will face upwards?

Relative frequency
Generate a large number of dice throws and
generate a frequency distribution for each event
1,2,3,4,5,6 Probability 6 Number of sixes /
total number of throws
11
Quick experiment
12
Toss one die 600 times
13
Toss a die 600 times
14
Probability

Probability measures uncertainty
A probability measures the chance of an event
occurring
Probabilities must lie between 0 and 1
The sum of the probabilities of all possible
mutually exclusive events is 1
Probability is central to statistical inference

15
Rules of probability

There are rules that allow us to compute the
probability of an event occurring
Addition rule
Multiplication rule
Terminology
Independent events
Mutually exclusive events
Conditional probability

16
Independent events

Definition Two events are independent if one
happening has no bearing on whether the other
happens or not.
Examples
An experiment involves throwing two dice. The
value of the first die tells us nothing about
what the value of the second die will be.
Pick two people in the class. Knowing the eye
colour of one will tell me nothing about the eye
colour of the other.

17
Mutually exclusive events
Definition Two events are mutually exclusive if
one happening means that the other cannot happen

The sum of the probabilities of all possible
mutually exclusive events is equal to 1.
18
Addition Rule

Consider two events A and B what is the
probability that either event occurs?
The addition rule states that
PA or B PA PB - PA and B

19
Example

A die is thrown and the upper face is observed.
Event A is that an even number is observed
2,4,6
Event B is a number lt 4 is observed 1,2,3
PA3/6 PB3/6 PA and B1/6

Using the addition rule
PA or B PA PB PA and B 3/6
3/6 1/6 5/6
20
Pictorial explanation
4, 6
1, 3
2
21
Example P14

A loaded die is thrown and the upper face is
observed.
Event A is that an even number is observed
Event B is a number lt 4 is observed
PA0.4 PB0.2 PA and B0.1

Using the addition rule
PA or B PA PB PA and B 0.4
0.2 0.1 0.5
22
Pictorial explanation
P 0.4
P0.2
P0.1
23
Mutually Exclusive Events

Two events are mutually exclusive if one
happening means that the other cannot happen.
e.g. Throw a die
Event A Even number
Event B Odd number
PA or B PA PB
3/6 3/6
1

24
Multiplication rule of probability

If two events are independent then
P(event A and event B)
P(A) ? P(B)
In a die toss experiment (fair/unloaded die),
events A and B are
Aobserve an even number and
Bobserve a number lt4
What is the probability of observing A and B?

25
Solution

P(A) 2, 4, 6 3/6 1/2
P(B) 1,2,3,4 4/6 2/3
P(A and B) 2,4 2/6
Note P(A and B) (1/2 x 2/3) P(A) ? P(B)
This implies that events A and B are independent
as P(A and B) P(A) ? P(B)

26
Conditional Probability

Independence is a strong assumption.
Consider instead the probability of one event
happening given that the other has occurred.
Probability of event A occurring GIVEN that B has
occurred is denoted with a vertical line
P(AB)
P(AB) P(A and B) P(B)

27
Definition of conditional probability

Conditional probability
P(BA) P(A and B)
P(A)
P(B given A has occurred) P(A and B)
P(A)
P(A and B) P(BA) x P(A)

28
Example

European cards have 52 in a pack. There 13 each
of 4 types hearts ?, diamonds ?, clubs ? and
spades ?
Choose two cards from pack, what is the
probability that both are hearts, under two
scenarios
The first card is replaced
The first card is not replaced

29
Scenario A

First card is replaced in pack
The two events are independent
When drawing each card all 52 cards are
available.
P(1st card is a heart) 13/52
P(2nd card is a heart) 13/52
P(1st and 2nd card are hearts) 13/52 x 13/52
0.0625

30
Scenario B

The first card is not replaced in the pack
Therefore, the probability of the second card
being a heart is conditional on the first card.
P(1st card is heart) 13 / 52
P(2nd card is a heart given that the first was a
heart) 12 / 51
P(1st heart and 2nd heart) 12/51 x 13/52
P(2nd heart1st heart) P(1st heart)
0.0588

31
Example

A computing magazine reported that one third of
graduates working in computing have degrees in IT
and two-thirds have degrees in other subjects.
They conclude that arts and science graduates
are twice as likely as IT graduates to pursue
careers as IT professionals
Is this statement definitely correct?

32
Discussion of example

one third of graduates working in computing have
degrees in IT and two-thirds have degrees in
other subjects

P (arts or science graduate work in IT) 2/3
P (graduate works in IT arts or science
graduate) ? 2/3
P (AB) ? P(BA)
33
Diagnostic tests Conditional probability

In certain situations it is necessary to know the
probability of a particular event or outcome
happening given that we already know that another
event or outcome has already occurred.
P(BA) P(A and B) P(A)
The main application of these types of
probabilities is in diagnostic test and screening
programs.

34
Screening example

As part of the Breast Cancer Screening Project of
the Health Insurance Plan of Greater New York
(HIP Program) 64,810 women aged between 40 and 64
were screened for breast cancer by mammography
and physical examination.
A total of 1,115 women were positive on
screening, of whom 132 had breast cancer
diagnosed. During a five- year follow-up period
45 further cases of cancer were detected among
women who were negative on screening. Source
American Journal of Epidemiology (1974), 100
357-366.

35
Screening
What is the probability that a women will develop
breast cancer given that she has had a positive
test?
P (breast cancer positive test result)
36
Solution
P (cancerve screening) P (cancer and ve
screen) / P (ve screen)
P (cancer and ve screening) 132 / 64,810
P (ve screening) 1,115/64,810
0.118
P (cancerve screening)
37
Predictive values

We have just calculated the positive predictive
value of the screening test in this sample.
Screening or diagnostic tests are used to
identify diseases and to help make a diagnosis.
It is important to know the probability that the
test is giving the correct diagnosis (positive or
negative).
Positive predictive value (PPV) is the
probability of a person having the disease given
a positive test result.
PPV P(test ve and disease ve)
P(test ve)

38
Sensitivity

Sensitivity is the probability that the test is
positive given that the disease is present (i.e.
true positives).
In conditional probability notation this is
P(Test is positive given that we know that the
Disease is present)
P(Test veDisease ve)
P(Test ve and Disease ve)
P(Disease ve)

39
Calculations - Sensitivity

P(Test veDisease ve) P(Test ve and Disease
ve) P(Disease ve)

(132/64,810) / (177/64,810)
132/177 0.75
40
Specificity

Specificity is the probability that the test is
negative given that the disease is absent (i.e.
true negatives).
In conditional probability notation this is
P(Test is negative given we know that the Disease
is absent) P(Test -veDisease-ve)
P(Test ve and Disease -ve)
P(Disease -ve)

41
Calculations - Specificity

P(Test -veDisease -ve) P(Test -ve and Disease
-ve) P(Disease -ve)

(63,650/64,810) / (64,633/64,810)

63,650/64,633 0.985
42
Sensitivity, Specificity, PPV

Sensitivity is the proportion of disease
positives that are correctly identified by the
test
132/177 0.746 74.6
Specificity is the proportion of disease
negatives that are correctly identified by the
test
63650/64,633 0.985 98.5
Positive predictive value is the proportion of
patients with positive test results who are
correctly diagnosed
132/1115 0.118 11.8

43
A note on these diagnostic values
The predictive values are only of limited
validity. In clinical practice the predictive
values depend critically on the prevalence of the
abnormality in the patients being tested. This
may well differ from the prevalence in published
study assessing the usefulness of the test.
44
So what!

The rules of probability allow us to compute
probabilities that a mutually exclusive event
will occur.
From these we can produce theoretical probability
distributions.
Each probability distribution is defined by
certain parameters (e.g. the mean and variance)
which characterise the distribution.
Probability distributions can be discrete or
continuous.

45
Some discrete and continuous probability
distributions

Discrete probability distributions
Binomial (number of trials (n) and probability of
success (p)
Poisson (count of the number of events occurring
independently (average rate))
Continuous probability distributions
Normal (Mean and variance)
t (Mean and variance)
F
Chi square (?2)

46
Binomial distribution

This type of distribution is used to calculate
probabilities when we have a dichotomous variable
(e.g. success/failure following treatment)
3 patients have a headache
They are each given a tablet to relieve their
headache
The outcome (relief/no relief) for the three
patients is independent to one another
Of the three patients there are 3 combinations in
which 2 of the three will have relief following
treatment (Patients 1 and 2, or 1 and 3, or 2 and
3)

47
Binomial distribution example

Cystic fibrosis is a serious congenital disease
which results in abnormal amounts of thick sticky
secretions in the lungs. It is autosomal
recessive, so that if both parents are carriers
there is a 1 in 4 chance that each child they
have will be infected.
If the parents are both carriers and have 3
children
What is the probability that none of the three
children are affected?
What is the probability that two of the three are
affected?

48
Answer

P (none has CF)
P (child 1 has no CF and child 2 has no CF and
child 3 has no CF)
(3/4 x 3/4 x 3/4) 0.4219
The probability that none of the children will
have CF is 0.42 (i.e. there is a 42 chance that
none will have CF)

49
Probability that two of the three children will
have CF
P (child 1 has CF and child 2 has CF and child 3
has no CF
or child 1 has CF and child 2 has no CF and child
3 has CF
or child 1 has no CF and child 2 has CF and child
3 has CF )

(1/4 x1/4 x 3/4) (1/4 x 3/4 x 1/4) (3/4 x
1/4 x 1/4)
3 x (1/4 x 1/4 x 3/4) 0.1406
The probability that two out of three will have
CF is 0.14.

50
Binomial distribution
51
Binomial distribution

Probabilities can be computed using a
mathematical formula.
These are listed in Statistical Tables and it is
easy to find the probability of r events
occurring in n trials with the probability of
success P .
However, with computers these can be easily
accessed without the need to consult tables of
probabilities.
When the number of trials (n) increases, the
binomial distribution can be approximated by the
Normal distribution.

52
Continuous probability distributions

For continuous probability distributions we can
only calculate the probability of a random
variable taking values in a certain range.
A curve can be drawn from the equation that
represents the appropriate probability
distribution.
The area under the curve must equal 1 since the
curve represents the probability of all possible
events.

53
Normal distribution

Turning our attention to continuous variables
such as height, weight or blood pressure it is
also possible to calculate probabilities.
The distributions of many medical measurements
approximate to the Normal distribution e.g. serum
uric acid levels, cholesterol levels).

54
The Normal distribution

This distribution is a smooth bell-shaped
distribution which is symmetrical about its mean
value.
It will be flatter is the variance is larger and
more peaked if the variance is small.
Areas under this Normal curve correspond to
probabilities.

55
Properties of Normal curve
P0.68 P0.95 P0.999
56
Standard Normal Distribution

Since there are an infinite number of Normal
distributions, it is easier to work with the
Standard Normal distribution.
The Standard Normal distribution has a mean of 0
and a standard deviation of 1.
The standardised Normal deviate (Z score) is a
random variable that has a Standard Normal
distribution.
A Z-score establishes how many standard
deviations a particular value is from the mean
value.

57
How to calculate Z-scores

It is easier to work with Z-scores
The Z-score is found as
X is the random variable of interest
m (represents the population mean of the Normal
distribution that the random variable follows)
s (represents the population standard deviation).

58
Properties of Normal curve
P0.68 P0.95 P0.999
59

The probability of being between any points may
be calculated as the area under the curve. These
are available in tables, but roughly correspond
to
the probability that a random Normal variable
will take a value between
the mean and 1 standard deviation either side is
0.68
the mean and 1.96 standard deviations either side
is 0.95
the mean and 2.58 standard deviations either side
is 0.99

60
Example

Suppose we know that in a population of patients,
diastolic blood pressure follows a Normal
distribution with mean 100mmHg and standard
deviation 8 mmHg.
Find the probability that the diastolic blood
pressure of a particular patient is less than 92
mmHg.

61
Solution

If we let X represent diastolic blood pressure,
m represents mean blood pressure and s
represents standard deviation
we can perform the following
Calculate the probability that blood pressure is
less than 92 mmHg, denoted P(X lt 92)
If X comes from the Normal distribution with mean
100 and SD8,
then standardise the variable into a Z score,
thus
P(Z lt 92-100) P (Z lt -1), 8
where Z has a standardised normal distribution
(Z? N(0,1)).

62
P0.16
63
The Normal distribution

The Normal distribution is for continuous data
when the population mean and variance are known.
When these are not known and we only have sample
information about the mean and variance then we
use the Students t-distribution. As the sample
size increases this tends towards the Normal.
In fact many of the distributions tend towards
Normality especially if many samples are
collected. Hence, the Normal distribution has
become an important part in the theory of
statistics.

64
Chi-Squared

Another important distribution related to the
normal is the Chi-squared distribution.
Its use is used when investigating categorical
data.

65
Crossstab Example

If eye and hair colour are not associated then
for example, the Expected number with blue eyes
and blond hair would be

So the chi-squared is found by looking a function
of the discrepancy between observed and expected
counts in each cell
summed over all combinations of hair and eye
colour.
If this is large and in the tail of the
distribution, then it may be said that the
observed is not as expected!
More of this later.

67
Summary

Probabilities are integral to all things around
us.
We can derive and understand probabilities.
We have seen that probabilities build together to
form probability distributions.
Some are theoretical distributions that are well
understood, the most important being the Normal.
Using these theoretical distributions we can
begin to make inferences about the population on
the basis of samples.

68
On a brighter note

Dont worry if you find the theory underlying
these distributions confusing.
In practice, it is more important that you know
when and how to use each of the probability
distributions rather than understand how the
probabilities are calculated.
They allow us to quantify how likely an event,
measurement, or statistic is to occur.

Write a Comment

User Comments (0)