Title: Probability
1Lecture 4
- Probability
- and what it has to do with
- data analysis
2Please Read
- Doug Martinsons
- Chapter 2 Probability Theory
- Available on Courseworks
3Abstraction
- Random variable, x
- it has no set value, until you realize it
- its properties are described by a distribution,
p(x)
4Probability density distribution
- When you realize x
- the probability that the value you get is
-
- between x and xdx
- is p(x) dx
5- the probability, P, that the value you get is
- is between x1 and x2
- is P ?x1x2 p(x) dx
- Note that it is written with a capital P
- And represented by a fraction between
- 0 never
- And
- 1 always
6Probability P that x is between x1 and x2 is
proportional to this area
p(x)
x
x2
x1
7Probability that x is between -? and ? is unity,
so total area 1
p(x)
x
- the probability that the value you get is
- is something is unity
- ?-?? p(x) dx 1
Or whatever the allowable range of x is
8Why all this is relevant
- Any measurement is that contains noise is treated
as a random variable, x - The distribution p(x) embodies both the true
value of the quantity being measured and the
measurement noise - All quantities derived from a random variable are
themselves random variables, so - The algebra of random variables allows you to
understand how measurement noise affects
inferences made from the data
9Basic Description of Distributions
10Mode x at which distribution has peak most-likely
value of x
peak
p(x)
x
xmode
11But modes can be deceptive
100 realizations of x
x N 0-1 3 1-2 18 2-3 11 3-4 8 4-5 11 5-6 14 6-7 8
7-8 7 8-9 11 9-10 9
Sure, the 1-2 range has the most counts, but most
of the measurements are bigger than 2!
peak
p(x)
x
0
10
xmode
12Median 50 chance x is smaller than xmedian 50
chance x is bigger than xmedian
No special reason the median needs to coincide
with the peak
p(x)
50
50
x
xmedian
13Expected value or mean x you would get if you
took the mean of lots of realizations of x
Lets examine a discrete distribution, for
simplicity ...
4
3
p(x)
2
1
0
1
2
3
x
14Hypothetical table of 140 realizations of x
mean 20 ? 1 80 ? 2 40 ? 3 /
140 (20/140) ? 1 (80/140) ?
2 (40/140) ? 3 p(1) ? 1 p(2) ? 2 p(3)
? 3 Si p(xi) xi
15by analogyfor a smooth distribution
- Expected value of x
- E(x) ?-?? x p(x) dx
16by the way
- You can compute the expected (mean) value of
any function of x this way - E(x) ?-?? x p(x) dx
- E(x2) ?-?? x2 p(x) dx
- E(?x) ?-?? ?x p(x) dx
- etc.
17Beware
- E(x2) ?E(x)2
- E(x) ?E(?x)2
- and so forth
18Width of a distribution Heres a perfectly
sensible way to define the width of a
distribution
p(x)
50
25
25
x
W50
its not used much, though
19Width of a distribution Heres another way
Parabola x-E(x)2
p(x)
x
E(x)
multiply and integrate
20Idea is that if distribution is narrow, then most
of the probability lines up with the low spot of
the parabola
x-E(x)2
p(x)
x
E(x)
But if it is wide, then some of the probability
lines up with the high parts of the parabola
x-E(x)2 p(x)
Compute this total area
x
E(x)
Variance s2 ?-?? x-E(x)2 p(x) dx
21?variance s A measure of width
p(x)
s
x
E(x)
we dont immediately know its relationship to
area, though
22the Gaussian or normal distributionp(x)
exp - (x-x)2 / 2s2 )
variance
expected value
1 ?(2p)s
Memorize me !
23p(x)
x 1 s 1
Examples of Normal Distributions
x
p(x)
x 3 s 0.5
x
24Properties of the normal distribution
Expectation Median Mode x 95 of
probability within 2s of the expected value
p(x)
95
x
25Functions of a random variable
any function of a random variable is itself a
random variable
26If x has distribution p(x)the y(x) has
distributionp(y) px(y) dx/dy
27This follows from the rule for transforming
integrals
- 1 ?x1x2 p(x) dx ?y1y2 px(y) dx/dy dy
Limits so that y1y(x1), etc.
28example
- Let x have a uniform (white) distribution of 0,1
1
p(x)
0
x
1
Uniform probability that x is anywhere between 0
and 1
29- Let y x2
- then xy½
- y(x0)0
- y(x1)1
- dx/dy½y-½
- px(y)1
- So p(y)½y-½ on the interval 0,1
1
30Numerical testhistogram of 1000 random numbers
Histogram of x, generated with Excels rand()
function which claims to be based upon a uniform
distribution
Plausible that its uniform
Plausible that its proportional to 1/?y
Histogram of x2, generated by squaring xs from
above
31multivariate distributions
32example
- Liberty island is inhabited by both pigeons and
seagulls - 40 of the birds are pigeons
- and 60 of the birds are gulls
- 50 of pigeons are white and 50 are grey
- 100 of gulls are white
33Two variables
- species s takes two values
- pigeon p
- and gull g
- color c takes two values
- white w
- and tan t
Of 100 birds, 20 are white pigeons 20 are tan
pigeons 60 are white gulls 0 are tan gulls
34What is the probability that a bird has species s
and color c ?
a random bird, that is
p
20
20
s
g
60
0
Note sum of all boxes is 100
w
t
c
35This is called theJoint Probabilityand is
writtenP(s,c)
36Two continuous variablessay x1 and x2have a
joint probability distributionand writtenp(x1,
x2)with ? ? p(x1, x2) dx1 dx2 1
37The probability thatx1 is between x1 and x1dx1
and x2 is between x2 and x2dx2 isp(x1, x2)
dx1 dx2so ? ? p(x1, x2) dx1 dx2 1
38You would contour a joint probability
distributionand it would look something like
x2
x1
39What is the probability that a bird has color c ?
Of 100 birds, 20 are white pigeons 20 are tan
pigeons 60 are white gulls 0 are tan gulls
start with P(s,c)
p
20
20
s
g
60
0
w
t
and sum columns
c
80
20
To get P(c)
40What is the probability that a bird has species s
?
start with P(s,c)
p
20
20
40
and sum rows
s
Of 100 birds, 20 are white pigeons 20 are tan
pigeons 60 are white gulls 0 are tan gulls
g
60
0
60
w
t
To get P(s)
c
41These operations make sense with distributions,
too
x2
x2
x2
x1
x1
p(x2)
p(x1)
x1
p(x1) ? p(x1,x2) dx2
p(x2) ? p(x1,x2) dx1
distribution of x1 (irrespective of x2)
distribution of x2 (irrespective of x1)
42Given that a bird is species swhat is the
probability that it has color c ?
Of 100 birds, 20 are white pigeons 20 are tan
pigeons 60 are white gulls 0 are tan gulls
Note, all rows sum to 100
43This is called theConditional Probability of c
given sand is writtenP(cs)similarly
44Given that a bird is color cwhat is the
probability that it has species s ?
Of 100 birds, 20 are white pigeons 20 are tan
pigeons 60 are white gulls 0 are tan gulls So 25
of white birds are pigeons
p
25
100
s
g
75
0
w
t
Note, all columns sum to 100
c
45This is called theConditional Probability of s
given cand is writtenP(sc)
46Beware!P(cs) ? P(sc)
p
p
50
50
25
100
s
s
g
100
0
g
75
0
w
t
w
t
c
c
47note
25 of 80 is 20
?
w
t
c
48and
50 of 40 is 20
p
?
s
g
49and if
- P(s,c) P(sc) P(c) P(cs) P(s)
then
P(sc) P(cs) P(s) / P(c) and P(cs) P(sc)
P(c) / P(s) which is called Bayes Theorem
50 Why Bayes Theorem is important Consider the
problem of fitting a straight line to data, d,
where the intercept and slope are given by the
vector m. If we guess m and use it to predict d
we are doing something like P(dm) But if we
observe d and use it to estimate m then we are
doing something like P(md) Bayes Theorem
provides a framework for relating what we do to
get P(dm) to what we do to get P(md)
51- Expectation
- Variance
- And
- Covariance
- Of a multivariate distribution
52The expected value of x1 and x2 are calculated in
a fashion analogous to the one-variable
caseE(x1)?? x1 p(x1,x2) dx1dx2 E(x2)?? x2
p(x1,x2) dx1dx2
Note E(x1) ?? x1 p(x1,x2) dx1dx2 ? x1
?p(x1,x2)dx2 dx1 ? x1 p(x1) dx1 So the
formula really is just the expectation of a
one-variable distribution
53The variance of x1 and x2 are calculated in a
fashion analogous to the one-variable case,
toosx12?? (x1-x1)2p(x1,x2) dx1dx2 with
x1E(x1)and similarly for sx22
Note, once again sx12?? (x1-x1)2p(x1,x2)
dx1dx2 ? (x1-x1)2 ?p(x1,x2) dx2 dx2 ??
(x1-x1)2p(x1) dx1 So the formula really is just
the variance of a one-variable distribution
54Note that in this distribution if x1 is bigger
than x1, then x2 is bigger than x2 and if x1 is
smaller than x1, then x2 is smaller than x2
x2
This is a positive correlation
x2
x1
x1
Expected value
55Conversely, in this distribution if x1 is bigger
than x1, then x2 is smaller than x2 and if x1
is smaller than x1, then x2 is smaller than x2
x2
This is a negative correlation
x2
x1
x1
Expected value
56This correlation can be quantified by multiplying
the distribution by a four-quadrant function
x2
-
x2
-
x1
x1
And then integrating. The function
(x1-x1)(x2-x2) works fine
cov(x1,x2) ?? (x1-x1) (x2-x2) p(x1,x2) dx1dx2
Called the covariance
57Note that the vector x with elements xi
E(xi)?? xi p(x1,x2) dx1dx2 is the expectation
of xand the matrix Cx with elementsCij ??
(xi-xi) (xj-xj) p(x1,x2) dx1dx2has diagonal
elements equal to the variance of xiCxii
sxi2andoff-diagonal elements equal to the
covariance of xi and xjCxij cov(xi,xj)
58Center of multivatiate distribution
xWidth and Correlatedness of multivariate
distributionsummarized a lot but not
everything about a multivariate distribution
59Functions of a set of random variables, x
A vector of of N random variables in a vector, x
60given y(x)Do you remember how to transform the
integral???? p(x) dNx ???? ? dNy
61given y(x)then???? p(x) dNx ????
px(y) dx/dy dNy
Jacobian determinant, that is, the determinant of
matrix Jij whose elements are dxi/dyj
62But heres something thats EASIER
- Suppose y(x) is a linear function yMx
- Then we can easily calculate the expectation of y
- yi E(yi) ? ? yi p(x1 xN) dx1dxN
- ? ? SMijxj p(x1 xN) dx1 dxN
- S Mij ? ? xj p(x1 xN) dx1 dxN
- S Mij E(xi) S Mij xi
So yMx
63- And we can easily calculate the covariance
- Cyij ? ? (yi yi) (yj yj) p(x1,x2) dx1dx2
-
- ? ? SpMip(xp xp) SqMjq (xq xq) p(x1xN)
dx1dxN - SpMip SqMjq ? ? (xp xp) (xq xq) p(x1xN)
dx1 dxN - SpMip SqMjq Cxpq
So Cy M Cx MT
Memorize!
64Note that these rules work regardless of the
distribution of xif y is linearly related to x,
yMx then yMx (rule for means) Cy M Cx
MT(rule for propagating error)