Title: Computing Fundamentals 2 Lecture 7 Statistics
1Computing Fundamentals 2Lecture 7 Statistics
- Lecturer Patrick Browne
- http//www.comp.dit.ie/pbrowne/
- Room K408
2Statistics
- Raw data are just lists of facts and numbers. The
branch of mathematics that organizes, analyzes
and interprets raw data is called statistics.
3Permutations Combinations
- P(n,r) n! / (n-r)!
- Permutations a, b, and c taken 2 at a time is
32/16 ltsequencegt - ltabgt,ltbagt,ltacgt,ltcagt,ltbcgt,ltcbgt
- C(n,r) n! /r! (n-r)!
- Combinations of a, b, and c taken 2 at a time is
32/213ab,ac,bc set - ab is the same combination as ba, but they are
distinct permutations.
4Probability Calculations
- Conditional probability
- P(AE) P(A ? E)/P(E)
- Test for independence
- P(A ? B) P(A)P(B)
- Calculation of union
- P(A ? B) P(A) P(B) P(A ? B)
5Frequency Table
- One way of organizing raw data is to use a
frequency table (or frequency distribution),
which shows the number of times that an
individual item occurs or the number of items
that fall within a given range or interval.
6Frequency Distribution
- Suppose that a sample consists of the heights of
100 male students at XYZ University. We arrange
the data into classes or categories and determine
the number of individuals belonging to each
class, called the class frequency. The resulting
table is called a frequency distribution or
frequency table
7Frequency Distribution
- The first class or category, for example,
consists of heights from 60 to 62 inches,
indicated by 6062, which is called class
interval. Since 5 students have heights belonging
to this class, the corresponding class frequency
is 5. Since a height that is recorded as 60
inches is actually between 59.5 and 60.5 inches
while one recorded as 62 inches is actually
between 61.5 and 62.5 inches, we could just as
well have recorded the class interval as 59.5
62.5. In the class interval 59.5 62.5, the
numbers 59.5 and 62.5 are often called class
boundaries.
8Frequency Distribution
- The midpoint of the class interval, which can be
taken as representative of the class, is called
the class mark. A graph for the frequency
distribution can be supplied by a histogram.
9Frequency table class interval
10Mean
- The arithmetic mean is the sum of the values in a
data set divided by the number of elements in
that data set. -
11Mean
- The arithmetic mean is the sum of the values in a
data set divided by the number of elements in
that data set. - x ?xi
- n
-
- x ?fixi where f
denotes frequency - ?fi
12Variance Standard Deviation
- List A 12,10,9,9,10
- List B 7,10,14,11,8
- The mean (x) of A B is 10, but the values of A
are more closely clustered around the mean than
those in B (or there is greater desperation or
spread in B). We use the standard deviation to
measure this spread (SD(A)1.1,SD(B) 2.4)
13Variance Standard Deviation
- The variance is always positive and is zero only
when all values are equal. - variance ?(xi - x )2
- n
-
-
- standard deviation
Alternatively
14Variance of a frequency distribution
15Median
- The median is the middle value. If the elements
are sorted the median is - Median valueAt(n1)/2 odd
- Median average(valueAtn/2,
- valueAtn/21) even
- For odd and even n respectively.
- Example 1,2,3,4,5 , Median 3
- Example 1,2,3,4,5,6, Median 3.5
16Mode
- The mode is the class or class value which occurs
most frequently. We can have bimodal or
multimodal collections of data.
The height of the bars is the number of cases in
the category
17Bernouilli Trials
- Independent repeated trial with two outcomes are
called Bernouilli Trials. The probability of k
successes in a binomial experiment is
- Where n is the number of trials and (n-k) is the
number of failure.
18Bernouilli Trials Example
- John hits target p1/4,
- John fires 6 times, n6,
- What is the probability John hits the target 2
times?
19Bernoulli Trials Example
- John hits target p1/4,
- John fires 6 times, n6,
- What is the probability John hits the target at
least once?
No success (0), all failures, Anything to the
power of 0 is 1 Only 1 way to pick 0 from 6
Probability that John hits target at least once
EXCEL 1-((3/4)6)
Probability that John does not hit target
0 to the power 0 is undefined, anything else to
the power of zero is 1.
20Bernoulli Trials Example
- Probability that Mary hits target p1/4,
- Mary fires 6 times, n6,
- What is the probability Mary hits the target more
than 4 times?
In EXCEL (6)((1/4)5)((3/4)1)(1/4)6
21Random variables and probability distributions.
- Suppose you toss a coin two times. There are four
possible outcomes HH, HT, TH, and TT. Let the
variable X represents the number of heads that
result from this experiment. The variable X can
take on the values 0, 1, or 2. In this example, X
is a random variable because its value is
determined by the outcome of a statistical
experiment.
22Random variables and probability distributions.
- A probability distribution is a table or an
equation that links each outcome of a statistical
experiment with its probability of occurrence.
The table below, which associates each outcome
(the number of heads) with its probability. This
is an example of a probability distribution.
23Random Variable
- A random variable X on a finite sample space S is
a function (or mapping) from S to a number R in
S. - Let S be sample space of outcomes from tossing
two coins. Then mapping a is - SHH,HT,TH,TT (assume HT?TH)
- Xa(HH)1, Xa(HT)2, Xa(TH)3, Xa(TT)4
- The range (image) of Xa is
- S1,2,3,4
24Random Variable
- Let S be sample space of outcomes from tossing
two coins, where we are interested in the number
of heads. Mapping b is - SHH,HT,TH,TT
- Xb(HH)2, Xb(HT)1, Xb(TH)1, Xb(TT)0
- The range (image) of Xb is
- S0,1,2
25Random Variable
- A random variable is a function that maps a
finite sample space into to a numeric value. The
numeric value has a finite probability space of
real numbers, where probabilities are assigned to
the new space according to the following rule - pointi P(xi) sum of probabilities of points
in S whose range is xi. - Recall function F Domain -gt Range (Image)
26Random Variable
- The function assigning pi to xi can be given as a
table called the distribution of the random
variable. - pi P(xi)
- number of points in S whose image is xi
- number of points in S
- (i 1,2,3...n) gives the distribution of X
27Random Variable
- The equiprobable space generated by tossing pair
of fair dice, consists of 36 ordered pairs(1) - Slt1,1gt,lt1,2gt,lt1,3gt...lt6,6gt
- Let X be the random variable which assigns to
each element of S the sum of the two dice
integers 2,3,4,5,6,7,8, 9,10,11,12
28Random Variable
- Continuing with the sum of the two dice.
- There is only one point whose image is 2, giving
P(2)1/36. - There are two points whose image is 3, giving
P(3)2/36. (lt1,2gt?lt2,1gt, but their sums are ) - Below is the distribution of X.
36/36
29Example Random Variable
- A box contains 9 good items and 3 defective items
(total 12 items). Three items are selected at
random from the box. Let X be the random variable
that counts the number of defective items in a
sample. X has a range space Rx
0,1,2.3. - The sample space 12-choose-3 220 different
samples of size 3. - There are 9-choose-3 84 samples of size 3 with
0 defective items. - There are 3 9-choose-2 108 samples of size
3 with 1 defective. - There are 3-choose-2 9 27 samples of size 3
with 2 defective. - There 3-choose-3 1 samples of size 3 with 3
defective items. - Where n-choose-r means the number of
combinations
COMBIN(12,3))
84 108 27 1 ----- 220
30Example Random Variable
- A box contains 9 good items and 3 defective items
(total 12 items). Three items are selected at
random from the box. Let X be the random variable
that counts the number of defective items in a
sample. X can have values 0-3. - Below is the distribution of X.
84 108 27 1 ----- 220
220/220
31Functions of a Random Variable
- If X is a random variable then so is Yf(X).
- P(yk) sum of probabilities xi, such that
ykf(xi)
32Expectation and variance of a random variable
- Let X be a discrete random variable over sample
space S. - X takes values x1,x2,x3,... xt with respective
probabilities p1,p2,p3,... pt - An experiment which generates S is repeated n
times and the numbers x1,x2,x3,... xt occur with
frequency f1,f2,f3,... ft (?fin) - If n is large then
- one expects
33Expectation of a random variable
- So becomes
- The final formula is the population mean,
expectation, or expected value of X is denoted as
? or E(X).
34Variance of a random variable
- The variance of X is denoted as ?2 or Var(X).
- 2
2 - The standard deviation is
35Expected value, Variance, Standard Deviation
- E(X) µ µx ??xipi
- Var(X) ?2 ?2x ?(xi - µ)2pi
- SD(X) ?x
36Relation between population and sample mean.
- If we select a sample size N at random from a
population, then it is possible to show that the
expected value of the sample mean m is the
population mean µ. - This rule differs slightly for variance. The
sample variance is (N-1)/N times the population
variance.
37Example Random Variable Expected Value
- A box contains 9 good items and 3 defective
items. Three items are selected at random from
the box. Let X be the random variable that counts
the number of defective items in a sample. X can
have values 0-3. - Below is the distribution of X.
38Example Random Variable Expected Value
- µ is the expected value of defective items in
in a sample size of 3. - µE(X)
- 0(84/220)1(108/220)2(27/220)3(1/220)132/220
? - Var(X)
- 02(84/220)12 (108/220)22 (27/220)32 (1/220)
- µ 2 ? - SD(X) sqrt(µ2)?
39Fair Game1?
- If a prime number appears on a fair die the
player wins that value. If an non-prime appears
the player looses that value. Is the game
fair?(E(X)0) - S1,2,3,4,5,6
- E(X) 2(1/6)3(1/6)5(1/6)(-1)(1/6)(-4)(1/6)(-
6)(1/6) -1/6 - Note 1 is not prime
40Fair Game2?
- A player gambles on the toss of two fair coins.
If 2 heads occur the player wins 2 Euro. If 1
head occurs he wins 1 Euro. If no heads occur he
looses 3 Euro. Is the game fair?(E(X)0) - SHH,HT,TH,TT,
- X(HH) 2, X(HT)X(TH)1, X(TT)-3
- E(X) 2(1/4)1(2/4)-3(1/4) 0.25
41Mean(µ), Variance(?2), Standard Deviation(?)
xi 2 3 11
pi 1/3 1/2 1/6
µExipi 2(1/3) 3(1/2) 11(1/6) 4 E(X2)
Exipi 2(1/3) 3(1/2) 11(1/6) 26 ?2 Var(X)
E(X2) µ2 26 42 10 ? sqrt(Var(X))
sqrt(10) 3.2
42Mean(µ), Variance(?2), Standard Deviation(?)
xi 2 3 11
pi 1/3 1/2 1/6
µExipi 2(1/3) 3(1/2) 11(1/6) 4 E(X2)
Exipi 2(1/3) 3(1/2) 11(1/6) 26 ?2 Var(X)
E(X2) µ2 26 42 10 ? sqrt(Var(X))
sqrt(10) 3.2
43Distribution Example(1)
- Five cards are numbered 1 to 5. Two cards are
drawn at random .Let X denote the sum of the
numbers drawn. Find (a) the distribution of X and
(b) the mean, variance, and standard deviation. - There are C(5,2) 10 ways of drawing two cards
at random.
44Distribution Example(2)
- Ten equiprobable sample points with their
corresponding X-values are
points 1,2 1,3 1,4 1,5 2,3 2,4 2,5 3,4 3,5 4,5
xi 3 4 5 6 5 6 7 7 8 9
45Distribution Example(3)
xi 3 4 5 6 5 6 7 7 8 9
pi 0.1 0.1 0.2 0.2 0.2 0.2 0.2 0.2 0.1 0.1
46Distribution Example(4)
xi 3 4 5 6 5 6 7 7 8 9
pi 0.1 0.1 0.2 0.2 0.2 0.2 0.2 0.2 0.1 0.1
- The mean is 3(0.1)....9(0.1)6
- The E(X2) is 32(0.1)....92(0.1) 39
- The variance is 39 62 3
- The SD is sqrt(3) 1.7