Title: Modelling Variability
1Unit 7
2Generalising from data
Weight gain of a litter of 40 pigs in 20 days
- A farmer is interested in specific pigs
- Researcher wants to generalise. What would you
expect from other pigs with same diet?
3Generalising from data
Weights of 144 carrots
- No interest in specific carrots
- Researcher wants to generalise. What is
distribution of this type of carrot in general?
4Randomness of data
- Repeat data collection
- Different data
- But similar data
- What do we mean by a generalisation?
- Any data set gives information about it
5Model for randomness of data
- Data Sample from underlying population
- Population may be real
- Herd sizes from sample of 20 dairy farms in
region - Popn is herd sizes of all dairy farms in region
- Usually population is hypothetical
- What we would get if infinite data set collected
6Describing population
7The Normal Curve
- Many data sets are fairly symmetric, tailing off
in the same way on both tails. - Pig weight gains
- Skew data becomes more symmetric if averages or
totals are examined.
Bunches of 4 carrots
Single carrots
8Examples with normal model
9Normal curves (distributions)
- Defined by two numbers (parameters)
- Mean (centre of distribution), ?
- Standard deviation (spread of distn), ?
- Shape all normal distns look as follows
??
??
?
?
1070-95-100 rule for normal distn
??
??
?
?
- Exactly Easy approximation
- P(X within ? of ?) 0.683 approx 70
- P(X within 2? of ?) 0.954 approx 95
- P(X within 3? of ?) 0.997 approx 100
1170-95-100 rule-of-thumb
Any symmetric bell-shaped distn
- 70-95-100 rule-of-thumb
- P(X within s of x) is approx 70
- P(X within 2s of x) is approx 95
- P(X within 3s of x) is approx 100
s
s
s 5.09
Pigs
12Best-fitting normal distribution
- Best estimate of ? is the sample mean
-
- Best estimate of ? is the sample st devn
-
13Probabilities for normal distn
- Normal curve is a histogram
- Proportion of values (probability) equals area
- How likely is it that we will get a result
between 25 28?
- 1/4 of area
- About 1 time in 4
- Probability 0.25
14Finding normal probabilities
- By eye
- Draw rough normal curve and guess area
- Exact
- Find z-scores for ends of interval (number of
standard deviations from the mean) - Look up areas to right of z-scores (tables)
- Add or subtract areas to give answer
15Using Standard Normal Tables
- Area (probability) of lower value than z for
normal(?0, ?1) distribution - Find probability of -
- z lt 1.30
- z lt -0.75
- z gt 1.23
- z gt -1.68
- Find the z value if the area is 0.6734
16Probabilities for Real Data
- For normal distns with ? ? 0 or ? ? 1
- z-value is number of st devns above mean
- Translate question into z-values
-
- Look up z values in the table as before.
17Examples
- Weight gains are approx normal
- Mean is 15 kg, standard deviation is 3 kg
- What proportion of animals -
- Gain less than 10 kg
- Gain less than 19 kg
- Gain more than 14 kg
- Gain more than 22 kg
- Gain between 13 and 20 kg
18Solution-
- Sketch distn
-
- From tables area to left is
- Probability is
19Weight gains of 40 pigs in 20 days
- 14.7 23.6 15.1 12.9 17.4 25.4 5.3 10.7
17.4 16.0 - 14.2 13.8 4.9 13.4 8.5 10.7 23.6 19.6
8.5 13.4 - 17.4 15.1 14.7 14.7 14.7 17.4 16.0 14.2
14.2 13.4 - 7.6 9.8 8.9 8.5 23.6 9.3 23.6
11.1 17.8 9.3
20Pig weight gains
s 5.09
- mean - 2s mean - s mean mean s mean 2s
4.08 9.17 14.26 19.35 24.44
21Probability and Distributions
- Random Phenomenon
- A phenomenon is called random if individual
outcomes are uncertain but there is none the less
a regular distribution in a large number of
repetitions. - Probability
- The probability of an event is its relative
frequency in a great many repetitions of the
random phenomenon.
22Central Limit Theorem
- Averages or totals are closer to a normal distn
than individual values
- Grass grubs 200 cores (100mm diameter)
23Central Limit Theorem
- Agricultural data is often totals from many
individuals, so normal distribution is often
reasonable model. - We often summarise data with means
- Normal distribution for sample means