Title: Assignments
1Assignments
- Assignment 1 handed out October 27,
- due November 3 in class
- Assignment 2 handed out November 22,
- due November 29 in class
2Statistics
- The science of collecting, analyzing,
- presenting and interpreting data
3- Descriptive statistics
- are tabular, graphical and numerical summaries
of data. The purpose of descriptive statistics is
to facilitate the presentation and interpretation
of data - Inferential statistics
- Inference, in statistics, is the process of
drawing conclusions about a particular parameter
of a statistical distribution.
4Characteristics of a statistical problem
- Associated with the problem is a large group
about which inferences are to be made. This group
of objects is the population - There is at least one random variable whose
behavior is to studied relative to the population - The population is too large to study in its
entirety (or techniques used in the study are
destructive in nature). Conclusions about the
population must be based on observing only a
portion or sample of objects drawn from the
population.
5- State research question
- Formulate null and alternative hypotheses
- Identify population variable and when possible
its distributions - Sample data according to chosen sampling
procedure - Determine appropriate test statistic
- Calculate appropriate test statistic
- A) Determine critical values for sampling
distribution and appropriate level of
significance - B) Determine P value of the test
statistic - Compare the test statistic to critical values.
- Reject or accept null hypothesis
- State conclusion and answer the question in step
1
6Guidelines for hypothesis testing
- When testing a hypothesis concerning the value of
some parameter, the statement of equality will
always be included in H0. In this way H0
pinpoints a specific numerical value that could
be the actual value of the parameter. - Whatever is to be detected or supported is the
alternative hypothesis (H1). - Since our research hypothesis is H1, it is hoped
that the evidence leads us to reject H0 and
thereby accept H1.
7Random sample
- Random sample of size n from the distribution of
- the random variable X is a collection of n
- Independent random variables, each with the same
- distribution as X
- Random sample is a sample of size n drawn from a
- population of size N in such a way that every
- possible sample of size n has the same
probability - of being drawn
8Random Sample?
- Question Do green and red birds of the same
species occur in the same frequency? - Sample red and green birds in a forest
- Question What is the size distribution of sugar
maple in the same forest? - Sample 100 individuals
9One sample hypothesis
- For example
- We have a population and we assume that it
- has a normal distribution
- We want to know if the population mean is
- smaller or larger than a specific value
- being selected
10Normal distribution
The location on the X-axis depends on the
population mean The shape of the distribution
depends on the population variance These are
the two parameters of the normal distribution
11- We estimate the population mean from which
- we have drawn our sample with the
- Sample mean
- We estimate the population variance from
- which we have drawn our sample with the
- Sample variance
12How good are these estimators?
- Unbiased estimator is centered around the
- right spot of what it is supposed estimate
- Biased estimator
13Unbiased estimator of population variance
14Importance of sample size
- Take many samples of size n from a population
- which is normally distributed then the mean of
- these samples is normally distributed with
variance
15Standard error of sample mean (mean standard
error)
This estimated with
16Under the assumption that the stated null
hypothesis is true
follows a t-distribution with n-1 degrees of
freedom
17One sample hypothesis test
- Compute the t-ratio.
- Under the assumption that the null hypothesis is
true - What is the probability of obtaining this t
- ratio or a more extreme value of the t-ratio?
- If this probability is high- do not reject H0
- If this probability is low-reject H0
18What is considered a high versus a low
probability?
- YOU DECIDE!
- Conventionally, a probability that is 0.05 is
- considered sufficiently low for the null
- hypothesis to be rejected
19Level of significance
- Is under our control and is usually chosen to
- be 0.05, 0.01, or 0.001
- Rejecting an H0 at 0.05, the result is
significant () - Rejecting an H0 at 0.01, the result is highly
significant () - Rejecting an H0 at 0.001, the result is very
highly significant ()
20- We reject the null hypothesis
- For a two-tailed test
- For a one-tailed test
-
- or depending on the null hypothesis
21(No Transcript)
22Marine arthropods
- A species of marine arthropods live in seawater
that - contains calcium in a concentration of 32
mmole/kg. - Question Does members of this species maintain a
- coelomic fluid (extra cellular body fluid) that
is less - than that of their environment?
23Coelomic fluid
- assists respiration and circulation by diffusing
nutrients, and excretion by accumulating wastes - functions in place of several organ systems in
higher animals such as mammals - protects internal organs and also serves as a
hydrostatic skeleton - (Just in case you did not know.)
24- Hypothesis
- H0 The calcium concentration of the arthropod is
the same or higher than the seawater - H1 The calcium concentration of the arthropod is
the lower than the seawater
25This is the same as
(Remember that the seawater has a concentration
of 32)
26- Thirteen animals are randomly sampled and the
- calcium concentrations in their coelomic fluid
- (extra cellular body fluid) is measured
- 28 27 29 29 30 30 31 30 33 27 30 32 31
27Marine arthropod example
28Calculate t-ratio for experiment
n-113-112 d.f.
- If we look in the t- table we find that
- What is the conclusion?
29- Because prediction in H0 and H1 are written so
that - they are mutually exclusive or all inclusive, we
have - a situation where one is true and the other is
false - 1. When H0 is true, then H1 is false
- -If we accept H0, we have done the right thing
- -If we reject H0, we have made a mistake
- This type of mistake is called Type I error
30Type I error
- Probability of rejecting a true null hypothesis
- Probability of making a type I error
- It is the same as the level of significance
31- 2. When H0 is false, then H1 is true
- -If we accept H0, we have made a mistake
- -If we reject H0, we have done the right thing
- This type of mistake is called Type II error
32Type II error
- Probability of not rejecting a false null
hypothesis - Probability of making a type II error
33Statistical power
34Statistical power
- Increases with increasing sample size
- Increases with effect size
- Increases with increasing !
35Fast rotation energy forest
36Basket willow example
- Is waste water influencing the harvest yield for
- a specific variety (clone) of basket willow?
- We choose to measure harvest yield in the
- form of plant height.
- Is this a good indicator of harvest yield?
37Assumptions
- Assume that height of untreated plants has
- a normal distribution with population mean
- Assume that height of treated plants has
- a normal distribution with population mean
- Equal variances
38We set up the hypothesis
- H0 There is no difference between and
- H1 There is a difference between and
- this is the same as
- H0
- H1
- which is the same as
- H0
- H1
39We obtain two random samples from each population
40We obtain two random samples from each population
- We estimate and with and ,
respectively. - And we estimate and with and
respectively.
41Two sample hypothesis test
- Remember that for the one sample hypothesis we
used - Here our is
- So that where is
the - pooled variance (see my notes)
42 has t distribution with degrees of
freedom under the null hypothesis. We reject the
null hypothesis if
43Keep track of the degrees of freedom
- The t-distribution is more spread out than the
normal distribution. In fact the smaller the
degrees of freedom the more spread out is the
t-distribution.
44t-distribution
45Violations of the two-sample t test assumptions
-
- The two sample t-test assumes that the two
populations are normally distributed and have
equal variances!!! - However, experience has shown that this test
is rather robust (have high power) even when
these assumptions are not met.
46Statistical power in two sample hypothesis
testing
- The power improves with increasing sample size
- Also, for a given number of data ( ),
maximum power is obtained if the sample sizes are
equal ( ) - If the sample variances are unequal the Type I
error will tend to be greater than
47Assessing departures from normality
- Graphical assessment of normality
- Check for outliers
- Frequency curve should look normal
- Cumultative frequency curve should be S-shaped
- We will come back to this in a later lecture
48Testing for homoscedasticity (homogeneity among
variances)
- The variance ratio test can be used but
remember that this test is severely and adversely
affected by non-normal populations! - However, understanding this test makes it a
bit easier to understand the logic behind ANOVA
(coming lectures)
49Energy forest example
- Question Does treated and untreated plants
- have the same variance for plant height?
- H0
- H1
50Variance ratio tests
- Take the larger of the two sample variances and
divide it with the smaller e.g. -
-
-
- if the two samples come from normal populations
with equal variances this ratio is F distributed
with and degrees of freedom
-
51The shape of the F-distribution depends on the
degrees of freedom
52Variance ratio test
- So reject the hypothesis of the null hypothesis if
53Manipulation of tuber size distribution in
Solanum tuberosum L
- Breeding goal reduce the size variability
- Let X1 be the tuber size in the year 1954
- Let X2 be the tuber size in the year 2004 (after
50 generations of breeding)
54Potato example
- Has 50 generations of breeding led to a
- reduced variability in tuber size?
- H0
- H1
55For a specific potato variety in 1954 a random
sample of 30 potatoes had a sample variance of
1367A random sample of 30 potatoes of the
same variety the year 2004 (after 50 years of
extensive breeding)985
56- We use our numbers to calculate the F ratio
- Which is F-distributed with 29 and 29 degrees
- of freedom so that
57- So what do we do when we are unable to tell
- if the two samples originate from populations
- with normal distributions or if there is a
- significant difference between the sample
- variances?
- Well, the problem is that the t-ratio does not
- have t-distribution!
58Nonparametric tests
- These tests do not rely on the normal
distribution and its parameters
59Important
- If you have a data set where either a
- or a nonparametric test can be applied, then
- go for the parametric. In these situations the
- parametric test is always more powerful than
- the nonparametric (the nonparametric tests
- tend to have a higher Type II error)
60(No Transcript)
61Example from Pollinators entering female
dioecious figs why commit suicide? Patel et
al. 1995 (J Evol Biol)
- In the dioecious fig/pollinator mutualism,
- -female wasps that pollinate figs on female trees
die without reproducing, - whereas female wasps that pollinate figs on male
- trees produce offspring.
- Selection should strongly favor wasps
- that avoid female figs and enter only male figs.
- Consequently, fig trees would not be pollinated
and fig seed - production would ultimately cease, leading to
- extinction of both wasp and fig.
62(No Transcript)
63- Question Do wasps prefer male figs over
- female figs?
- H0 Equal or larger number of wasps on female
figs than on male figs - H1 Fewer wasps on female figs than on male figs
64- In a controlled experiment pollinators in the
wild (southern India) were presented with a
choice between male and female figs of the
species Ficus hispida. This was repeated 3 times
on 3 different occasions. The data from the first
experiment is presented on the next slide.
65Results
66Mann-Whitney U Test(Wilcoxon Rank-Sum Test)
- Assumptions
- The variables we are testing are continuous
random variables - The samples must be two independent random
samples, however the samples sizes do not have to
be equal
67Mann-Whitney U Test
- Pool all observations into one sample
- Observations are ranked from smallest to largest,
irrespective of which populations each
observation was sampled from - Midranks are used for ties values.
68Mann-Whitney U Test
- The test statistic W1 is the sum of the ranks
- from the X1 population (female figs)
- If the sum is too small (too large) then
- this is an indication that the values of the X1
- population tend to be smaller (larger) than
- those of the X2 population (male figs)
69The fig example
- The number of wasps were counted on
- 10 female figs and 9 male figs so we have to
- rank these 19 observations
70Results
71- In our case we will use
- We compare this value to that our critical value
72Conclusion
- Do not reject H0. There is not significantly
fewer wasps on female figs than on male figs.
73Allergy
- Swedish researchers (Bill Hasselmar et al.)
- claim that children that have pets at home are
- less likely to develop allergies than children
- which have no pets
- Conclusion Buy a furry pet for you child
- Can you see any problems here?