Assignments presentation

About This Presentation

Transcript and Presenter's Notes

Title: Assignments

1
Assignments

Assignment 1 handed out October 27,
due November 3 in class
Assignment 2 handed out November 22,
due November 29 in class

2
Statistics

The science of collecting, analyzing,
presenting and interpreting data

Descriptive statistics
are tabular, graphical and numerical summaries
of data. The purpose of descriptive statistics is
to facilitate the presentation and interpretation
of data
Inferential statistics
Inference, in statistics, is the process of
drawing conclusions about a particular parameter
of a statistical distribution.

4
Characteristics of a statistical problem

Associated with the problem is a large group
about which inferences are to be made. This group
of objects is the population
There is at least one random variable whose
behavior is to studied relative to the population
The population is too large to study in its
entirety (or techniques used in the study are
destructive in nature). Conclusions about the
population must be based on observing only a
portion or sample of objects drawn from the
population.

State research question
Formulate null and alternative hypotheses
Identify population variable and when possible
its distributions
Sample data according to chosen sampling
procedure
Determine appropriate test statistic
Calculate appropriate test statistic
A) Determine critical values for sampling
distribution and appropriate level of
significance
B) Determine P value of the test
statistic
Compare the test statistic to critical values.
Reject or accept null hypothesis
State conclusion and answer the question in step
1

6
Guidelines for hypothesis testing

When testing a hypothesis concerning the value of
some parameter, the statement of equality will
always be included in H0. In this way H0
pinpoints a specific numerical value that could
be the actual value of the parameter.
Whatever is to be detected or supported is the
alternative hypothesis (H1).
Since our research hypothesis is H1, it is hoped
that the evidence leads us to reject H0 and
thereby accept H1.

7
Random sample

Random sample of size n from the distribution of
the random variable X is a collection of n
Independent random variables, each with the same
distribution as X
Random sample is a sample of size n drawn from a
population of size N in such a way that every
possible sample of size n has the same
probability
of being drawn

8
Random Sample?

Question Do green and red birds of the same
species occur in the same frequency?
Sample red and green birds in a forest
Question What is the size distribution of sugar
maple in the same forest?
Sample 100 individuals

9
One sample hypothesis

For example
We have a population and we assume that it
has a normal distribution
We want to know if the population mean is
smaller or larger than a specific value
being selected

10
Normal distribution
The location on the X-axis depends on the
population mean The shape of the distribution
depends on the population variance These are
the two parameters of the normal distribution
11

We estimate the population mean from which
we have drawn our sample with the
Sample mean
We estimate the population variance from
which we have drawn our sample with the
Sample variance

12
How good are these estimators?

Unbiased estimator is centered around the
right spot of what it is supposed estimate
Biased estimator

13
Unbiased estimator of population variance
14
Importance of sample size

Take many samples of size n from a population
which is normally distributed then the mean of
these samples is normally distributed with
variance

15
Standard error of sample mean (mean standard
error)
This estimated with
16
Under the assumption that the stated null
hypothesis is true
follows a t-distribution with n-1 degrees of
freedom
17
One sample hypothesis test

Compute the t-ratio.
Under the assumption that the null hypothesis is
true
What is the probability of obtaining this t
ratio or a more extreme value of the t-ratio?
If this probability is high- do not reject H0
If this probability is low-reject H0

18
What is considered a high versus a low
probability?

YOU DECIDE!
Conventionally, a probability that is 0.05 is
considered sufficiently low for the null
hypothesis to be rejected

19
Level of significance

Is under our control and is usually chosen to
be 0.05, 0.01, or 0.001
Rejecting an H0 at 0.05, the result is
significant ()
Rejecting an H0 at 0.01, the result is highly
significant ()
Rejecting an H0 at 0.001, the result is very
highly significant ()

We reject the null hypothesis
For a two-tailed test
For a one-tailed test
or depending on the null hypothesis

21
(No Transcript)
22
Marine arthropods

A species of marine arthropods live in seawater
that
contains calcium in a concentration of 32
mmole/kg.
Question Does members of this species maintain a
coelomic fluid (extra cellular body fluid) that
is less
than that of their environment?

23
Coelomic fluid

assists respiration and circulation by diffusing
nutrients, and excretion by accumulating wastes
functions in place of several organ systems in
higher animals such as mammals
protects internal organs and also serves as a
hydrostatic skeleton
(Just in case you did not know.)

Hypothesis
H0 The calcium concentration of the arthropod is
the same or higher than the seawater
H1 The calcium concentration of the arthropod is
the lower than the seawater

25
This is the same as

(Remember that the seawater has a concentration
of 32)
26

Thirteen animals are randomly sampled and the
calcium concentrations in their coelomic fluid
(extra cellular body fluid) is measured
28 27 29 29 30 30 31 30 33 27 30 32 31

27
Marine arthropod example
28
Calculate t-ratio for experiment
n-113-112 d.f.

If we look in the t- table we find that
What is the conclusion?

Because prediction in H0 and H1 are written so
that
they are mutually exclusive or all inclusive, we
have
a situation where one is true and the other is
false
1. When H0 is true, then H1 is false
-If we accept H0, we have done the right thing
-If we reject H0, we have made a mistake
This type of mistake is called Type I error

30
Type I error

Probability of rejecting a true null hypothesis
Probability of making a type I error
It is the same as the level of significance

2. When H0 is false, then H1 is true
-If we accept H0, we have made a mistake
-If we reject H0, we have done the right thing
This type of mistake is called Type II error

32
Type II error

Probability of not rejecting a false null
hypothesis
Probability of making a type II error

33
Statistical power
34
Statistical power

Increases with increasing sample size
Increases with effect size
Increases with increasing !

35
Fast rotation energy forest

36
Basket willow example

Is waste water influencing the harvest yield for
a specific variety (clone) of basket willow?
We choose to measure harvest yield in the
form of plant height.
Is this a good indicator of harvest yield?

37
Assumptions

Assume that height of untreated plants has
a normal distribution with population mean
Assume that height of treated plants has
a normal distribution with population mean
Equal variances

38
We set up the hypothesis

H0 There is no difference between and
H1 There is a difference between and
this is the same as
H0
H1
which is the same as
H0
H1

39
We obtain two random samples from each population
40
We obtain two random samples from each population

We estimate and with and ,
respectively.
And we estimate and with and
respectively.

41
Two sample hypothesis test

Remember that for the one sample hypothesis we
used
Here our is
So that where is
the
pooled variance (see my notes)

42
has t distribution with degrees of
freedom under the null hypothesis. We reject the
null hypothesis if
43
Keep track of the degrees of freedom

The t-distribution is more spread out than the
normal distribution. In fact the smaller the
degrees of freedom the more spread out is the
t-distribution.

44
t-distribution
45
Violations of the two-sample t test assumptions

The two sample t-test assumes that the two
populations are normally distributed and have
equal variances!!!
However, experience has shown that this test
is rather robust (have high power) even when
these assumptions are not met.

46
Statistical power in two sample hypothesis
testing

The power improves with increasing sample size
Also, for a given number of data ( ),
maximum power is obtained if the sample sizes are
equal ( )
If the sample variances are unequal the Type I
error will tend to be greater than

47
Assessing departures from normality

Graphical assessment of normality
Check for outliers
Frequency curve should look normal
Cumultative frequency curve should be S-shaped
We will come back to this in a later lecture

48
Testing for homoscedasticity (homogeneity among
variances)

The variance ratio test can be used but
remember that this test is severely and adversely
affected by non-normal populations!
However, understanding this test makes it a
bit easier to understand the logic behind ANOVA
(coming lectures)

49
Energy forest example

Question Does treated and untreated plants
have the same variance for plant height?
H0
H1

50
Variance ratio tests

Take the larger of the two sample variances and
divide it with the smaller e.g.
if the two samples come from normal populations
with equal variances this ratio is F distributed
with and degrees of freedom

51
The shape of the F-distribution depends on the
degrees of freedom
52
Variance ratio test

So reject the hypothesis of the null hypothesis if

53
Manipulation of tuber size distribution in
Solanum tuberosum L

Breeding goal reduce the size variability
Let X1 be the tuber size in the year 1954
Let X2 be the tuber size in the year 2004 (after
50 generations of breeding)

54
Potato example

Has 50 generations of breeding led to a
reduced variability in tuber size?
H0
H1

55
For a specific potato variety in 1954 a random
sample of 30 potatoes had a sample variance of
1367A random sample of 30 potatoes of the
same variety the year 2004 (after 50 years of
extensive breeding)985
56

We use our numbers to calculate the F ratio
Which is F-distributed with 29 and 29 degrees
of freedom so that

So what do we do when we are unable to tell
if the two samples originate from populations
with normal distributions or if there is a
significant difference between the sample
variances?
Well, the problem is that the t-ratio does not
have t-distribution!

58
Nonparametric tests

These tests do not rely on the normal
distribution and its parameters

59
Important

If you have a data set where either a
or a nonparametric test can be applied, then
go for the parametric. In these situations the
parametric test is always more powerful than
the nonparametric (the nonparametric tests
tend to have a higher Type II error)

60
(No Transcript)
61
Example from Pollinators entering female
dioecious figs why commit suicide? Patel et
al. 1995 (J Evol Biol)

In the dioecious fig/pollinator mutualism,
-female wasps that pollinate figs on female trees
die without reproducing,
whereas female wasps that pollinate figs on male
trees produce offspring.
Selection should strongly favor wasps
that avoid female figs and enter only male figs.
Consequently, fig trees would not be pollinated
and fig seed
production would ultimately cease, leading to
extinction of both wasp and fig.

62
(No Transcript)
63

Question Do wasps prefer male figs over
female figs?
H0 Equal or larger number of wasps on female
figs than on male figs
H1 Fewer wasps on female figs than on male figs

In a controlled experiment pollinators in the
wild (southern India) were presented with a
choice between male and female figs of the
species Ficus hispida. This was repeated 3 times
on 3 different occasions. The data from the first
experiment is presented on the next slide.

65
Results

66
Mann-Whitney U Test(Wilcoxon Rank-Sum Test)

Assumptions
The variables we are testing are continuous
random variables
The samples must be two independent random
samples, however the samples sizes do not have to
be equal

67
Mann-Whitney U Test

Pool all observations into one sample
Observations are ranked from smallest to largest,
irrespective of which populations each
observation was sampled from
Midranks are used for ties values.

68
Mann-Whitney U Test

The test statistic W1 is the sum of the ranks
from the X1 population (female figs)
If the sum is too small (too large) then
this is an indication that the values of the X1
population tend to be smaller (larger) than
those of the X2 population (male figs)

69
The fig example

The number of wasps were counted on
10 female figs and 9 male figs so we have to
rank these 19 observations

70
Results

71

In our case we will use
We compare this value to that our critical value

72
Conclusion

Do not reject H0. There is not significantly
fewer wasps on female figs than on male figs.

73
Allergy

Swedish researchers (Bill Hasselmar et al.)
claim that children that have pets at home are
less likely to develop allergies than children
which have no pets
Conclusion Buy a furry pet for you child
Can you see any problems here?

Write a Comment

User Comments (0)

About PowerShow.com

Assignments PowerPoint PPT Presentation