Title: Non-parametric methods
1Non-parametric methods
t-test (et cetera) tests hypotheses about
parameters of distribution (in t-test about µ as
a parameter of normal distribution) there are
other approaches too
2What to do, if data have not normal
distribution?and disturbance of normality is so
large, that I cannot rely on test robustness
- There are transformations improving the normality
and homoscedascity we will go through it later - If data have such a distribution, which can be
approximated with selected types of distribution,
then special methods can be used developed for
them (generalized linear modes) - We use non-parametric tests
3Non-parametric methods
- Most often
- Permutation commonly randomized tests
- Rank-based tests
4Permutation tests
- Basic idea (for t-test)
- Reached level of significance is probability,
that so different samples I get just by chance,
if from one population. So, I can try it I put
all the observations from both groups together,
and then randomly assign their group membership
(e.g. by tossing from a hat or by computer random
number generator)
5And so on, at least thousand times
I look how often is t from randomly generated
groups bigger than from data.
So, I try to simulate it here.
I dont believe this P as I dont know, if
assumptions are fulfilled
6Reached level of significance (P) is computed then
Number of random permutations, where it was
better than in data (so where tpermut gt
tdata
7Attention
- I test hypothesis, that both samples are from one
(and same) population. If I want to interpret the
test as location test, then I have to add an
assumption that both populations have the same
distribution shape. If they differ after that,
they can differ in the location parameter.
8Rank-based tests
- Basic idea We dont know, what the distribution
is, so we forgot real values and replace them
with their rank - Many parametric methods have their non-parametric
counterparts
9Mann-Whitney testnon-parametric analogue of
two-sample t-test
- All values from both samples are arrayed (and so
they get numbers from 1 to n, where nn1n2 - It doesnt matter, if the arrangement is made
from top or from bottom, but I must pay attention
on it, if one-tailed tests are used.
10compute
it gives especially high value, if ranks in the
first group are low
or
it gives especially high value, if ranks in the
second group are low
holds U U' n1n2,
11Male and female students are the same high.
Male and female students arent the same high.
High of males
High of females
High of males rank
High of females rank
As
we refuse H0
Mann-Whitney test for non-parametric testing if
two-tailed hypothesis, that there is no
difference between heights of male and female
students.
12Attention
All sorts of values are tabulated, so pay
attention, what is tabulated and how Statistica
prints 21sided exact p (if I want one-tailed
test, if deviation goes in the right direction, I
divide by two)
13Normal approximation if there is great number
of observations, holds
Z (U-?U)/ ?U has near normal distribution. At
it is easy job to find corresponding p to it
Statistica prints - Attention if I have exact
p, this value is never more of interest.
14Similar to permutation test
- even M-W has its presumptions
- It is either test of null hypothesis, that the
samples are from the same population - If it is formulated as a location test, then
there is an assumption that samples have the same
distribution shape
15It is thus absurd to write
- As we had not homogeneity of variances, we had to
use non-parametric test. - 1. to test, if it is the same population, when I
have proved inhomogeneity of variance
previously, doesnt make any sense - 2. for location test, inhomogeneity of variance
is the same problem for MW as for t-test.
16Another presumption - data can be ranked
Ties are averaged deviation from original
presumption can make problem, some tests use
equalities correction ties
17Median test
- I compute median for all observations and how
much observations is in each group above and how
much below this median. I analyse it then with
classic 2 x 2 table. So, it is test about overall
median and it has not any further assumptions,
but it is very weak.
18Wilcoxon test
- Analogue of pair t-test
- Attention, more tests are called Wilcoxon, thus
it is sometimes written as Wilcoxon for pair
observations
19Wilcoxon test
- First, we count differences among observations,
then we rank them according to the size of their
absolute value from the smallest to the largest
one. After that we total of positive differences
ranks and number of negative differences ranks
(marked as T and T-). (As the sum of series
numerical from 1 to n is n(n1)/2, we can easily
compute Tn(n1)/2-T-)
Thus, test reflects number as well as quantity of
positive and negative differences.
20Length of foreleg and hind leg is the same in
roe-deer.
Length of foreleg and hind leg isnt the same in
roe-deer.
Roe-deer
Hind leg L.
Foreleg L.
Difference
Rank
Rank with mark
As
is rejected
or
Wilcoxon pair test applied upon data of roe-deer
legs length
21Approximation can be used again (for large
samples)
and from this compute Z. Attention, Statistica
shows just normal approximation, does not print
exact p look for it in tables, if
needed. tables can be found here
http//fsweb.berry.edu/academic/education/vbissonn
ette/tables/wilcox_t.pdf
Test has assumption about symmetric distribution
of differences.
22Sign test
Compares numbers of positive and negative
differences Has no assumptions, but very weak
23Non-parametric tests
- If assumptions for parametric test are fulfilled,
non-parametric tests are weaker than
corresponding parametric test. - Common idea about no assumptions for
nonparametric test is not true. - Generally the more observations I have, the
more robust parametric tests used to be to
disturbances of their presumptions - The stronger assumptions are fulfilled, the more
powerful test I can usually use