Title: Power recap
1Power recap
2Power recap
- It is good to fake data
- BUT
- p-values of 1 fake data is crap!
3Power recap
- A 1000 simulation Power analyses is not crap!
- BUT
- Power depends on
- Sample size
- Effect size
- Variation
4Project considerations I
- Make graphs
- Check for outliers
- Check assumptions
- Decide if you want to transform y and or x
- Check VIF
- Are your assumtions still fkd up.
- ? Well, thats for today.
5Project considerations II
- Interpret interactions first!
- If they are significant? Are main effects
still interpretable? - Distinguish between y x1 and y x1
given x2 - Simplify your models!
62 ? 2 tables
Logistic regression
Categoric
Melica
1.0
0.8
0.6
Prob. of choosing Melica
0.4
0.2
0.0
Response variable
Luzula
4.5
5.5
6.5
7.5
Ant size
Regression
Anova
Continuous
-
-
Seed size
Continuous
Categoric
Explanatory variable
7Response variable
Regression
Anova
Continuous
-
-
Seed size
Continuous
Categoric
Explanatory variable
8Assumptions for parametric tests with continuous
response? i.e., also linear models!!
- About the same variation in all groups or along a
continuous variable or along fitted values - Pretty normal residuals ( noice)
9The residuals
- are the noice that is not explained by the
explanatory variable(s) - In a regression the residuals are the distance
from the data points to the regression line - In an Anova the residual are the distance to the
group mean - In a linear model the residuals are the distance
from the data points to the fitted values.
10Residuals
11Residuals
12Assumption check
13Assumption check
14Assumption check
15Solutions
- Poisson for counts (generalized linear model)
- Non-parametric tests
- Resampling methods
- Permutation
- Bootstrap
- Binarize your response
16quasipoisson fit on Xanthoria
17Xanthoria reproduction again
18Poisson distribution
- Response Numbers (not true continuous)
- Examples
- Are there more maple seedlings close to a maple?
- Response number per square
- m1lt-glm(numberdistance,familyPoisson)
19Poisson distribution
- Usually log(y) also works fine.
- Poisson excells
- small means
- many zeroes
- Many zeroes ? Hurdle models
20break?
21Non-parametric tests
- Based on ranked values instead of actual data.
22Non-parametric tests
- Still often in use.
- Questionable with modern computers.
- ? In principle permutions of ranked values
- ? But worse than real permutations, because
information about actual data values is
discarded.
23Non-parametric tests
- Still often in use.
- Questionable with modern computers.
- ? In principle permutions of ranked values
- ? But worse (than real permutations) because
information about actual data values is
discarded.
BENEFIT Calm dow outliers!
?
24Response variable
Regression
Anova 2 groups also t-test
Continuous
-
-
Seed size
Continuous
Categoric
Explanatory variable
25Response variable
Kruskal-Wallis also Mann-Whitney U-test Paired
Sign test(binomial)
Kendall rank correlation also Spearman rank
Continuous
-
-
Seed size
Continuous
Categoric
Explanatory variable
26Permutations
- Does not require normal distribution
- BUT, does require distributions to be equal if
your hypothesis is not true. - ? Example
- If the lichens are equally large in the city as
they are at campus, they must have the same
variation and e.g., skewness. - In principle a test of if the distributions
differ.
27Ash seed dispersal
28Acer twigs plasticity
29Birch cost of reproduction
30bootstrap
- to pull oneself up by one's bootstraps
- to succeed only on one's own effort or
abilities.
31shrimp-booting...
32 Rumex crispus Rumex longifolius
300
250
250
200
200
150
150
100
100
50
50
0
0
1.0
1.1
1.2
1.3
1.4
1.5
1.25
1.30
1.35
1.40
33Confidence intervals
- shows how sure we are of a group mean.
- The confidence interval will contain the true
mean in 95 of the time. - The larger our sample size the more sure (
confident!) we are of our sample mean ? the
confidence interval decreases - And (of course), the more variation within
groups, the less sure we get ? confidence
interval increases
34Bootstrap for tests
120
80
No. boot-samples
60
40
20
0
-5
0
5
10
15
20
25
boot.difference
35Bootstrap
- Does not require normal distribution of
residuals. - Does not require the same variation.
- Only requirement is that what you bootstrap
(e.g., means) are the same if your hypothesis is
not correct. - And, in practice, a large, representative sample
36moss.shoot forest type
2000
1500
1000
500
0
0
5
10
15
Bootstrapped difference in moss shoot length
37Bootstrap
- We use the functionsample(row.names(d),replaceT
) - More advanced (and better)library(boot)?boot?
boot.ci
38Binarize your response
- If all other efforts sucks
- Binarize your response
- Nothing vs Something
- Above the median vs Below the median
- bin.ylt-ifelse(y lt median(y),0,1)
- bin.ylt-factor(bin.y)
- ? Then do a logistic regression, 22, or a
generalized linear model
39Wednesday seminar
- Read one powerpoint paper
- Read one book chapter on graphs
- Watch one youtube film
- Bring two graphs
- 3 groups
- Narin, Mandeep, Ruben, Mamun, Karolina
- Keshav, Hanna, Malin, Georg
- Lovisa, Dries, Andrea, Mehrnaz
40Friday Morning 09.00
41Friday afternoon 13.00
42Computer exercise
- Use yor own data.
- Or old data.
- Use either a continuous or categorical
explanatory. - Possible also for many explanatories?
- Non-parametric ? Well, usually not
- Permutation ? Yes, but hard
- Bootstrap ? Yes, easy
- Binarizing ? Yes, easy
43Mail me your data!
- excel file
- Help option booking list
44Exam
- Read Learning goals
- Read Crawley in relation to learning goals
- E.g., no GAM, Survival
- Check lecture powerpoints in relation to learning
goals - Practice on understanding the excercises (they
ARE in the learning goals)
45Nice Books
- www.bokfynd.nu
- www.abebooks.com
46Lunch?
or
47Dedication!