Experimental design and statistical methods in biology - PowerPoint PPT Presentation

1 / 60

About This Presentation

Title:

Experimental design and statistical methods in biology

Description:

Remains that could be identified to species (genus, family, order, class) were weighed ... The distance from the nest to the nearest nest of Herring gull? ... – PowerPoint PPT presentation

Number of Views:170

Avg rating:3.0/5.0

Slides: 61

Provided by: GnaC6

Category:

more less

Transcript and Presenter's Notes

Title: Experimental design and statistical methods in biology

1
Experimental design and statistical methods in
biology

Lesson 7
Analysis of ratios
Logistic regression
Generalized Linear Models (GENMOD)
Analysis of frequency tables
Log-linear models

2
Imagine an investigation aiming at examining
lynxs food selection.

Method Contents of 6 lynx stomachs were
examined.
Remains that could be identified to species
(genus, family, order, class) were weighed
Number of prey individuals were estimated from
the remains (bones, feathers, hairs etc).

3
(No Transcript)
4
(No Transcript)
5
Because it is ratio between number of observations
6
Because it depends on weight unit i.e. 534
g/1021 g 0.534 kg/1.021 kg 0.523
7
The noise of a true ratio depends on the number
of observations.
For instance, let p denote the probability that a
prey item is a mouse and 1- p the probability
that it is not a mouse. Then the probability of
the next prey item being a mouse (given that p is
binomially distributed) is found as
8
The variance of r
The variance of p
which shows that the variance of p declines with n
9
Conditions for analyzing frequency tables

Observations are on a nominal scale
Observations show stochastic independence (that
is, every new observation should be drawn from a
distribution (binomial or multinomial) with
probability pj for event j and 1- pj for not
event j.

For instance, teeth from prey animals cannot be
regarded as stochastically independent because
each prey contributes with 1,., m observations.
In this situation, it is more correct to use the
prey item as the unit of observation.
10
Logistic regression

Used when data are dichotomous.
Used when data are true fractions between 0 and 1

11
Example
Does predation of eggs in nests of Oyster catcher
depend on

The distance from the nest to the nearest nest of
Herring gull?
On the vegetation surrounding the nest?
On the number of eggs in the nest?

12
(No Transcript)
13
(No Transcript)
14
Data
OBS DIST EGGS VEG KILLED 1
0.5 3 B 3 2 1.0 7
C 5 3 5.7 5 B
1 4 3.8 9 A
6 5 3.0 7 C 5
6 6.1 8 A 3
........ 57 3.3 3 A
3
15
Analysis of dichotomous data

Nests are categorized according to whether
predation has occurred or not.
No predation is scored as 0
Predation is scored as 1

16
Plus/minus predator visit to Oyster catcher nest
17
The purpose is to fit a model to the data a
model that predicts the probability of a nest
being predated
18
The logistic regression model
19
y 0
20
How to do it in SAS
21
DATA logist OPTIONS LINESIZE 90 / Example
on logistic regression / / The example is
inspirered by Dorthe Lahrmann's investigations of
Oyster catchers (strandskader) on Langli in
Ho Bugt / INFILE 'h\lin-mod\logist.prn'
FIRSTOBS2 INPUT dist eggs veg killed /
dist Distance to the nearest nest of Herring
gull (sølvmåge)/ / eggs Number of Oyster
catcher eggs in a nest / / veg vegetation
type surrounding an Oyster catcher nest/ IF
killed gt 0 THEN visit 1 IF killed 0 THEN
visit 0 / If killed gt 0 then the nest has
been visited by a predator at least once /
22
/ Eksempel A Analysis of a nest has been
visited or not-visited by predators, i.e. visit
1 or 0 / PROC GENMOD / The procedure is
Generalized Linear Models / TITLE 'Eksempel
A' CLASS veg / veg is a class variable
/ MODEL visit dist veg /DISTbinomial
LINKlogit TYPE3 DSCALE OBSTATS / DIST
distribution function (here chosen as binomial)
/ / LINK the model uses a
logit-transformation of data / / TYPE3 type
3 is used in order to evaluate the relative
contribution of the different factors on the
independent variable / / DSCALE an option
which tells SAS to scale the error in order to
meet the demands of the model. If DSCALE is
approximately 1, scaling is not needed. / /
OBSTATS gives the predicted values as well as
their confidence limits / RUN
23
Eksempel A 1019 Thursday, November 22, 2001
87 The GENMOD
Procedure
Model Information
Description Value
Data Set
WORK.LOGIST Distribution
BINOMIAL
Link Function LOGIT
Dependent Variable VISIT
Observations Used
57 Number Of Events
52 Number Of
Trials 57
Class Level Information
Class Levels Values
VEG 3 A B C
24
Low values (for a given DF) indicate a good fit
Criteria For Assessing Goodness Of Fit
Criterion DF Value
Value/DF Deviance
53 20.2819 0.3827
Scaled Deviance 53 53.0000
1.0000 Pearson Chi-Square
53 22.2740 0.4203
Scaled Pearson X2 53 58.2057
1.0982 Log Likelihood
. -26.5000 .

Values greater than unity indicate overdispersion
(variance greater than expected)
Values less than unity indicate underdispersion
(variance less than expected)
25
Analysis Of Parameter Estimates
Parameter DF Estimate Std Err
ChiSquare PrgtChi INTERCEPT
1 8.5639 2.1271 16.2093 0.0001
DIST 1 -1.0032
0.2651 14.3173 0.0002 VEG
A 1 0.2489 0.9555 0.0678
0.7945 VEG B 1
0.4370 0.9250 0.2232 0.6366
VEG C 0 0.0000 0.0000
. . SCALE 0
0.6186 0.0000 . . NOTE
The scale parameter was estimated by the square
root of DEVIANCE/DOF.
LR Statistics For Type 3 Analysis
Source NDF DDF F PrgtF
ChiSquare PrgtChi DIST 1
53 34.8596 0.0001 34.8596 0.0001
VEG 2 53 0.1118 0.8944
0.2237 0.8942
26
Criteria For Assessing Goodness Of Fit
Criterion DF Value
Value/DF Deviance
55 20.3675 0.3703
Scaled Deviance 55 55.0000
1.0000 Pearson Chi-Square
55 21.6364 0.3934
Scaled Pearson X2 55 58.4265
1.0623 Log Likelihood
. -27.5000 .
Analysis Of Parameter Estimates
Parameter DF Estimate Std Err
ChiSquare PrgtChi INTERCEPT 1
8.8288 2.0182 19.1363 0.0001
DIST 1 -1.0012 0.2587
14.9777 0.0001 SCALE 0
0.6085 0.0000 . . NOTE
The scale parameter was estimated by the square
root of DEVIANCE/DOF.
LR Statistics For Type 3 Analysis
Source NDF DDF F PrgtF
ChiSquare PrgtChi DIST 1
55 36.4999 0.0001 36.4999 0.0001
27
Observation Statistics VISIT Pred
Xbeta Std HessWgt Lower
Upper Resraw 1 0.9998
8.3283 1.8909 0.000652 0.9903
1.0000 0.000242 1 0.9996
7.8277 1.7639 0.001075 0.9875
1.0000 0.000398 1 0.9578
3.1222 0.6185 0.1091 0.8710
0.9871 0.0422 1 0.9935
5.0244 1.0628 0.0175 0.9498
0.9992 0.006533 1 0.9971
5.8253 1.2605 0.007924 0.9663
0.9998 0.002943 1 0.9383
2.7217 0.5356 0.1563 0.8418
0.9775 0.0617 1 0.9971
5.8253 1.2605 0.007924 0.9663
0.9998 0.002943 1 0.9973
5.9255 1.2854 0.007173 0.9679
0.9998 0.002663 0 0.3358
-0.6822 0.5813 0.6023 0.1392
0.6123 -0.3358 1 0.9764
3.7229 0.7525 0.0622 0.9045
0.9945 0.0236 0 0.7150
..........................................
28
Predicted values and 95 confidence limits
29
/ Example B Analysis of the fraction of eggs in
a nest that are lost / PROC GENMOD /
procedure is Generalized Linear Models / TITLE
'Eksempel B' CLASS veg / veg is a class
variable / MODEL killed/eggs dist veg
eggs/DISTbinomial LINKlogit TYPE3 DSCALE
OBSTATS / DIST distribution function (here
chosen as binomial) / / LINK the model uses
a logit-transformation of data / / TYPE3
SS3 is used to determine the contribution of the
individual factors to the dependent variable /
/ DSCALE option that can be used if
Deviance/DF is different from 1. It reduces
the risk of Type 1 errors if the scale parameter
is gt 1 og the risk of a Type II errors, if
the scale parameter is lt 1 / / OBSTATS
gives the predicted values, and the confidence
limits / RUN
30
Eksempel B 1226 Thursday, November 22, 2001
7 The GENMOD
Procedure
Model Information
Description Value
Data Set
WORK.LOGIST Distribution
BINOMIAL
Link Function LOGIT
Dependent Variable
KILLED Dependent Variable
EGGS
Observations Used 57
Number Of Events 183
Number Of Trials
336 Class Level
Information
Class Levels Values
VEG 3 A B C
31
Criteria For Assessing Goodness Of Fit
Criterion DF Value
Value/DF Deviance
52 53.9491 1.0375
Scaled Deviance 52 52.0000
1.0000 Pearson Chi-Square
52 44.1413 0.8489
Scaled Pearson X2 52 42.5465
0.8182 Log Likelihood
. -171.3777 .
32
Analysis Of Parameter Estimates
Parameter DF Estimate Std Err
ChiSquare PrgtChi INTERCEPT
1 2.6437 0.5644 21.9369 0.0001
DIST 1 -0.5284
0.0623 71.9060 0.0001 VEG
A 1 0.1425 0.3629 0.1541
0.6946 VEG B 1
0.1623 0.3602 0.2029 0.6524
VEG C 0 0.0000 0.0000
. . EGGS 1
-0.0314 0.0637 0.2433 0.6219
SCALE 0 1.0186 0.0000
. . NOTE The scale parameter
was estimated by the square root of
DEVIANCE/DOF. LR
Statistics For Type 3 Analysis
Source NDF DDF F PrgtF
ChiSquare PrgtChi DIST 1
52 97.2164 0.0001 97.2164 0.0001
VEG 2 52 0.1135 0.8929
0.2271 0.8927 EGGS 1
52 0.2443 0.6232 0.2443 0.6211
33
Criteria For Assessing Goodness Of Fit
Criterion DF Value
Value/DF Deviance
55 54.5182 0.9912
Scaled Deviance 55 55.0000
1.0000 Pearson Chi-Square
55 45.0882 0.8198
Scaled Pearson X2 55 45.4867
0.8270 Log Likelihood
. -179.6600 .
Analysis Of Parameter Estimates
Parameter DF Estimate Std Err
ChiSquare PrgtChi INTERCEPT 1
2.5156 0.2950 72.7128 0.0001
DIST 1 -0.5212 0.0589
78.3656 0.0001 SCALE 0
0.9956 0.0000 . . NOTE
The scale parameter was estimated by the square
root of DEVIANCE/DOF.
LR Statistics For Type 3 Analysis
Source NDF DDF F PrgtF
ChiSquare PrgtChi DIST 1
55 107.8859 0.0001 107.8859 0.0001
34
Predicted values and 95 confidence limits
35
Criteria For Assessing Goodness Of Fit
Criterion DF Value
Value/DF Deviance
52 53.9491 1.0375
Scaled Deviance 52 52.0000
1.0000 Pearson Chi-Square
52 44.1413 0.8489
Scaled Pearson X2 52 42.5465
0.8182 Log Likelihood
. -171.3777 .
36
The likelihood function
37
The binomial distribution
A nest contains n eggs of which r are eaten by
predators. The probability that a given egg is
eaten is denoted p. The probability that exactly
r of the eggs are killed is
38
r1 number of killed eggs out of n1 eggs in the
first nest
r2 number of killed eggs out of n2 eggs in the
second nest
ri number of killed eggs out of ni eggs in the
ith nest
times
L P(r1) P(r2) P(r3)....... P(ri)...... P(rk)
ln L ln P(r1) ln P(r2) ln P(r3) ... ln
P(ri) ... ln P(rk)
39
Maximum likelihood
are found as the values that maximize the
likelihood of observing exactly r1, r2,
....,ri.... positive events out of n1, n2,
....,ni.... events
The maximum value of L can be found by
differentiation of L with respect to ß0 , ß1,
...., ßp, and setting the derivative equal to 0.
This is the same as differentiation with respect
to ln L
......
40
The variance of a parameter
41
An example Estimation of ß0
42
(No Transcript)
43
The variance of
44

Analysis of frequency tables

45
One-way classification
Example Tomato plants
Height of tomato-plants is determined by two
allels T tall (dominant) d dwarf (recessive)
Leaf morphology is determined by two allels C
cut-leaves (dominant) p potato-shaped leaves
(recessive)
46
TTCC x ddpp
x TdCp
47
F2-generation
48
9 Tall, Cut-leaves
49
3 Tall, potato-leaves
9 Tall, Cut-leaves
50
3 Tall, potato-leaves
3 dwarf, cut-leaves
9 Tall, Cut-leaves
51
3 Tall, potato-leaves
3 dwarf, cut-leaves
9 Tall, Cut-leaves
1 dwarf,potato-leaves
52

H0 The observed distribution agrees with the
expected 9331 distribution
H1 The observed distribution does not agree the
expected distribution
a 0.05
53

54

55

56
?2 one-sample test

57
G-test

58
G-test

59
G-test

60
G-test
is distributed approximately as ?2 with df a-1
with 3 df
P 0.687
Conclusion The observed and the expected
distributions agree well

Write a Comment

User Comments (0)