ANOVA

About This Presentation

Title:

ANOVA

Description:

Problem of multiple comparisons ... The p-value for comparing 1-2 is 0.0082. For 1-3, PV=0.0003. For 2 ... For comparing multiple groups, we use Kruskal-Wallis ... – PowerPoint PPT presentation

Number of Views:38

Avg rating:3.0/5.0

Slides: 48

Provided by: johnt1

Category:

more less

Transcript and Presenter's Notes

Title: ANOVA

1
ANOVA

Analysis of Variance
Method to compare more than 2 means
As with comparing more than 2 ps, we cant use
subtraction
Will use sum of squares

2
ANOVA

Think of an extension to 2 sample t problem
Need an estimate of SD
In 2 sample t, we always assumed the two distns
had the same SD
Had to combine our 2 estimates (one from each
sample) to get a single est

3
ANOVA

Review 2 sample t test
Given N, Avg, SD for 2 groups
SPpool(s1,n1,s2,n2)
PVtprob(0,SP?(1/n11/n2),av1-av2,999)
(double for two sided)

4
ANOVA

Pool(s1,n1,s2,n2)
? (n1-1)s12 (n2-1) s22 / (n1-1 n2-1)
Concentrate on the numerator and ignore sqrt
? (n-1)s2
SSE, Sum of Squares Error
(Error not in sense of mistake)

5
ANOVA

If we ignored the fact that our data came from
(potentially) different groups, we might have
started with
? (x-avg)2
SSTotal
Note that ?SSTotal/(N-1) would be our estimator
of SD using all the data (and ignoring grouping)

6
ANOVA

The Sum of Squares Identity
SSTotal SSE SSTreatment
Where
SSTr ? nj (avgj-AVG)2
Except for nj this is like the sum of squares of
the averages

7
ANOVA

See Ex 62, p. 488
3 levels of blockage
Measure flow rate where collapse occurs
In book as fig11.4
Not a good form for Excel
See ANOVA1.xls
First col tells which blockage level
Second col is the flowrate

8
ANOVA

To extract the values that correspond to flow
rate1, use
Dt(,1)1
means all rows
Equality is
Single is assignment
Use above expression to select rows
Dt( Dt(,1)1, )

9
ANOVA

To count how many, use length()
Length(Dt( Dt(,1)1, ))
Average is mean()
SD is std()
Most convenient to store these in lists

10
ANOVA

For i13,
Ns(i) Length(Dt( Dt(,1)i, ))
Avs(i)
Sds(i)
End

11
ANOVA

These are rows
We can display conveniently by making them cols
of a matrix
ns avs sds
11.0000 11.2091 1.8987
14.0000 15.0857 2.1504
10.0000 17.3300 2.1680

12
ANOVA

Alternate plan is to compute Ns, Avs, SDs as a
matrix in the first place
Write a function SUMSTATS(X, grp) which returns
such a matrix
The number of rows of the answer will be the
number of groups
Assume grp contains 1,groups

13
ANOVA

Notice that SDs are quite similar in our example
Good if we are going to assume the underlying SDs
are all equal
Easy to get SSE from here
gtgt ssesum((ns-1).sds.2)
138.4672
(Close to p. 497 in book)

14
ANOVA

Exercises
1. Fig 11.5, p. 489, has data on roadway
aggregate. Do sumstats.
2. Fig 11.6, p. 491, has data on carpet fibers.
Do sumstats.

15
ANOVA

To get SSTr, we need overall average
Generally NOT the avg of the avgs
Have to weight by the Ns
gtgt grandsum(ns.avs)/sum(ns)
14.5086
gtgt sstrsum(ns.(avs-grand).2)
204.0202

16
ANOVA

Dont actually need SST, but it would be SSTrSSE
342.4874
gtgt (length(d114)-1)std(dt(,2)).2
342.4874

17
ANOVA

H0 all means equal
Ha not all means equal
(Not the same as all means different)
It can be shown that when H0 is true, then each
of the SSs is a (multiple of) ?2

18
ANOVA

Degrees of Freedom
Usual est of SD has dfN-1
So, if H0 true, Total df N-1
Each term of SSE has df ni-1
Fact Sums of (ind) Chi2 has df given by sum of
df
So df for SSE S (ni-1)
Df for SSTr is difference
groups-1 G-1

19
ANOVA
20
ANOVA

Recall that the mean of a ?2 df
For comparison, we should divide SS/df
Called Mean Square
MSTr, MSE (dont usually do MST)

21
ANOVA
22
ANOVA
23
ANOVA

If H0 is true, then MSTr MSE
Consider their ratio (to get rid of the common
multiple)
F MSTr / MSE
Near 1 -gt H0
F large -gt avgs are more spread out than expected
-gt Ha

24
ANOVA

F distn
Depends on df for numerator and df for
denominator
FPROB(df1, df2, lo, hi)
Can calculate p-value

25
ANOVA

For our problem, F102/4.33 23.57
gtgt fprob(2,32,23.57,999)
5.1058e-007
So if H0 were true, it would be nearly impossible
to get F this large.
Can be quite sure that H0 is not true.
Means are not all exactly the same.

26
ANOVA

Write a function that takes SUMSTATS and returns
ANOVA
Since ANOVA table is not a full matrix, suggest
we return a matrix for SS, df, MS and then return
F and PV as scalars
F,pv,SS ANOVA(sumstats)

27
ANOVA

Exercises
1. Fig 11.5, p. 489, has data on roadway
aggregate. Do ANOVA.
2. Fig 11.6, p. 491, has data on carpet fibers.
Do ANOVA.

28
ANOVA

If we only have 2 groups, should we do ANOVA or 2
sample t test?
Consider the following
N18, Av112.3, SD15.3
N212, Av216.4, SD26.6

29
ANOVA

SP6.1273
For the two sided Ha
gtgtpvtprob(0,spsqrt(1/N11/N2),N1-1N2-1,Av2-Av1,
999)2
0.1599

30
ANOVA

For ANOVA
sm
8.0000 12.3000 5.3000
12.0000 16.4000 6.6000
gtgt f,pv,ssaanova1(sm)
f
2.1492
pv
0.1599
So the (two-sided) pvalue is exactly the same by
either method

31
ANOVA

But theres more!!
ssa
80.6880 1.0000 80.6880
675.7900 18.0000 37.5439
756.4780 19.0000 39.8146
and SP2 37.5439
So vMSE RMSE is the analog of SP

32
ANOVA

If we reject H0, then some means are different.
Which ones?
Problem of multiple comparisons
If H0 were true (all means equal) and we did 20
tests with alpha0.05, then we should expect that
1 of the comparisons would appear to be different
If we are going to do a number of 2 sample t
tests, we will have to use a lower ? than usual

33
ANOVA

The book uses the Studentized range method
There are a number of different methods for
addressing this problem
We will use the Bonferroni method

34
ANOVA

The probability of a union of sets could be as
high as the sum of the probabilities (if the sets
are mutually disjoint)
In our cases, the sets are making a Type I
error for each test we are doing
If we are doing N tests and each test has a
Prob(Type I) 0.05/N, then the overall Prob(Type
I) 0.05

35
ANOVA

Therefore, if we want to end up at an overall
level of 5, say, then we need to conduct each
test at a level of 0.05/ tests
For G groups, there are G(G-1)/2 pairs
For flowrate example, G3 and there are 32/2 3
pairs to compare
Do 2 sample t tests with ?/3 0.05/3

36
ANOVA

If we use p-values, then we should multiply the
(apparent) p-value by the of comparisons
If we do confidence intervals, we should get
wider intervals
For 95 intervals, we usually solve for 0.025
Now we solve for 0.025 / ( groups)

37
ANOVA

Recall that in the 2 sample t test, we use
SD?(1/n11/n2) for the diff of avgs
What do we use for the pooled SD?
This was the basis for SSE (or MSE)
For SD, use ?MSE
For df, use df for MSE

38
ANOVA

The p-value for comparing 1-2 is 0.0082
For 1-3, PV0.0003
For 2-3, PV0.0872
Using Bonferoni, we cannot conclude that 2 and 3
are different, but 1 IS diff from both 2 and 3.
Note that on p. 504, he gets a slightly diff
result using conf intervals and Studentized range
The Studentized range is a bit more efficient
than Bonferroni, but note that his last CI comes
quite close to 0

39
Rank procedures

The underlying assumption of ANOVA is that the
data has a normal distn
When we are not comfortable with that assumption
there are other methods available
Nonparametric methods

40
Rank procedures

A large class of NP methods are based on ranks
Replace the data with ranks and then do
calculations
We should feel comfortable in knowing which
values are larger than which other values, even
if we dont know the probability of such a
difference

41
Rank procedures

Function yranks(x)
Midranks to handle ties in the xs
For comparing multiple groups, we use
Kruskal-Wallis
Based on the sum of ranks within each group

42
Rank procedures

Text has a formula on p. 713, but not very
meaningful
Better formula is based on S(obs-exp)2/exp
If we have N total obs and n1 obs in group 1,
then the avg rank of all the data is (N1)/2
We would expect the rank sum for group 1 to be
n1(N1)/2, etc
Can then compute the diff between obs and exp
TRICK There is a fudge factor. Have to multiply
by 6/N to get ?2

43
Rank procedures

For artery flow
rranks(dt(,2))
rs(1)sum(r(dt(,1)1)) etc
ns(1)sum(dt(,1)1) etc
exptns(n1)/2
gtgt c26/nsum((rs-expt).2./expt)
20.4380
gtgt pvchiprob(2,c2,99)
3.6471e-005
NOTE df groups-1 df for Treatment

44
Rank procedures

Can use pooling as we did for Chi2 to determine
which differences are significant
But the pooling can be done on the rank sums (and
expecteds)
As before, df do not change

45
Rank procedures

Carpet fiber example
Assume weve run sumstats
gtgt nssmst(,1)'
ns
16 16 13 16 14 15
gtgt rranksums(x(,2),x(,1))
r
495 1177.5 353
854.5 288 927
gtgt nsum(ns)
n
90

46
Rank procedures

gtgt exptns(n1)/2
expt
728 728 591.5
728 637 682.5
gtgt c26/nsum((r-expt).2 ./ max(1,expt))
c2
49.937
Chi2 w/ df5, so extremely large

47
Rank procedures

gtgt (r-expt)./expt
ans
-0.32005 0.61745 -0.40321
0.17376 -0.54788 0.35824
Note that 1 and 3 look similar
gtgt rcpool(r,1,3),exptcpool(expt,1,3),
r
848 1177.5 0
854.5 288 927
expt
1319.5 728 0
728 637 682.5
gtgt c26/nsum((r-expt).2 ./ max(1,expt))
c2
49.787
Still large (unchanged), so diff is not due to a
diff between 1 3

Write a Comment

User Comments (0)

About PowerShow.com

ANOVA - PowerPoint PPT Presentation

ANOVA

Problem of multiple comparisons ... The p-value for comparing 1-2 is 0.0082. For 1-3, PV=0.0003. For 2 ... For comparing multiple groups, we use Kruskal-Wallis ... – PowerPoint PPT presentation