10  One-Way ANOVA vignette

Author

Author: Antdrew D. Nguyen

Published

June 1, 2017

11 Libraries

library(plyr)
library(tidyr)

12 Reviewing Analysis of Variance (ANOVA)

I’ll be following Chapter 10 of A Primer of Ecological Statistics by Gotelli and Ellison

Key Concepts:

ANOVA aims to determine differences in a continuous variable between 2 or more groups.

ANOVA bulit on partitioning on the concept of paritioning of the sum of squares (\(SS_{total}\)). How do we calculate this?

The total sum of squares of the data is the sum of squared deviations of each observation (\(Y_i\)) from the grand mean (\(\bar{Y}\)). There are \(i\) = 1 to \(a\) treatment levels and \(j\) = 1 to \(n\) replicates per treatment.

recap:

  • \(i\) refers to the treatment levels
    • it looks like \(a\) represents the number of different treatments
  • \(j\) refers to each observations, with \(n\) being the number of replicates

\[ SS_{total} = \sum_{i=1}^a\sum_{j=1}^n (Y_{ij}-\bar{Y})^2\]

The total sum of squares is the deviation of each observation from the grand mean.

\(SS_{total}\) can then be partitioned into different components, mainly among and within groups.

\[SS_{total} = SS_{among} + SS_{within}\] So for the among group SS:

\[ SS_{among} = \sum_{i=1}^a\sum_{j=1}^n (\bar{Y}_{i}-\bar{Y})^2\] So for the within group SS:

\[ SS_{within} = \sum_{i=1}^a\sum_{j=1}^n (Y_{ij}-\bar{Y}_i)^2\]

Bringing it all together:

\[ \sum_{i=1}^a\sum_{j=1}^n (Y_{ij}-\bar{Y})^2 = \sum_{i=1}^a\sum_{j=1}^n (\bar{Y}_{i}-\bar{Y})^2 + \sum_{i=1}^a\sum_{j=1}^n (Y_{ij}-\bar{Y}_i)^2\]

12.1 Enter data

#data, 
# a = 3, 3 treatments
# n = 4, 4 reps
n=4
unman<-c(10,12,12,13)
control<-c(9,11,11,12)
treat<-c(12,13,15,16)

wide.dat<-data.frame(unman,control,treat);wide.dat
  unman control treat
1    10       9    12
2    12      11    13
3    12      11    15
4    13      12    16
long.dat<-gather(wide.dat,treatment,measure,unman:treat);long.dat
   treatment measure
1      unman      10
2      unman      12
3      unman      12
4      unman      13
5    control       9
6    control      11
7    control      11
8    control      12
9      treat      12
10     treat      13
11     treat      15
12     treat      16
#global mean
grandmean<-round(mean(c(unman,control,treat)),2);grandmean
[1] 12.17

12.2 \(SS_{total}\) calculations

sum((long.dat$measure-grandmean)^2)
[1] 41.6668

12.3 \(SS_{among}\) calculations

## calculating 1 case

(mean(unman)-grandmean)^2
[1] 0.1764
### making a whole function 
#with ddply
ssa<-function(n=n,vec=c(1,3,3),grandmean=grandmean){
  SSa<-(mean(vec)-grandmean)^2
  SSa
}
ssa(vec=unman,n=n,grandmean=grandmean) # verify function
[1] 0.1764
## executing function
ssam<-ddply(long.dat,.(treatment),summarize,ssamong=ssa(vec=measure,n=n,grandmean=grandmean));ssam
  treatment ssamong
1   control  2.0164
2     treat  3.3489
3     unman  0.1764
SSAM<-n*sum(ssam$ssamong);SSAM
[1] 22.1668
# for a balanced design!

12.4 \(SS_{within}\) calculations

## calculating 1 case
sum((unman-mean(unman))^2)
[1] 4.75
### making a whole function 
#with ddply
sswi<-function(x){
  SSwithin<-sum((x-mean(x))^2)
  SSwithin
}
sswi(unman) # verify function
[1] 4.75
SSwi<-ddply(long.dat,.(treatment),summarize,sswi=sswi(measure));SSwi
  treatment  sswi
1   control  4.75
2     treat 10.00
3     unman  4.75
sumwithin<-sum(SSwi$sswi);sumwithin
[1] 19.5

13 Assumptions of ANOVAs

  1. Samples are indepednent and identically distributed.
  2. Variances are homogeneous among groups
    • variance within each group approx to variance within other groups
  3. Residuals are normally distributed
  4. Samples are classified correctly
  5. Maine effects are additive

14 Hypothesis testing

Definitions:

  • \(Y_{ij}\): replicated \(j\) associated with treatment level \(i\)
  • \(\mu\) is the true grand mean or average
  • \(A_i\) is the additive linear component associated with level \(i\) of treatment \(A\).
    • There is a different coefficient \(A_i\) associated with each treatment level (\(i\)).
    • positive coefficients mean that the treatment level has a higher value than grand mean

Alternative Hypothesis, \(H_a\): \(Y_{ij} = \mu + A_i + \epsilon_{ij}\)

Null Hypothesis, \(H_o\): \(Y_{ij} = \mu + \epsilon_{ij}\)

15 Verify with aov() function

knitr::kable(round(summary(aov(measure~treatment,data=long.dat))[[1]],2))
Df Sum Sq Mean Sq F value Pr(>F)
treatment 2 22.17 11.08 5.12 0.03
Residuals 9 19.50 2.17 NA NA