How to perform an ANCOVA in R
I want to perform an ANCOVA analysis of data concerning density of plant epiphytes. At first, I would like to know if there is any difference in plant density between two slopes, one N and one S, but I have other data such as altitude, canopy openness and height of the host plant. I know that my covariate would have to be the two slopes (N and S). I built this model that runs in R and although I have no idea if it performs well. Also I would like to know what the difference is if I use the symbol
+
or*
.model1 <- aov(density~slope+altitude+canopy+height) summary(model1) model1
The basic tool for this is
lm
; note thataov
is a wrapper forlm
.In particular, if you have some grouping variable (factor), $g$, and a continuous covariate $x$, the model
y ~ x + g
would fit a main effects ANCOVA model, whiley ~ x * g
would fit a model which includes interaction with the covariate.aov
will take the same formulas.Pay particular attention to the
Note
in the help onaov
.As for
+
vs*
, russellpierce pretty much covers it, but I'd recommend you look at?lm
and?formula
and most especially section 11.1 of the manual An Introduction to R that comes with R (or you can find it online if you haven't figured out how to find it on your computer; most easily, this involves finding the "Help" pull down menu in either R or RStudio).suppose i have two group factors $g_1,g_2$, and two covariate $x_1,x_2$, with my model being $$y_{ij}=\mu+\alpha_i+\eta_j+x_{ij1}\gamma_1+x_{ij2}\gamma_2+\epsilon_{ij}$$ Does y~g_1+g_2+x_1+x_2 does the same trick? Do the F values obtained against x_1 and x_2 test $\gamma_1=0$ and $\gamma_2=0$ respectively?
Not sure how I missed this. Yes. .... and if you want to test both at once, fit both with and without them and pass the fitted lm objects to `anova` (you'll soon see if you give them in the wrong order because some SS will be negative if you do)
I recommend getting and reading Discovering Statistics using R by Field. He has a nice section on ANCOVA.
To run ANCOVA in R load the following packages:
car compute.es effects ggplot2 multcomp pastecs WRS
If you are using
lm
oraov
(I useaov
) make sure that you set the contrasts using the "contrasts" function before doing eitheraov
orlm
. R uses non-orthogonal contrasts by default which can mess everything up in an ANCOVA. If you want to set orthogonal contrasts use:contrasts(dataname$factorvariable)=contr.poly(# of levels, i.e. 3)
then run your model as
model.1=aov(dv~covariate+factorvariable, data=dataname)
To view the model use:
Anova(model.1, type="III")
Make sure you use capital "A"
Anova
here and notanova
. This will give results using type III SS.summary.lm(model.1)
will give another summary and includes the R-sq. output.posth=glht(model.1, linfct=mcp(factorvariable="Tukey")) ##gives the post-hoc Tukey analysis summary(posth) ##shows the output in a nice format.
If you want to test for homogeneity of regression slopes you can also include an interaction term for the IV and covariate. That would be:
model=aov(dv~covariate+IV+covariate:IV, data=dataname)
If the interaction term is significant then you do not have homogeneity.
Why do non-orthogonal contrasts mess everything up?
To answer the question above about "why non-orthogonal contrasts mess everything up". The answer is that R defaults to non-orthogonal (i.e. difference between means) which can cause problems if you want to see the contribution of each IV separately. When we specify orthogonal contrasts we tell R that we want the SS for the IV's to be completely partitioned and non-overlapping. In this way we can see the variation attributed to each predictor cleanly and clearly. If you do not specify, R defaults to a more liberal approach to the contrast.
Why the interest in type III SS?
Here is a complementary documentation http://goo.gl/yxUZ1R of the procedure suggested by @Butorovich. In addition, my observation is that when the covariate is binary, using summary(lm.object) would give same IV estimate as generate by Anova(lm.object, type="III").
It isn't clear that this is supposed to be an answer. Is it? If so, please edit to clarify. If it is a question, please ask by clicking the `ASK QUESTION` at the top & asking it there. Then we can help you properly.
Agreed. The message has been revised as an (complementary) answer to the previous one.
We use Regression analysis to create models which describe the effect of variation in predictor variables on the response variable. Sometimes if we have a categorical variable with values like Yes/No or Male/Female etc. the simple regression analysis give multiple results for each value of the categorical variable. In such scenario, we can study the effect of the categorical variable by using it along with the predictor variable and comparing the regression lines for each level of the categorical variable. Such an analysis is termed as Analysis of Covariance also called as ANCOVA.
Example
Consider theR
built-in data setmtcars
. In it we observer that the fieldam
represents the type of transmission (auto or manual). It is a categorical variable with values 0 and 1. The miles per gallon value (mpg
) of a car can also depend on it besides the value of horse power (hp
). We study the effect of the value ofam
on the regression betweenmpg
andhp
. It is done by using theaov()
function followed by theanova()
function to compare the multiple regressions.Input Data
Create a data frame containing the fieldsmpg
,hp
andam
from the data setmtcars
. Here we takempg
as the response variable,hp
as the predictor variable andam
as the categorical variable.input <- mtcars[,c("am","mpg","hp")] head(input)
When we execute above code, it produces following result:
am mpg hp Mazda RX4 1 21.0 110 Mazda RX4 Wag 1 21.0 110 Datsun 710 1 22.8 93 Hornet 4 Drive 0 21.4 110 Hornet Sportabout 0 18.7 175 Valiant 0 18.1 105
ANCOVA Analysis
We create a regression model takinghp
as the predictor variable andmpg
as the response variable taking into account the interaction betweenam
andhp
.Model with interaction between categorical variable and predictor variable
Create regression model1
result1 <- aov(mpg~hp*am,data=mtcars) summary(result1)
When we execute above code, it produces following result:
Df Sum Sq Mean Sq F value Pr(>F) hp 1 678.4 678.4 77.391 1.50e-09 *** am 1 202.2 202.2 23.072 4.75e-05 *** hp:am 1 0.0 0.0 0.001 0.981 Residuals 28 245.4 8.8 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
This result shows that both horse power and transmission type has significant effect on miles per gallon as the p-value in both cases is less than 0.05. But the interaction between these two variables is not significant as the p-value is more than 0.05.
Model without interaction between categorical variable and predictor variable
Create the regression model2
result2 <- aov(mpg~hp+am,data=mtcars) summary(result2)
When we execute above code, it produces following result:
Df Sum Sq Mean Sq F value Pr(>F) hp 1 678.4 678.4 80.15 7.63e-10 *** am 1 202.2 202.2 23.89 3.46e-05 *** Residuals 29 245.4 8.5 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
This result shows that both horse power and transmission type has significant effect on miles per gallon as the p-value in both cases is less than 0.05.
Comparing two models
Now we can compare the two models to conclude if the interaction of the variables is truly insignificant. For this we use theanova()
function.anova(result1,result2) Model 1: mpg ~ hp * am Model 2: mpg ~ hp + am Res.Df RSS Df Sum of Sq F Pr(>F) 1 28 245.43 2 29 245.44 -1 -0.0052515 6e-04 0.9806
As the p-value is greater than 0.05 we conclude that the interaction between horse power and transmission type is not significant. So the mileage per gallon will depend in a similar manner on the horse power of the car in both auto and manual transmission mode.
So which came first, this answer or this post on tutorials point? https://www.tutorialspoint.com/r/r_analysis_of_covariance.htm. Should this answer be deleted as plagiarism? Or did tutorials point just copy from here?
License under CC-BY-SA with attribution
Content dated before 6/26/2020 9:53 AM
russellpierce 8 years ago
+ will calculate main effects only, * will estimate interactions between factors connected with *. ANCOVA frameworks usually estimate only a main effect of the continuous factor, but interactions between all grouped factors.