How to interpret and report eta squared / partial eta squared in statistically significant and non-significant analyses?
I have data that has eta squared values and partial eta squared values calculated as a measure of effect size for group mean differences.
What is the difference between eta squared and partial eta squared? Can they both be interpreted using the same Cohen's guidelines (1988 I think: 0.01 = small, 0.06 = medium, 0.13 = large)?
Also, is there use in reporting effect size if the comparison test (ie t-test or one-way ANOVA) is non-significant? In my head, this is like saying "the mean difference did not reach statistical significance but is still of particular note because the effect size indicated from the eta squared is medium". Or, is effect size a replacement value for significance testing, rather than complementary?
In fact SPSS calculates partial eta square for all ANOVAs. This will give the same value as eta squared in single IV Independent Groups Designs, but a different value in single IV repeated measures designs. This causes no end of problems with my students.
Effect sizes for group mean differences
- In general, I find standardised group mean differences (e.g., Cohen's d) a more meaningful effect size measure within the context of group differences. Measures like eta square are influenced by whether group samples sizes are equal, whereas Cohen's d is not. I also think that the meaning of d-based measures are more intuitive when what you are trying to quantify is a difference between group means.
- The above point is particularly strong for the case where you only have two groups (e.g., the effect of treatment versus control). If you have more than two groups, then the situation is a little more complicated. I can see the argument for variance explained measures in this case. Alternatively, Cohen's $f^2$ is another option.
- A third option is that within the context of experimental effects, even when there are more than two groups, the concept of effect is best conceptualised as a binary comparison (i.e., the effect of one condition relative to another). In this case, you can once again return to d-based measures. The d-based measure is not an effect size measure for the factor, but rather of one group relative to a reference group. The key is to define a meaningful reference group.
- Finally, it is important to remember the broader aim of including effect size measures. It is to give the reader a sense of the size of the effect of interest. Any standardised measure of effect should assist the reader in this task. If the dependent variable is on an inherently meaningful scale, then don't shy away from interpreting the size of effect in terms of that scale. E.g., scales like reaction time, salary, height, weight, etc. are inherently meaningful. If you find, as I do, eta squared to be a bit unintuitive within the context of experimental effects, then perhaps choose another index.
Eta squared versus partial eta squared
- Partial eta squared is the default effect size measure reported in several ANOVA procedures in SPSS. I assume this is why I frequently get questions about it.
- If you only have one predictor variable, then partial eta squared is equivalent to eta squared.
- This article explains the difference between eta squared and partial eta squared (Levine and Hullett Eta Squared, Partial Eta Squared..).
- In summary, if you have more than one predictor, partial eta squared is the variance explained by a given variable of the variance remaining after excluding variance explained by other predictors.
Rules of thumb for eta squared and partial eta squared
- If you only have one predictor then, eta squared and partial eta squared are the same and thus the same rules of thumb would apply.
- If you have more than one predictor, then I think that the general rules of thumb for eta squared would apply more to partial eta squared than to eta squared. This is because partial eta squared in factorial ANOVA arguably more closely approximates what eta squared would have been for the factor had it been a one-way ANOVA; and it is presumably a one-way ANOVA which gave rise to Cohen's rules of thumb. In general, including other factors in an experimental design should typically reduce eta squared, but not necessarily partial eta squared due to the fact that the second factor, if it has an effect, increases variability in the dependent variable.
- Despite what I say about rules of thumb for eta squared and partial eta squared, I reiterate that I'm not a fan of variance explained measures of effect size within the context of interpreting the size and meaning of experimental effects. Equally, rules of thumb are just that, rough, context dependent, and not to be taken too seriously.
Reporting effect size in the context of significant and non-significant results
- In some sense an aim of your research is to estimate various quantitative estimates of the effects of your variables of interest in the population.
- Effect sizes are one quantification of a point estimate of this effect. The bigger your sample size is, the more close, in general, your sample point estimate will be to the true population effect.
- In broad terms, significance testing aims to rule out chance as an explanation of your results. Thus, the p-value tells you the probability of observing an effect size as or more extreme assuming the null hypothesis was true.
- Ultimately, you want to rule out no effect and want to say something about the size of the true population effect. Confidence intervals and credibility intervals around effect sizes are two approaches that get at this issue more directly. However, reporting p-values and point estimates of effect size is quite common and much better than reporting only p-values or only effect size measures.
- With regards to your specific question, if you have non-significant results, it is your decision as to whether you report effect size measures. I think if you have a table with many results then having an effect size column that is used regardless of significance makes sense. Even in non-significant contexts effect sizes with confidence intervals can be informative in indicating whether the non-significant findings could be due to inadequate sample size.
Hi Jeremy - I differ with you when you say "partial eta squared in factorial ANOVA arguably more closely approximates what eta squared would have been for the factor had it been a one-way ANOVA." In fact, eta squared if the predictor were used alone is liable to be much larger than its partial eta squared in the company of other predictors. In the latter case, shared variance explained in the outcome does not get credited to the predictor in question; in the former, there is no "competition" for explained variance, so the predictor gets credit for any overlap it shows with the outcome.
@rolando2 Perhaps my point was ambiguous. I'm referring to designed experiments. Say experiment 1 manipulates factor A, and experiment 2 A and B. Assuming a balanced design, both factors are orthogonal. Assuming both factors explain variance, the variance explained by factor A in experiment 2 will be less than in experiment 1, where the level of factor B is held constant. Thus, when comparing factorial experiments with one-factor experiments, I think partial eta squared is more similar across factorial and one-factor experiments, especially if there is no interaction effect.
Most people identify problems with eta squared, it has been argued that it has biases. The question however remains for me in interpreting eta squared and partial eta squared, what are the critical values? What value suggest a small effect size what value suggests a mediocre effect size and what value suggests a huge effect size?