In statistical inference, an effect size is a measure of the strength of the relationship between two variables. Effect sizes are a useful descriptive statistic. Effect sizes provide a standard metric for comparing across studies and thus are critical to meta-analysis. When reporting statistical significance for an inferential test, effect size(s) should also be reported. This is emphasized by the American Psychological Association (APA) Task Force on Statistical Inference (Wilkinson & APA Task Force on Statistical Inference, 1999), "reporting and interpreting effect sizes in the context of previously reported effects is essential to good research" (p. 599, emphasis added).[1][2] This page provides an undergraduate-level introduction to effect sizes and their usage. |
An inferential test may be statistically significant (i.e., unlikely to have occurred by chance), but this doesn’t necessarily indicate how large the effect is.
There may be non-significant, notable effects especially in low powered tests.
Effect sizes are influenced by sample size, and by the number of groups/conditions (e.g. see Murray and Dosser, 1987). Therefore, caution must be taken when interpreting effect sizes, and when comparing them between different studies! Effect sizes can give additional information about the effects of independent variables within the same study, but should not be applied to compare effects between studies. Unless, sample size and study design are carefully taken into account.
Some commonly used effect size are:
There are no agreed standards for how to interpret an ES. Interpretation is ultimately subjective.
For equal and large sample sizes the following formulae are appropriate:
and conversely,
as discussed by Cohen (1965, 1988), Friedman (1968), Glass, McGraw, and Smith(1981), Rosenthal (1984), and Wolf (1986).
The correct formula for unequal or small sample sizes is given by (see [3]):
In the case that and both are large enough that , this reduces to Eq. 1
Q: 20 athletes rate their personal playing ability, M = 3.4 (SD = .6) (on a scale of 1 to 5). After an intensive training program, the players rate their personal playing ability again, M = 3.8 (SD = .6) What is the ES? How good was the intervention? (For simplicity, this example uses the same SD for both occasions.)
A: Standardised mean effect size = (M2 - M1) / SDpooled = (3.8 - 3.4) / .6 = .4 / .6 = .67 = a moderate-large change over time
In ANOVA, is reporting partial eta-squared sufficient, or should Cohen's d also be reported?
Partial eta-squared and Cohen's d provide two different types of effect size and both may be appropriate and useful in reporting the results of ANOVA.
Partial eta-squared indicates the % of the variance in the Dependent Variable (DV) attributable to a particular Independent Variable (IV). If the model has more than one IV, then report the partial eta-squared for each.
Cohen's d indicates the size of the difference between two means in standard deviation units.
To be thorough, one should report partial eta-squared for each IV and Cohen's ds for each pairwise comparison of interest (e.g., if posthoc tests or planned comparisons were conducted, then Cohen's d should be provided for each contrast).
If F is non-significant for an IV, then reporting the partial eta-squared is sufficient and appropriate. If F is significant and you go on to do planned contrasts or posthoc tests, then each of these pairwise comparisons should also be accompanied by Cohen's d effect sizes.
Note that if the IV of interest only has two levels, then the partial eta-squared and the Cohen's d will communicate similar information but do so using different scales of measurement (% and SD units respectively).
Ward (2002)[4] examined articles in 3 psychology journals to assess the current status of statistical power and effect size measures:
Ward (2002) found that:
Effect sizes indicate the amount of difference or strength of relationship. They have historically been underutilized. Inferential tests should be accompanied by effect sizes and confidence intervals.
Commonly used effect sizes include Cohen’s d, r.
Whilst rules of thumb for interpreting effect sizes are available, it is best to compare effect sizes with similar or related studies.
Standardized mean effect sizes are not available in SPSS – hand calculate or use a spreadsheet calculator.
Coe, R. (2002). It’s the effect size, stupid: What effect size is and why it is important. Paper presented at the Annual Conference of British Education Research Association, University of Essex England.
Thalheimer, W., & Cook, S. (2002, August). How to calculate effect sizes from published research articles: A simplified methodology.