Jonas Ranstam PhD
This is a brief description of common statistical misunderstandings that often appear in manuscripts.
1. The greatest problem in medical research is insufficient statistical testing.
No, evaluations of inferential uncertainty may be necessary, but hypothesis testing is not. The greatest problems in medical research are related
to inadequate research questions, flawed study designs, and confused interpretation of findings.
2. Why are p-values controversial?
P-values are often misunderstood, incorrectly interpreted as descriptive measures. Findings are considered practically important when p<0.05,
and a p>0.05 is considered an indication of equivalence. However, p-values are uncertainty measures, and a statistically significant finding is
not necessarily scientifically relevant. Scientific relevance has to be shown by other means than p-values. Furthermore, statistical nonsignificance
cannot be used to claim equivalence as a p>0.05 just reflects uncertainty. This incorrect use of p-values has evolved into an unfortunate standard
and become a substitute for scientific reasoning.
3. What measure can be used to show the uncertainty of an estimated treatment effect?
Estimation uncertainty needs to be considered when the clinical relevance of an estimated effect is evaluated. The p-value cannot be used as
this measures the uncertainty of the relation between the null hypothesis and the data, not of the estimated effect size. The correct uncertainty
measure of an estimated effect is its confidence interval.
4. Why are odds ratios controversial?
The odds ratio is in some cases (e.g. in case-control studies) a relevant measure in itself, but in other cases (e.g. cohort studies) it is
used as an approximation of the relative risk of an exposure. The approximation is good when the baseline risk is low, but otherwise two similar
odds ratios can have different clinical interpretations (and two different odds ratios the same) because:
RR = OR/(1-R+OR*R) where
R = baseline risk, RR = relative risk, and OR = odds ratio. The clinical significance of a treatment effect cannot always be evaluated if the
studied effect is presented as an odds ratio. The problem can be avoided by using a statistical method that provides direct estimates of the
5. When analysing data, it is important to check that all continuous variables have Gaussian distributions.
No, some statistical methods, such as Student's t-test, are based on an underlying assumption of a Gaussian distribution, but why should all
continuous variables in a research project have a Gaussian distribution? Furthermore, the p-value from a distributional test is as all other
p-values a measure of uncertainty. It cannot directly show whether or not a variable has a Gaussian distribution. Moreover, in some cases it
is not the observed variables but a derived one that is assumed to have a Gaussian distribution, like the residual of a linear model, and this
can have a Gaussian distribution even when the original variables do not.
6. What about non-parametric data?
First, a null hypothesis may or may not include assumptions about a parameter, and a non-parametric null hypothesis can often be tested using a
distribution-free test, but the term non-parametric has no specific implications for data. Second, distribution-free tests provide p-values but
not necessarily effect size estimates, and p-values are controversial see 2, which means that such tests are not useful for evaluation of
7. Why shouldn't I use Bonferroni corrections?
Multiplicity issues (related to the testing of multiple null hypotheses) are important to address in confirmatory studies, and one way is to use
a Bonferroni correction, i.e. by lowering the significance level by a factor of 1/m, where m is the number of tested null hypotheses. However,
to avoid subjectivity the adjustment should be pre-specified, and as it has negative effects on the statistical power of the comparisons, it should
also be accounted for in the sample size calculation, and this incresases patient numbers and costs. Multiplicity problems can often be avoided
in the study design by careful endpoint definitions or solved by using closed test procedures or more effective adjustment methods such as Holm's
or Hochberg's methods. In addition, while the existence of multiplicity issues is a problem in confirmatory studies, this is not relevant in
exploratory or hypothesis generating studies. Furthermore, the statistical analysis of observational studies needs to include validity
considerations as selection and confounding bias cannot be prevent in the study design, which implies that detailed pre-specification is not
practically possible. Moreover, the strategy, common in laboratory studies, of Bonferroni correcting for the number of exposure groups but
ignoring that multiple endpoints are tested, does not solve the multiplicity problem.
8. I have always performed my lab experiments in triplicates, and now the statistical reviewer complains about n=3.
To be continued...
© Copyright 2019 Jonas Ranstam. All rights reserved.