Most reports are probably written with the intention of being clear, coherent, and stringent. However, methodological misunderstandings cause many publications to be vague, ambiguous, and confusing. Misinterpretation of statistical significance plays a major role in this problem.
In modern medicine, research is usually based on a dataset. Statistical methods produce p-values, which are used to categorise findings as either significant or nonsignificant. While significance is seen as a token of practical importance, nonsignificance is considered to show irrelevance. Such findings are described as “no difference”. However, statistical significance and p-values have been developed to evaluate the effects of sampling variation and cannot be used to measure clinical significance. The definition of a minimally clinically significant difference requires clinical knowledge. For example, what is an antihypertensive therapy’s smallest clinically significant effect?
Whether or not a difference observed in a dataset is clinically as well as statistically significant depends on its size and estimation uncertainty. A confidence interval can be used for the evaluation. The question is whether all clinically insignificant effects are excluded from the confidence interval. If clinically significant effects are included in the interval, it would be a mistake to state “no difference”. Clinically significant and insignificant differences can both be statistically significant as well as nonsignificant. Just considering and reporting statistical significance may be simple, but it is misleading.
The ICMJE recommendations for manuscript preparation include the guideline, “Link the conclusions with the goals of the study but avoid unqualified statements and conclusions not adequately supported by the data. In particular, distinguish between clinical and statistical significance … When possible, quantify findings and present them with appropriate indicators of measurement error or uncertainty (such as confidence intervals). Avoid relying solely on statistical hypothesis testing, such as P values, which fail to convey important information about effect size and precision of estimates.”