Statistical significance

A common but flawed view of medical research is that all you need is a data set and the ability to run statistical tests. In the past, when tests had to be calculated by hand or on a mainframe, it took statisticians to do the tests. Today, with personal computers and easy-to-use software, anyone can calculate p-values.

A p-value is the probability of drawing a sample with a characteristic that is at least as extreme as a certain value (for example, an apparent effect observed in a particular sample of patients) from a (fictitious) population with a property defined by the hypothesis being tested (in this example, that no such effect exists). The smaller the p-value, the less likely the sample has been drawn from such a population.

A p-value of less than 0.05 is usually described as statistically significant and interpreted as evidence against the tested hypothesis, i.e. the finding is unlikely to be explained by sampling variation and that the rejection of the hypothesis can be considered to have empirical support.

However, there is nothing to say that a statistically significant finding is clinically relevant, and inventing biologically interpretable explanations for statistically significant findings post hoc, after having screened a dataset for low p-values (also known as data dredging), is not meaningful and misleading if the tests are presented as pre-specified. Unfortunately, many journals will publish this type of research as long as the APC (article processing charge) is paid.

There is more to a good study than p-values and statistical significance. Developing a study design, analysis plan, and data collection procedure to investigate a particular phenomenon is more challenging than most authors and reviewers realize. Just searching for p-values is a poor substitute for scientific reasoning.


You'll only receive email when they publish something new.

More from Ranstam
All posts