Statistical terminology
February 21, 2025•362 words
created_at: 2025-02-21 17:08:05
Michael Healy, professor of medical statistics at the London School of Hygiene and Tropical Medicine, described clinical research in a Fisher Memorial Lecture in 1995 as "a largely amateur pursuit conducted by doctors". Whether or not this statement is true today, misuse of well-defined statistical terms is a certain way of appearing amateurish. This problem can easily be avoided by checking the definitions of the terms used. I recommend consulting The Oxford Dictionary of Statistical Terms. New York: Oxford University Press, 2003. Here are a few examples of the most common errors.
Tertiles, quartiles and quintiles are quantiles that divide sorted data into equal parts. Two tertiles are used to divide the data into three parts, three quartiles into four parts and four quintiles into five parts. However, medical publications are full of results presented with three tertiles, four quartiles and five quintiles.
The range and interquartile range describe the difference (a single value) between the largest and smallest values of a variable and between the third and first quartiles respectively, not the largest and smallest values or the first and third quartiles themselves (two values each).
A non-parametric hypothesis can be tested using a distribution-free test (often referred to as a non-parametric test), but it is nonsense to describe the data as being non-parametric.
Multiple regression analysis can be used to fit a multivariate model with one or more explanatory variables, but a multivariate model is based on the assumption of a multivariate probability distribution, which implies a statistical model with more than one response variable. Thus, a multivariate model, like a univariate model, can be univariate or multivariate.
The word 'correlation' may seem more scientific than the simpler 'relation', but 'correlation' is one of the most misused statistical terms, and 'relation' is often a more appropriate term because not all relations are correlations. Correlation implies a linear relationship, and even some closely related variables are not correlated.
Even worse, the use of trial-specific terminology such as primary endpoint, interim analysis, and intention-to-treat in an observational study can be interpreted as spin, an attempt to mislead the reader about the level of evidence of the results.