Unlike art, which focuses on conveying beauty or evoking emotion and is judged by subjective criteria, science is based on striving for truth and requires precise and reproducible results. Empirical sciences rely on observation and experimentation to formulate, test and confirm hypotheses using objective evidence. Precise language, often including descriptive statistics, statistical methodology and statistical terminology, is used to communicate research results.

Observation and experimentation are usually performed using sampling. For example, the effect of a new antihypertensive drug can be evaluated using a sample of 250 patients. Descriptive statistics in tables and figures can describe what has been observed in the experiment. The reporting problem is to present data clearly and economically while maintaining all relevant information. However, there is an additional problem when interpreting what has been observed. The trial aims to estimate the size of the drug’s effect among all potential patients, not just among the randomised ones. However, a heterogeneous population, such as all hypertensive patients (including future ones), can generate many samples with different characteristics. This phenomenon is known as sampling variation. The consequence is that the findings from a single sample must be interpreted cautiously; what the sample can tell us about the properties of the population is uncertain.

Fortunately, the uncertainty can be calculated using statistics, and it is expected to be described when the effect observed in the trial is presented. A confidence interval is often used for this purpose. Alternatively, a p-value can be presented for the statistical test that the new drug is no better than a placebo. The outcome of the test is then either interpreted as significant or nonsignificant. However, statistical significance says nothing about clinical relevance. P-values depend on sample size, and a minute biological effect can be statistically significant in a large sample.

Conversely, a clinically significant effect may be statistically nonsignificant in a small sample. Thus, the confidence interval is more helpful than the p-value for describing the clinical relevance of a finding. It can be used both to evaluate whether a finding is clinically significant (when a clinically significant effect size has been defined) and statistically significant (when a statistical significance level has been defined).

It is crucial to understand that the interpretation of statistical significance depends on the type of hypothesis tested. Some hypotheses may be tested for confirmation, others for a wide range of exploratory reasons. A pre-defined analysis strategy for addressing relevant issues is necessary for confirmatory trials, and rationale authors explain the analysis strategies used in exploratory evaluations. A presentation that includes p-values but no explanation of the followed analysis strategy may be of little help to the reader when trying to interpret the scientific value of the reported findings.