Education2026-04-107 min read

p-Values and Statistical Significance in Medical Research

What Is a p-Value?

The p-value is one of the most widely used and widely misunderstood statistics in medical research. The formal definition: the p-value is the probability of observing results at least as extreme as those found, assuming the null hypothesis is true. The null hypothesis is usually "there is no effect" or "the two treatments are equal." A small p-value means: if there truly were no effect, it would be very unlikely to see results this extreme by chance. A p-value of 0.03 means: if the null hypothesis were true, you'd see results this extreme or more extreme only 3% of the time by chance.

What p < 0.05 Does NOT Mean

The p < 0.05 threshold is deeply embedded in medical research, but it's often interpreted incorrectly: **p < 0.05 does NOT mean:** - There is a 95% chance the result is correct - The treatment definitely works - The effect is clinically meaningful - The study will replicate - The null hypothesis is false **p < 0.05 DOES mean:** - If the null hypothesis were true, results this extreme would occur less than 5% of the time by chance - The finding meets an arbitrary threshold for "statistical significance" The 0.05 threshold was chosen by Ronald Fisher in the 1920s as a rule of thumb — not a fundamental law of nature.

Statistical Significance vs. Clinical Significance

A statistically significant result is not necessarily clinically meaningful. **Example:** A large trial with 50,000 patients finds that a new drug reduces blood pressure by 1 mmHg (p = 0.0001). This is highly statistically significant but clinically meaningless — a 1 mmHg difference has no impact on cardiovascular outcomes. Conversely, a small trial with 30 patients finds a drug reduces tumor size by 40% (p = 0.08). This misses the 0.05 threshold but may represent a genuinely important effect that deserves further investigation. Always ask: What is the effect size? Is it clinically meaningful? What is the confidence interval? Does it include the minimum clinically important difference?

Confidence Intervals Are More Informative

A 95% confidence interval (CI) tells you more than a p-value alone. If the 95% CI for an odds ratio is 1.2 to 3.4: - The best estimate is the midpoint (roughly 2.0) - You can be 95% confident the true effect lies between 1.2 and 3.4 - Since 1.0 (no effect) is excluded, the result is statistically significant Confidence intervals communicate: - The direction of the effect - The magnitude of the effect - The precision of the estimate - Whether the effect is clinically meaningful A CI that stretches from 0.9 to 12.0 is technically significant if 1.0 is excluded, but the huge range tells you the estimate is very imprecise.

Multiple Comparisons and the Problem of P-Hacking

If you run 20 statistical tests and use p < 0.05 as your threshold, you'd expect 1 "significant" result purely by chance — even if nothing is actually happening. This is called the multiple comparisons problem, and it leads to p-hacking: running many analyses and selectively reporting the ones that reach p < 0.05. To address this: - **Bonferroni correction**: Divide the threshold by the number of comparisons (e.g., 0.05/10 = 0.005) - **Pre-registration**: Commit to your primary outcome before collecting data - **False Discovery Rate (FDR)**: Controls for the expected proportion of false positives When reading a study with multiple outcomes, check whether the primary outcome was pre-specified and whether corrections for multiple comparisons were applied.

Beyond p-Values: Effect Sizes

The American Statistical Association and many journals now recommend moving beyond binary p < 0.05 decisions and reporting effect sizes with confidence intervals. Common effect size measures: - **Cohen's d**: Standardized mean difference (d = 0.2 small, 0.5 medium, 0.8 large) - **Odds Ratio (OR)**: Ratio of odds of outcome in exposed vs. unexposed - **Relative Risk (RR)**: Ratio of risk in treated vs. control group - **Absolute Risk Reduction (ARR)**: Difference in event rates (clinically most intuitive) - **Number Needed to Treat (NNT)**: 1/ARR — how many patients need treatment for one to benefit MetaLens AI extracts and displays these effect sizes from published abstracts, giving you a richer picture than p-values alone.

Ready to try AI-powered meta-analysis?

Try MetaLens AI Free

Related Drug Comparisons

Pranlukast vs Montelukast

Metformin vs Insulin

Ibuprofen vs Acetaminophen

Lisinopril vs Losartan