πŸ“Š Medical Statistics β€” From Population to Inference

An interactive guide built around clinical examples. Every concept is illustrated with data your patients will actually generate β€” blood pressure, HbA1c, drug trials.

For medical students & junior clinicians

The Population N = 5,000 patients

πŸ₯ Clinical Context
You are a researcher studying systolic blood pressure (SBP) in adult hypertensive patients registered at hospitals across the country. The full population is 5,000 patients. Each dot below is one patient.

In practice, you can never measure everyone β€” too many patients, too little time and budget. This is why we sample. Choose a different variable to see how real clinical distributions look.

The population has true, fixed values called parameters β€” denoted by Greek letters: ΞΌ (mu) = true mean, Οƒ (sigma) = true standard deviation.
These are the "ground truth" we are always trying to estimate.

Population Parameters the ground truth

πŸ₯ Clinical Context
We can see the true parameters here because we generated this simulated population. In a real study, ΞΌ is unknown β€” if we already knew the true mean BP of every hypertensive patient in the country, there would be nothing to study.
Parameter = a numerical summary of the entire population.
ΞΌ = true population mean  |  Οƒ = true population SD  |  N = population size

Parameters are fixed constants β€” they do not change when you repeat a study. Only our estimates of them change.

Random Sampling your study cohort

πŸ₯ Clinical Context
You design an RCT and recruit n patients from the population. Orange dots = enrolled patients. Blue dots = patients not in your study. Adjust n and see how coverage changes.
50
Random sampling gives every patient an equal chance of enrolment β€” this prevents selection bias.
In clinical trials, this is achieved through randomisation. Without it, your sample may systematically differ from the population (e.g. only sicker patients presenting to your clinic).

Sample Statistics vs Population Parameters estimation

πŸ₯ Clinical Context
You measure SBP in your recruited patients and compute the sample mean xΜ„. This is your estimate of the true population mean ΞΌ. Every published clinical trial does exactly this.
50

Population β€” true parameters

Sample (n=50) β€” estimates

xΜ„ β‰ˆ ΞΌ  sample mean estimates true mean  |  s β‰ˆ Οƒ  sample SD estimates true SD
Larger n β†’ better estimates. This is why underpowered trials (tiny n) are unreliable β€” their xΜ„ can be far from ΞΌ by chance alone.

Sampling Error why studies disagree

πŸ₯ Clinical Context
Two hospitals run identical studies on the same antihypertensive drug β€” same protocol, same population β€” yet report slightly different mean SBP reductions. This is not researcher error. It is sampling error: random variation between samples.
30
Each line = one study's sample mean. Dashed orange = true population ΞΌ.
Standard Error (SE) = Οƒ / √n β€” quantifies how much xΜ„ varies across studies.
Increase n: notice the dots cluster tighter around ΞΌ. Bigger trials = less sampling error.

Central Limit Theorem why we can use t-tests on non-normal data

πŸ₯ Clinical Context
HbA1c values in a diabetic population are right-skewed β€” not normally distributed. Yet clinical trials still use t-tests on HbA1c. How? Because the CLT guarantees that the distribution of sample means is approximately normal, even when the raw data are not.
5
1000

Original population (skewed)

Distribution of sample means xΜ„

CLT: as n increases, the distribution of xΜ„ approaches Normal(ΞΌ, σ²/n) β€” regardless of the original distribution's shape.
Rule of thumb: n β‰₯ 30 is usually sufficient. This underpins nearly every parametric test you will use in clinical research (t-test, ANOVA, regression).

Confidence Intervals what every paper reports

πŸ₯ Clinical Context
The SPRINT trial reported that intensive BP control reduced SBP by 14.8 mmHg (95% CI: 14.3 – 15.3). You read this in every paper β€” but what does it actually mean?
40
Formula: xΜ„ Β± z Γ— (Οƒ/√n) where z = 1.96 for 95% CI

Correct interpretation: "If we repeated this study 100 times, ~95 of the resulting intervals would contain the true ΞΌ."
Common misconception: It does NOT mean "there is a 95% probability that ΞΌ lies in this interval." ΞΌ is fixed β€” it either is or isn't in the interval. The probability refers to the procedure, not this particular interval.

A narrow CI = more precise estimate = larger n or smaller Οƒ. When you see wide CIs in a paper, the study was likely underpowered.

Hypothesis Testing & the p-value the most misunderstood concept in medicine

πŸ₯ Clinical Context
You run an RCT: Drug A vs Placebo for SBP reduction. After 6 months, Drug A group has a lower mean SBP. But could this difference be due to chance alone? This is what hypothesis testing answers.
Hβ‚€ β€” Null Hypothesis
The drug has no effect. Any observed difference is due to chance alone.
"Drug A does not change SBP."
H₁ β€” Alternative Hypothesis
The drug has a real effect on the population.
"Drug A reduces mean SBP."
5
40
p-value = probability of observing a difference this large (or larger) if Hβ‚€ were true.

p < 0.05 β†’ "statistically significant" β€” we reject Hβ‚€ (by convention).
p β‰₯ 0.05 β†’ we fail to reject Hβ‚€ β€” insufficient evidence for an effect.

⚠️ p < 0.05 does NOT mean:
  β€’ The drug is clinically important (a 1 mmHg reduction can be "significant" with n=10,000)
  β€’ Hβ‚€ is true with 95% probability
  β€’ The result will replicate

Always report effect size + CI + p-value together.

Parametric vs Non-parametric Tests when normal rules break down

πŸ₯ Clinical Context
Many clinical variables are not normally distributed: length of hospital stay, pain scores (1–10), cytokine levels, tumour sizes. Using a t-test on these violates its assumptions. Non-parametric tests make no assumption about the distribution shape β€” they work on ranks instead of raw values.

Choose a distribution shape and sample size below. See how the two groups compare visually, and watch which test gives the right answer.

20
1.0

Group A (Control)

Group B (Treatment)

πŸ“‹ Which test should you use?

Situation Parametric test Non-parametric alternative When to use non-param
⚠️ Common mistake in clinical papers: Using a t-test on pain scores, Likert scales, or heavily skewed lab values without checking normality. Always plot your data first β€” a histogram or Q-Q plot reveals the shape.

Rule of thumb: If n β‰₯ 30 per group, CLT saves you even for skewed data (use t-test). If n < 30 and data are clearly non-normal β†’ use non-parametric. Always use non-parametric for ordinal data regardless of n.