Statistical Model Diagnostics

An interactive guide to the 5 diagnostic plots in R. Each plot checks a different assumption of linear regression. Click a plot to explore it interactively.

plot(model) Linear regression R / statistics

Residuals vs Fitted  plot(model, which=1)

Checks two assumptions: linearity (is the relationship actually linear?) and homoscedasticity (constant variance of errors). Points should scatter randomly around the horizontal zero line — no pattern.

What to look for:
• Random cloud around the dashed zero line = assumptions met ✅
• Curved/U-shaped smoothed line = non-linearity → consider polynomial terms or log transformation
• Funnel shape (variance increases with fitted values) = heteroscedasticity → use robust SEs or transform outcome

Normal Q-Q Plot  plot(model, which=2)

Checks whether residuals are approximately normally distributed. Points should follow the diagonal reference line closely. Minor deviations at the extremes are usually acceptable.

What to look for:
• Points follow the diagonal = normality assumption met ✅
• S-shape (tails curve away at both ends) = heavy tails or outliers
• Points bend upward at the right = right-skewed residuals → consider log transformation
• With large n, CLT provides some robustness — minor deviations are usually acceptable

Scale-Location  plot(model, which=3)

Also called Spread-Location. Checks homoscedasticity more directly than plot 1 by plotting the square root of standardised residuals. The red smoothed line should be approximately horizontal.

What to look for:
• Flat red line + evenly spread points = homoscedasticity ✅
• Upward slope = variance increases with fitted values (common in count data, income, biological measurements)
• Fix: log or square-root transform the outcome variable, or use weighted least squares (WLS)

Residuals vs Leverage  plot(model, which=5)

Identifies influential observations — points with high leverage (unusual predictor values) AND large residuals. Points outside Cook's D contour lines are potentially distorting your regression line.

Key concepts:
Leverage = how far the x-value is from the mean — unusual predictor values
Influence = high leverage + large residual = high Cook's D
• High leverage but small residual: unusual x, but model fits it well — usually OK
• Cook's D > 0.5: moderate concern  |  Cook's D > 1: strong concern, always investigate

Cook's Distance Bar Plot  plot(model, which=4)

Shows the influence of each individual observation on the entire fitted model. A bar above the threshold means that one data point is substantially shifting your regression line.

40 1
Threshold rules of thumb:
• Cook's D > 4/n: worth a look (dashed threshold line)
• Cook's D > 0.5: moderate concern — examine the observation
• Cook's D > 1: strong concern — report sensitivity analysis with and without
• High Cook's D ≠ automatically delete! Investigate why. It may be the most scientifically interesting point.