Just a little demo of what happens if you don’t or do adjust your r-squared. Here’s the bottom line…

As we increase the number of explanatory variables in a linear model (e.g. multiple regression) the unadjusted r-squared increaes (green dots) even if the additional explanatory variables contain only random numbers. The adjusted r-squared is “adjusted” so it does not! So if we simply want to know the proportion of variance explained by our model we are fine using the unadjusted r-squared. If, however, we want to compare the r-squared of models with different numbers of explanatory variables, we should compare the adjusted r-squared.

Here’s the code for making the figure. (Done before we converted to the tidyverse!)

## Lets do some multiple regression, with different numbers of explanatory variables
## with completely random data
numb.expl.vars <- floor(rep(2^seq(0, 5, 0.5), each=50))

## Number of observations
n <- 100

## The response variable
y <- rnorm(n)

get.r2 <- function(ne) {
x <- as.data.frame(matrix(rnorm(n*ne), n, ne))
m1 <- lm(y ~ ., x)
result <- c(summary(m1)$r.squared, summary(m1)$adj.r.squared)
result
}

## use lapply to run the function over the number of explanatory variables vec
rez <- do.call(rbind, lapply(numb.expl.vars, function(x) get.r2(x)))

## get the mean r-squared and adjusted r-squared per number of expl varbs
means <- aggregate(rez, list(numb.expl.vars=numb.expl.vars), mean)

## plot the data
matplot(log2(numb.expl.vars), rez, type="n", ann=F, axes=F)
box()
abline(h=0)
matpoints(jitter(log2(numb.expl.vars)), rez, pch=19, col=c("#11ff1144", "#ff111144"))
mtext(1, line=2.5, text="Number of explanatory variables")