levels() not working in 2nd Edition of Getting Started

Sometimes, in the second edition, we use the levels function to get the unique levels of a variable. For example on page 133 we do levels(growth.moo$diet) to get the unique levels of the diet variable. Today, this does not work. Below I explain why and how to fix it. Short version is use unique instead of levels or convert the variables to factors.

Prepare

We will use the mutate function from the dplyr package, so please ensure you have that package installed.

Import the data

In the next line of code I import the data from github, rather than a local copy. This saves us having to deal with local location of the data file. I would normally work with a local copy, however.

growth.moo <- read.csv(url("https://raw.githubusercontent.com/r4all/datasets/master/growth.csv"))

Using unique rather than levels

Looking at the structure of the data in R we see:

str(growth.moo)
## 'data.frame':    48 obs. of  3 variables:
##  $ supplement: chr  "supergain" "supergain" "supergain" "supergain" ...
##  $ diet      : chr  "wheat" "wheat" "wheat" "wheat" ...
##  $ gain      : num  17.4 16.8 18.1 15.8 17.7 ...

Supplement and diet are both chr (character) type variables.

Hence the levels function doesn’t give us the levels. Instead, we get NULL:

levels(growth.moo$supplement)
## NULL
levels(growth.moo$diet)
## NULL

So, instead use unique:

unique(growth.moo$supplement)
## [1] "supergain" "control"   "supersupp" "agrimore"
unique(growth.moo$diet)
## [1] "wheat"  "oats"   "barley"

Awesomeness!

Converting to a factor

Another option is to convert the chr type variables to be factor type variables. There are many ways to achieve this, here are two.

If we want to convert to factors all the chr variables in our data, then we can use the type.convert function with the argument as.is = FALSE. Making this FALSE tells the type.convert function to not keep character type variables as they are, but rather to convert them to factors.

growth.moo.factors1 <- type.convert(growth.moo, as.is = FALSE)
str(growth.moo.factors1)
## 'data.frame':    48 obs. of  3 variables:
##  $ supplement: Factor w/ 4 levels "agrimore","control",..: 3 3 3 3 2 2 2 2 4 4 ...
##  $ diet      : Factor w/ 3 levels "barley","oats",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ gain      : num  17.4 16.8 18.1 15.8 17.7 ...

Great! What were character type variables are now factors.

By the way, the default since R 4.0.0 is as.is = TRUE which can be understood as keep variables as they are–do not convert them to factors. We wrote the second edition before 4.0.0, and this is why levels worked when we wrote the second edition, but does not work now.

Another way is to individually convert each variable, for example:

growth.moo.factors2  <- dplyr::mutate(growth.moo,
                                      supplement = as.factor(supplement),
                                      diet = as.factor(diet))
str(growth.moo.factors2)
## 'data.frame':    48 obs. of  3 variables:
##  $ supplement: Factor w/ 4 levels "agrimore","control",..: 3 3 3 3 2 2 2 2 4 4 ...
##  $ diet      : Factor w/ 3 levels "barley","oats",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ gain      : num  17.4 16.8 18.1 15.8 17.7 ...

Awesomeness 2!

Thanks for reading. Have a nice day!

Avatar
Owen Petchey
Professor of Integrative Ecology

Interested in ecology, diversity, prediction, quantitative methods, a bit of programming, and making beer.