Sometimes, in the second edition, we use the levels
function to get the unique levels of a variable. For example on page 133 we do levels(growth.moo$diet)
to get the unique levels of the diet
variable. Today, this does not work. Below I explain why and how to fix it. Short version is use unique
instead of levels
or convert the variables to factors.
Prepare
We will use the mutate
function from the dplyr
package, so please ensure you have that package installed.
Import the data
In the next line of code I import the data from github, rather than a local copy. This saves us having to deal with local location of the data file. I would normally work with a local copy, however.
growth.moo <- read.csv(url("https://raw.githubusercontent.com/r4all/datasets/master/growth.csv"))
Using unique
rather than levels
Looking at the structure of the data in R we see:
str(growth.moo)
## 'data.frame': 48 obs. of 3 variables:
## $ supplement: chr "supergain" "supergain" "supergain" "supergain" ...
## $ diet : chr "wheat" "wheat" "wheat" "wheat" ...
## $ gain : num 17.4 16.8 18.1 15.8 17.7 ...
Supplement and diet are both chr
(character) type variables.
Hence the levels
function doesn’t give us the levels. Instead, we get NULL
:
levels(growth.moo$supplement)
## NULL
levels(growth.moo$diet)
## NULL
So, instead use unique
:
unique(growth.moo$supplement)
## [1] "supergain" "control" "supersupp" "agrimore"
unique(growth.moo$diet)
## [1] "wheat" "oats" "barley"
Awesomeness!
Converting to a factor
Another option is to convert the chr
type variables to be factor type variables. There are many ways to achieve this, here are two.
If we want to convert to factors all the chr
variables in our data, then we can use the type.convert
function with the argument as.is = FALSE
. Making this FALSE
tells the type.convert
function to not keep character type variables as they are, but rather to convert them to factors.
growth.moo.factors1 <- type.convert(growth.moo, as.is = FALSE)
str(growth.moo.factors1)
## 'data.frame': 48 obs. of 3 variables:
## $ supplement: Factor w/ 4 levels "agrimore","control",..: 3 3 3 3 2 2 2 2 4 4 ...
## $ diet : Factor w/ 3 levels "barley","oats",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ gain : num 17.4 16.8 18.1 15.8 17.7 ...
Great! What were character type variables are now factors.
By the way, the default since R 4.0.0 is as.is = TRUE
which can be understood as keep variables as they are–do not convert them to factors. We wrote the second edition before 4.0.0, and this is why levels
worked when we wrote the second edition, but does not work now.
Another way is to individually convert each variable, for example:
growth.moo.factors2 <- dplyr::mutate(growth.moo,
supplement = as.factor(supplement),
diet = as.factor(diet))
str(growth.moo.factors2)
## 'data.frame': 48 obs. of 3 variables:
## $ supplement: Factor w/ 4 levels "agrimore","control",..: 3 3 3 3 2 2 2 2 4 4 ...
## $ diet : Factor w/ 3 levels "barley","oats",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ gain : num 17.4 16.8 18.1 15.8 17.7 ...
Awesomeness 2!
Thanks for reading. Have a nice day!