Sometimes, in the second edition, we use the levels function to get the unique levels of a variable. For example on page 133 we do levels(growth.moo$diet) to get the unique levels of the diet variable. Today, this does not work. Below I explain why and how to fix it. Short version is use unique instead of levels or convert the variables to factors.
Prepare We will use the mutate function from the dplyr package, so please ensure you have that package installed.
In our book Insights we help readers learn how to use the amazing ggplot2 package to make visualisations. If you have some experience with ggplot2, you may think our method of teaching it, and of using it in the book are a bit odd. Here we explain our reason for teaching it the way we teach it.
For example, here is the code for the first graph we make:
bats_Age_Sex %>% ggplot() + geom_col(mapping = aes(x=Sex, y=num_bat_IDs, fill=Age)) And if we had not piped the dataset into ggplot then we would have done this:
[This is a minimal post due to very limited time.]
We need to check the assumptions of our linear model (e.g. regression, ANOVA, ANCOVA) are not too badly violated. We often use four diagnostic graphs to do so. One of these shows standardised residuals plotted against leverage (each observation has a value).
The take home message of this post is if your model contains at least one continuous explanatory variable, use the base R methods for making your diagnostic plots:
Inspiration for the following from from Richard McElreath’s Statistical Rethinking book, and some of the code comes from here: https://bookdown.org/ajkurz/Statistical_Rethinking_recoded/multivariate-linear-models.html#masked-relationship
Let us think about the question of how the response variable y is related to two explanatory variables x1 and x2.
First we make a dataset in which we know the relationships because we specify them: we make y = x1 - x2. Before this, we create x1 and x2 and make them correlated…
This is the first in a series of posts about maximum likelihood methods for fitting statistical models to data. Inspiration for the material comes in large part from Drew Purves who presented something similar. Owen is using Drew’s approach as the basis for this course. Much of the R specific stuff is heavily influenced by Ben Bolker’s excellent book: Ecological Models and Data in R. The goal of this and the following posts includes: learning how to fit to our data more mechanistic models of arbitrary complexity.
Just a little demo of what happens if you don’t or do adjust your r-squared. Here’s the bottom line…
As we increase the number of explanatory variables in a linear model (e.g. multiple regression) the unadjusted r-squared increaes (green dots) even if the additional explanatory variables contain only random numbers. The adjusted r-squared is “adjusted” so it does not! So if we simply want to know the proportion of variance explained by our model we are fine using the unadjusted r-squared.
To use the functions in an add-on package you first need to install the package. Remember you only need install it once.
During the writing of the book, and in early 2018 the normal method for installing the ggfortify add-on package didn’t work (we got the message package ggfortify is not available (for R Version 3.2.4)).
This has not happened for some time, so hopefully you won’t experience it. If you do…
As the Second Edition of Getting Started with R was going to press, Rstudio changed the function it uses to import data in the Import Dataset tool, from the base function read.csv() to the read_csv() function in the readr package. Since then, the Import Dataset button gives a menu with an option to use either (“base” uses read.csv and “readr” uses read_csv) From the Rstudio Blog about the readr package: