knitr::opts_chunk$set(echo = TRUE) devtools::load_all()
Have you ever been in that situation where your R script is painfully long? Something like:
# first remove everything on the environment rm(list = ls()) # then install the packages you need install.packages("ggplot2") install.packages("nlmer") install.packages("rethinking") install.packages("whatever") install.packages("some bullshit I need 4 just that one line") # then load the packages library(ggplot2) library(nlmer) library(rethinking) library(whatever) library(some bullshit I need 4 just that one line) # then load the data data <- read.csv(C:/was_it_this_folder?/within_this_one?/not_sure/maybe_here?/data.csv) # then tidy the raw data and only keep what you need for your analysis actual_data <- data.frame(unicorn_id = data$UniCorn_Identity, horn_length = data$Length_of_THe_horn_of_the_UNicorn, weight = data$W_of_the_Unicorn_when_born) # to then finally be able to run the fricking analysis model <- lm(horn_length ~ weight, actual_data) # to then get a fricking ERROR AGAIN GODDAMN IT!
Installing packages, loading them, cleaning data, preparing it for analysis, doing the analysis and plotting the results all within the same file can result in absurdly long R
scripts. There is a way to solve this and it is by treating your entire research project as an R
package. Let me explain:
R
package?An R
package is a type of R
project, in other words, a way to work with R
on a specific topic. A package, just like any project, is set up from a folder in your computer that you set up where you store all your files and data. If you develop your own R
package it will work similarly to those packages you install and load every time you run an R
script (i.e ggplot2
, nlmer
etc.) when you need a function that does something useful, only this time, you will be the person behind the scenes!
A package can include, functions, data, and all kinds of scripts to process or document your work. As an example, I have developed the package cataract
por the EcoLunch presentation. I can load this package and with it I can specify other packages that need to be both installed and loaded with it with the function
devtools::load_all()
When loading the package I can also access functions that I have written myself to perform specific tasks. For example, here is a function that generates facts about Chris H. with different degrees (1-5) of spicyness:
generate_chris_fact(spicyness = 3)
I can also access any data sets I am working with. I can actually save them after being processed and with the variables I want so when I load them they are ready to be played with. As an example, I have added some avengers data to the cataract
project:
# just typing the dataset's name when the package is loaded will make the data pop up!
avengers
And this is just the tip of the iceberg of the things you can do with your R
package! Would you be interested in creating one for yourself? Here are some tips:
R
packageOkay dude but how in the hell do you actually do this? No problem! Here is a step by step tutorial for you:
Have you ever noticed when you open Rstudio that little R sign that says something like Project:(None)
. Click there and go to New Project
.
Then this window will pop up, we want a New Directory
.
Here we will choose R package
.
Which will open this:
Three things to consider here:
projects
where you can locate all your work!GIT
! You should have GIT installed in your computer right now (if not go to: https://ggcostoya.github.io/cataract/articles/git.html and check how to do it). If you indeed have it, a little check box with Create git repository
should appear in this screen. Be sure to check that box! It will be essential later. Once you are done, click on Create Project
and KABOOM! Congrats! You've created your first R
package, now it's time to see what is going on here.
R
packageAs soon as you create your R
package you will see that a bunch of folders automatically appear on the folder for it. You can see these in the lower right-hand panel in Rstudio where it says Files
, it should look something like this:
Among these files and documents the ones you got to pay attention to are:
.Rproj
The .Rproj
file is the source to all of your R
package. In this case, as an example, I've created the kurama
package so in the picture above you see kurama.Rproj
. When you create projects of your own the .Rproj
file will be your package's name. If you go to the folder where you created the package and click on it an R
session with your package name will appear. Notice also that on the top right hand side it doesn't say Project:(None)
anymore! The name of your package should appear up there.
DESCRIPTION
The DESCRIPTION
file contains all the information related to your project. If you open it, it should look like this:
Package: kurama # here's my example, but here the name of your package will appear. Type: Package Title: What the Package Does (Title Case) Version: 0.1.0 Author: Who wrote it Maintainer: The package maintainer <yourself@somewhere.net> Description: More about what it does (maybe more than one line) Use four spaces when indenting paragraphs within the Description. License: What license is it under? Encoding: UTF-8 LazyData: true
The DESCRIPTION
file can include all Dependencies, which are all packages that need to be installed and loaded for your package to run. You will need to add them manually. You can also add information about the package (title, author, description etc.). As an example, here is the DESCRIPTION
file for cataract
:
Package: cataract Type: Package Title: Project Cataract Version: 1.0 Author: Guillermo Garcia Costoya Maintainer: Guillermo Garcia Costoya <guille@nevada.unr.edu> Description: A quick guide on how to use GIT, R packages, GitHub and GitHub pages as part of your research process to keep your files and data tidy, tracked, organized and kewl. A package develop to complement the April 30th 2021 EcoLunch talk by Guillermo Garcia Costoya at the EECB program of the University of Nevada, Reno License: What license is it under? Encoding: UTF-8 LazyData: true Depends: ## Here you will need to specify the packages you want to be installed and loaded! R(>= 3.3.0), tidyverse, devtools, DescTools, gridExtra, here RoxygenNote: 7.1.1 ## Do me a favor and add this as well if you are copy pasting. It's for your own good.
Setting up a DESCRIPTION
file like this saves you the trouble of having to install and load packages every time. You just need to include them as Dependencies and BOOM! They will be loaded every time you load your R
package!
\R
The \R
folder will contain all the functions for your package. If you click there as soon as you create your package you'll see that R
has automatically created a function called hello
, which is saved on the hello.R
file that looks like this:
# Hello, world! # # This is an example function named 'hello' # which prints 'Hello, world!'. # # You can learn more about package authoring with RStudio at: # # http://r-pkgs.had.co.nz/ # # Some useful keyboard shortcuts for package authoring: # # Install Package: 'Ctrl + Shift + B' # Check Package: 'Ctrl + Shift + E' # Test Package: 'Ctrl + Shift + T' hello <- function() { print("Hello, world!") }
The goal here is that you create functions for yourself to make tasks easier and more repeatable. Writing your own functions is a way to avoid repetitions and keep your code short. For example, let's say that for the avengers
data set I showed you earlier I want to determine who would win in a fight between Vision and Thor:
# let's check the data first once again:
avengers
# filter the data to keep only data for vision and thor avg_vision_thor <- filter(avengers, name %in% c("vision", "thor")) # keep the avenger with the highest power value winner <- filter(avg_vision_thor, power == max(power)) # extract the name of the winner name_winner <- as.character(winner$name) # and here's the winner! VISION! of course... name_winner
# what if I needed to know who would win between scarlett witch and hulk? # Redo the entire shannanigan again? HELL NO! # I can write a function to automate this process. I can call this function "match_avengers": match_avengers <- function(avg_1, avg_2){ # avg_1 and avg_2 are the arguments of the function, # filter avenger data and keep only two specified avengers as well as the most powerful one match <- avengers %>% filter(name %in% c(avg_1, avg_2)) %>% filter(power == max(power)) # get the name of the winner winner <- as.character(match$name) return(winner) } # let's test the function out! SCARLETT WITCH ALL DAY!! match_avengers("scarlett witch", "hulk")
If you check the \R
folder for the cataract
package here: https://github.com/ggcostoya/cataract/tree/master/R , you'll see that one of the files there is for the match_avengers
function. If it is there I can use the function any time cataract
is loaded!
.gitignore
The .gitignore
file will specify which files or folders you don't want GIT
tracking version-control style. For example, a good addition to .gitignore
is the .Rproj
file, it will never change, why keep track of it?
R
packageAlthough the package would work with the basic stuff that is generated when you created, to upgrade your R
package to the next level you'll need to create the following folders:
\raw_data
In the \raw_data
folder you will save all data that comes fresh from the field or the lab. I am talking about those messy excels, .txt
files, .csv
files anything! All in there. The \raw_data
folder will also contain what I call cleaning scripts. These will have all the data processing, filtering and cleaning necessary for your raw data to become the actual data you'll use in your scripts.
If you go to: https://github.com/ggcostoya/cataract/tree/master/raw_data, you'll be to see the raw data for the cataract
. As an example I have included some meerkats
data a friend of mine used for a project a couple of years ago. Below is an an example cleaning script (called meerkats_data_prep.R
):
## Meerkat data preparation library(here) # load here package meerkats <- read.table(file.path(here(), "raw_data", "meerkats.txt")) # load the raw meerkat data # rename some columns to make analysis easier meerkats <- data.frame(obs_feeds = meerkats$observedNfeeds, obs_hours = meerkats$observedhours, r_2_brood = meerkats$relatedness2brood, group_id = meerkats$socialgroupID, brood_id = meerkats$broodID, delta_mass = meerkats$massChange) # save meerkat data in "data" folder save(meerkats, file = file.path(here::here(), "data", "meerkats.RData"))
As you see above, I am reading it, renaming some variables to make it look prettier and easier to handle when running stats on it and finally I am saving it in the \data
folder, which I'll talk about next:
\data
The \data
folder will contain the actual data you will use in your scripts (in .RData
format) and will be available to you immediately after you load your R
package. As an example, I can take a look at the meerkats
data without having to load anything because it is saved in the data
folder!
head(meerkats)
\sandbox
And finally, the \sandbox
is where you will save all those scripts in which you are trying out stuff. This is a personal recommendation of mine; a folder where you can save all your playing around with models and data.
Everything I have explained is extracted from the book "R
packages" by the most awesome Hadley Wickham, the head developer for Rstudio and co-author of packages as awesome as tidyverse
, ggplot2
, pkgdown
etc. The book is absolutely free and you can find it here: https://r-pkgs.org/. It has soo many more awesome things I did not develop here! Feel free to check it out!
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.