Creating packages tutorial

Check for a correct devtools install/development environment.

Tip: The RStudio Release Preview has the most up-to-date build tools and code-linting features. I recommend using it to avoid mistakes.

# install.packages(c("devtools", "roxygen2", "knitr", "testthat", "ggplot2"))
devtools::find_rtools()
devtools::has_devel()

Both should return true. If not, you will need to install RTools (Windows) or the XCode/Command-Line tools for Mac. (You should already have the r-base-dev package on Linux.)

What's the funky :: operator? It's a shortcut for using functions and data from packages when you haven't actually loaded them with library().

X::Y() is the equivalent of saying "from package X, use function Y"

So this:

devtools::has_devel()

is the same as

library(devtools)
has_devel()

Neat, right? It'll become VERY useful later.


Making Packages

Okay let's make a package with the built-in RStudio interface.

New Project-> New Project in new directory -> New R Package (with git repo checked)

Commit everything to the git repo. (In case something goes wrong later...)

Now go to RStudio's build tab and hit "Build & Reload". This will construct your package and tell R that it exists and is ready for use.

As a test, run the following function (this is all your package contains right now): hello()

Woo! We just made our first package. But why? Why do we want to make packages? Hopefully as we continue, it will become more apparent, but I'm putting my favorite reasons here:


So what is a package?

People have different definitions of what a package does, but they commonly are a set of functions and data that extend R's function. However, strictly speaking, they are anything that has a DESCRIPTION file (this is what RStudio/devtools considers a package).

So what goes into a package and how does it work? What are we going to do when we make a package? Lets go over what each of the files do.


DESCRIPTION

This is the description of your package. This has all of the information needed to understand what your package does, who wrote it, and what other packages it requires to run. Fill in the fields as you like right now. If you want something to be on multiple lines, you'll need to hit return mid sentence and add two spaces before you continue typing (a bit irksome, but that's life).

What if we want our package to require other packages to run? Use the following code:

devtools::use_package("NameOfPackage")

You should see the name of the package pop up under imports. If not close the file and open again and it should be there. Try adding an import for ggplot2.


NAMESPACE

This is a special file that is handled automatically by RStudio. You will never need to touch it.

However, in the interests of being thorough, it contains the "names" of everything you add to your package. Stuff that is found in this file becomes available to the user. RStudio will automatically generate this for you as you write code.


.gitignore

Stuff you don't want git to see. Add a new line here with the filename for every file you wish to hide from it (you can use the "*" character as a wildcard).


.Rbuildignore

This is another special file that you will never need to touch. It contains information on which files to ignore when building your package from source.


R/ (directory)

This is where our actual R code lives. So what types of code goes in here? Long story short: functions, methods, and classes (we'll explain what methods/classes are later). Generally scripts (ie. probably what you've been writing up until now) are NOT ALLOWED.

Why can't we use scripts here?

R packages are run differently from normal R code. In a normal R script/markdown or on the console, code is run when you execute it (when you "source it" or hit enter). In an R package, the code is run only once: when the package is built (code is only run when you hit the "build and reload" button).

So if you put a script in your R/ directory, it'll be run when you hit "build and reload"

As a BAD EXAMPLE to demonstrate this, create a new Rscript and paste this in there.

print("this is why not to use scripts")
4 + 5

Save the file under scriptsAreBad.R in the R/ directory. Now rebuild your project. Notice anything weird?

So how do we get our code to run when we want it to? Make it a function. Here's a good example (paste this code into a new R script, save it in R/, and rebuild).

functionsAreGood <- function() {
    print("this is why not to use scripts")
    return(4 + 5)
}

Now run functionsAreGood(). Our code runs when its intended to. Yay!

As you can see, when adding code to your packages, it is a much better idea to use functions instead of script. If you want to add a script to your package that does a lot of work, it is usually best to break it down into a couple smaller functions that each do a small, discrete chunk of the work (loading data/doing calculations/etc.) instead of having one big function that does everything.

Another thing to keep in mind. Because code only is run when the package is built, how do we run code from other packages? We can't use library() because that only gets run when we build things!

Answer: the :: operator. Let's try an example using ggplot2.

Save this into a new .R file:

testPlot <- function() {
  ggplot2::ggplot(mapping = ggplot2::aes(x = 1:10, y = 1:10)) +
    ggplot2::geom_point()
}

# Build/reload, then run `testPlot()`. This *should* work. 
testPlot()

Awesome! So that is the basics of putting stuff and getting it to work in the R/ directory. Note that when you are saving functions/other stuff into .R files, the name of the file doesn't matter, and you can have any number of functions/things in a single .R file. To jump to a function press "Ctrl + ." and then start typing its name!

One final note: R doesn't actually read anything in subdirectories of R/. So this means you can't organize your files using folders, which is annoying. Your best bet for organizing your files is clever naming/grouping related functions in a single file.


data/ (directory)

This directory holds data. More specifically, this directory holds data that your users can see and access. Ever wondered where the example data comes from in packages? This is it.

Let's see if we can include the gapminder dataset in our package.

# load the gapminder data...
library(gapminder)
gap <- gapminder
# check to make sure its what we think it is...
head(gap)
# save it under gapminder.rda in the data/ directory
devtools::use_data(gap)
# note: this is equivalent to the following line:
# save(gap, file = "data/gap.rda")

Now let's see if we can use it! Rebuild your package then enter the following code:

# Get rid of the data to test loading it as a demo
rm(gap)
# See what datasets are available. Scroll to the bottom and our data should be there!
data()
# To load the data
data(gap)
head(gap)

man/ (directory)

This is where the manuals for our functions/stuff lives. Also known as what pops up when we type `?functionName'

Let's open it up and examine the example docs (hello.Rd) for the hello() function that our package came up with. Looks complicated. I'd rather not type that out. Why not let RStudio make it for us?

In an awesome turn of events, we don't actually have to ever touch the man/ folder to create documentation. When using roxygen2 to generate documentation, all the work is done where the actual function/object is defined (in the .R file in the R/ folder).

Let's go back to the R/ folder and make a new function that does basic addition as an example.

Copy and paste this into a new .R file (or write a function of your own!).

adder <- function(x, y) {
  return(x + y)
}

Now, with the keyboard cursor located somewhere inside the function, press Ctrl+Alt+Shift+R (or you can go to Code -> Insert Roxygen skeleton in the menu). Your function should now look like this:

#' Title
#'
#' @param x 
#' @param y 
#'
#' @return
#' @export
#'
#' @examples
adder <- function(x, y) {
  return(x + y)
}

This is a "skeleton" set of documentation for you to add to. Note that the #' is a special type of comment that indicates that each line is something for Roxygen to read and document. Each item of information is preceded by the @ character.

Heres what each of the common tags mean:

I've filled it out with the necessary information here as an example:

#' Add two numbers
#'
#' This is a riveting description that describes what your function does. This
#' is a link to \code{\link{hello}}.
#'
#' @param x A number
#' @param y Another number!
#'
#' @return Returns the result of addition
#' @export
#'
#' @seealso \code{\link{hello}}
#'
#' @examples
#' adder(1, 2)
#'
adder <- function(x, y) {
  return(x + y)
}

Now run devtools::document() and build/reload your project. Now when you type ?adder (or search for the documentation in any other manner), the documentation we just wrote will pop up.


tests/ (directory)

Sometimes you want to want to check your code to make sure it runs properly. You can test it informally (by testing functions manually) or you can automate it with a package like testthat (also called "unit testing"). I'll give a quick example or two on how to perform unit testing using testthat.

devtools::use_testthat() will set up your package to be run with testthat.

To create a unit test, create a .R file inside tests/testthat/.

Inside this file, create a series of tests using the following format:

test_that("message that explain what this test does", {
  expectation1
  expectation2
  expectation3
})

Simply put, a test generally tests one "unit" of functionality. Expectations are the actual individual trials your code needs to complete to pass this test. Your expectations can vary, but generally the left side of an expectation must match the right side. Here are a few examples of what testthat expectations look like:

expect_equal() - Used to test whether two numbers are the same.

expect_equal(number1, number2)

expect_match() - Does the string on the left contain the string on the right (CASE-SENSITIVE)?

expect_match(string_you_are_testing, does_it_contain_this)

expect_error() and expect_warning() - does your code generate a warning/error like it's supposed to?

expect_warning(expression_to_test)
expect_warning(expression_to_test, error_message)

Let's try making an example test file for our adder() function. Note that I've added a function called context() with the description of what this file is testing near the top.

context("We're testing the adder function")

test_that("Adder adds numbers together properly", {
  expect_equal(adder(2, 3), 5)
  expect_equal(adder(2, -2), 0)
})

test_that("Adding fails when given weird input", {
  expect_error(adder(1, "ted"))
  expect_equal(adder(4, NA), NA)
})

IMPORTANT: your test file needs to start with "test" or testthat won't find it!

To run our tests, we simply use:

devtools::test()

Huh, that's weird. It appears that NA is not actually equal to NA here (due to being different object types). I learned something new there... but thats informative and demonstrates what happens when you fail a test! (note that the passed expectations in a test file are represented by "."s).

As a side note, I love that "this is why not to use scripts" keeps popping up over and over as a nice reminder...


Using and distributing your package

Those are the raw basics to making a package. Sure, you add other stuff as well, like compiled C++ or Java code, unit tests (see testthat package), or other package-specific features, but this should be enough to get started.

How do you use a package that you've written? It's as simple as using library(yourPackageName), the same as any other package. After all, we did just build a real package.

What if you want to use your package on another computer? What if you have a colleague that wants to use it on their computer? devtools::install_github("yourGitHubAccount/repositoryName") will install the package for you along with the required dependencies (that you specified in the "Imports" section in the DESCRIPTION file). The only prerequisites are that your end user has the devtools package and the necessary requirements to build stuff from source (RTools on Windows installations). You obviously need to have your package on GitHub for this to work, but an equivalent command also exists for other online code repositories as well (as an example devtools::install_bitbucket() will install something from BitBucket).


Common errors (that I've run into at least)

Package won't build, throwing an error with a traceback.

Package won't build, giving an error that says something like "setwd(), permission denied" .... In my experience (only run into this on Windows so far), either one of the following two things are wrong:

Documentation is unavailable for a function, even though you just rand devtools::document()

A function/dataset/documentation element cannot be found even though you are 100% sure it's there and have rebuilt the package.

"Cyclic namespace dependency detected" error

I want to get rid of a package. I've either renamed it or simply don't want it anymore.



jstaf/TestPackage documentation built on May 20, 2019, 2:11 a.m.