README.md

Notes from R Packages Book by Hadley Wickham

Dan Yavorsky

Minimum Package Creation with VCS and C++

Create a Package

Connect to GitHub

First, track the package with git:

Second, create a GitHub repo:

Last, connect the local project to the GitHub repo:

git remote add origin git@github.com:dyavorsky/how-to-r-pkg.git  
git push -u origin master

To work with Git/GitHub:

You'll want a markdown README file at the top level of the directory so that GitHub renders it on the repo's landing page. It's best to add the README.Rmd and README.html to .gitignore so there's no confusion on which gets rendered by GitHub (the README.md one).

Package Pieces

Package Development Workflow

To load the development version of the package, run devtools::load_all(). The easier way, however, is to press Ctrl-Shift-L in RStudio, which saves all open files and loads the package.

You'll mostly be writing functions. That workflow is:

To omit a directory or file when building a package, include a RegEx in the .Rbuildignore file. For this package, the RMarkdown version of this ReadMe file (README.Rmd) has been added to .Rbuildignore. The line reads ^ReadMe\.Rmd$.

When you document functions with roxygen (see below), that workflow is:

DESCRIPTION File

This file uses Debian Control Format (DCF), which means each line has a field name and a value, separate by a colon. When values span multiple lines, they need to be indented.

Title and Description appear on the CRAN download page:

This file is where you put other package dependencies (not in a library() or require() statement somewhere in your code!)

You can manually add information to the DESCRIPTION file, or you can use devtools::use_package("dplyr") to do it for you.

It's a good idea to require a minimum version of other packages if they're dependencies. So your DESCRIPTION file might look something like this

Imports:
  dplyr (>= 0.3.0.1),
  ggvis (>= 0.2)
Suggests:
  MASS (>= 7.3.0)

NAMESPACE File

This file control hows your package interacts with the rest of R, importantly making it self-contained. It controls imports, which define how a function in one package finds a function in another; and exports, which define which of your functions are available outside of your package.

We will not generate the NAMESPACE file by hand, but will instead use roxygen2, as we do for function/dataset documentation (see below). The NAMESPACE workflow will be the same as the documentation workflow:

For a function to be usable outside of your package, you must export it. Do this by putting #' @export in a roxygen block. Roxygen will handle the specific "export" code in the NAMESPACE file for you. (For datasets that live in data/, they don't use the usual namespace mechanism and don't need to be exported.)

If your package uses functions from another package, you must import them (i.e., that package must be loaded or loaded-and-attached). Here's the recommended way to do so:

C++

To set up your package with Rcpp:

This does the following:

The workflow will be:

"Build and Reload" does a lot of work behind the scenes. One of those things is to call Rcpp::compileAttributes(), which inspects .cpp functions looking for attributes of the form // [[Rcpp::export]]. When it finds one, it generates the code needed to make the function available in R, and creates src/RcppExports.cpp and R/RcppExports.R. Never modify these by hand.

Two important parts of each C++ file are:

  1. The header

    #include <Rcpp.h> using namespace Rcpp;

  2. Making the function available in R

    // [[Rcpp::export]]

A bit about package development

For package development, it's helpful to know about the 5 states that a package can be in.

  1. Source: Just a directory of files and subdirectories (i.e., the development version of a package)
  2. Bundled: A compressed, single file (.tar.gz) of the package; not useful on its own, but a useful intermediary step
  3. Binary: Platform-specific; how packages are distributed via CRAN
  4. Installed: A decompressed binary package
  5. In-Memory: to use a package, you need to load it into memory

The command line tool R CMD INSTALL powers all package installation. The R package devtools provides functions that are wrappers for R CMD INSTALL so that you can call this command from inside of R.

Additional Topics

Documentation Workflow with Roxygen2

The four basic steps are:

  1. Add roxygen comments to your .R files
  2. Run devtools::document() (or press Ctrl-Shift-D^[This must be enabled in Package Options > Build Tools] in RStudio) to convert roxygen comments to .Rd files
  3. Preview documentation with ?
  4. Repeat

Documenting R code with roxygen2 involves putting the help documentation directly into the .R code files using roxygen comments, which start with #'. In C++ files, roxygen comments are //'. Lines must wrap at 80 characters.

Roxygen comments come in blocks. A block is all the documentation for a specific function and it goes before the function. Thus one .R file can have multiple documented functions.

Each block is made up of an introduction and tags with the format @tagname details. The intro has a title, description, and (otionally) details. Then you include tags for documentation elements.

Common tags are:

Functions

Navigation

Finding Documentation

Datasets

Other

Rd character formatting

Character

Links

Lists

Math

So an example might be:

#' Sum of vector elements.
#'
#' \code{sum} returns the sum of all the values present in its arguments.
#'
#' This is a generic function: methods can be defined for it directly
#' or via the \code{\link{Summary}} group generic. For this to work properly,
#' the arguments \code{...} should be unnamed, and dispatch is on the
#' first argument.
#'
#' @param ... Numeric, complex, or logical vectors.
#' @param na.rm A logical scalar. Should missing values (including NaN)
#'   be removed?
#' @return If all inputs are integer and logical, then the output
#'   will be an integer. If integer overflow
#'   \url{http://en.wikipedia.org/wiki/Integer_overflow} occurs, the output
#'   will be NA with a warning. Otherwise it will be a length-one numeric or
#'   complex vector.
#'
#'   Zero-length vectors have sum 0 by definition. See
#'   \url{http://en.wikipedia.org/wiki/Empty_sum} for more details.
#' @examples
#' sum(1:10)
#' sum(1:5, 6:10)
#' sum(F, F, F, T, T)
#'
#' sum(.Machine$integer.max, 1L)
#' sum(.Machine$integer.max, 1)
#'
#' \dontrun{
#' sum("a")
#' }
#'
#' @section Warning:
#' Do not operate heavy machinery within 8 hours of using this function.
#'
#' @family aggregate functions
#' @seealso \code{\link{prod}} for products, \code{\link{cumsum}} for cumulative
#'   sums, and \code{\link{colSums}}/\code{\link{rowSums}} marginal sums over
#'   high-dimensional arrays.
#'
sum <- function(..., na.rm = TRUE) {}

Vignettes

Common vignette browsing commands:

To create a vignette with devtools: devtools::use_vignette("my-vignette"), which does 3 things:

  1. Creates a vignettes/ directory
  2. Adds the necessary dependencies to the DESCRIPTION file (adds knitr to the Suggests and VignetteBuilder fields)
  3. Drafts a template vignette, vignettes/my-vignette.Rmd

Workflow:

CRAN Notes

You build vignettes locally. CRAN only receives the output (html/pdf) and the source code. CRAN does not rebuild the vignette; it only checks that the code is runnable (by running it).

Testing

[ADD TESTING CHAPTER NOTES]

Checking

[ADD CHECKING CHAPTER NOTES]

Datasets

You can include data in your package. .RData datasets go in data/; raw datasets go in inst/extdata/.

#' Prices of 50,000 round cut diamonds.
#'
#' A dataset containing the prices and other attributes of almost 54,000
#' diamonds.
#'
#' @format A data frame with 53940 rows and 10 variables:
#' \describe{
#'   \item{price}{price, in US dollars}
#'   \item{carat}{weight of the diamond, in carats}
#'   ...
#' }
#' @source \url{http://www.diamondse.info/}
"diamonds"

For CRAN, datasets should be less than 1MB and compressed.



dyavorsky/HowToPkg documentation built on Feb. 2, 2024, 8:24 p.m.