Dan Yavorsky
git init
the top level directorydevtools::use_rstudio("path/to/package")
First, track the package with git:
git init
in a shell after you cd
into the package's directorySecond, create a GitHub repo:
Last, connect the local project to the GitHub repo:
git remote add origin git@github.com:dyavorsky/how-to-r-pkg.git
git push -u origin master
To work with Git/GitHub:
.gitignore
to keep them untracked (can do this with a right-click in the Git pane of RStudio)You'll want a markdown README file at the top level of the directory so that GitHub renders it on the repo's landing page. It's best to add the README.Rmd
and README.html
to .gitignore
so there's no confusion on which gets rendered by GitHub (the README.md
one).
R/
directory. You cannot put subdirectories in there.src/
directoryman/
directory (done automatically with roxygen comments)data/
directoryCITATION
or raw data (extdata
) go in the inst
(installed files) directoryDESCRIPTION
and NAMESPACE
files live at the top levelREADME
file, especially if the package is on GitHubTo load the development version of the package, run devtools::load_all()
. The easier way, however, is to press Ctrl-Shift-L in RStudio, which saves all open files and loads the package.
You'll mostly be writing functions. That workflow is:
To omit a directory or file when building a package, include a RegEx in the .Rbuildignore
file. For this package, the RMarkdown version of this ReadMe file (README.Rmd
) has been added to .Rbuildignore
. The line reads ^ReadMe\.Rmd$
.
When you document functions with roxygen (see below), that workflow is:
This file uses Debian Control Format (DCF), which means each line has a field name and a value, separate by a colon. When values span multiple lines, they need to be indented.
Title and Description appear on the CRAN download page:
Title:
is a one-line description (65 char max, no punctuation, no formatting)Description:
is a one-paragraph description (80 char/line max, indent subsequent lines)This file is where you put other package dependencies (not in a library()
or require()
statement somewhere in your code!)
Imports: dplyr, ggvis
means these packages must be present for your package to workSuggests: dplyr, ggvis
is weaker, see Hadley's R Packages page 35You can manually add information to the DESCRIPTION file, or you can use devtools::use_package("dplyr")
to do it for you.
It's a good idea to require a minimum version of other packages if they're dependencies. So your DESCRIPTION file might look something like this
Imports:
dplyr (>= 0.3.0.1),
ggvis (>= 0.2)
Suggests:
MASS (>= 7.3.0)
This file control hows your package interacts with the rest of R, importantly making it self-contained. It controls imports, which define how a function in one package finds a function in another; and exports, which define which of your functions are available outside of your package.
We will not generate the NAMESPACE file by hand, but will instead use roxygen2, as we do for function/dataset documentation (see below). The NAMESPACE workflow will be the same as the documentation workflow:
For a function to be usable outside of your package, you must export it. Do this by putting #' @export
in a roxygen block. Roxygen will handle the specific "export" code in the NAMESPACE file for you. (For datasets that live in data/
, they don't use the usual namespace mechanism and don't need to be exported.)
If your package uses functions from another package, you must import them (i.e., that package must be loaded or loaded-and-attached). Here's the recommended way to do so:
Imports:
field in the DESCRIPTION
file, and then call those functions using ::
(e.g., bayesm::breg()
)::
with @importFrom pkg fun
as a roxygen comment@import pkg
as a roxygen commentTo set up your package with Rcpp:
devtools::use_rcpp()
This does the following:
src/
directory to hold the .cpp
filesRcpp
to the LinkingTo
and Imports
fields in the DESCRIPTION
file.gitignore
file #' @useDynLib your-pkg-name
and #' @importFrom Rcpp sourceCpp
) for the namespace importsThe workflow will be:
"Build and Reload" does a lot of work behind the scenes. One of those things is to call Rcpp::compileAttributes()
, which inspects .cpp
functions looking for attributes of the form // [[Rcpp::export]]
. When it finds one, it generates the code needed to make the function available in R, and creates src/RcppExports.cpp
and R/RcppExports.R
. Never modify these by hand.
Two important parts of each C++ file are:
The header
#include <Rcpp.h>
using namespace Rcpp;
Making the function available in R
// [[Rcpp::export]]
For package development, it's helpful to know about the 5 states that a package can be in.
The command line tool R CMD INSTALL
powers all package installation. The R package devtools
provides functions that are wrappers for R CMD INSTALL
so that you can call this command from inside of R.
devtools::install()
is a wrapper for R CMD INSTALL
devtools::build()
is a wrapper for R CMD BUILD
that turns source packages into bundlesdevtools::install_github()
downloads a source package from GitHub, runs build()
to make vignettes, and then uses R CMD INSTALL
to do the installdevtools::install_url()
and devtools::install_bitbucket()
work similarlyThe four basic steps are:
devtools::document()
(or press Ctrl-Shift-D^[This must be enabled in Package Options > Build Tools] in RStudio) to convert roxygen comments to .Rd files?
Documenting R code with roxygen2 involves putting the help documentation directly into the .R code files using roxygen comments, which start with #'
. In C++ files, roxygen comments are //'
. Lines must wrap at 80 characters.
Roxygen comments come in blocks. A block is all the documentation for a specific function and it goes before the function. Thus one .R file can have multiple documented functions.
Each block is made up of an introduction and tags with the format @tagname details
. The intro has a title, description, and (otionally) details. Then you include tags for documentation elements.
Common tags are:
Functions
@param name description
@examples
@return description
Navigation
@seealso
: point to other places \url{}
)\code{\link{functionname}}
)\code{\link[packagename]{functionname}
)@family
when all functions in a family should link to each otherFinding Documentation
@aliases alias1 alias2
adds aliases which can be used with ?
@keywords keyword1 keyword2
must be taken from a predefined list found in file.path(R.home("doc"), "KEYWORDS")
Datasets
@format
for providing an overview of a dataset@source
to provide details from where you got a datasetOther
@section title
for long documentation that require section breaksRd character formatting
Character
\emph{italics}
\strong{bold}
\code{rfunction}
Links
\code{\link{function}}
\code{\link[package]{function}}
\link[destination]{name}
\url{http://google.com}
\href{http://google.com}{google}
\email{dyavorsky@gmail.com}
Lists
\enumerate{}
\itemize{}
\describe{}
Math
\eqn{}
\deqn{}
So an example might be:
#' Sum of vector elements.
#'
#' \code{sum} returns the sum of all the values present in its arguments.
#'
#' This is a generic function: methods can be defined for it directly
#' or via the \code{\link{Summary}} group generic. For this to work properly,
#' the arguments \code{...} should be unnamed, and dispatch is on the
#' first argument.
#'
#' @param ... Numeric, complex, or logical vectors.
#' @param na.rm A logical scalar. Should missing values (including NaN)
#' be removed?
#' @return If all inputs are integer and logical, then the output
#' will be an integer. If integer overflow
#' \url{http://en.wikipedia.org/wiki/Integer_overflow} occurs, the output
#' will be NA with a warning. Otherwise it will be a length-one numeric or
#' complex vector.
#'
#' Zero-length vectors have sum 0 by definition. See
#' \url{http://en.wikipedia.org/wiki/Empty_sum} for more details.
#' @examples
#' sum(1:10)
#' sum(1:5, 6:10)
#' sum(F, F, F, T, T)
#'
#' sum(.Machine$integer.max, 1L)
#' sum(.Machine$integer.max, 1)
#'
#' \dontrun{
#' sum("a")
#' }
#'
#' @section Warning:
#' Do not operate heavy machinery within 8 hours of using this function.
#'
#' @family aggregate functions
#' @seealso \code{\link{prod}} for products, \code{\link{cumsum}} for cumulative
#' sums, and \code{\link{colSums}}/\code{\link{rowSums}} marginal sums over
#' high-dimensional arrays.
#'
sum <- function(..., na.rm = TRUE) {}
Common vignette browsing commands:
browseVignettes()
to see all installed vignettesbrowseVignettes(packagename)
to see vignettes for a specific packagevignette()
to read a vignetteedit(vignette())
to see the vignette's codeTo create a vignette with devtools: devtools::use_vignette("my-vignette")
, which does 3 things:
vignettes/
directoryDESCRIPTION
file (adds knitr
to the Suggests
and VignetteBuilder
fields)vignettes/my-vignette.Rmd
Workflow:
CRAN Notes
You build vignettes locally. CRAN only receives the output (html/pdf) and the source code. CRAN does not rebuild the vignette; it only checks that the code is runnable (by running it).
[ADD TESTING CHAPTER NOTES]
[ADD CHECKING CHAPTER NOTES]
You can include data in your package. .RData
datasets go in data/
; raw datasets go in inst/extdata/
.
.RData
files in the /data
directory with the same name as the object once it's loaded into R's workspace LazyData: true
in the DESCRIPTION
file, so that datasets don't occupy memory until used@format
and @source
:#' Prices of 50,000 round cut diamonds.
#'
#' A dataset containing the prices and other attributes of almost 54,000
#' diamonds.
#'
#' @format A data frame with 53940 rows and 10 variables:
#' \describe{
#' \item{price}{price, in US dollars}
#' \item{carat}{weight of the diamond, in carats}
#' ...
#' }
#' @source \url{http://www.diamondse.info/}
"diamonds"
For CRAN, datasets should be less than 1MB and compressed.
tools::checkRdaFiles()
to determine the best compression for each filedevtools::use_data()
with the compress
argument set to that optimal valueAdd the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.