There are two ways to call C code from : .C and .Call. .C is more straightforward, since it automatically between R and C data-structures. .Call is more more complicated, but gives additional flexibility required for more complex packages. The first bulleted list in this article provides a useful overview of the differences between the two approaches.

Of course, here we are focused on specifically how to use C-code in an R Package. There are many articles which cover these components, but I needed to triangulate about 7/8 of them, talk to people, find examples on github, etc. before I finally got a hang of it. This document summarizes what I learned so that I can use it as a reference.

Adding C Code using .Call

I started using .Call over .C because the algorithm I am working on for my thesis uses a lot of memory. So, I needed as much memory flexibility as possible when working on my algorithm. This is available from .Call, since .C requires all the memory to be allocated from R. When I was trying to allocate large R arrays, my session would freeze or crash.

I want to write as much of my code as possible using C, only using R for input and output. Taylor pointed me to the the following R package, that contains both a c library, and an R package calling this C library. In addition, Taylor wrote an R package called primesum, implementing a condensed version of the glmgen package.

The structure of these two packages was ideal: An R-function using the .Call()interface. This R function calls a function defined in r-wrapper.c, which converts the R data structure to C, runs the c-function, then converts the output back into an R data-structure. Conceptually, this is straightforward. But there are several smaller details required for getting the code working efficiently in an R Package.

The code

Lets start with the R function, located as usual at R/dft.R. The algorithm I am implementing for my thesis is a type of discrete Fourier transform, so here we will implement the DFT. The goal is getting the structure of code, wrappers, etc down, so hopefully the complexities of including complex number do not make this more challenging. Our R-function is:

#' Discrete Fourier Transform
#'
#' @param x real or complex vector
#' @param inverse logical, defaults to FALSE
#'
#' @useDynLib leeR dft_
#'
#' @export
#'
#' @return the DFT of x
dft <- function(x, inverse = FALSE) {
  # If inverse is true: 1, false: -1
  inverse <- ifelse(inverse, 1, -1)

  # Ensure the input is a complex number
  if (is.complex(x) == FALSE) {  x <- as.complex(x) }

  # Call the C-code with the DFT implementation
  a <- .Call("dft_", x, inverse, PACKAGE = "leeR")

  return(a)
}

For now, the most important part is the second line of non-commented code starting with a <-. This line uses the .Call interface, the focus of this section. The first argument of .Call gives the name of the c-function that we are calling. In this case, we have a c-function called dft_ defined in our r-wrapper function. The second and third arguments just inputs to the C function, and the final argument specifies either the Dynamic Linked Library .dll, or Shared Object .so library containing this function. In our case, this is simply the name of our package, and our final built package will have a libs/leeR.so shared object file, which is what the PACKAGE argument is calling.

And that is it for the .Call interface. You do not need to allocate the outputs from the function, as the raw R objects are passed to C to be dealt with.

Now, the .Call function is looking for a c-function with name dft_, in the leeR.so shared object library. In our case, this is located in the file with all of our r-wrapper functions, in src/r-wrapper.c. This file looks like:

```{c, eval=FALSE}

include

include

include

// R-wrapper C dft function SEXP dft_(SEXP _x, SEXP _inverse) { // Convert SEXP variables to C variables --- int inverse = asInteger(_inverse);

Rcomplex *x; x = COMPLEX(_x);

// Allocate the outputs for returning to R -- int n = length(x); SEXP a = PROTECT(allocVector(CPLXSXP, n)); Rcomplex *a; a = COMPLEX(a_);

// Call the C dft function using C variables int n = length(_x); dft(x, a, inverse, n);

// Unprotect the a_ vector from garbage collection UNPROTECT(1);

return(a_); }

Now, if you add this file and try to run, it might not work. See the final section for details on how to do this, but we need to define a header file for the wrapper functions, and tell R where to compile from using Makevars. In addition, make sure you have the `#' @useDynLib leeR dft_` type line in your R-function.

On to the c-wrapper function. The first thing that jumps out at you is the SEXP variables. As it turns out, all R-objects are internally represented as SEXP's, short for "S Expression". SEXP's are pointers to SEXPREC structures, which hold the R objects we are familiar with (numeric, integer, character, complex. etc). SEXPREC structures have a 5-bit header specifying the 2^5 = 32 SEXP types.  

Since we are using the R API, we need to include the R.h and Rinternals.h header files.  R.h is R's API, defining almost all of R's functionality. Rinternals.h gives you the data structures, and dft.h is the header file for the c-functions we will be calling. The inputs and outputs to our function are SEXPs, which just implies that the input is an R object, and the output is an R object, consistent with what we are expecting.

Now, the purpose of this c-wrapper is converting R-objects into c-variables, running the computation in c using the c-variables, and converting the output back to R-objects. Our first step is converting the R SEXPs to c-variables, shown in the first section of the function. All of the functions used here are available from [this cheatsheet](http://tolstoy.newcastle.edu.au/R/e17/devel/att-0724/R_API_cheat_sheet.pdf). From this, we can convert the integers to c-int's using asInteger, and access the complex vector using COMPLEX. Vectors are a bit more tricky, but what worked for me is initializing a pointer with the internal C structure, then using the accessor funcion, (in this case, COMPLEX) for directing this pointer to the appropriate data.

In this situation, our c-function takes a pointer for input and output. So, next we allocate the output we are returning to R. To do this, we use the allocVector function, allocating a length-n complex vector. A complex vector is internally represented as an Rcomplex, which is just to C99 complex doubles, meaning that each complex element is 16 bytes. To access this vector with c-functions, we do the same thing we did for the input. Specifically, we create a pointer using R's internal structure, and point it to the complex data of our allocVector output using our COMPLEX accessor function. PROTECT should always be used with allocVector, since this tells R not to perform garbage collection on this object. All thats needed is remembering to UNPROTECT at the end of the function.

_SIDE NOTE: Since Rcomplex is a C99 double complex, can I get away with using complex.h in my dft function, opposed to Rcomplex (which defeats the point)_

Finally, all that remains is calling the c-function, unprotecting, and returning the final r-object!

The C function is probably the least important component of this example, since it is just a c-function working with c-variables. In addition, I probably made this more complicated than it should be, since I need a lot of extra parts to get the complex number portions to work (I would love to find a way around this! using the C99 complex.h header instead of Rcomplex.) Anyway, The C-code is:

```{c, eval = FALSE}
#include <stdio.h>
#include <stdlib.h>
#include <dft_utils.h> 
#include <R.h> 

void dft(Rcomplex *x, Rcomplex *a, int inverse, int n) {
  // Initialize the iterator variables
  int j, k;

  // Exponent of the complex exponential
  double exponent;

  // Allocate variables needed for the coefficient
  Rcomplex tot, val, twiddle;

  // Loop over coefficients
  for (k = 0; k < n; k++) {
    // Initialize the value to 0 + i 0
    tot.r = 0;
    tot.i = 0;

    // Loop over data-vector
    for (j = 0; j < n; j++) {
      // Get the twiddle factor for this particular iteration
      exponent = ((j * k) / (double) n);
      twiddle = get_twiddle(exponent, inverse);

      // Multiply the x-value by the current twiddle factor
      val = mult(x[j], twiddle);

      // Add the total variable to the current calculated value
      tot = add(tot, val);
    }

    a[k] = tot;
  }
}

This is just raw c-code, the only R-dependent part is that we are using the R.h header for the Rcomplex data-type. This is the core coding portion of of writing an R-wrapper to a c-function. However, if we are doing this inside of a package, it still needs a few more details in order to get it working. These are covered in the next section.

Additional Details for Packages

The first thing we want to do is make the functions in our src/r-wrapper.c available from the package. To do this, the common practice from package I've seen:

https://github.com/statsmaths/glmgen https://github.com/joshuaulrich/xts https://github.com/hadley/dplyr/

is adding a package header file to inst/include/leeR.h. This header file defines all of the functions with r-wrappers. To do this, lets create the file and add:

```{c, eval = FALSE}

include

include

// Define the r-wrapper functions used in the leeR package SEXP dft_(SEXP _x, SEXP _inverse);

This uses R and Rinternals, definining the wrapper function we wrote in the previous section. Now, to make sure we are compuling in the right space, create a `src/Makevars` file with:

```r
PKG_CPPFLAGS = -I../inst/include

Now, in order to use the leeR.so object with our compiled c-code, we need to add useDynLib(leeR,dft_) to our namespace file (Section 1.5.4 in writing R extensions for more details). This loads the function dft_ from the shared object leeR.so using the library.dyam function. This happens "after the package code has been loaded and before running the load hook function", I am not entirely clear what that means, but it makes it so the package loads it before using it, I think. Fortunately we can use devtools and roxygen2 to do this, by putting

#' @useDynLib leeR dft_

in the documentation of our R-function.

Finally, the last part of the process involves registering our .Call (and .C, .Fortan functions) explicitly. This helps for making sure our functions work accross operating systems. An unbelievably clear presentation of this was given by Duncan Temple Lang, and this is mainly what I am following here. His reasoning:

"That’s really the point: we are depending on highly system-specific features that are not entirely reproducible and can be very, very frustrating to diagnose. Ideally, we want S to help out and tell us we are calling native routines with the wrong function, signal that we have the wrong number of arguments, and perhaps even convert those arguments to the appropriate types ..."

"... Well, there is a better approach which allows S to do exactly these things. The idea is to have the DLL explicitly tell S which routines are available to S, and for which interface mechanisms (.C(), .Call(), . . .). R stores the information about these routines and consults it when the user calls a native routine. When does the DLL get to tell R about the routines? When we load the DLL, R calls the R-specific initialization routine in that DLL (named R_init_dllname()), if it exists. This routine can register the routines as well as performing any other initialization it wants"

So, we are explicitly registering our functions with .Call (and .C, etc.) interfaces. To do this, we add a file src/r-init with:

```{c, eval = FALSE}

include

include

include

include

/ --- REGISTER FUNCTIONS --- /

// Create an array of all methods using .Call interface // First Argument: Name of function // Second Argument: Pointer to the actual routing // Third Argument: Number of arguments the routine is expecting static R_CallMethodDef callMethods[] = { {"dft_", (DL_FUNC)&dft_, 2}, {NULL, NULL, 0} };

// Register the .Call methods. Dll info comes from R and is // information about the shared object/dll void R_init_leeR (DllInfo info) { // Register routines defines in the above arrays R_registerRoutines(info, NULL, callMethods, NULL, NULL);

// Default to dynamic lookup if the routine is not found R_useDynamicSymbols(info, TRUE); } ```

Using the R-internals, the first function here contains an array containaing 4 elements: the c-function name, a pointing to the c-function to take, and the number of arguments. Now, we define the R_init_leeR function which registers the functions when the package is loaded, using the R_registerRoutines function, and using the callMethods array we created about.

We are all set! We write a function in tests/testthat/test-dft.R, and we are home free, having succesfully created a function in an R package, using the .Call interface!

Resources

Some various useful collected links along the way:



leerichardson/leeR documentation built on May 21, 2019, 1:39 a.m.