An Introduction to the rClr package

library(knitr)
opts_chunk$set(out.extra='style="display:block; margin: auto"', fig.align="center")

Introduction

The rClr package is a low-level interoperability bridge between R and a Common Language Runtime (CLR), the Microsoft .NET CLR or the Mono implementation. rClr is to a CLR the equivalent of what rJava is to a Java runtime.

A few overarching principles in the design of rClr are: Limit the syntactic tedium Stay as close as possible to the expected behaviors of R code and R data structures Have transparent bi-directional data conversion for commonly used data types e.g. numeric vectors Avoid requiring changes and additions to the .NET code (C#, F#, etc.) for it to be used from R * The Common Language Runtime should behave as it usually does, so long as the behavior does not clash with expected R behaviors.

Release 0.7 has major changes compared to the 0.5 series released in 2013. Major evolutions in R.NET have allowed to move interop code from C++ to C#. The package will continue to increase its use of C# over C++, to reduce code complexity and be more agnostic with respect to the language runtime. The recent announcements in November 2014 by Microsoft with respect to .NET may change the landscape anyway, for the better.

Date-time handling and known limitations

Several use cases for rClr involve the handling of time series. A lot of effort was devoted to handling the conversion of date and time between R and .NET representations. A large portion of the unit tests is devoted to these. Some conversions that are known to pass the test on Microsoft .NET do not pass on Mono, mostly if not solely because of daylight savings (at least for the Australia/Canberra time zone...). It may be wise to stick as much as possible to handling UTC dates, and you should exert caution and double-check if you are handling date-time in another time zone than UTC.

Getting Started

On loading, the package attempts to detect and initialise the CLR.

To start with, the customary "Hello" example follows:

library(rClr)
clrCallStatic('Rclr.HelloWorld', 'Hello')

To demonstrate something marginally snazzier:

if (tolower(Sys.info()['sysname'])=='windows') {
  # TODO: change that to a nicer prepackaged forms example. Or, allow for shorter forms for assembly names. Careful not to allow for ambiguity however.
  clrLoadAssembly("System.Windows.Forms, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089")
  f <- clrNew('System.Windows.Forms.Form')
  clrSet(f, 'Text', "Hello from the .NET framework")
  clrCall(f, 'Show')
}

Main functions

Most functions in the package are prefixed with "clr" for mnemonic. clrNew creates a new object, and returns an R object, either of (S4) class cobjRef or one of the common R data structures if a natural conversion is found, e.g. a System.String to an R character vector.

testClassName <- "Rclr.TestObject";
(testObj <- clrNew(testClassName))

It is possible to dynamically inspect the CLR object "members": in the .NET world members refer to properties, fields and methods of an object.

clrGetProperties(testObj)
clrGetMemberSignature(testObj, 'GetMethodWithParameters')

Calling the instance methods on objects is done by specifying the method name as a string - convenience R functions may appear when the package is more mature. Static methods are called by specifying the type name, at least namespace qualified. We can determine the parameters necessary for calling the public method CreateDateArray on the class Rclr.TestCases. The method signature is expressed using CLR types, not R types. The .NET method expects a String and an Int32.

clrCall(testObj, "GetFieldIntegerOne")
clrGetMemberSignature('Rclr.TestCases', 'CreateDateArray')

Two arguments are passed to the function clrCallStatic: a string in an ISO date time format, and a numeric cast as an integer. rClr will transparently convert these to CLR types. Note that if the last argument had just been 4, the mode of the R vector would have been numeric, not integer. To avoid ambiguity in method calls you must pass an integer vector. The returned value of the .NET method is a DateTime[] i.e. an array of System.DateTime objects. Again rClr transparently converts these to a standard R vector with the appropriate POSIX time class attribute.

dates <- clrCallStatic('Rclr.TestCases', 'CreateDateArray', '2001-01-01', as.integer(4))
str(dates)
class(dates)

clrGet and clrSet are used to get/set public fields and properties of CLR objects. The argument type must match the expected type; observe what happens if a numeric is passed instead of an integer

clrGet(testObj, "PropertyIntegerOne")
# clrSet(testObj, "PropertyIntegerOne", 1) # this would currently fail
clrSet(testObj, "PropertyIntegerOne", as.integer(1))

There are functions to list static members (a.k.a. class members). The discovery of static members can be done on a class name or on an object type; there is no ambiguity.

# clrGetStaticMembers(testObj)
# would have the same result as:
clrGetStaticMembers('Rclr.TestObject')

clrGet and clrSet can also set static fields and properties, but only if the argument is the name of a class (Type). To avoid risks of ambiguity and confusion, you cannot do this with a concrete object.

# the following call would fail
# clrGet(testObj,  "StaticFieldIntegerOne") 
# This is less ambiguous, and is supported
clrGet('Rclr.TestObject',  "StaticFieldIntegerOne")
clrSet('Rclr.TestObject',  "StaticFieldIntegerOne", as.integer(3))

Data conversion

Where there is an obvious and natural conversion between CLR and R data types, the conversion is done in preference to passing pointer to external data structures. Most of the basic modes in R (character,numeric,integer etc.) have relatively obvious equivalents in the CLR. For basic these basic modes, a bijection (i.e. round-trip) with these CLR conterparts is defined if possible.

The notion of time is important for many application, notably for the processing of time series. In order to facilitate the mapping of more complex types by packages that depend on rClr, it supports the conversion of some chosen R date and time types. A choice was made to convert R date and time classes found in the base package (see (Ripley and Hornik, 2001)). Other packages such as lubridate offer compelling features and you should consider using these. Note that the R classes Date and POSIXct both map to a System.DateTime in the CLR, and a strict bijection is not possible. DateTime in the CLR is converted to a POSIXct object, as this is the most appropriate to retain the original information. Date, time and time zone software constructs are fraught with oddities and inconsistent between software systems. While considerable effort was devoted to achieve a sensible conversion behavior in rClr, you should double check conversions on your data and in your time zone, and you should prefer the use of UTC time zone for conversion.

The rationale for converting data to native types at the boundary of R and the CLR is that this limits the leakage of concepts and behaviors between the two systems. This is particularly important when the expect behavior differs significantly between systems e.g. copying objects versus passing references to objects. Things behave 'as expected' on each side of the boundary.

Objects in R are vectors, and single values are just a particular case of vectors of length one. In the CLR, scalar values and arrays are different types. An R vector will be translated to two different types in the CLR depending on its length. Any length other than one results in an array in the CLR, a length of one becomes a scalar value. This is a choice based on the expected behavior of .NET code, and on the typical code available for reuse. See the section on method bindings below.

r_types = list(
  letters[1:3],
  as.integer(1:3),
  1:3 * 1.1,
  1:3 == 2,
  complex(real=1:3, imaginary=4:6),
  as.Date('2001-01-01') + 0:2,
  as.POSIXct('2001-01-01') + 0:2,
  as.difftime(3, units='secs') + 0:2
)

library(plyr)
conversion = lapply(r_types, rToClrType)
result <- ldply(lapply(conversion, FUN=as.data.frame))

r_vec_len_one = lapply(r_types, `[`, 1)
conversion = lapply(r_vec_len_one, rToClrType)
result2 <- ldply(lapply(conversion, FUN=as.data.frame))
result <- ldply(list(result, result2))

## Cannot seem to get the proper Rmarkdown with pander.
# library(pander)
# print(pander(result, style='rmarkdown'))
library(xtable)
print(xtable(result),type='html')

More advanced data types

Most basic vector data types can be handled in rClr in the native code layer written in C/C++. However, limitations and difficulties appear to handle for instance multidimensional arrays, and matching more complicated types between the CLR and R.

Fortunately, it is easier to handle these directly from C#, using the library R.NET, the close relative of rClr. The following code demonstrate how this allows the conversion of multi-dimensional arrays from the CLR to R matrices.

# setRDotNet(TRUE)  # this should be the default package state after 0.7.0
cTypename <- "Rclr.TestCases"
clrGetMemberSignature(cTypename, "CreateRectDoubleArray")
clrCallStatic(cTypename, "CreateRectDoubleArray")

There are some cases where, as of version 0.7, automatic conversion can be turned on or off. An example is for CLR dictionaries and arrays, for which an R list is a sensible equivalent. However, a priori converting R lists to CLR equivalent is not well defined. You may also want to manipulate CLR objects from R without rClr converting to an R data representation. This can be achieved with the function setConvertAdvancedTypes. Consider the following behavior converting a few types of dictionaries and arrays:

library(stringr)
toShortTypeName <- function(x) {str_replace_all( x, ", mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089", "") }
getComplexDataCase <- function(i) {clrCallStatic(cTypename, "GetComplexDataCase", as.integer(i-1))}
getComplexDataTypeName <- function(i) {clrCallStatic(cTypename, "GetComplexDataTypeName", as.integer(i-1))}

numDataTestCases <- clrCallStatic(cTypename, "GetNumComplexDataCases")
clrTnames <- toShortTypeName(sapply(1:numDataTestCases, getComplexDataTypeName))
withConversion <- sapply(1:numDataTestCases, function(i) {class(getComplexDataCase(i))})

setConvertAdvancedTypes(FALSE)
withoutConversion <- sapply(1:numDataTestCases, function(i) {tryCatch(class(getComplexDataCase(i)), error = function(e){'Not supported'})})
setConvertAdvancedTypes(TRUE)

convSummary <- data.frame(clrType=clrTnames, enabled=withConversion, disabled=withoutConversion)

print(xtable(convSummary),type='html')

Method binding in the CLR

In the common language infrastructure methods can be overloaded, that is have the same names but varying types of arguments. The feature is common to other languages such as Java and C++. R has a partly similar behavior throught the use of default argument values.

An important difference between most languages running on the CLR (C#, F#, etc.) and R is that the former are statically typed (or statically inferred) whereas R is dynamically typed. rClr tries hard to use R idioms, but without clashing with the strongly typed languages targeting the CLR.

The rClr functions clrCall and clrCallStatic currently relies mostly on the default behavior of the CLR to find the best method to call for the arguments passed to these functions. The .NET lingo for this is "method binding". In the future, there may be additional behaviors added to facilitate the operations from R. A typical case is that, in order to emulate the behavior expected by R users with respect to vectorized operations.

params keywords

C# has a feature to handle elegantly methods allowing for a variable number of arguments, the params keyword. This helps handle concisely methods that would otherwise have a large number of overloads.

public static int TestParams(string a, string b, params int[] c)

You can call such a method qith a variable number of arguments; however the types must match that in the CLR method.

testObj <- clrNew('Rclr.TestObject')
clrCall(testObj, 'TestParams', "Hello, ", "World!", 1L, 2L, 3L, 6L, 5L, 4L, 9L)
# As in C#, it also works if passing a vector
clrCall(testObj, 'TestParams', "Hello, ", "World!", 1:5)

Optional parameter values in CLR methods

C# also has a way to define functions with optional parameters, or rather parameters with default values.

public void TestDefaultValues(string a, int b, int c=1, int d=2) { }

These methods are supported, as demonstrated in the following code. Note that argument passing is based on the order of parameters. Named arguments is not yet supported, though it may be a feature available in future versions.

clrCall(testObj, 'TestDefaultValues', "Hello", 1L)
clrCall(testObj, 'TestDefaultValues', "Hello", 1L, 2L)
clrCall(testObj, 'TestDefaultValues', "Hello", 1L, 2L, 3L)

Generic methods

This section is a placeholder. At the time of writing, generic methods are not supported, at least not readily.

Metaprogramming

This section is a placeholder

Runtime performance

While the primary concern of the package is the numerical correctness, attention is also being paid to the speed of execution. This section highlights a few figures and advice that may be of importance to users to avoid unnecessarily sluggish performance. The main take home messages are: * Prefer passing large arrays/vectors, rather than numerous calls to clrCallStatic or clrCall with smaller vectors passed or expected as return value from the call.

The following code and figures show the effective transfer rate for numeric vectors, i.e. double precision array in the .NET world. The experimental design is such that 10 repetitions of the measurement are made for a given vector size. Performance is measured in both direction, with comparable speed. Note that the speed of execution is calculated by removing an estimate of the cost of measurement itself; you may thus notice outliers and even negative transfer rates if the measurement was perturbed by garbage collection in R or the CLR.

perfMeasureFile <- list.files(system.file('extras', package='rClr'), pattern='performance_profiling', full.names=TRUE)
if(length(perfMeasureFile)==0) {stop('performance_profiling file not found in the installed rClr package')}
stopifnot(file.exists(perfMeasureFile[1]))
source(perfMeasureFile[1])
maxArrayLen<-7.5e6
# need to reduce the size if we run this on 32 bits machines:
if(Sys.info()['machine'] == 'x86') {maxArrayLen<-6.5e6}
trials <- createCases(numReps=10, maxArrayLen=maxArrayLen)
bench <- rclrNumericPerf(trials, tag='no.r.net')
plotRate(bench, case = 'CLR->R')
plotRate(bench, case = 'CLR->R', logy=FALSE)

The transfer rate going form CLR arrays of double to R vectors peaks at a value ranging from 40 million to ~200 million items per second. This seems to largely depend on the hardware. The log/log plot of the conversion rate is particularly useful to visualize at which point the overhead of the function call becomes negligible compared to the array conversion itself: Below ~5000 elements in the vector the transfer rate is roughly linear in log-log space. You may notice a decrease in performance in that case of 'CLR->R'; this may be an artefact of the experimental design interplaying with garbage collection.

plotRate(bench, case = 'R->CLR')
plotRate(bench, case = 'R->CLR', logy=FALSE)

Related work

Acknowledgements

I gratefully acknowledge: Kosei ABE, creator of R.NET Hadley Wickham, creator of devtools and testtat * Yihui Xie, creator of the knitr package

References

Ripley, B. D. and Hornik, K. (2001) Date-time classes. R News, 1/2, 8\9611



jmp75/rClr documentation built on June 11, 2019, 6:45 p.m.