largeVectors: Internal Computations for Large Vectors

Description Sending Large Vectors between R and Julia Details

Description

Internal Computations for Large Vectors

Sending Large Vectors between R and Julia

Large vectors will be slow to transfer as JSON, and may fail in Julia. Internal computations have been added to transfer vectors of types real, integer, logical and character by more direct computations when they are large. The computations and their implementation are described here.

R and Julia both have the concept of numeric (floating point) and integer arrays whose elements have a consistent type and both implement these (following Fortran) as contiguous blocks in memory, augmented by length or dimension information. They also both have a mechanism for arrays of character strings, class "character" in R and array type Array{String, 1} in Julia. Julia has arrays for boolean data; R stores the corresponding logical as integers.

JSON has no such concepts, so interface evaluators using the standard JSON form provided by 'XR' must send such data as a JSON list. This will become inefficient for very large data from these classes. Users have reported failure by Julia to parse the corresponding JSON.

The 'XRJulia' package (as of version 0.7.9) implements special code to send vectors to Julia, by writing an intermediate file that Julia reads. The actual text sent to Julia is a call to the relevant Julia function. The code is triggered within the methods for the asServerObject function, so vectors should be transferred this way whether on their own or as part of a larger structure, such as an array or the column of a data frame.

Similarly, large arrays to be retrieved in R by the Get() method or the optional argument .get = TRUE will be written to an intermediate file by Julia and read by R.

As vectors become large, direct transfer becomes much faster. On a not-very-powerful laptop, vectors of length 10^7 transfer in an elapsed time of a few seconds. Character vectors are slightly slower than numeric, as explained below, but in all cases it would be hard to do much computation with the data that did not swamp the cost of transfer. That said, as always it's more sensible to transfer data once and then use the corresponding proxy object in later calls.

Details

For all vectors, the method uses binary writes and reads, which are defined in both R and Julia. No special computationss are needed for numeric, integer, complex and raw. For these, the R binary representation corresponds to array types in Julia. The special pseudo-value NA is defined for vectors in R, but no corresponding concept exists in Julia. For numeric and complex vectors, the floating-point pattern NaN is used. For all other vectors, a warning is issued and either a numeric object or a special character string is used instead.

For logicals, the internal representation in R uses integers. The Julia code when data is sent from R casts the integer array to a boolean array. On the return side, the Julia boolean array is converted to integer before writing.

Character vectors take a little more work, partly because of a weirdness in binary writes for string arrays in Julia. Where R character vectors can be written in binary form and then read back in, writing a String array in Julia omits the end-of-string character, effectively writing a single string, from which the array cannot be recovered. Communicating the entire vector to Julia requires that the Julia side uses this information to split the single string resulting from the R binary write by matching the end-of-string character explicitly For sending back to R, the Julia code appends an end-of-string character to each string before writing the array to a file. This produces the R format for a binary read of a character vector.

Two fields in the evaluator object control details. A large object is defined as a vector of length greater than the integer field largeObject. Julia creates intermediate files for sending large arrays to R by appending sequenctial numbers to a character field fileBase. By default, largeObject and fileBase is obtained from tempfile() with pattern "Julia". Note that all the files are removed at the end of the evaluation of the expression sending or getting the relevant objects.

Since these fields must be known to the Julia evaluator, they should not be set directly—this will have no effect. Instead call the function juliaOptions() with these parameter names.


XRJulia documentation built on May 6, 2019, 1:01 a.m.