Description Usage Arguments Value Note Author(s) References See Also Examples
This function allows us to launch a CUDA kernel on a GPU and so run many instances of the call in parallel.
It is similar in spirit to .C
in that it
transfers control to outside of the R interpreter
and also copies some of the results to memory
and then back from memory in case the external code changes
them. Unlike .C
, the outputs
parameter
allows us to control what arguments are copied back
to R so that we can avoid redundant computations.
The function .gpu
is another name for .cuda
.
1 2 3 4 |
fun |
the CUDA kernel refernce, typically obtained from a
pre-compiled PTX module using the |
... |
zero or more arguments to the kernel |
.args |
the arguments to the kernel as a single list |
gridDim,blockDim |
an integer vec |
sharedMemBytes |
an integer value specifying the number of bytes that are shared between the ?. |
stream |
currently ignored. This corresponds to a CUDA stream and allows us to interleave computations on the GPU |
outputs |
an optional mechanism which serves as a means to index
the results of the computaations. Like the
This works independently of the argument types. In other words,
if some arguments are specified as R vectors and |
.gc |
a logical value which controls whether the R garbage collector is run before proceeding with the computations in this function. The reason for this is to ensure that any data allocated passively on the GPU in earlier calls is released so that we can allocate some of the arguments on the GPU, if appropriate and necessary. |
inplace |
currently ignored but intended to indicate which arguments are modified in place. |
gridBy |
a vector or a list of vectors used to determine how many threads should be run and hence compute the grid dimensions to use. This is a convenience parameter that, for common but simple cases, allows callers to identify what we are vectorizing on the GPU. The function then determines the grid and block dimensions to use. |
.async |
a logical value that, if |
.numericAsDouble |
a logical value or vector that controls whether numeric
vectors should be mapped to the GPU as arrays of |
The result depends on outputs
and the actual values of the
the arguments to the kernel in ...
or .args
.
If the arguments are all scalars and/or pointers to GPU allocated
memory, the result is just the success status.
If any of the inputs are vectors (not scalars), only they are
returned, copying their potentially modified contents from the GPU.
If outputs
is specified, only those inputs are returned.
If there is only a single argument returned, it is returned directly
and not as a list.
All RCUDA functions may raise an error and the class of this is
the name of the error. This can be caught with tryCatch
.
If .async
is TRUE
, the function returns the objects which
were copied from R to the GPU by this function. This allows the caller
to retrieve the results after synchronizing with the GPU.
It is important to determine whether to pre-allocate memory and data
on the GPU for reuse in subsequent calls or to leave .cuda
to
marshall this for us and incur that one-time expense.
Duncan Temple Lang
cudaAlloc
, copyToDevice
, copyFromDevice
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 | if(getNumDevices() > 0) {
ptx = system.file("sampleKernels", "dnormOutput.ptx", package = "RCUDA")
if(!file.exists(ptx))
ptx = nvcc(system.file("sampleKernels", "dnormOutput.cu", package = "RCUDA"), "dnormOutput.ptx")
m = loadModule(ptx)
kernel = m$dnorm_kernel
N = 1e6L
x = rnorm(N)
mu = .5
sigma = 1.1
ans = .cuda(kernel, x, N, mu, sigma, numeric(N),
gridDim = c(64L, 32L), blockDim = 512L, outputs = 5)
ans = .cuda(kernel, x, N, mu, sigma, out = numeric(N),
gridDim = c(64L, 32L), blockDim = 512L, outputs = "out")
head(ans)
dnorm(x[1:5], mu, sigma)
summary(abs(ans - dnorm(x, mu, sigma)))
# Compare the above to allocating the output vector in R and just allocating the space on the GPU.
# This avoids allocating a vector in R and also copying each element to the corresponding element in
# the GPU.
ans = .cuda(kernel, x, N, mu, sigma, cudaAlloc(N, elType = "numeric"),
gridDim = c(64L, 32L), blockDim = 512L, outputs = 5)
# Using gridBy
ans = .cuda(kernel, x, N, mu, sigma, numeric(N),
gridBy = x, outputs = 5)
ans = .cuda(kernel, x, N, mu, sigma, numeric(N),
gridBy = N, outputs = 5)
# explicitly allocating data on the GPU and passing these as inputs.
cx = copyToDevice(x)
vals = cudaAlloc(N, elType = "numeric")
.cuda(kernel, cx, N, mu, sigma, vals, gridDim = c(64L, 32L), blockDim = 512L, out = FALSE)
head(vals[])
max(vals[])
}
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.