r-apply-scidbst-method: Apply custom R functions on scidbst array chunks

Description Usage Arguments Details Value Note Examples

Description

This function applies a custom r function on each individual scidbst array chunk using the r_exec interface with SciDB.

Usage

1
2
3
4
5
6
7
8
## S4 method for signature 'scidbst,'function''
r.apply(x, f, array, packages,
  parallel = FALSE, cores = 1, aggregates, output, logfile, dim, dim.spec,
  method = "rexec", ...)

## S4 method for signature 'scidb,'function''
r.apply(x, f, array, packages, parallel = FALSE,
  cores = 1, aggregates = c(), output, logfile, ...)

Arguments

x

scidbst array or scidb array

f

r function of form function(x) { ... } expecting parameter x, which is a subset of the incoming data based on the aggregate statement

array

string with the name of the output array

packages

a vector of string of the packages required for the function f

parallel

(optional) boolean whether or not the chunk is processed in parallel at an instance

cores

(optional) if using parallel this specifies the number of cores to use at an instance

aggregates

(optional) a vector of attribute names to group by

output

a named list of output attributes and its scidb type (if using rexec method it will be 'double' regardless)

logfile

(optional) the file path used to log during the processing if required

dim

(optional) a named list with attribute name = output attribute name e.g. list(dimy="y",dimx="x")

dim.spec

(optional) a named list with the dimension specification using the output dimension name as a identifier and a named numeric vector with min, max, overlap and chunk to specify the dimensionality

method

The method to use, either "rexec" or "stream"; not utilized currently

...

see Details

Details

The script that is created during this function will handle the installation of required R-packages on each of the instances. Then it combines the incoming attribute vectors to a data.frame object, which is passed on to the 'ddply' function of the package 'plyr'. Depending on the stated aggregates parameter the function 'f' is applied on that grouped sub data.frame object (parameter x of function f). Using the output list the array will be projected on to the selected attributes. When specifying 'dim' and 'dim.spec' the stated columns of the data.frame will be used as dimension in a perceeding redimension call.

The ... operator can contain the parameter eval, which is set to TRUE as default. Also ... can contain a developer parameter called result with the allowed values "afl" and "rscript". r.apply then returns the submitted R-Script or the resulting AFL query. To prevent the function from being executed use result in combination with "eval=FALSE".

The following variable names are reserved if the spatial and temporal references exists and are transferred to ddply function:

affine

a 2x3 matrix for spatial coordinate transformation

crs

a CRS object stating the used coordinate reference system

extent

a extent object stating the spatial extent

tmin / tmax

POSIXlt objects stating the minimum and maximum temporal boundary

t0

POSIXlt object marking the datum (time at value 0)

tunit

character describing the temporal measurement unit

tres

a number describing the temporal resolution

Value

scidbst array or scidb array depending on the input

Note

The function that can be stated has the following description "function(x,...) ". The x parameter is a data.frame of the attributes stored in one chunk. In most cases you are advised to transform the array to have the dimension values as attributes if you need those to perform calculations. The function will be passed on to the ddply function.

parameter option "stream" for 'method' currently not supported.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
## Not run: 
 input.arr = scidbst("some_scidbst_array")

 # make sure to have the dimensions as attributes if you plan to use them in calculations
 input.arr = transform(input.arr, dimx="double(x)",dimy="double(y)", dimt="double(t)")
 f <- function(x,...) {
     # parse the parameter passed as ... into the function and assign them to the functions
     # environment
     dot.input = list(...)
     i <- 1
     lapply(dot.input, function(x,y) {
         assign(x=y[i],value=x,envir=parent.env(environment()))
         i <<- i+1
         x
       },
       names(dot.input)
     )
     rm(i)

     if (is.null(x)) {
       return(c(nt=0,var=0,median=0,mean=0))
     }
     t = x$dimt
     n = x$val
     return(c(nt=length(t),var=var(n),median=median(n),mean=mean(n)))
   }
 rexec.arr = r.apply(x=input.arr,
     f=f,
     array="output_array",
     parallel=FALSE,
     cores=1,
     aggregates=c("dimy","dimx"),
     output=list(dimy="double",dimx="double",nt="double",var="double",median="double",mean="double"),
     dim=list(dimy="y",dimx="x"),
     dim.spec=list(y=c(min=0,max=99,chunk=20,overlap=0),x=c(min=0,max=99,chunk=20,overlap=0)),
     logfile="/tmp/logfile.log")

## End(Not run)

flahn/scidbst documentation built on May 16, 2019, 1:15 p.m.