Description Usage Arguments Details Value Author(s) Examples

Miscellaneous code snippets for use with the parallel package, including “Snowdoop.”

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | ```
formrowchunks(cls,m,mchunkname,scramble=FALSE)
matrixtolist(rc,m)
addlists(lst1,lst2,add)
setclsinfo(cls)
getpte()
exportlibpaths(cls)
distribsplit(cls,dfname,scramble=FALSE)
distribcat(cls,dfname)
distribagg(cls,ynames,xnames,dataname,FUN,FUNdim=1,FUN1=FUN)
distribrange(cls,vec,na.rm=FALSE)
distribcounts(cls,xnames,dataname)
distribmeans(cls,ynames,xnames,dataname,saveni=FALSE)
dwhich.min(cls,vecname)
dwhich.max(cls,vecname)
distribgetrows(cls,cmd)
distribisdt(cls,dataname)
docmd(toexec)
doclscmd(cls,toexec)
geteltis(lst,i)
ipstrcat(str1 = stop("str1 not supplied"), ..., outersep = "", innersep = "")
``` |

`cls` |
A cluster for the parallel package. |

`scramble` |
If TRUE, randomize the row order in the resulting data frame. |

`rc` |
Set to 1 for rows, other for columns. |

`m` |
A matrix or data frame. |

`mchunkname` |
Quoted name to be given to the created chunks. |

`lst1` |
An R list. |

`lst2` |
An R list. |

`add` |
“Addition” function, which could be summation, concatenation and so on. |

`dfname` |
Quoted name of a data frame, either centralized or distributed. |

`ynames` |
Vector of quoted names of variables on which |

`vecname` |
Quoted name of a vector. |

`...` |
One of more vectors of character strings, where the vectors are typically of length 1. |

`xnames` |
Vector of quoted names of variables that define the grouping. |

`dataname` |
Quoted name of a distributed data frame or data.table. |

`saveni` |
If TRUE, save the chunk sizes. |

`FUN` |
Quoted name of a single-argument function to be used in
aggregating within cluster nodes. If |

`FUNdim` |
Number of elements in the return value of |

`FUN1` |
Quoted name of function to be used in aggregation between cluster nodes. |

`vec` |
Quoted expression that evaluates to a vector. |

`na.rm` |
Remove NA values. |

`cmd` |
An R command. |

`toexec` |
Quoted string containing command to be executed. |

`lst` |
An R list of vectors. |

`i` |
A column index |

`str1` |
A character string. |

`outersep` |
Separator, e.g. a comma, between strings specified in ... |

`innersep` |
Separator, e.g. a comma, within string vectors specified in ... |

The `setclsinfo`

function does initialization needed for
use of the tools in the package.

`formrowchunks`

splits `m`

into chunks of rows and puts each
chunk into a global variable called `mchunkname`

in the global space
of the worker.

A call to `matrixtolist`

extracts the rows or columns of a matrix
or data frame and forms an R list from them.

The function `addlists`

does the following: Say we have two lists,
with numeric values. We wish to form a new list, with all the keys
(names) from the two input lists appearing in the new list. In the case
of a key in common to the two lists, the value in the new list will be
the sum of the two individual values for that key. (Here “sum” means
the result of applying `add`

.) For a key appearing in one list and
not the other, the value in the new list will be the value in the input
list.

The function `exportlibpaths`

, invoked from the manager, exports
the manager's R search path to the workers.

The function `distribsplit`

splits a data frame `dfname`

into
approximately equal-sized chunks of rows, placing the chunks on the
cluster nodes, as global variables of the same name. The opposite action
is taken by `distribcat`

, coalsecing variables of the given name in
the cluster nodes into one grand data frame as the calling (i.e.
manager) node.

The package's `distribagg`

function is a distributed (and somewhat
restricted) form of `aggregate`

. The latter is called to each
distributed chunk with the function `FUN`

. The manager collects
the results and calls `FUN1`

.

The special cases of aggregating counts and means is handled by the
wrappers `distribcounts`

and `distribmeans`

. In each case,
cells are defined by `xnames`

, and aggregation done first within
workers and then across workers.

The `distribrange`

function is a distributed form of `range`

.

The `dwhich.min`

and `dwhich.max`

functions are distributed
analogs of R's `which.min`

and `which.max`

.

The `distribgetrows`

function is useful in a variety of situations.
It can be used, for instance, as a distributed form of `select`

.
In the latter case, the specified rows will be selected at each cluster
node, then `rbind`

-ed together at the caller.

The `docmd`

function executes the quoted command, useful for
building up complex command for remote execution. The `doclscmd`

function does that directly.

An R `formula`

will be constructed from the arguments `ynames`

and `xnames`

, with the latter put on the left side of the `~`

sign, with `cbind`

for combining, and the latter put on the right
side, with `+`

signs as delimiters.

The `geteltis`

function extracts from an R list of vectors element
`i`

from each.

In the case of `addlists`

, the return value is the new list.

The `distribcat`

function returns the concatenated data frame;
`distribgetrows`

works similarly.

The `distribagg`

function returns a data frame, the same as would a
call to `aggregate`

, though possibly in different row order;
`distribcounts`

works similarly.

The `dwhich.min`

and `dwhich.max`

functions each return a
two-tuple, consisting of the node number and row number which node at
which the min or max occurs.

Norm Matloff

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 | ```
# examples of addlists()
l1 <- list(a=2, b=5, c=1)
l2 <- list(a=8, c=12, d=28)
addlists(l1,l2,sum) # list with a=10, b=5, c=13, d=28
z1 <- list(x = c(5,12,13), y = c(3,4,5))
z2 <- list(y = c(8,88))
addlists(z1,z2,c) # list with x=(5,12,13), y=(3,4,5,8,88)
# need 'parallel' cluster for the remaining examples
cls <- makeCluster(2)
setclsinfo(cls)
# check it
clusterEvalQ(cls,partoolsenv$myid) # returns 1, 2
clusterEvalQ(cls,partoolsenv$ncls) # returns 2, 2
# formrowchunks example; see up a matrix to be distributed first
m <- rbind(1:2,3:4,5:6)
# apply the function
formrowchunks(cls,m,"mc")
# check results
clusterEvalQ(cls,mc) # list of a 1x2 and a 2x2 matrix
matrixtolist(1,m) # 3-component list, first is (1,2)
# test of of distribagg():
# form and distribute test data
x <- sample(1:3,10,replace=TRUE)
y <- sample(0:1,10,replace=TRUE)
u <- runif(10)
v <- runif(10)
d <- data.frame(x,y,u,v)
distribsplit(cls,"d")
# check that it's there at the cluster nodes, in distributed form
clusterEvalQ(cls,d)
d
# try the aggregation function
distribagg(cls,c("u","v"), c("x","y"),"d","max")
# check result
aggregate(cbind(u,v) ~ x+y,d,max)
# real data
mtc <- mtcars
distribsplit(cls,"mtc")
distribagg(cls,c("mpg","disp","hp"),c("cyl","gear"),"mtc","max")
# check
aggregate(cbind(mpg,disp,hp) ~ cyl+gear,data=mtcars,FUN=max)
distribcounts(cls,c("cyl","gear"),"mtc")
# check
table(mtc$cyl,mtc$gear)
# find mean mpg, hp for each cyl/gear combination
distribmeans(cls,c('mpg','hp'),c('cyl','gear'),'mtc')
# extract and collect all the mtc rows in which the number of cylinders is 8
distribgetrows(cls,'mtc[mtc$cyl == 8,]')
# check
mtc[mtc$cyl == 8,]
# same for data.tables
mtc <- as.data.table(mtc)
setkey(mtc,cyl)
distribsplit(cls,'mtc')
distribcounts(cls,c("cyl","gear"),"mtc")
distribmeans(cls,c('mpg','hp'),c('cyl','gear'),'mtc')
dwhich.min(cls,'mtc$mpg') # smallest is at node 1, row 15
dwhich.max(cls,'mtc$mpg') # largest is at node 2, row 4
stopCluster(cls)
``` |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.