Description Usage Arguments Details Value Author(s) Examples
Parallelization of machine learning algorithms.
1 2 3 4 | caclassfit(cls,fitcmd)
caclasspred(fitobjs,newdata,yidx=NULL,...)
vote(preds)
re_code(x)
|
cls |
A cluster run under the parallel package. |
fitcmd |
A string containing a model-fitting command to be run on each cluster node. This will typically include specification of the distributed data set. |
fitobjs |
An R list of objects returned by the |
newdata |
Data to be predicted from the fit computed by
|
yidx |
If provided, index of the true class values in
|
... |
Arguments to be passed to the underlying prediction
function for the given method, e.g. |
preds |
A vector of predicted classes, from which the "winner" will be selected by voting. |
x |
A vector of integers, in this context class codes. |
This should work for almost any classification code that has a
“fit” function and a predict
method.
The method assumes i.i.d. data. If your data set had been stored in
some sorted order, it must be randomized first, say using the
scramble
option in distribsplit
or by calling
readnscramble
, depending on whether your data is already in
memory or still in a file.
It is assumed that class labels are 1,2,... If not, use
re_code
.
The caclassfit
function returns an R list of objects as in
fitobjs
above.
The caclasspred
function returns an R list with these components:
predmat
, a matrix of predicted classes for
newdata
, one row per cluster node
preds
, the final predicted classes, after using
vote
to resolve possible differences in predictions among
nodes
consensus
, the proportion of cases for which all
nodes gave the same predictions (higher values indicating more
stability)
acc
, if yidx
is non-NULL, the proportion of
cases in which preds
is correct
confusion
, if yidx
is non-NULL, the confusion matrix
Norm Matloff
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | ## Not run:
# set up 'parallel' cluster
cls <- makeCluster(2)
setclsinfo(cls)
# data prep
data(prgeng)
prgeng$occ <- re_code(prgeng$occ)
prgeng$bs <- as.integer(prgeng$educ == 13)
prgeng$ms <- as.integer(prgeng$educ == 14)
prgeng$phd <- as.integer(prgeng$educ == 15)
prgeng$sex <- prgeng$sex - 1
pe <- prgeng[,c(1,7,8,9,12,13,14,5)]
pe$occ <- as.factor(pe$occ) # needed for rpart!
# go
distribsplit(cls,'pe')
library(rpart)
clusterEvalQ(cls,library(rpart))
fit <- caclassfit(cls,"rpart(occ ~ .,data=pe)")
predout <- caclasspred(fit,pe,8,type='class')
predout$acc # 0.36
stopCluster(cls)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.