caclassfit: Software Alchemy for Machine Learning

Description Usage Arguments Details Value Author(s) Examples

View source: R/Class.R

Description

Parallelization of machine learning algorithms.

Usage

1
2
3
4
caclassfit(cls,fitcmd) 
caclasspred(fitobjs,newdata,yidx=NULL,...)
vote(preds)
re_code(x)

Arguments

cls

A cluster run under the parallel package.

fitcmd

A string containing a model-fitting command to be run on each cluster node. This will typically include specification of the distributed data set.

fitobjs

An R list of objects returned by the fitcmd calls.

newdata

Data to be predicted from the fit computed by caclassfit.

yidx

If provided, index of the true class values in newdata, typically in a cross-validation setting.

...

Arguments to be passed to the underlying prediction function for the given method, e.g. predict.rpart.

preds

A vector of predicted classes, from which the "winner" will be selected by voting.

x

A vector of integers, in this context class codes.

Details

This should work for almost any classification code that has a “fit” function and a predict method.

The method assumes i.i.d. data. If your data set had been stored in some sorted order, it must be randomized first, say using the scramble option in distribsplit or by calling readnscramble, depending on whether your data is already in memory or still in a file.

It is assumed that class labels are 1,2,... If not, use re_code.

Value

The caclassfit function returns an R list of objects as in fitobjs above.

The caclasspred function returns an R list with these components:

Author(s)

Norm Matloff

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
## Not run: 
# set up 'parallel' cluster
cls <- makeCluster(2)
setclsinfo(cls)
# data prep
data(prgeng)
prgeng$occ <- re_code(prgeng$occ)
prgeng$bs <- as.integer(prgeng$educ == 13)
prgeng$ms <- as.integer(prgeng$educ == 14)
prgeng$phd <- as.integer(prgeng$educ == 15)
prgeng$sex <- prgeng$sex - 1
pe <- prgeng[,c(1,7,8,9,12,13,14,5)]
pe$occ <- as.factor(pe$occ)   # needed for rpart!
# go
distribsplit(cls,'pe')
library(rpart)
clusterEvalQ(cls,library(rpart))
fit <- caclassfit(cls,"rpart(occ ~ .,data=pe)")
predout <- caclasspred(fit,pe,8,type='class')
predout$acc  # 0.36 

stopCluster(cls)

## End(Not run)

partools documentation built on May 2, 2019, 5:14 a.m.