Parallel-Voting version of machine learning algorithms

Share:

Description

By sampling your data, running the machine learning algorithm on these samples in parallel on your own machine and letting your models vote on a prediction, we return much faster predictions than the regular machine learning algorithm and possibly even more accurate predictions.

Usage

1
2
3
4
5
## Default S3 method:
parallelML(MLCall, MLPackage,
			samplingSize = 0.2,numberCores = detectCores(),
			underSample = FALSE, underSampleTarget = NULL,
			sampleMethod = "bagging")

Arguments

MLCall

Your call to a machine learning algorithm. All arguments in this call should be named and the package only allows formula calls. Hence the call should look like "machineLearningAlgorithm(formula = ..., data = ..., ...)".

MLPackage

A character string of the package which provides your machine learning algorithm. This is needed since all cores should load the package.

samplingSize

Size of your data you will take in each sample.

numberCores

Number of cores of your machine you want to use. Is set equal to the number of samples you take.

underSample

Logical wether you want to take an undersample on your desired target.

underSampleTarget

When you set underSample to TRUE, underSampleTarget takes your target you want to keep in every sample. e.g. If you have 5 elements of category1 and 100 elements of category2 and your sampleSize is 0.2, then every sample will contain 25 elements, namely the 5 of category1 and 20 of category2.

sampleMethod

String which decides wether you sample on your observations (bagging) or on your variables (random).

Value

A list containing of numberCores machine learning models.

Note

Although it can cope with numeric probability predictions, this package is designed for classification labeling.

Author(s)

Wannes Rosiers

See Also

This package can be regarded as a parallel extension of machine learning algorithms, therefor check the package of the machine learning algorithm you want to use.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
## Not run: 
# Load the library which provides svm
library(e1071)

# Create your data
data(iris)

# Create a model
parSvmModel <- parallelML("svm(formula = Species ~ ., data = iris)",
                     "e1071",samplingSize = 0.8)
                     
# Get prediction
parSvmPred   <- predictML("predict(parSvmModel,newdata=iris)",
                          "e1071","vote")

# Check the quality
table(parSvmPred,iris$Species)

## End(Not run)
## Not run: 
# Load the library which provides rpart
library(rpart)

# Create your data
data("magicData")

# Create a model
parTreeModel  <- parallelML("rpart(formula = V11 ~ ., data = trainData[,-1])",
                            "rpart",samplingSize = 0.8)

# Get prediction
parTrainTreePred  <- predictML("predict(parTreeModel,newdata=trainData[,-1],type='class')",
                               "rpart","vote")
parTestTreePred  <- predictML("predict(parTreeModel,newdata=testData[,-1],type='class')",
                              "rpart","vote")

# Check the quality
table(parTrainTreePred,trainData$V11)
table(parTestTreePred,testData$V11)	

## End(Not run)
## Not run: 
# Load the library which provides svm
library(e1071)

# Create your data
data(iris)
subdata <- iris[1:60,]

# Create a model
parsvmmodel   <- parallelML("svm(formula = Species ~ ., data = subdata)",
                            "e1071",samplingSize = 0.8,
                            underSample = TRUE, underSampleTarget = "versicolor")
                            
# Get prediction                            
parsvmpred    <- predictML("predict(parsvmmodel,newdata=subdata)",
                           "e1071","vote")
                           
# Check the quality                           
table(parsvmpred,subdata$Species)

## End(Not run)
## Not run: 
# Load the library which provides svm
library(e1071)

# Create your data
data(iris)

# Create a model
parsvmmodel   <- parallelML("svm(formula = Species ~ ., data = iris)",
                            "e1071",samplingSize = 0.6,
                            sampleMethod = "random")
                            
# Get prediction                            
parsvmpred    <- predictML("predict(parsvmmodel,newdata=iris)",
                           "e1071","vote")
                           
# Check the quality                           
table(parsvmpred,iris$Species)

## End(Not run)