Sample data in parallel

Description

Sample data or data and output in parallel: each core provides one sample of your desired size.

Usage

1
2
3
trainSample(data, numberCores = detectCores(), samplingSize = 0.2,
			underSample = FALSE, toPredict = NULL, underSampleTarget = NULL,
			sampleMethod = "bagging")

Arguments

data

A data frame, or structure convertable to a data frame, which you want to sample upon.

numberCores

In this setting equal to number of different training samples you are creating: one for each core you are using.

samplingSize

Size of your training sample in percentage.

underSample

Logical wether you want to take an undersample on your desired target.

toPredict

The column of your dataset you want to predict

underSampleTarget

When you set underSample to TRUE, underSampleTarget takes your target you want to keep in every sample. e.g. If you have 5 elements of category1 and 100 elements of category2 and your sampleSize is 0.2, then every sample will contain 25 elements, namely the 5 of category1 and 20 of category2.

sampleMethod

String which decides wether you sample on your observations (bagging) or on your variables (random).

Value

You get a list of length numberCores. Each core has created one item of your list, namely a data frame containing a a samplingSize size sample of data.

Author(s)

Wannes Rosiers

See Also

Under the hood this function uses foreach, and sample

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
## Not run: 
# Create your data
x <- data.frame(1:10,10:1)

# Sampling on observations
trainSample(x,numberCores=2,samplingSize = 0.5)

#Create your data
data(iris)

# Sampling on variables
trainSample(iris,numberCores=2,samplingSize = 0.6,
            toPredict = "Species", sampleMethod = "random")

# Create your data
data(iris)
data <- iris[1:110,]

# Sampling
trainSamples <- trainSample(data,2,0.2,TRUE,"Species","virginica")

## End(Not run)

Questions? Problems? Suggestions? or email at ian@mutexlabs.com.

Please suggest features or report bugs with the GitHub issue tracker.

All documentation is copyright its authors; we didn't write any of that.