blkbox: Train and Test datasets.
In blkbox: Data Exploration with Multiple Machine Learning Algorithms

Description Usage Arguments Author(s) Examples

View source: R/blkbox.R

This standard function will allow multiple machine learning algorithms to be utilized on the same data to determine, which algorithm may be the most appropriate.

1 2	blkbox(data, labels, holdout, holdout.labels, ntrees, mTry, Kernel, Gamma, exclude, max.depth, xgtype = "binary:logistic", seed)

`data`	Data partitioned by into a list or a data frame of training data where the features correspond to columns and the samples are rows. As data size increases the memory required and run time of some algorithms may compound exponentially.
`labels`	a character or numeric vector that contains the training class identifiers for the samples in the data frame. Must appear in the same order. Does not need to be specified if using a partitoned data list.
`holdout`	a data frame of holdout of testing data where the features correspond to columns and the samples are the rows. Does not need to be specified if using a partitoned data list.
`holdout.labels`	a character or numeric vector that contains the holdout or testing class identifiers for the samples in the holdout data frame. Does not need to be specified if using a partitoned data list.
`ntrees`	The number of trees used in the ensemble based learners (randomforest, bigrf, party, bartmachine). default = 500.
`mTry`	The number of features sampled at each node in the trees of ensemble based learners (randomforest, bigrf, party, bartmachine). default = sqrt(number of features).
`Kernel`	The type of kernel used in the support vector machine algorithm (linear, radial, sigmoid, polynomial). default = "linear".
`Gamma`	dvanced parameter, defines the distance of which a single training example reaches. Low gamma will produce a SVM with softer boundaries, as Gamma increases the boundaries will eventually become restricted to their singular support vector. default is 1/(ncol - 1).
`exclude`	removes certain algorithms from analysis - to exclude random forest which you would set exclude = "randomforest". The algorithms each have their own numeric identifier. randomforest = "randomforest", knn = "kknn", bartmachine = "bartmachine", party = "party", glmnet = "GLM", pam = "PamR, nnet = "nnet", svm = "SVM", xgboost = "xgboost".
`max.depth`	the maximum depth of the tree in xgboost model, default is sqrt(ncol(data)).
`xgtype`	either "binary:logistic" or "reg:linear" for logistic regression or linear regression respectively.
`seed`	Sets the seed for the bartMachine model.

Zachary Davies, Boris Guennewig

my_data <- iris[1:100, 1:4]
my_labels <- as.character(iris[1:100, 5])
my_partition = Partition(data = my_data, labels = my_labels)
model_1 <- blkbox(data = my_partition)