cross_validate: used for cross-validation of various algorithms.

Description Usage Arguments Author(s) Examples

View source: R/cross_validate.R

Description

Performs n-fold cross-validation of specified algorithm.

Usage

1
2
3
4
5
6
cross_validate(container, nfold, algorithm = c("SVM", "SLDA", "BOOSTING", 
"BAGGING", "RF", "GLMNET", "TREE", "NNET"), seed = NA, 
method = "C-classification", cross = 0, cost = 100, kernel = "radial", 
maxitboost = 100, maxitglm = 10^5, size = 1, maxitnnet = 1000, MaxNWts = 10000, 
rang = 0.1, decay = 5e-04, ntree = 200, l1_regularizer = 0, l2_regularizer = 0, 
use_sgd = FALSE, set_heldout = 0, verbose = FALSE)

Arguments

container

Class of type matrix_container-class generated by the create_container function.

nfold

Number of folds to perform for cross-validation.

algorithm

A string specifying which algorithm to use. Use print_algorithms to see a list of options.

seed

Random seed number used to replicate cross-validation results.

method

Method parameter for SVM implentation. See e1071 documentation for more details.

cross

Cross parameter for SVM implentation. See e1071 documentation for more details.

cost

Cost parameter for SVM implentation. See e1071 documentation for more details.

kernel

Kernel parameter for SVM implentation. See e1071 documentation for more details.

maxitboost

Maximum iterations parameter for boosting implentation. See caTools documentation for more details.

maxitglm

Maximum iterations parameter for glmnet implentation. See glmnet documentation for more details.

size

Size parameter for neural networks implentation. See nnet documentation for more details.

maxitnnet

Maximum iterations for neural networks implentation. See nnet documentation for more details.

MaxNWts

Maximum number of weights parameter for neural networks implentation. See nnet documentation for more details.

rang

Range parameter for neural networks implentation. See nnet documentation for more details.

decay

Decay parameter for neural networks implentation. See nnet documentation for more details.

ntree

Number of trees parameter for RandomForests implentation. See randomForest documentation for more details.

l1_regularizer

An numeric turning on L1 regularization and setting the regularization parameter. A value of 0 will disable L1 regularization. See maxent documentation for more details.

l2_regularizer

An numeric turning on L2 regularization and setting the regularization parameter. A value of 0 will disable L2 regularization. See maxent documentation for more details.

use_sgd

A logical indicating that SGD parameter estimation should be used. Defaults to FALSE. See maxent documentation for more details.

set_heldout

An integer specifying the number of documents to hold out. Sets a held-out subset of your data to test against and prevent overfitting. See maxent documentation for more details.

verbose

A logical specifying whether to provide descriptive output about the training process. Defaults to FALSE, or no output. See maxent documentation for more details.

Author(s)

Loren Collingwood, Timothy P. Jurka

Examples

1
2
3
4
5
6
7
8
library(RTextTools)
data(NYTimes)
data <- NYTimes[sample(1:3100,size=100,replace=FALSE),]
matrix <- create_matrix(cbind(data["Title"],data["Subject"]), language="english", 
removeNumbers=TRUE, stemWords=FALSE, weighting=tm::weightTfIdf)
container <- create_container(matrix,data$Topic.Code,trainSize=1:75, testSize=76:100, 
virgin=FALSE)
svm <- cross_validate(container,2,algorithm="SVM")

Example output

Loading required package: SparseM

Attaching package: 'SparseM'

The following object is masked from 'package:base':

    backsolve

Fold 1 Out of Sample Accuracy = 0.2083333
Fold 2 Out of Sample Accuracy = 0.1923077

RTextTools documentation built on April 26, 2020, 9:05 a.m.