train_test: Create Training and Test Data Subsets

Description Usage Arguments Details Value Examples

Description

Creates a list of lists, containing the dependent variable and matrix of predictors for the training set and the test set, respectively.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
train_test(current_fold, index, y, X, scale = T)

is.train_test(x)

## S3 method for class 'train_test'
subset(x, y = T, train = T)

## S3 method for class 'train_test'
levels(x)

## S3 method for class 'train_test'
dimnames(x)

## S3 method for class 'train_test'
as.integer(x, train = T)

## S3 method for class 'train_test'
print(x)

## S3 method for class 'train_test'
as.data.frame(x, train = T)

## S3 method for class 'train_test'
size(x, y = T, train = T)

Arguments

current_fold

The index for data to withhold and use for the test subset.

index

A vector of indices indicating which observations belong to which fold for cross-validation (see cv_index).

y

A vector for the dependent variable. Converted to a factor if it is not already.

X

A matrix of predictors.

scale

Logical; if TRUE, standardizes the predictors.

Details

The method subset can be used to extract the dependent variable (y = TRUE) or matrix of predictors from the training (train = TRUE) or test subsets, respectively. The method as.data.frame converts the specified subset into a data frame (useful for glm). The methods dimnames and levels can extract the column labels for predictors and the levels for the dependent variable, respectively. The method as.integer can convert the subset for the dependent variable into integer values. When standardizing predictors, the training set is standardized, and then the test set is scaled relative to the training set.

Value

A list of lists, of class 'train_test'. The element train contains the training subset for the dependent variable y and matrix of predictors X, while the element test contains the test subset.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# Example data
y = sample( c( 'Yes', 'No' ), 1000, replace = T, prob = c(.4,.6) )
y = as.factor(y)
X = matrix( rnorm( 1000*8 ), 1000, 8 )
index = cv_index( 10, 1000 )
ex = train_test( 1, index, y, X )
ex
# Extract dependent variable from test set
y_test = subset( ex, y = T, train = F )
# Extract matrix of predictors from training set
X_train = subset( ex, y = F, train = T )
# Extract column names for predictors
names( ex )
# Extract levels for dependent variable
levels( ex )

rettopnivek/binclass documentation built on May 13, 2019, 4:46 p.m.