lol.xval.eval: Embedding Cross Validation

Description Usage Arguments Value Details Author(s) Examples

View source: R/xval.R

Description

A function for performing leave-one-out cross-validation for a given embedding model. This function produces fold-wise cross-validated misclassification rates for standard embedding techniques. Users can optionally specify custom embedding techniques with proper configuration of alg.* parameters and hyperparameters. Optional classifiers implementing the S3 predict function can be used for classification, with hyperparameters to classifiers for determining misclassification rate specified in classifier.* parameters and hyperparameters.

Usage

1
2
3
lol.xval.eval(X, Y, r, alg, sets = NULL, alg.dimname = "r",
  alg.opts = list(), alg.embedding = "A", classifier = lda,
  classifier.opts = list(), classifier.return = "class", k = "loo", ...)

Arguments

X

[n, d] the data with n samples in d dimensions.

Y

[n] the labels of the samples with K unique labels.

r

the number of embedding dimensions desired, where r <= d.

alg

the algorithm to use for embedding. Should be a function that accepts inputs X, Y, and has a parameter for alg.dimname if alg is supervised, or just X and alg.dimname if alg is unsupervised.This algorithm should return a list containing a matrix that embeds from d to r <= d dimensions.

sets

a user-defined cross-validation set. Defaults to NULL.

  • is.null(sets) randomly partition the inputs X and Y into training and testing sets.

  • !is.null(sets) use a user-defined partitioning of the inputs X and Y into training and testing sets. Should be in the format of the outputs from lol.xval.split. That is, a list with each element containing X.train, an [n-k][d] subset of data to test on, Y.train, an [n-k] subset of class labels for X.train; X.test, an [n-k][d] subset of data to test the model on, Y.train, an [k] subset of class labels for X.test.

alg.dimname

the name of the parameter accepted by alg for indicating the embedding dimensionality desired. Defaults to r.

alg.opts

the hyper-parameter options you want to pass into your algorithm, as a keyworded list. Defaults to list(), or no hyper-parameters.

alg.embedding

the attribute returned by alg containing the embedding matrix. Defaults to assuming that alg returns an embgedding matrix as "A".

  • !is.nan(alg.embedding) Assumes that alg will return a list containing an attribute, alg.embedding, a [d, r] matrix that embeds [n, d] data from [d] to [r < d] dimensions.

  • is.nan(alg.embedding) Assumes that alg returns a [d, r] matrix that embeds [n, d] data from [d] to [r < d] dimensions.

classifier

the classifier to use for assessing performance. The classifier should accept X, a [n, d] array as the first input, and Y, a [n] array of labels, as the first 2 arguments. The class should implement a predict function, predict.classifier, that is compatible with the stats::predict S3 method. Defaults to MASS::lda.

classifier.opts

any extraneous options to be passed to the classifier function, as a list. Defaults to an empty list.

classifier.return

if the return type is a list, class encodes the attribute containing the prediction labels from stats::predict. Defaults to the return type of MASS::lda, class.

  • !is.nan(classifier.return) Assumes that predict.classifier will return a list containing an attribute, classifier.return, that encodes the predicted labels.

  • is.nan(classifier.return) Assumes that predict.classifer returns a [n] vector/array containing the prediction labels for [n, d] inputs.

k

the cross-validated method to perform. Defaults to 'loo'. If sets is provided, this option is ignored. See lol.xval.split for details.

  • 'loo' Leave-one-out cross validation

  • isinteger(k) perform k-fold cross-validation with k as the number of folds.

...

trailing args.

rank

whether to force the training set to low-rank. Defaults to FALSE. If sets is provided, this option is ignored. See lol.xval.split for details.

  • if rank == FALSE, uses default cross-validation method with standard k-fold validation. Training sets are k-1 folds, and testing sets are 1 fold, where the fold held-out for testing is rotated to ensure no dependence of potential downstream inference in the cross-validated misclassification rates.

  • if ]coderank == TRUE, users cross-validation method with ntrain = min((k-1)/k*n, d) sample training sets, where d is the number of dimensions in X. This ensures that the training data is always low-rank, ntrain < d + 1. Note that the resulting training sets may have ntrain < (k-1)/k*n, but the resulting testing sets will always be properly rotated ntest = n/k to ensure no dependencies in fold-wise testing.

Value

Returns a list containing:

lhat

the mean cross-validated error.

model

The model returned by alg computed on all of the data.

classifier

The classifier trained on all of the embedded data.

lhats

the cross-validated error for each of the k-folds.

Details

For more details see the help vignette: vignette("xval", package = "lolR")

For extending cross-validation techniques shown here to arbitrary embedding algorithms, see the vignette: vignette("extend_embedding", package = "lolR")

For extending cross-validation techniques shown here to arbitrary classification algorithms, see the vignette: vignette("extend_classification", package = "lolR")

Author(s)

Eric Bridgeford

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
# train model and analyze with loo validation using lda classifier
library(lolR)
data <- lol.sims.rtrunk(n=200, d=30)  # 200 examples of 30 dimensions
X <- data$X; Y <- data$Y
r=5  # embed into r=5 dimensions
# run cross-validation with the nearestCentroid method and
# leave-one-out cross-validation, which returns only
# prediction labels so we specify classifier.return as NaN
xval.fit <- lol.xval.eval(X, Y, r, lol.project.lol,
                          classifier=lol.classify.nearestCentroid,
                          classifier.return=NaN, k='loo')

# train model and analyze with 5-fold validation using lda classifier
data <- lol.sims.rtrunk(n=200, d=30)  # 200 examples of 30 dimensions
X <- data$X; Y <- data$Y
xval.fit <- lol.xval.eval(X, Y, r, lol.project.lol, k=5)

# pass in existing cross-validation sets
sets <- lol.xval.split(X, Y, k=2)
xval.fit <- lol.xval.eval(X, Y, r, lol.project.lol, sets=sets)

neurodata/lol documentation built on Oct. 17, 2018, 8:58 a.m.