ExposureClassifyCV: k-fold cross-validation of sample classifcation by exposure...

ExposureClassifyCVR Documentation

k-fold cross-validation of sample classifcation by exposure levels

Description

Splits labeled samples in k groups (deafult k=8), keeping the proportion of classes stable among groups. Classify samples in each group according to the k-1 remaining ones. Gather results and evaluate global classification performance.

Usage

## S4 method for signature 'SignExp,character'
ExposureClassifyCV(signexp_obj, labels, method="knn",
    max_instances=200, k=3, weights=NA, plot_to_file=FALSE, 
    file="Classification_CV_barplot.pdf", colors=NA_character_, 
    min_agree=0.75, fold=8, ...)

Arguments

signexp_obj

A SignExp object returned by signeR function.

labels

Sample labels. Unlabeled samples (NA labels) will be ignored.

method

Classification algorithm used. Default is k-Nearest Neighbors (kNN). Any other algorithm may be used, as long as it is customized to satisfy the following conditions:
Input: a matrix of labeled samples, with one sample per line and one feature per column; a matrix of unlabeled samples to classify, with the same structure; an array of labels, with one entry for each labeled sample.
Output: an array of assigned labels, one for each unlabeled sample.

max_instances

Maximum number of the exposure matrix instances to be analyzed. If the number of available E instances is bigger than this parameter, a subset of those will be randomly selected for analysis.

k

Number of nearest neighbors considered for classification, used only if method="kNN". Default is 3.

weights

Vector of weights applied to the signatures when performing classification. Default is NA, which leads all the signatures to have weight=1.

plot_to_file

Whether to save the plot to the file parameter. Default is FALSE.

file

File that will be generated with cross validation graphic output.

colors

Array of color names, one for each sample class. Colors will be recycled if the length of this array is less than the number of classes.

min_agree

Minimum frequency of agreement among individual classifications. Samples showing a frequency of agreement below this value are considered as "undefined". Default is 0.75.

fold

Number of subsets in which labeled samples will be split

...

additional parameters for classification algorithm (defined by "method" above).

Value

A list with the following items:

confusion_matrix

Contingency table of attributed sample classes against original labels.

class

The assigned classes for each sample.

freq

Classification agreement for each sample: the relative frequency of assignment of each sample to the group specified in "class".

allfreqs

Matrix with one column for each sample and one row for each class label. Contains the assignment frequencies of each sample to each class.

probs

As above, a matrix with samples in columns and class labels in rows. Contains the average probability, among repeated exposure classifications, of each sample belonging to each class.

Examples

# assuming signatures is the return value of signeR()


my_labels <- c("a","a","a","a","a","b","b","b","b","b")
ClassCV <- ExposureClassifyCV(signatures$SignExposures, labels=my_labels,fold=5)

# see also
vignette(package="signeR")

rvalieris/signeR documentation built on April 20, 2024, 2:08 p.m.