predict_classes-matrix-method: Predict classes for new samples based on signature centroid...

predict_classes-matrix-methodR Documentation

Predict classes for new samples based on signature centroid matrix

Description

Predict classes for new samples based on signature centroid matrix

Usage

## S4 method for signature 'matrix'
predict_classes(object, mat, dist_method = c("euclidean", "correlation", "cosine"),
    nperm = 1000, p_cutoff = 0.05, plot = TRUE, col_fun = NULL, split_by_sigatures = FALSE,
    verbose = TRUE, prefix = "", mc.cores = 1, cores = mc.cores, width1 = NULL, width2 = NULL)

Arguments

object

The signature centroid matrix. See the Details section.

mat

The new matrix where the classes are going to be predicted. The number of rows should be the same as the signature centroid matrix (also make sure the row orders are the same). Be careful that mat should be in the same scale as the centroid matrix.

dist_method

Distance method. Value should be "euclidean", "correlation" or "cosine".

nperm

Number of permutatinos. It is used when dist_method is set to "euclidean" or "cosine".

p_cutoff

Cutoff for the p-values for determining class assignment.

plot

Whether to draw the plot that visualizes the process of prediction.

col_fun

A color mapping function generated from colorRamp2. It is set to both heatmaps.

verbose

Whether to print messages.

split_by_sigatures

Should the heatmaps be split based on k-means on the main heatmap, or on the patterns of the signature heatmap.

prefix

Used internally.

mc.cores

Number of cores. This argument will be removed in future versions.

cores

Number of cores, or a cluster object returned by makeCluster.

width1

Width of the first heatmap.

width2

Width of the second heatmap.

Details

The signature centroid matrix is a k-column matrix where each column is the centroid of samples in the corresponding class (k-group classification).

For each sample in the new matrix, the task is basically to test which signature centroid the current sample is the closest to. There are two methods: the Euclidean distance and the correlation (Spearman) distance.

For the Euclidean/cosine distance method, for the vector denoted as x which corresponds to sample i in the new matrix, to test which class should be assigned to sample i, the distance between sample i and all k signature centroids are calculated and denoted as d_1, d_2, ..., d_k. The class with the smallest distance is assigned to sample i. The distances for k centroids are sorted increasingly, and we design a statistic named "difference ratio", denoted as r and calculated as: (|d_(1) - d_(2)|)/mean(d), which is the difference between the smallest distance and the second smallest distance, normalized by the mean distance. To test the statistical significance of r, we randomly permute rows of the signature centroid matrix and calculate r_rand. The random permutation is performed n_perm times and the p-value is calculated as the proportion of r_rand being larger than r.

For the correlation method, the distance is calculated as the Spearman correlation between sample i and signature centroid k. The label for the class with the maximal correlation value is assigned to sample i. The p-value is simply calculated by cor.test between sample i and centroid k.

If a sample is tested with a p-value higher than p_cutoff, the corresponding class label is set to NA.

Value

A data frame with two columns: the class labels (the column names of the signature centroid matrix are treated as class labels) and the corresponding p-values.

Examples


data(golub_cola)
res = golub_cola["ATC:skmeans"]
mat = get_matrix(res)
# note scaling should be applied here because the matrix was scaled in the cola analysis
mat2 = t(scale(t(mat)))

tb = get_signatures(res, k = 3, plot = FALSE)
sig_mat = tb[, grepl("scaled_mean", colnames(tb))]
sig_mat = as.matrix(sig_mat)
colnames(sig_mat) = paste0("class", seq_len(ncol(sig_mat)))
# this is how the signature centroid matrix looks like:
head(sig_mat)

mat2 = mat2[tb$which_row, , drop = FALSE]

# now we predict the class for `mat2` based on `sig_mat`
predict_classes(sig_mat, mat2)


jokergoo/cola documentation built on Feb. 29, 2024, 1:41 a.m.