nscentroids: Nearest Shrunken Centroids

View source: R/nscentroids.R

nscentroidsR Documentation

Nearest Shrunken Centroids

Description

Nearest shrunken centroids performs regularized classification of high-dimensional data. Originally developed for classification of microarrays, it calculates test statistics for each feature/dimension based on the deviation between the class centroids and the global centroid. It applies regularization (via soft thresholding) to these test statistics to produce shrunken centroids for each class.

Usage

# Nearest shrunken centroids
nscentroids(x, y, s = 0, distfun = NULL,
	priors = table(y), center = NULL, transpose = FALSE,
	verbose = NA, nchunks = NA, BPPARAM = bpparam(), ...)

## S3 method for class 'nscentroids'
fitted(object, type = c("response", "class"), ...)

## S3 method for class 'nscentroids'
predict(object, newdata,
	type = c("response", "class"), ...)

## S3 method for class 'nscentroids'
logLik(object, ...)

Arguments

x

The data matrix.

y

The response. (Coerced to a factor.)

s

The sparsity (soft thresholding) parameter used to shrink the test statistics. May be a vector.

distfun

The function of the form function(x, y, ...) used to generate a distance function of the form function(i) giving the distances between the ith object(s) in x and all objects in y. If provided, it must support an argument called weights that takes a vector of feature weights used to scale the features during the distance calculation.

priors

The prior probabilities or sample sizes for each class. (Will be normalized.)

center

An optional vector giving the pre-calculated global centroid.

transpose

A logical value indicating whether x should be considered transposed or not. This can be useful if the input matrix is (P x N) instead of (N x P) and storing the transpose is expensive. This is not necessary for matter_mat and sparse_mat objects, but can be useful for large in-memory (P x N) matrices.

verbose

Should progress be printed for each iteration? Not passed to distfun.

nchunks

The number of chunks to use (for centering and scaling only). Passed to distfun.

BPPARAM

An optional instance of BiocParallelParam. See documentation for bplapply. Passed to distfun.

...

Additional options passed to distfun.

object

An object inheriting from nscentroids.

newdata

An optional data matrix to use for the prediction.

type

The type of prediction, where "response" means the posterior probability matrix and "class" will be the vector of class predictions.

Details

This functions implements nearest shrunken centroids based on the original algorithm by Tibshirani et al. (2002). It provides a sparse strategy for classification based on regularized class centroids. The class centroids are shrunken toward the global centroid. The shrunken test statistics used to perform the regularization can then be interpreted to determine which features are relevant to the classification. (Important features will have nonzero test statitistics after soft thresholding.)

Unlike the original algorithm, this implementation allows specifying a custom dissimilarity function. If not provided, then this defaults to rowDistFun() if transpose=FALSE or colDistFun() if transpose=TRUE.

If a custom function is passed, it should take the form function(x, y, ...), and it must return a function of the form function(i). The returned function should return the distances between the ith object(s) in x and all objects in y. In addition, it must support an argument called weights that takes a vector of feature weights used to scale the features during the distance calculation. rowDistFun() and colDistFun() are examples of functions that satisfy these properties.

Value

An object of class nscentroids, with the following components:

  • class: The predicted classes.

  • probability: A matrix of posterior class probabilities.

  • centers: The shrunken class centroids used for classification.

  • statistic: The shrunken test statistics.

  • sd: The pooled within-class standard deviations for each feature.

  • priors: The prior class probabilities.

  • s: The regularization (soft thresholding) parameter.

  • distfun: The function used to generate the dissimilarity function.

Author(s)

Kylie A. Bemis

References

R. Tibshirani, T. Hastie, B. Narasimhan, and G. Chu. “Diagnosis of multiple cancer types by shrunken centroids of gene expression.” Proceedings of the National Academy of Sciences of the USA, vol. 99, no. 10, pp. 6567-6572, 2002.

R. Tibshirani, T. Hastie, B. Narasimhan, and G. Chu. “Class prediction by nearest shrunken with applications to DNA microarrays.” Statistical Science, vol. 18, no. 1, pp. 104-117, 2003.

See Also

rowDistFun, colDistFun

Examples

register(SerialParam())

set.seed(1)
n <- 100
p <- 5
x <- matrix(rnorm(n * p), nrow=n, ncol=p)
colnames(x) <- paste0("x", seq_len(p))
y <- ifelse(x[,1L] > 0 | x[,2L] < 0, "a", "b")

nscentroids(x, y, s=1.5)

kuwisdelu/matter documentation built on May 1, 2024, 5:17 a.m.