nscentroids: Nearest Shrunken Centroids

View source: R/nscentroids.R

nscentroidsR Documentation

Nearest Shrunken Centroids

Description

Nearest shrunken centroids performs regularized classification of high-dimensional data. Originally developed for classification of microarrays, it calculates test statistics for each feature/dimension based on the deviation between the class centroids and the global centroid. It applies regularization (via soft thresholding) to these test statistics to produce shrunken centroids for each class.

Usage

# Nearest shrunken centroids
nscentroids(x, y, s = 0, distfun = NULL,
	priors = table(y), center = NULL, transpose = FALSE,
	verbose = NA, chunkopts=list(),
	BPPARAM = bpparam(), ...)

## S3 method for class 'nscentroids'
fitted(object, type = c("response", "class"), ...)

## S3 method for class 'nscentroids'
predict(object, newdata,
	type = c("response", "class"), priors = NULL, ...)

## S3 method for class 'nscentroids'
logLik(object, ...)

Arguments

x

The data matrix.

y

The response. (Coerced to a factor.)

s

The sparsity (soft thresholding) parameter used to shrink the test statistics. May be a vector.

distfun

A distance function with the same usage (i.e., supports the same arguments and return values) as rowDists or colDists. In particular, it must support an argument called weights that takes a vector of feature weights used to scale the feature-wise distance components.

priors

The prior probabilities or sample sizes for each class. (Will be normalized.)

center

An optional vector giving the pre-calculated global centroid.

transpose

A logical value indicating whether x should be considered transposed or not. This can be useful if the input matrix is (P x N) instead of (N x P) and storing the transpose is expensive. This is not necessary for matter_mat and sparse_mat objects, but can be useful for large in-memory (P x N) matrices.

verbose

Should progress be printed for the initial centroid calculations and for each fitted model (i.e., each value of s)?

chunkopts

An (optional) list of chunk options including nchunks, chunksize, and serialize. See chunkApply.

BPPARAM

An optional instance of BiocParallelParam. See documentation for bplapply. Passed to distfun.

...

Additional options passed to distfun.

object

An object inheriting from nscentroids.

newdata

An optional data matrix to use for the prediction.

type

The type of prediction, where "response" means the posterior probability matrix and "class" will be the vector of class predictions.

Details

This functions implements nearest shrunken centroids based on the original algorithm by Tibshirani et al. (2002). It provides a sparse strategy for classification based on regularized class centroids. The class centroids are shrunken toward the global centroid. The shrunken test statistics used to perform the regularization can then be interpreted to determine which features are relevant to the classification. (Important features will have nonzero test statitistics after soft thresholding.)

A custom distance function can be passed via distfun. If not provided, then this defaults to rowDists if transpose=FALSE or colDists if transpose=TRUE.

If a custom function is passed, it must support the same arguments and return values as rowDists and colDists.

Value

An object of class nscentroids, with the following components:

  • class: The predicted classes.

  • probability: A matrix of posterior class probabilities.

  • centers: The shrunken class centroids used for classification.

  • statistic: The shrunken test statistics.

  • sd: The pooled within-class standard deviations for each feature.

  • priors: The prior class probabilities.

  • s: The regularization (soft thresholding) parameter.

  • distfun: The function used to generate the dissimilarity function.

Author(s)

Kylie A. Bemis

References

R. Tibshirani, T. Hastie, B. Narasimhan, and G. Chu. “Diagnosis of multiple cancer types by shrunken centroids of gene expression.” Proceedings of the National Academy of Sciences of the USA, vol. 99, no. 10, pp. 6567-6572, 2002.

R. Tibshirani, T. Hastie, B. Narasimhan, and G. Chu. “Class prediction by nearest shrunken with applications to DNA microarrays.” Statistical Science, vol. 18, no. 1, pp. 104-117, 2003.

See Also

rowDists, colDists

Examples

register(SerialParam())

set.seed(1)
n <- 100
p <- 5
x <- matrix(rnorm(n * p), nrow=n, ncol=p)
colnames(x) <- paste0("x", seq_len(p))
y <- ifelse(x[,1L] > 0 | x[,2L] < 0, "a", "b")

nscentroids(x, y, s=1.5)

kuwisdelu/matter documentation built on Dec. 8, 2024, 8:09 p.m.