nscentroids: Nearest Shrunken Centroids
In kuwisdelu/matter: Out-of-core statistical computing and signal processing

nscentroids

R Documentation

Nearest Shrunken Centroids

Description

Nearest shrunken centroids performs regularized classification of high-dimensional data. Originally developed for classification of microarrays, it calculates test statistics for each feature/dimension based on the deviation between the class centroids and the global centroid. It applies regularization (via soft thresholding) to these test statistics to produce shrunken centroids for each class.

Usage

# Nearest shrunken centroids
nscentroids(x, y, s = 0, distfun = NULL,
	priors = table(y), center = NULL, transpose = FALSE,
	verbose = NA, chunkopts=list(),
	BPPARAM = bpparam(), ...)

## S3 method for class 'nscentroids'
fitted(object, type = c("response", "class"), ...)

## S3 method for class 'nscentroids'
predict(object, newdata,
	type = c("response", "class"), priors = NULL, ...)

## S3 method for class 'nscentroids'
logLik(object, ...)

Arguments

`x`	The data matrix.
`y`	The response. (Coerced to a factor.)
`s`	The sparsity (soft thresholding) parameter used to shrink the test statistics. May be a vector.
`distfun`	A distance function with the same usage (i.e., supports the same arguments and return values) as `rowDists` or `colDists`. In particular, it must support an argument called `weights` that takes a vector of feature weights used to scale the feature-wise distance components.
`priors`	The prior probabilities or sample sizes for each class. (Will be normalized.)
`center`	An optional vector giving the pre-calculated global centroid.
`transpose`	A logical value indicating whether `x` should be considered transposed or not. This can be useful if the input matrix is (P x N) instead of (N x P) and storing the transpose is expensive. This is not necessary for `matter_mat` and `sparse_mat` objects, but can be useful for large in-memory (P x N) matrices.
`verbose`	Should progress be printed for the initial centroid calculations and for each fitted model (i.e., each value of `s`)?
`chunkopts`	An (optional) list of chunk options including `nchunks`, `chunksize`, and `serialize`. See `chunkApply`.
`BPPARAM`	An optional instance of `BiocParallelParam`. See documentation for `bplapply`. Passed to `distfun`.
`...`	Additional options passed to `distfun`.
`object`	An object inheriting from `nscentroids`.
`newdata`	An optional data matrix to use for the prediction.
`type`	The type of prediction, where `"response"` means the posterior probability matrix and `"class"` will be the vector of class predictions.

Details

This functions implements nearest shrunken centroids based on the original algorithm by Tibshirani et al. (2002). It provides a sparse strategy for classification based on regularized class centroids. The class centroids are shrunken toward the global centroid. The shrunken test statistics used to perform the regularization can then be interpreted to determine which features are relevant to the classification. (Important features will have nonzero test statitistics after soft thresholding.)

A custom distance function can be passed via distfun. If not provided, then this defaults to rowDists if transpose=FALSE or colDists if transpose=TRUE.

If a custom function is passed, it must support the same arguments and return values as rowDists and colDists.

Value

An object of class nscentroids, with the following components:

class: The predicted classes.
probability: A matrix of posterior class probabilities.
centers: The shrunken class centroids used for classification.
statistic: The shrunken test statistics.
sd: The pooled within-class standard deviations for each feature.
priors: The prior class probabilities.
s: The regularization (soft thresholding) parameter.
distfun: The function used to generate the dissimilarity function.

Author(s)

Kylie A. Bemis

References

R. Tibshirani, T. Hastie, B. Narasimhan, and G. Chu. “Diagnosis of multiple cancer types by shrunken centroids of gene expression.” Proceedings of the National Academy of Sciences of the USA, vol. 99, no. 10, pp. 6567-6572, 2002.

R. Tibshirani, T. Hastie, B. Narasimhan, and G. Chu. “Class prediction by nearest shrunken with applications to DNA microarrays.” Statistical Science, vol. 18, no. 1, pp. 104-117, 2003.

Examples

register(SerialParam())

set.seed(1)
n <- 100
p <- 5
x <- matrix(rnorm(n * p), nrow=n, ncol=p)
colnames(x) <- paste0("x", seq_len(p))
y <- ifelse(x[,1L] > 0 | x[,2L] < 0, "a", "b")

nscentroids(x, y, s=1.5)

kuwisdelu/matter documentation built on April 12, 2025, 2:41 p.m.