nscentroids | R Documentation |
Nearest shrunken centroids performs regularized classification of high-dimensional data. Originally developed for classification of microarrays, it calculates test statistics for each feature/dimension based on the deviation between the class centroids and the global centroid. It applies regularization (via soft thresholding) to these test statistics to produce shrunken centroids for each class.
# Nearest shrunken centroids
nscentroids(x, y, s = 0, distfun = NULL,
priors = table(y), center = NULL, transpose = FALSE,
verbose = NA, nchunks = NA, BPPARAM = bpparam(), ...)
## S3 method for class 'nscentroids'
fitted(object, type = c("response", "class"), ...)
## S3 method for class 'nscentroids'
predict(object, newdata,
type = c("response", "class"), ...)
## S3 method for class 'nscentroids'
logLik(object, ...)
x |
The data matrix. |
y |
The response. (Coerced to a factor.) |
s |
The sparsity (soft thresholding) parameter used to shrink the test statistics. May be a vector. |
distfun |
The function of the form |
priors |
The prior probabilities or sample sizes for each class. (Will be normalized.) |
center |
An optional vector giving the pre-calculated global centroid. |
transpose |
A logical value indicating whether |
verbose |
Should progress be printed for each iteration? Not passed to |
nchunks |
The number of chunks to use (for centering and scaling only). Passed to |
BPPARAM |
An optional instance of |
... |
Additional options passed to |
object |
An object inheriting from |
newdata |
An optional data matrix to use for the prediction. |
type |
The type of prediction, where |
This functions implements nearest shrunken centroids based on the original algorithm by Tibshirani et al. (2002). It provides a sparse strategy for classification based on regularized class centroids. The class centroids are shrunken toward the global centroid. The shrunken test statistics used to perform the regularization can then be interpreted to determine which features are relevant to the classification. (Important features will have nonzero test statitistics after soft thresholding.)
Unlike the original algorithm, this implementation allows specifying a custom dissimilarity function. If not provided, then this defaults to rowDistFun()
if transpose=FALSE
or colDistFun()
if transpose=TRUE
.
If a custom function is passed, it should take the form function(x, y, ...)
, and it must return a function of the form function(i)
. The returned function should return the distances between the i
th object(s) in x
and all objects in y
. In addition, it must support an argument called weights
that takes a vector of feature weights used to scale the features during the distance calculation. rowDistFun()
and colDistFun()
are examples of functions that satisfy these properties.
An object of class nscentroids
, with the following components:
class
: The predicted classes.
probability
: A matrix of posterior class probabilities.
centers
: The shrunken class centroids used for classification.
statistic
: The shrunken test statistics.
sd
: The pooled within-class standard deviations for each feature.
priors
: The prior class probabilities.
s
: The regularization (soft thresholding) parameter.
distfun
: The function used to generate the dissimilarity function.
Kylie A. Bemis
R. Tibshirani, T. Hastie, B. Narasimhan, and G. Chu. “Diagnosis of multiple cancer types by shrunken centroids of gene expression.” Proceedings of the National Academy of Sciences of the USA, vol. 99, no. 10, pp. 6567-6572, 2002.
R. Tibshirani, T. Hastie, B. Narasimhan, and G. Chu. “Class prediction by nearest shrunken with applications to DNA microarrays.” Statistical Science, vol. 18, no. 1, pp. 104-117, 2003.
rowDistFun
,
colDistFun
register(SerialParam())
set.seed(1)
n <- 100
p <- 5
x <- matrix(rnorm(n * p), nrow=n, ncol=p)
colnames(x) <- paste0("x", seq_len(p))
y <- ifelse(x[,1L] > 0 | x[,2L] < 0, "a", "b")
nscentroids(x, y, s=1.5)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.