cosci_is_select: Use a data driven approach to select the features

Description Usage Arguments Details Value References See Also Examples

View source: R/coscilibrary.R

Description

Once you have the feature scores from cosci_is, you can select the features

  1. based on a pre-defined threshold,

  2. using table A.10 in the paper[1] to determine an appropriate threshold or,

  3. using a data driven approach described in the references to select the features and obtain an implicit threshold value.

cosci_is_select implements option 3.

Usage

1

Arguments

score

a p vector of scores

gamma

what proportion of the p features is noise? If your sample size n is smaller than 100, setting gamma = 0.85 is recommended. Otherwise set gamma = 0.9

Details

Converts the problem of screening out features with lower scores into a problem in large scale multiple testing and uses the procedure described in reference [2] to determine the signal features.

Value

a vector of selected features

References

  1. Banerjee, T., Mukherjee, G. and Radchenko P., Feature Screening in Large Scale Cluster Analysis, Journal of Multivariate Analysis, Volume 161, 2017, Pages 191-212

  2. T. Cai, W. Sun, W., Optimal screening and discovery of sparse signals with applications to multistage high throughput studies, J. Roy.Statist. Soc. Ser. B (Statistical Methodology) 79, no. 1 (2017) 197-223

See Also

cosci_is

Examples

1
2
3
4
5
6
7
8
library(fusionclust)
set.seed(42)
noise<-matrix(rnorm(49000),nrow=1000,ncol=49)
set.seed(42)
signal<-c(rnorm(500,-1.5,1),rnorm(500,1.5,1))
x<-cbind(signal,noise)
scores<- cosci_is(x,0)
features<-cosci_is_select(scores,0.9)

trambakbanerjee/fusionclust documentation built on June 18, 2021, 5:40 a.m.