uni.selection: Univariate feature selection based on univariate significance...

Description Usage Arguments Details Value Author(s) References Examples

View source: R/uni.selection.R

Description

This function performs univariate feature selection using significance tests (Wald tests or score tests) based on association between individual features and survival. Features are selected if their P-values are less than a given threshold (P.value).

Usage

1
2
uni.selection(t.vec, d.vec, X.mat, P.value=0.001,K=10,score=TRUE,d0=0,
                       randomize=FALSE,CC.plot=FALSE,permutation=FALSE,M=200)

Arguments

t.vec

Vector of survival times (time to either death or censoring)

d.vec

Vector of censoring indicators (1=death, 0=censoring)

X.mat

n by p matrix of covariates, where n is the sample size and p is the number of covariates

P.value

A threshold for selecting features

K

The number of cross-validation folds

score

If TRUE, the score tests are used; if not, the Wald tests are used

d0

A positive constant to stabilize the variance of score statistics (Witten & Tibshirani 2010)

randomize

If TRUE, randomize patient ID's before cross-validation

CC.plot

If TRUE, the compound covariate (CC) predictors are plotted

permutation

If TRUE, the FDR is computed by a permutation method (Witten & Tibshirani 2010; Emura et al. 2018).

M

The number of permutations to calculate the FDR

Details

The cross-validated likelihood (CVL) value is computed for selected features (Matsui 2006; Emura et al. 2018-). A high CVL value corresponds to a better predictive ability of selected features. Hence, the CVL value can be used to find the optimal set of features. The CVL value is computed by a K-fold cross-validation, where the number K can be chosen by user. The false discovery rate (FDR) is also computed by a formula and a permutation test (if "permutation=TRUE"). The RCVL1 and RCVL2 are "re-substitution" CVL values and provide upper control limits for the CVL value. If the CVL value is less than RCVL1 and RCVL2 values, the CVL value would be in-control. On the other hand, if the CVL value exceeds either RCVL1 or RCVL2 value, then the CVL may be computed again after changing the sample allocation.

Value

gene

Gene symbols

beta

Estimated regression coefficients

Z

Z-values for significance tests

P

P-values for significance tests

CVL

The value of CVL, RCVL1, and RCVL2 (Emura et al. 2018-)

Genes

The number of genes, the number of selected genes, and the number of falsely selected genes

FDR

False discovery rate (by a formula or a permutation method)

Author(s)

Takeshi Emura

References

Emura T, Matsui S, Chen HY (2018-). compound.Cox: Univariate Feature Selection and Compound Covariate for Predicting Survival, Computer Methods and Programs in Biomedicine, to appear.

Matsui S (2006). Predicting Survival Outcomes Using Subsets of Significant Genes in Prognostic Marker Studies with Microarrays. BMC Bioinformatics: 7:156.

Witten DM, Tibshirani R (2010) Survival analysis with high-dimensional covariates. Stat Method Med Res 19:29-51

Examples

1
2
3
4
5
6
data(Lung)
t.vec=Lung$t.vec[Lung$train==TRUE]
d.vec=Lung$d.vec[Lung$train==TRUE]
X.mat=Lung[Lung$train==TRUE,-c(1,2,3)]
uni.selection(t.vec, d.vec, X.mat, P.value=0.05,K=5,score=FALSE)
## the outputs reproduce Table 3 of Emura and Chen (2016) ## 

Example output

Loading required package: numDeriv
Loading required package: survival
$beta
     ANXA5       DLG2     ZNF264      DUSP6      CPEB4        LCK      STAT1 
-1.0876762  1.3215044  0.5473276  0.7524497  0.5891676 -0.8447389 -0.5844262 
      RNF4       IRF4      STAT2        HGF      ERBB3        NF1      FRAP1 
 0.6463635  0.5176704  0.5849869  0.5086750  0.5509026  0.4715235 -0.7696768 
       MMD       HMMR 
 0.9151541  0.5156711 

$Z
    ANXA5      DLG2    ZNF264     DUSP6     CPEB4       LCK     STAT1      RNF4 
-2.885540  2.872880  2.654412  2.628478  2.404015 -2.384028 -2.329287  2.290596 
     IRF4     STAT2       HGF     ERBB3       NF1     FRAP1       MMD      HMMR 
 2.171948  2.155568  2.127643  2.126139  2.074913 -2.045298  2.034407  1.976606 

$P
      ANXA5        DLG2      ZNF264       DUSP6       CPEB4         LCK 
0.003907424 0.004067486 0.007944666 0.008576790 0.016216117 0.017124302 
      STAT1        RNF4        IRF4       STAT2         HGF       ERBB3 
0.019843870 0.021986777 0.029859561 0.031117422 0.033366690 0.033491656 
        NF1       FRAP1         MMD        HMMR 
0.037994593 0.040825466 0.041910555 0.048086199 

$CVL
      CVL     RCVL1     RCVL2 
-96.00449 -83.71309 -85.32446 

$Genes
         No. of genes No. of selected genes 
                   97                    16 

$FDR
P.value * (No. of genes) 
                0.303125 

Warning messages:
1: In fitter(X, Y, strats, offset, init, control, weights = weights,  :
  Loglik converged before variable  1 ; beta may be infinite. 
2: In fitter(X, Y, strats, offset, init, control, weights = weights,  :
  Loglik converged before variable  1 ; beta may be infinite. 
3: In fitter(X, Y, strats, offset, init, control, weights = weights,  :
  Loglik converged before variable  1 ; beta may be infinite. 

compound.Cox documentation built on July 21, 2018, 5:01 p.m.