Description Usage Arguments Details Value Author(s) References See Also Examples
Elastic net uses a mixing parameter alpha
to tune the penalty term continuously from ridge (alpha=0
) to lasso (alpha=1
). eNetXplorer
generates a family of elastic net models over different values of alpha
for the quantitative exploration of the effects of shrinkage. For each alpha
, the regularization parameter lambda
is chosen by optimizing a quality (objective) function based on outofbag crossvalidation predictions. Statistical significance of each model, as well as that of individual features within a model,
is assigned by comparison to a set of null models generated by random permutations of the response. eNetXplorer
fits linear (gaussian), logistic (binomial), multinomial, and Cox regression models.
1 2 3 4 5 6 7 8 9 10 11  eNetXplorer(x, y, family=c("gaussian","binomial","multinomial","cox"),
alpha=seq(0,1,by=0.2), nlambda=100, nlambda.ext=NULL, seed=NULL, scaled=TRUE,
n_fold=5, n_run=100, n_perm_null=25, save_obj=FALSE, dest_dir=getwd(),
dest_dir_create=TRUE, dest_dir_create_recur=FALSE, dest_obj="eNet.Robj",
save_lambda_QF_full=FALSE, QF.FUN=NULL, QF_label=NULL,
cor_method=c("pearson","kendall","spearman"),
binom_method=c("accuracy","precision","recall","Fscore","specificity","auc"),
multinom_method=c("avg accuracy","avg precision","avg recall","avg Fscore"),
binom_pos=NULL, fscore_beta=NULL, fold_distrib_fail.max=100,
cox_index=c("concordance","Dindex"), logrank=FALSE, survAUC=FALSE,
survAUC_time=NULL, ...)

x 
Input numerical matrix with instances as rows and features as columns. Instance and feature labels should be provided as row and column names, respectively. Can be in sparse matrix format (inherit from class 
y 
Response variable. For 
family 
Response type: 
alpha 
Sequence of values for the mixing parameter penalty term in the elastic net family. Default is 
nlambda 
Number of values for
the regularization parameter 
nlambda.ext 
If set to a value larger than 
seed 
Sets the pseudorandom number seed to enforce reproducibility. Default is 
scaled 
Zscore transformation of individual features across all instances. Default is 
n_fold 
Number of crossvalidation folds per run. 
n_run 
Number of runs (i.e. crossvalidated model iterations); for each run, instances are randomly assigned to crossvalidation folds. Default is 100. 
n_perm_null 
Number of random nullmodel permutations of the response per run. Default is 25. 
save_obj 
Logical to save the 
dest_dir 
Destination directory. Default is the working directory. 
dest_dir_create 
Creates destination directory if it does not exist already. Default is 
dest_dir_create_recur 
Creates destination directory recursively if it does not exist already. Default is 
dest_obj 
Name for output 
save_lambda_QF_full 
Full lambda vs QF information is included in the 
QF.FUN 
Userdefined quality (objective) function as maximization criterion to select 
QF_label 
Label for userdefined quality function, if 
cor_method 
For 
binom_method 
For 
multinom_method 
For 
binom_pos 
For 
fscore_beta 
For 
fold_distrib_fail.max 
For categorical models, maximum number of failed attempts per run to have all classes represented in each inbag fold. If this number is exceeded, the execution is halted; try again with larger 
cox_index 
For 
logrank 
For 
survAUC 
For 
survAUC_time 
For 
... 
Accepts parameters from 
For each alpha
, a set of nlambda
values is
obtained using the full data; if provided, nlambda.ext
allows to extend the range of lambda
values symmetrically while keeping its density uniform in log scale. Using these
values of lambda
, elastic net crossvalidation models are generated for n_run
random assignments of instances among n_fold
folds; the best lambda
is determined
by the maximization of a quality (objective) function that compares outofbag predictions against the response.
A variety of quality functions are implemented for each response type, namely: for gaussian models, correlation (different correlation methods available); for binomial models, accuracy, precision, recall, Fscore, specificity, areaundercurve; for multinomial models, average accuracy, precision, recall, Fscore; for Cox regression models, concordance, Dindex (Schroeder et al). Some of these choices require additional parameters: binomial measures that are not invariant under class permutation (see Sokolova & Lapalme) require to specify which class is to be considered positive; Fscore requires to specify the value of the beta factor to balance precision and recall (Fscore equals precision for beta=0 and tends to recall in the large beta limit). Besides these builtin options, userdefined quality functions can be provided via QF.FUN
.
For each run, using the same assignment of instances into folds, n_perm_null
null models are generated by shuffling the response. By using the quality function to compare the outofbag performance of the model to that of the null models,
an empirical significance pvalue is assigned to the model.
Similar procedures allow to obtain pvalues for individual features based on absolute coefficient magnitude and on the frequency of nonzero coefficients.
A family of elastic net models is thus generated for multiple
values of alpha
spanning the range from
ridge (alpha=0
) to lasso (alpha=1
). This function
returns an eNetXplorer
object on which summary, plotting
and export functions in this package can be applied for further
analysis.
For details about the underlying elastic net models (Friedman et al; Zhou & Hastie), refer to the glmnet
package and references therein.
For more details about eNetXplorer
, see Candia & Tsang.
For Cox regression models, setting logrank=T
generates crossvalidated logrank test pvalues of low vs highrisk groups, which are defined by the median of outofbag predicted risk (Simon et al). Moreover, setting survAUC=T
and providing a numerical vector survAUC_time
with timepoints of interest generates the AUC from crossvalidated timedependent ROC curves based on outofbag predicted risk (Simon et al) using the timeROC
package (Blanche et al).
An object with S3 class "eNetXplorer"
.
predictor 
Predictor matrix used for regression (in sparse matrix format). 
response 
Response variable used for regression. 
family 
Input parameter. 
alpha 
Input parameter. 
nlambda 
Input parameter. 
nlambda.ext 
Input parameter. 
seed 
Input parameter. 
scaled 
Input parameter. 
n_fold 
Input parameter. 
n_run 
Input parameter. 
n_perm_null 
Input parameter. 
QF_label 
Input parameter. 
cor_method 
Input parameter. 
binom_method 
Input parameter. 
multinom_method 
Input parameter. 
binom_pos 
Input parameter. 
fscore_beta 
Input parameter. 
fold_distrib_fail.max 
Input parameter. 
cox_index 
Input parameter. 
logrank 
Input parameter. 
survAUC 
Input parameter. 
survAUC_time 
Input parameter. 
survAUC_method 
Input parameter. 
survAUC_lambda 
Input parameter. 
survAUC_span 
Input parameter. 
instance 
Instance labels. 
feature 
Feature labels. 
glmnet_params 

best_lambda 

model_QF_est 
Quality function values obtained by crossvalidation. 
QF_model_vs_null_pval 
Pvalue from model vs null comparison to assess statistical significance. 
lambda_values 
List of 
lambda_QF_est 
List of quality function values obtained for each 
predicted_values 
List of outofbag predicted values for each 
feature_coef_wmean 
Mean of feature coefficients (over runs) weighted by nonzero frequency (over folds) in sparse matrix format, with features as rows and 
feature_coef_wsd 
Standard deviation of feature coefficients (over runs) weighted by nonzero frequency (over folds) in sparse matrix format, with features as rows and 
feature_freq_mean 
Mean of nonzero frequency in sparse matrix format, with features as rows and 
feature_freq_sd 
Standard deviation of nonzero frequency in sparse matrix format, with features as rows and 
null_feature_coef_wmean 
Analogous to 
null_feature_coef_wsd 
Analogous to 
null_feature_freq_mean 
Analogous to 
null_feature_freq_sd 
Analogous to 
feature_coef_model_vs_null_pval 
Pvalue from model vs null comparison to assess statistical significance of mean nonzero feature coefficients in sparse matrix format, with features as rows and 
feature_freq_model_vs_null_pval 
Pvalue from model vs null comparison to assess statistical significance of mean nonzero feature frequencies in sparse matrix format, with features as rows and 
logrank_pval 
For Cox regression (if 
AUC_mean 
For Cox regression (if 
AUC_sd 
For Cox regression (if 
AUC_perc025 
For Cox regression (if 
AUC_perc500 
For Cox regression (if 
AUC_perc975 
For Cox regression (if 
AUC_pval 
For Cox regression (if 
Julian Candia and John S. Tsang
Maintainer: Julian Candia julian.candia@nih.gov
Blanche P, Dartigues JF and JacqminGadda H. Estimating and comparing timedependent areas under receiver operating characteristic curves for censored event times with competing risks, Statistics in Medicine (2013) 32:53815397.
Candia J and Tsang JS. eNetXplorer: an R package for the quantitative exploration of elastic net families for generalized linear models, BMC Bioinformatics (2019) 20:189.
Friedman J, Hastie T and Tibshirani R. Regularization paths for generalized linear models via coordinate descent, Journal of Statistical Software (2010) 33:122.
Schroeder MS, Culhane AC, Quackenbush J, HaibeKains B. survcomp: an R/Bioconductor package for performance assessment and comparison of survival models, Bioinformatics (2011) 27:32068.
Simon RM, Subramanian J, Li MC and Menezes S. Using crossvalidation to evaluate predictive accuracy of survival risk classifiers based on highdimensional data, Briefings in Bioinformatics (2011) 12:20314.
Sokolova M and Lapalme G. A systematic analysis of performance measures for classification tasks, Information Processing and Management (2009) 45, 427437.
Zou H and Hastie T. Regularization and variable selection via the elastic net, Journal of the Royal Statistical Society Series B (2005) 67:30120.
summary
, plot
, summaryPDF
, export
, mergeObj
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47  # Linear models (synthetic dataset comprised of 20 features and 75 instances):
data(QuickStartEx)
fit = eNetXplorer(x=QuickStartEx$predictor, y=QuickStartEx$response,
family="gaussian", n_run=20, n_perm_null=10, seed=111)
# Custom QF provided (negative mean squared error)
data(QuickStartEx)
customQF = function(predicted,response){
mean((predictedresponse)**2)
}
fit = eNetXplorer(x=QuickStartEx$predictor, y=QuickStartEx$response,
family="gaussian", n_run=20, n_perm_null=10, seed=111, QF.FUN=customQF, QF_label="MSE")
# Linear models to predict numerical day70 H1N1 serum titers based on
# day7 cell population frequencies:
data(H1N1_Flow)
fit = eNetXplorer(x=H1N1_Flow$predictor_day7, y=H1N1_Flow$response_numer[rownames(
H1N1_Flow$predictor_day7)], family="gaussian", n_run=25, n_perm_null=15, seed=111)
# Binomial models to predict acute myeloid (AML) vs acute lymphoblastic (ALL)
# leukemias:
data(Leukemia_miR)
fit = eNetXplorer(x=Leuk_miR_filt$predictor, y=Leuk_miR_filt$response_binomial,
family="binomial", n_run=25, n_perm_null=15, seed=111)
# Multinomial models to predict acute myeloid (AML), acute Bcell lymphoblastic
# (BALL) and acute Tcell lymphoblastic (TALL) leukemias:
data(Leukemia_miR)
fit = eNetXplorer(x=Leuk_miR_filt$predictor, y=Leuk_miR_filt$response_multinomial,
family="multinomial", n_run=25, n_perm_null=15, seed=111)
# Binomial models to predict BALL vs TALL:
data(Leukemia_miR)
fit = eNetXplorer(x=Leuk_miR_filt$predictor[Leuk_miR_filt$response_multinomial!="AML",],
y=Leuk_miR_filt$response_multinomial[Leuk_miR_filt$response_multinomial!="AML"],
family="binomial", n_run=25, n_perm_null=15, seed=111)
# Cox regression models to predict survival based on 7gene signature:
data(breastCancerSurv)
fit = eNetXplorer(x=breastCancerSurv$predictor, y=breastCancerSurv$response, family="cox",
n_run=25, n_perm_null=15, seed=111)

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.