Description Usage Arguments Details Value Author(s) Examples
Elastic nets use a mixing parameter alpha
to tune the penalty term continuously from ridge (alpha=0
) to lasso (alpha=1
). This function generates a family of elastic net models over different values of alpha
for the quantitative exploration of the effects of shrinkage. For each alpha
, the regularization parameter lambda
is chosen by optimizing a quality function based on out-of-bag cross-validation predictions. Statistical significance of each model, as well as that of individual features within a model,
is assigned by comparison to a set of null models generated by random permutations of the response. This function fits linear (gaussian), logistic (binomial) and multinomial models.
1 2 |
x |
Input numerical matrix with instances as rows and features as columns. Instance and feature labels should be provided as row and column names, respectively. Can be in sparse matrix format (inherit from class |
y |
Response variable. For |
family |
Response type: |
alpha |
Sequence of values for the mixing parameter penalty term in the elastic net family. Default is |
nlambda |
Number of values for
the regularization parameter |
nlambda.ext |
If set to a value larger than |
seed |
Sets the pseudo-random number seed to enforce reproducibility. Default is |
scaled |
Z-score transformation of individual features across all instances. Default is |
n_fold |
Number of cross-validation folds per run. |
n_run |
Number of runs; for each run, instances are randomly assigned to cross-validation folds. Default is 100. |
n_perm_null |
Number of random null-model permutations of the response per run. Default is 25. |
QF.FUN |
User-defined quality function as maximization criterion to select |
QF_label |
Label for user-defined quality function, if QF.FUN is provided. |
cor_method |
Correlation method to be used in the default quality function |
fold_distrib_fail.max |
For categorical models, maximum number of failed attempts per run to have all classes represented in each in-bag fold. If this number is exceeded, the execution is halted; try again with larger |
... |
Accepts parameters from |
For each alpha
, a set of nlambda
values is
obtained using the full data; if provided, nlambda.ext
allows to extend the range of lambda
values symmetrically while keeping its density constant (in log scale). Using these
values of lambda
, elastic net cross-validation models are generated for n_run
random assignments of instances among n_fold
folds; the optimal lambda
is determined
by the maximization of a quality function that compares out-of-bag predictions against the response. User-defined quality functions can be provided via QF.FUN
, otherwise sensible defaults are used (e.g. correlation for gaussian models).
For each run, using the same assignment of instances into folds, n_perm_null
null models are generated by shuffling the response. By using the quality function to compare the out-of-bag performance of the model to that of the null models,
an empirical significance p-value is assigned to the model.
Similar procedures allow to obtain p-values for individual features based on absolute coefficient magnitude and on the frequency of non-zero coefficients.
A family of elastic net models is thus generated for multiple
values of alpha
typically spanning the range from
ridge (alpha=0
) to lasso (alpha=1
). This function
returns an eNetXplorer
object on which summary, plotting
and export functions in this package can be applied for further
analysis.
For details about the underlying elastic net models, please refer to the glmnet
package and references therein.
An object with S3 class "eNetXplorer"
.
predictor |
|
response |
|
family |
|
alpha |
|
nlambda |
|
nlambda.ext |
|
seed |
|
scaled |
|
n_fold |
|
n_run |
|
n_perm_null |
|
QF_label |
|
instance |
|
feature |
|
fold_distrib_fail.max |
|
glmnet_params |
|
best_lambda |
|
model_QF_est |
|
QF_model_vs_null_pval |
|
lambda_values |
|
lambda_QF_est |
|
predicted_values |
|
feature_coef_wmean |
|
feature_coef_wsd |
|
feature_freq_mean |
|
feature_freq_sd |
|
null_feature_coef_wmean |
|
null_feature_coef_wsd |
|
null_feature_freq_mean |
|
null_feature_freq_sd |
|
feature_coef_model_vs_null_pval |
|
feature_freq_model_vs_null_pval |
Julian Candia and John S. Tsang
Maintainer: Julian Candia julian.candia@nih.gov
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | ## Not run:
# Gaussian eNetXplorer models (synthetic dataset comprised of 20 features and 75 instances):
data(QuickStartExample)
result = eNetXplorer(x=QuickStartExample$predictor,y=QuickStartExample$response,family="gaussian",n_run=25,n_perm_null=20,seed=111)
# Gaussian eNetXplorer models to predict H1N1 serum titers at day 70 based on cell subpopulation frequencies at day 7:
data(H1N1_Flow)
result = eNetXplorer(x=H1N1_Flow$predictor_day7,y=H1N1_Flow$response[rownames(H1N1_Flow$predictor_day7)],family="gaussian",n_run=25,n_perm_null=20,seed=111)
# Binomial eNetXplorer models to predict acute myeloid (AML) vs acute lymphoblastic (ALL) leukemias:
data(Leukemia_miR)
result = eNetXplorer(x=Leukemia_miR$predictor,y=Leukemia_miR$response_binomial,family="binomial",n_run=25,n_perm_null=20,seed=111)
# Multinomial eNetXplorer models to predict acute myeloid (AML), acute B-cell lymphoblastic (B-ALL) and acute T-cell lymphoblastic (T-ALL) leukemias:
data(Leukemia_miR)
result = eNetXplorer(x=Leukemia_miR$predictor,y=Leukemia_miR$response_multinomial,family="multinomial",n_run=25,n_perm_null=20,seed=111)
# Binomial eNetXplorer models to predict B-ALL vs T-ALL:
data(Leukemia_miR)
result = eNetXplorer(x=Leukemia_miR$predictor[Leukemia_miR$response_multinomial!="AML",],y=Leukemia_miR$response_multinomial[Leukemia_miR$response_multinomial!="AML"],family="binomial",n_run=25,n_perm_null=20,seed=111)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.