grridge: Group-regularized (logistic) ridge regression

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/grridge.R

Description

This function implements adaptive group-regularized (logistic) ridge regression by use of co-data. It uses co-data to improve predictions of binary and continuous response from high-dimension (e.g. genomics) data. Here, co-data is auxiliary information on variables (e.g. genes), such as annotation or p-values from other studies.

Usage

1
2
3
4
5
6
7
8
grridge(highdimdata, response, partitions, unpenal = ~1, 
        offset=NULL, method="exactstable",
        niter=10, monotone=NULL, optl=NULL, innfold=NULL, 
        fixedfoldsinn=TRUE, maxsel=c(25,100),selectionEN=FALSE,cvlmarg=1,
        savepredobj="all", dataunpen=NULL, ord = 1:length(partitions),
        comparelasso=FALSE,optllasso=NULL,cvllasso=TRUE,
        compareunpenal=FALSE,trace=FALSE,modus=1,
        EBlambda=FALSE,standardizeX = TRUE)       

Arguments

highdimdata

Matrix or numerical data frame. Contains the primary data of the study. Columns are samples, rows are variables (features).

response

Factor, numeric, binary or survival. Response values. The number of response values should equal ncol(highdimdata).

partitions

List of lists. Each list component contains a partition of the variables, which is again a list. See details.

unpenal

Formula. Includes unpenalized variables. Set to unpenal = ~0 if an intercept is not desired.

offset

Numeric (vector). Optional offset, either one constant or sample-specific, in which case length(offset)=ncol(highdimdata)

method

Character. Equal to "exactstable": the stable iterative, systems-based method, "stable": the iterative non-systems-based method, "exact": the non-iterative, systems-based method, "adaptridge": adaptive ridge (not recommended).

niter

Integer. Maximum number of re-penalization iterations.

monotone

Vector of booleans. If the jth component of monotone equals TRUE, then the group-penalties are forced to be monotone. If monotone=NULL monotony is not imposed for any partition.

optl

Numeric. Value of the global regularization parameter (lambda). If specified, it skips optimization by cross-validation.

innfold

Integer. The fold for cross-validating the global regularization parameter lambda and for computing cross-validated likelihoods. Defaults too LOOCV.

fixedfoldsinn

Boolean. Use fixed folds for inner cross-validation?

selectionEN

Boolean. If selectionEN=TRUE then post-hoc variable selection by weighted elastic net is performed.

maxsel

Vector of integers. The maximum number of selected variables. Can be multiple to allow comparing models of various sizes.

cvlmarg

Numeric. Maximum margin (in percentage) that the cross-validated likelihood of the model with selected variables may deviate from the optimum one.

savepredobj

Character. If savepredobj="last", only the last penalized prediction object is saved; if savepredobj="all" all are saved; if savepredobj="none", none are saved.

dataunpen

Data frame. Optional data for unpenalized variables.

ord

Integer vector. The order in which the partitions in partitions are used.

comparelasso

Boolean. If comparelasso=TRUE the results of lasso regression are included.

optllasso

Numeric. Value of the global regularization parameter (lambda) in the lasso. If specified, optimization by cross-validation is skipped.

cvllasso

Boolean. If cvllasso=TRUE it returns the cross-validated likelihood for lasso when comparelasso=TRUE.

compareunpenal

Boolean. If compareunpenal=TRUE the results of regression with unpenalized covariates only are included. Only relevant when dataunpenal is specified.

trace

Boolean. If trace=TRUE the results of the cross-validation for parameter (lambda) tuning are shown.

modus

Integer. Please use modus=1. Only use modus=2 when backward compatibility with versions <= 1.6 is desired.

EBlambda

Boolean. If EBlambda=TRUE global lambda is estimated by empirical Bayes (currently only available for linear model).

standardizeX

Boolean. If standardizeX=TRUE variables in X are standardized prior to the analysis.

Details

About partitions: this is a list of partitions or one partition represented as a simple list. Each partition is a (named) list that contains the indices (row numbers) of the variables in the concerning group. Such a partition is usually created by CreatePartition. About savepredobj: use savepredobj="all" if you want to compare performances of the various predictors (e.g. ordinary ridge, group-regularized ridge, group-regularized ridge + selection) using grridgeCV. About monotone: We recommend to set the jth component of monotone to TRUE when the jth partition is based on external p-values, test statistics or regression coeeficients. This increases stability of the predictions. If selectionEN=TRUE, EN selection will, for all elements m of maxsel, select exactly m or fewer variables. Note that EN is only used for selection; the final predictive model is a group-ridge model fitted only on the selected variables using the penalties estimated by GRridge. Using multiple values for maxsel allows comparing models of various sizes, also in terms of cross-validated performance when using grridgeCV. About cvlmarg: We recommended to use values between 0 and 2. A larger value will generally result in fewer selected variables by forward selection. About innfold: for large data sets considerable computing time may be saved when setting innfold=10 instead of default leave-one-out-cross-validation (LOOCV). About method: "exactstable" is recommended. If the number of variables is not very large, say <2000, the faster non-iterative "exact" method can be used as an alternative. grridge uses the penalized package to fit logistic and survival ridge models; glmnet is used for linear response and for fitting lasso when comparelasso=TRUE.

Value

A list object containing:

true

True values of the response

cvfit

Measure of fit. Cross-validated likelihoods from the iterations for linear and survival model; minus CV error for linear model

lambdamults

List of lists object containing the penalty multipliers per group per partition

optl

Global penalty parameter lambda

lambdamultvec

Vector with penalty multipliers per variable

predobj

List of prediction objects

betas

Estimated regression coefficients

reslasso

Results of the lasso. NULL when comparelasso=FALSE

resEN

Results of the Elastic Net selection for all elements of maxsel. list() when selectionEN=FALSE

model

Model used for fitting: logistic, linear or survival

arguments

Arguments used to call the function

allpreds

Predictions on the same data

Author(s)

Mark A. van de Wiel

References

Mark van de Wiel, Tonje Lien, Wina Verlaat, Wessel van Wieringen, Saskia Wilting. (2016). Better prediction by use of co-data: adaptive group-regularized ridge regression. Statistics in Medicine, 35(3), 368-81.

Novianti PW, Snoek B, Wilting SM, van de Wiel MA (2017). Better diagnostic signatures from RNAseq data through use of auxiliary co-data. Bioinformatics, 33, 1572-1574.

See Also

Creating partitions: CreatePartition; Cross-validation for assessing predictive performance: grridgeCV.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
## NOTE: 
## 1. EXAMPLE DEVIATES SOMEWHAT FROM THE EXAMPLE IN THE MANUSCRIPT IN ORDER TO SHOW SOME
##    OTHER FUNCTIONALITIES.
## 2. HERE WE SHOW A SIMPLE EXAMPLE FROM THE FARKAS DATA SET 
## MORE EXTENSIVE EXAMPLES OF FUNCTIONALITIES IN THE GRRIGDE PACKAGE ARE PROVIDED IN 
## VIGNETTE DOCUMENTATION FILE


## 1ST EXAMPLE: Farkas DATA, USING ANNOTATION: DISTANCE TO CpG

##load data objects:
##datcenFarkas: methylation data for cervix samples (arcsine-transformed beta values)
##respFarkas: binary response (Normal and Precursor)
##CpGannFarkas: annotation of probes according to location
##(CpG-Island, North-Shelf, South-Shelf, North-Shore, South-Shore, Distant) 
data(dataFarkas)

##Create list of partition(s), here only one partition included
partitionFarkas <- list(cpg=CreatePartition(CpGannFarkas))

##Group-regularized ridge applied to data datcenFarkas, 
##response respFarkas and partition partitionFarkas. 
##Saves the prediction objects from ordinary and group-regularized ridge.
##Includes unpenalized intercept by default.

#grFarkas <- grridge(datcenFarkas,respFarkas, optl=5.680087,
#                      partitionFarkas,monotone=FALSE)

## 2ND EXAMPLE: Verlaat DATA, USING P-VALUES AND SIGN OF EFFECT FROM FARKAS DATA
## see vignette documentation file!

GRridge documentation built on Nov. 8, 2020, 5:47 p.m.