optPenaltyGLMmultiT.kCVauto: Automatic search for optimal penalty parameters of the...

View source: R/ridgeGLMandCo.R

optPenaltyGLMmultiT.kCVautoR Documentation

Automatic search for optimal penalty parameters of the targeted ridge GLM estimator.

Description

Function finds the optimal penalty parameter of the targeted ridge regression estimator of the generalized linear model parameter. The optimum is defined as the minimizer of the cross-validated loss associated with the estimator.

Usage

optPenaltyGLMmultiT.kCVauto(Y, X, lambdaInit, model="linear", targetMat,
                      folds=makeFoldsGLMcv(min(10, length(X)), Y, model=model),
                      loss="loglik", lambdaMin=10^(-5),
                      minSuccDiff=10^(-5), maxIter=100)     

Arguments

Y

A numeric being the response vector.

X

The design matrix. The number of rows should match the number of elements of Y.

lambdaInit

A numeric giving the starting values for search of the optimal penalty parameter.

model

A character, either "linear" and "logistic" (a reference to the models currently implemented), indicating which generalized linear model model instance is to be fitted.

targetMat

A matrix with targets for the regression parameter as columns.

folds

A list. Each list item representing a fold. It is an integer vector indexing the samples that comprise the fold. This object can be generated with the makeFoldsGLMcv function.

loss

A character, either loss="loglik" or "sos", specifying loss criterion to be used in the cross-validation. Used only if model="linear".

lambdaMin

A positive numeric, the lower bound of search interval of the regular ridge penalty parameter.

minSuccDiff

A numeric, the minimum distance between the loglikelihoods of two successive iterations to be achieved. Used only if model="logistic".

maxIter

A numeric specifying the maximum number of iterations. Used only if model="logistic".

Value

The function returns an all-positive numeric, the cross-validated optimal penalty parameters. The average loglikelihood over the left-out samples is used as the cross-validation criterion. If model="linear", also the average sum-of-squares over the left-out samples is offered as cross-validation criterion.

Author(s)

W.N. van Wieringen.

References

van Wieringen, W.N. Binder, H. (2022), "Sequential learning of regression models by penalized estimation", accepted.

Examples

# set the sample size
n <- 50

# set the true parameter
betas <- (c(0:100) - 50) / 20

# generate covariate data
X <- matrix(rnorm(length(betas)*n), nrow=n)

# sample the response
probs <- exp(tcrossprod(betas, X)[1,]) / (1 + exp(tcrossprod(betas, X)[1,]))
Y     <- numeric()
for (i in 1:n){
    Y <- c(Y, sample(c(0,1), 1, prob=c(1-probs[i], probs[i])))
}

# create targets
targets <- cbind(betas/2, rep(0, length(betas)))

# tune the penalty parameter
### optLambdas <- optPenaltyGLMmultiT.kCVauto(Y, X, c(50,0.1), fold=5,            
###                                           targetMat=targets, model="logistic", 
###                                           minSuccDiff=10^(-3))                  

# estimate the logistic regression parameter
### bHat <- ridgeGLMmultiT(Y, X, lambdas=optLambdas, targetMat=targets, model="logistic") 

porridge documentation built on May 29, 2024, 1:37 a.m.