cv.c2plasso: Perform k-fold cross validation for the...
In rakheon/c2plasso: Pliable Lasso and Structural Varying-coefficient Regression

Description Usage Arguments Value Examples

View source: R/c2plasso.R

Perform k-fold cross validation for the continuous-categorical pliable lasso (c2plasso) model over a sequence of regularization parameter

cv.c2plasso(
  X,
  Z,
  Y,
  df_Z,
  kfold = 10,
  lambda_seq = NULL,
  alpha = 0.5,
  tt = 0.1,
  zlinear = TRUE,
  tol = 1e-07,
  iter = 500,
  cvseed = NULL
)

`X`	N by p matrix of main predictors
`Z`	N by K matrix of modifying variables. Modifying variables can take the form of continuous variables or categorical variables or both. Categorical variable should be coded by dummy variables (0-1).
`Y`	vector of response variable
`df_Z`	vector of degrees of freedom for each group of modifying variables. Continuous or binary variables has one degree of freedom. Categorical variables with C categories has (C-1) degrees of freedom. For example, if there are one continuous modifying variable, one binary modifying variable and one categorical modifying variable with 4 factor levels which is expressed with 3 binary dummy variables, then df_Z = c(1,1,3).
`kfold`	the number of folds (=k) for the k-fold cross-validation. Default value is 10.
`lambda_seq`	sequence of the tuning parameter, lambda. Can take the form of a sequence or a scalar.
`alpha`	weight parameter between group penalty and individual penalty. Default value is 0.5.
`tt`	learning rate for the gradient descent procedure. Default value is 0.1.
`zlinear`	if true, the linear terms of the modifying variables are included. These terms are not regularized. Default value is TRUE.
`tol`	tolerance for convergence. Convergence is determined by the value of the objective function: abs(objective_old - objective_new) is compared with the tolerance value. Default value is 1e-7.
`iter`	maximum number of iteration for one lambda. Default value is 500.
`cvseed`	if specified, seed number for random sampling in the cross-validation procedure is fixed so the result is reproducible. If unspecified, the result is not reproducible. Default value is NULL.

lambda_seq: lambda sequence used in the algorithm

beta_mat: p by (length of lambda_seq) matrix of estimated beta for scaled and centered main predictors. Each column represents the vector of fitted beta for each lambda value. The order of lambda is the order of lambda_seq. For a scalar value of lambda_seq, the output is a p-dim vector of fitted beta.

theta_mat: p by K by (length of lambda_seq) array of estimated theta for scaled and centered main predictors and modifying variables.

beta0_vec: intercept term

theta0_vec: coefficient for the linear terms of the modifying variables. If zlinear = FALSE, the output is the vector of zeros.

beta_raw_mat: estimated beta for raw main predictors (non-standardized)

theta_raw_mat: estimated theta for raw modifying variables (non-standardized)

beta0_raw_vec: intercept term (non-standardized)

theta0_raw_vec: coefficient for the linear terms of the modifying variables (non-standardized)

lambda_min: the lambda value which minimizes the continuous-categorical pliable lasso objective function among the values in the lambda_seq.

lambda_1se: the largest lambda value such that the difference with minimum objective function value is within 1 standard error of the minimum

cvm: the sequence of objective function values for lambda_seq

cvse: the sequence of the standard error of objective function values for lambda_seq

cvfold: (kfold) by (length of lambda_seq) matrix of the mean squared error of the test set for each fold

sqerror: N by (length of lambda_seq) matrix of the squared error

x=matrix(rnorm(100*5, 0, 1),100,5)
z1=matrix(rnorm(100*3, 0, 1),100,3)
z2=matrix(as.factor(sample(0:3, 100*2, prob=c(1/4,1/4,1/4,1/4), replace = TRUE)),100,2)
z2=as.data.frame(model.matrix(~., data=as.data.frame(z2))[,-1])
z=cbind(z1, z2)
z=as.matrix(z)
y=2*x[,1] - (2+2*z[,1])*x[,2] + (2+3*z[,4]+2*z[,5]-2*z[,6])*x[,3] + rnorm(100, 0, 1)
cv.c2plasso(X=x,Z=z,Y=y,df_Z=c(1,1,1,3,3),lambda_seq=c(1,0.5))
cv.c2plasso(X=x,Z=z,Y=y,df_Z=c(1,1,1,3,3),lambda_seq=c(1,0.5),cvseed=1)
cv.c2plasso(X=x,Z=z,Y=y,df_Z=c(1,1,1,3,3),lambda_seq=c(1,0.5),zlinear=FALSE)