gesso.fit: gesso fit
In gesso: Hierarchical GxE Interactions in a Regularized Regression Model

Description Usage Arguments Value Examples

View source: R/gesso.R

Fits gesso model over the two dimentional grid of hyperparmeters lambda_1 and lambda_2, returns estimated coefficients for each pair of hyperparameters.

gesso.fit(G, E, Y, C = NULL, normalize = TRUE, normalize_response = FALSE,
          grid = NULL, grid_size = 20, grid_min_ratio = NULL, 
          alpha = NULL, family = "gaussian", weights = NULL,
          tolerance = 1e-3, max_iterations = 5000, 
          min_working_set_size = 100,
          verbose = FALSE)

`G`	matrix of main effects of size `n x p`, variables organized by columns
`E`	vector of environmental measurments
`Y`	outcome vector. Set `family="gaussian"` for the continuous outcome and `family="binomial"` for the binary outcome with 0/1 levels
`C`	matrix of confounders of size `n x m`, variables organized by columns
`normalize`	`TRUE` to normalize matrix `G` and vector `E`
`normalize_response`	`TRUE` to normalize vector `Y`
`grid`	grid sequence for tuning hyperparameters, we use the same grid for `lambda_1` and `lambda_2`
`grid_size`	specify `grid_size` to generate grid automatically. Grid is generated by calculating `max_lambda` from the data (smallest lambda such that all the coefficients are zero). `min_lambda` is calculated as a product of `max_lambda` and `grid_min_ratio`. The program then generates `grid_size` values equidistant on the log10 scale from `min_lambda` to `max_lambda`
`grid_min_ratio`	parameter to determine `min_lambda` (smallest value for the grid of lambdas), default is 0.1 for p > n, 0.01 otherwise
`alpha`	if `NULL` independent 2D grid is used for (`lambda_1`, `lambda_2`), else 1D grid is used where `lambda_2` = `alpha` * `lambda_1`, i.e. (`lambda_1`, `alpha` * `lambda_1`)
`family`	`"gaussian"` for continuous outcome and `"binomial"` for binary
`tolerance`	tolerance for the dual gap convergence criterion
`max_iterations`	maximum number of iterations
`min_working_set_size`	minimum size of the working set
`weights`	inner fitting parameter
`verbose`	`TRUE` to print messages

A list of estimated coefficients and other model fit metrics for each pair of hyperparameters (lambda_1, lambda_2)

`beta_0`	vector of estimated intercept values of size `lambda_1`*`lambda_2`
`beta_e`	vector of estimated environment coefficients of size `lambda_1`*`lambda_2`
`beta_g`	matrix of estimated main effects coefficients organized by rows, size (`lambda_1`*`lambda_2`) by `p`
`beta_gxe`	matrix of estimated interactions coefficients organized by rows, size (`lambda_1`*`lambda_2`) by `p`
`beta_c`	matrix of estimated confounders coefficients organized by rows, size (`lambda_1`*`lambda_2`) by `m`, where `m` is the number of confounders
`num_iterations`	number of iterations until convergence for each fit
`working_set_size`	maximum number of variables in the working set for each fit
`has_converged`	1 if the model converged within given `max_iterations`, 0 otherwise
`objective_value`	objective function (loss) value for each fit
`beta_g_nonzero`	number of estimated non-zero main effects for each fit
`beta_gxe_nonzero`	number of estimated non-zero interactions for each fit
`lambda_1`	`lambda_1` path values, decreasing
`lambda_2`	`lambda_2` path values, oscillating
`grid`	vector of values used for hyperparameters tuning

data = data.gen()
fit = gesso.fit(G=data$G_train, E=data$E_train, Y=data$Y_train, normalize=TRUE)
plot(fit$beta_g_nonzero, pch=19, cex=0.4, 
     ylab="num of non-zero features", xlab="lambdas path")
points(fit$beta_gxe_nonzero, pch=19, cex=0.4, col="red")