shim: Fit Strong Heredity Interaction Model

Description Usage Arguments Details Value Note Author(s) See Also Examples

View source: R/models.R

Description

function to fit the Strong Heredity Interaction Model for a sequence of tuning parameters. This is a penalized regression method that ensures the interaction term is non-zero only if its corresponding main-effects are non-zero.

Usage

1
2
3
4
5
6
shim(x, y, main.effect.names, interaction.names, family = c("gaussian",
  "binomial", "poisson"), weights, lambda.factor = ifelse(nobs < nvars, 0.01,
  1e-06), lambda.beta = NULL, lambda.gamma = NULL, nlambda.gamma = 10,
  nlambda.beta = 10, nlambda = 100, threshold = 1e-04, max.iter = 100,
  initialization.type = c("ridge", "univariate"), center = TRUE,
  normalize = TRUE, verbose = TRUE, cores = 2)

Arguments

x

Design matrix of dimension n x q, where n is the number of subjects and q is the total number of variables; each row is an observation vector. This must include all main effects and interactions as well, with column names corresponding to the names of the main effects (e.g. x1, x2, E) and their interactions (e.g. x1:E, x2:E). All columns should be scaled to have mean 0 and variance 1; this is done internally by the shim function.

y

response variable. For family="gaussian" should be a 1 column matrix or numeric vector. For family="binomial", if the response is a vector it can be numeric with 0 for failure and 1 for success, or a factor with the first level representing "failure" and the second level representing "success". Alternatively, For binomial logistic regression, the response can be a matrix where the first column is the number of "successes" and the second column is the number of "failures".

main.effect.names

character vector of main effects names. MUST be ordered in the same way as the column names of x. e.g. if the column names of x are "x1","x2" then main.effect.names = c("x1","x2")

interaction.names

character vector of interaction names. MUST be separated by a colon (e.g. x1:x2), AND MUST be ordered in the same way as the column names of x

family

response type. see y for details. Currently only family = "gaussian" is implemented.

weights

observation weights. Can be total counts if responses are proportion matrices. Default is 1 for each observation. Currently NOT IMPLEMENTED

lambda.factor

The factor for getting the minimal lambda in lambda sequence, where min(lambda) = lambda.factor * max(lambda). max(lambda) is the smallest value of lambda for which all coefficients are zero. The default depends on the relationship between N (the number of rows in the matrix of predictors) and p (the number of predictors). If N > p, the default is 1e-6, close to zero. If N < p, the default is 0.01. A very small value of lambda.factor will lead to a saturated fit.

lambda.beta

sequence of tuning parameters for the main effects. If NULL (default), this function will automatically calculate a sequence using the shim_once function which will be over a grid of tuning parameters for gamma as well. If the user specifies a sequence then this function will not automatically perform the serach over a grid. You will need to create the grid yourself e.g. repeat the lambda.gamma for each value of lambda.beta

lambda.gamma

sequence of tuning parameters for the interaction effects. Default is NULL which means this function will automatically calculate a sequence of tuning paramters. See shim_once for details on how this sequence is calculated.

nlambda.gamma

number of tuning parameters for gamma. This needs to be specified even for user defined inputs

nlambda.beta

number of tuning parameters for beta. This needs to be specified even for user defined inputs

nlambda

total number of tuning parameters. If lambda.beta = NULL and lambda.gamma = NULL then nlambda should be equal to nlambda.beta x nlambda.gamma. This is important to specify especially when a user defined sequence of tuning parameters is set.

threshold

Convergence threshold for coordinate descent. Each coordinate-descent loop continues until the change in the objective function after all coefficient updates is less than threshold. Default value is 1e-4.

max.iter

Maximum number of passes over the data for all tuning parameter values; default is 100.

initialization.type

The procedure used to estimate the regression coefficients and used in the uni_fun function. If "univariate" then a series of univariate regressions is performed with the response variable y. If "ridge" then ridge regression is performed using the cv.glmnet function and the tuning parameter is chosen using 10 fold cross validation. The default is "ridge".

center

Should x and y be centered. Default is TRUE. Centering y applies to family="gaussian" only.

normalize

Should x be scaled to have unit variance. Default is TRUE

verbose

Should iteration number and vector of length nlambda be printed to console? Default is TRUE. 0 represents the algorithm has not converged for the pair of tuning parameters lambda.beta and lambda.gamma and 1 means it has converged

cores

The number of cores to use for certain calculations in the shim function, i.e. at most how many child processes will be run simultaneously using the parallel package. Must be at least one, and parallelization requires at least two cores. Default is 2.

Details

the index of the tuning parameters is as follows. If for example there are 10 lambda_gammas, and 20 lambda_betas, then the first lambda_gamma gets repeated 20 times. So the first twenty entries of tuning parameters correspond to 1 lambda_gamma and the 20 lambda_betas

Value

An object with S3 class "shim"

b0

Intercept sequence of length nlambda

beta

A nvars x nlambda matrix of main effects (β) coefficients, stored in sparse column format ("CsparseMatrix")

alpha

A nvars x nlambda matrix of interaction effects (α) coefficients, stored in sparse column format ("CsparseMatrix")

gamma

A nvars x nlambda matrix of (γ) coefficients, stored in sparse column format ("CsparseMatrix")

lambda.beta

The sequence of tuning parameters used for the main effects

lambda.gamma

The sequence of tuning parameters used for the interaction effects

tuning.parameters

2 x nlambda matrix of tuning parameters. The first row corresponds to lambda.beta and the second row corresponds to lambda.gamma

dfbeta

list of length nlambda where each element gives the index of the nonzero β coefficients

dfalpha

list of length nlambda where each element gives the index of the nonzero α coefficients

x

x matrix

y

response data

bx

column means of x matrix

by

mean of response

sx

column standard deviations of x matrix

call

the call to the function

nlambda.gamma

nlambda.gamma

nlambda.beta

nlambda.beta

nlambda

nlambda

interaction.names

interaction names

main.effect.names

main effect names

Note

if the user specifies lambda.beta and lambda.gamma then they this will not take all possible combinations of lambda.beta and lambda.gamma. It will be the first element of each as a pair, and so on. This is done on purpose for use with the cv.shim function which uses the same lambda sequences for each fold.

Author(s)

Sahir Bhatnagar

Maintainer: Sahir Bhatnagar sahir.bhatnagar@mail.mcgill.ca

See Also

shim_once

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# number of observations
n <- 100

# number of predictors
p <- 5

# environment variable
e <- sample(c(0,1), n, replace = T)

# main effects
x <- cbind(matrix(rnorm(n*p), ncol = p), e)

# need to label columns
dimnames(x)[[2]] <- c("x1","x2","x3","x4","x5","e")

# design matrix without intercept (can be user defined interactions)
X <- model.matrix(~(x1+x2+x3)*e+x1*x4+x3*x5-1, data = as.data.frame(x))

# names must appear in the same order as X matrix
interaction_names <- grep(":", colnames(X), value = T)
main_effect_names <- setdiff(colnames(X), interaction_names)

# response
Y <- X %*% rbinom(ncol(X), 1, 0.6) + 3*rnorm(n)

# standardize data
data_std <- standardize(X,Y)

result <- shim(x = data_std$x, y = data_std$y,
            main.effect.names = main_effect_names,
            interaction.names = interaction_names)

sahirbhatnagar/shim documentation built on May 29, 2019, 12:59 p.m.