stat.lasso_coefdiff_bin: Importance statistics based on regularized logistic...

View source: R/stats_lasso_cv_bin.R

stat.lasso_coefdiff_binR Documentation

Importance statistics based on regularized logistic regression with cross-validation

Description

Fits a logistic regression model via penalized maximum likelihood and cross-validation. Then, compute the difference statistic

W_j = |Z_j| - |\tilde{Z}_j|

where Z_j and \tilde{Z}_j are the coefficient estimates for the jth variable and its knockoff, respectively. The value of the regularization parameter λ is selected by cross-validation and computed with glmnet.

Usage

stat.lasso_coefdiff_bin(X, X_k, y, cores = 2, ...)

Arguments

X

n-by-p matrix of original variables..

X_k

n-by-p matrix of knockoff variables.

y

vector of length n, containing the response variables. It should be either a factor with two levels, or a two-column matrix of counts or proportions (the second column is treated as the target class; for a factor, the last level in alphabetical order is the target class). If y is presented as a vector, it will be coerced into a factor.

cores

Number of cores used to compute the statistics by running cv.glmnet. If not specified, the number of cores is set to approximately half of the number of cores detected by the parallel package.

...

additional arguments specific to glmnet (see Details).

Details

This function uses the glmnet package to fit the penalized logistic regression path and is a wrapper around the more general stat.glmnet_coefdiff.

The statistics W_j are constructed by taking the difference between the coefficient of the j-th variable and its knockoff.

By default, the value of the regularization parameter is chosen by 10-fold cross-validation.

The optional nlambda parameter can be used to control the granularity of the grid of λ's. The default value of nlambda is 500, where p is the number of columns of X.

For a complete list of the available additional arguments, see cv.glmnet and glmnet.

Value

A vector of statistics W of length p.

See Also

Other statistics: stat.forward_selection(), stat.glmnet_coefdiff(), stat.glmnet_lambdadiff(), stat.lasso_coefdiff(), stat.lasso_lambdadiff_bin(), stat.lasso_lambdadiff(), stat.random_forest(), stat.sqrt_lasso(), stat.stability_selection()

Examples

set.seed(2022)
p=200; n=100; k=15
mu = rep(0,p); Sigma = diag(p)
X = matrix(rnorm(n*p),n)
nonzero = sample(p, k)
beta = 3.5 * (1:p %in% nonzero)
pr = 1/(1+exp(-X %*% beta))
y = rbinom(n,1,pr)
knockoffs = function(X) create.gaussian(X, mu, Sigma)

# Basic usage with default arguments
result = knockoff.filter(X, y, knockoffs=knockoffs, 
                           statistic=stat.lasso_coefdiff_bin)
print(result$selected)

# Advanced usage with custom arguments
foo = stat.lasso_coefdiff_bin
k_stat = function(X, X_k, y) foo(X, X_k, y, nlambda=200)
result = knockoff.filter(X, y, knockoffs=knockoffs, statistic=k_stat)
print(result$selected)


knockoff documentation built on Aug. 15, 2022, 9:06 a.m.